Responsibilities :
- Hands-on building of ETL pipelines using our internal framework written in Java and
Python
Hands-on solutioning of real time REST APIs or other solutions for streaming datafrom Graph
Modify existing application code or interfaces or build new application componentsfrom detailed requirements.
Analysis of requirements, support of the design, development of the code, testing,debugging, deployment, and maintenance of those programs and interfaces.
Documentation of the work is essential
Participation in most aspects of programming and application development,including file design, update, storage, and retrieval
Enhance processes to resolve operational problems and add new functions takinginto consideration schedule, resource constraints, process complexity, dependencies,
assumptions and application structure
Ability to maintain the developed solution on an on-going basis is essentialAbility to follow the existing development methodology and coding standards, andensure compliance with the internal and external regulatory requirements
Develop and implement databases, data collection systems, data analytics and otherstrategies that optimize statistical efficiency and quality
Acquire data from primary or secondary data sources and maintain databases / datasystems
Work with management to prioritize business and information needsLocate and define new process improvement opportunitiesDocument design and data flow for existing and new applications being built.Co-ordinate with multiple different teams QA, Operations and other developmentteam within the organization.
Testing methods, including unit and integration testing (PyTest, PyUnit)Ability to integrate with large teams, demonstrating strong verbal and writtencommunication skills
Utilization of software configuration management toolsCode deployment and code versioning toolsExcellent Communication SkillsQualifications :
Bachelor's degree preferably with Computer Science background.At least 5+ years of experience implementing complex ETL pipelines preferably withSpark toolset.
At least 5+ years of experience with Python particularly within the data spaceTechnical expertise regarding data models, database design development, datamining and segmentation techniques
Good experience writing complex SQL and ETL processesExcellent coding and design skills, particularly either in Scala or Python.Strong practical working experience with Unix scripting in at least one of Python,Perl, Shell (either bash or zsh).
Experience in AWS technologies such as EC2, Redshift, Cloud formation, EMR, AWSS3, AWS Analytics required.
Experience designing and implementing data pipelines in a onprem / cloudenvironment is required.
Experience building / implementing data pipelines using Databricks / On prem orsimilar cloud database.
Expert level knowledge of using SQL to write complex, highly optimized queriesacross large volumes of data.
Hands-on object-oriented programming experience using Python is required.Professional work experience building real-time data streams using Spark andExperience in Spark.
Knowledge or experience in architectural best practices in building data lakesDevelop and work with APIsDevelop and maintain scalable data pipelines and build out new API integrations tosupport continuing increases in data volume and complexity.
Collaborate with analytics and business teams to improve data models that feedbusiness intelligence tools, increase data accessibility, and foster data-driven decision
making across the organization.
Implement processes and systems to monitor data quality, to ensure productiondata accuracy, and ensure key stakeholder and business process access.
Write unit / integration tests, contribute to engineering wiki, and documents.Perform data analysis required to troubleshoot data related issues and assist in theresolution of data issues.
Experience developing data integrations and data quality framework based onestablished requirements.
Experience with CI / CD processes and tools (e.g., concourse, Jenkins).Experience with test driven development writing unit tests, test coverage usingPyTest, PyUnit, pytest-cov libraries.
Experience working in an Agile environment.Good understanding & usage of algorithms and data structuresGood Experience building reusable frameworks.Experience working in an Agile Team environment.AWS certification is preferable : AWS Developer / Architect / DevOps / Big DataExcellent communication skills both verbal and written