Responsibilities:
• Hands-on building of ETL pipelines using our internal framework written in Java and
Python
• Hands-on solutioning of real time REST APIs or other solutions for streaming data
from Graph
• Modify existing application code or interfaces or build new application components
from detailed requirements.
• Analysis of requirements, support of the design, development of the code, testing,
debugging, deployment, and maintenance of those programs and interfaces.
Documentation of the work is essential
• Participation in most aspects of programming and application development,
including file design, update, storage, and retrieval
• Enhance processes to resolve operational problems and add new functions taking
into consideration schedule, resource constraints, process complexity, dependencies,
assumptions and application structure
• Ability to maintain the developed solution on an on-going basis is essential
• Ability to follow the existing development methodology and coding standards, and
ensure compliance with the internal and external regulatory requirements
• Develop and implement databases, data collection systems, data analytics and other
strategies that optimize statistical efficiency and quality
• Acquire data from primary or secondary data sources and maintain databases/data
systems
• Work with management to prioritize business and information needs
• Locate and define new process improvement opportunities
• Document design and data flow for existing and new applications being built.
• Co-ordinate with multiple different teams QA, Operations and other development
team within the organization.
• Testing methods, including unit and integration testing (PyTest, PyUnit)
• Ability to integrate with large teams, demonstrating strong verbal and written
communication skills
• Utilization of software configuration management tools
• Code deployment and code versioning tools
• Excellent Communication Skills
Qualifications:
• Bachelor's degree preferably with Computer Science background.
• At least 5+ years of experience implementing complex ETL pipelines preferably with
Spark toolset.
• At least 5+ years of experience with Python particularly within the data space
• Technical expertise regarding data models, database design development, data
mining and segmentation techniques
• Good experience writing complex SQL and ETL processes
• Excellent coding and design skills, particularly either in Scala or Python.
• Strong practical working experience with Unix scripting in at least one of Python,
Perl, Shell (either bash or zsh).
• Experience in AWS technologies such as EC2, Redshift, Cloud formation, EMR, AWS
S3, AWS Analytics required.
• Experience designing and implementing data pipelines in a onprem/cloud
environment is required.
• Experience building/implementing data pipelines using Databricks/On prem or
similar cloud database.
• Expert level knowledge of using SQL to write complex, highly optimized queries
across large volumes of data.
• Hands-on object-oriented programming experience using Python is required.
• Professional work experience building real-time data streams using Spark and
Experience in Spark.
• Knowledge or experience in architectural best practices in building data lakes
• Develop and work with APIs
• Develop and maintain scalable data pipelines and build out new API integrations to
support continuing increases in data volume and complexity.
• Collaborate with analytics and business teams to improve data models that feed
business intelligence tools, increase data accessibility, and foster data-driven decision
making across the organization.
• Implement processes and systems to monitor data quality, to ensure production
data accuracy, and ensure key stakeholder and business process access.
• Write unit/integration tests, contribute to engineering wiki, and documents.
• Perform data analysis required to troubleshoot data related issues and assist in the
resolution of data issues.
• Experience developing data integrations and data quality framework based on
established requirements.
• Experience with CI/CD processes and tools (e.g., concourse, Jenkins).
• Experience with test driven development writing unit tests, test coverage using
PyTest, PyUnit, pytest-cov libraries.
• Experience working in an Agile environment.
• Good understanding & usage of algorithms and data structures
• Good Experience building reusable frameworks.
• Experience working in an Agile Team environment.
• AWS certification is preferable: AWS Developer/Architect/DevOps/Big Data
• Excellent communication skills both verbal and written
Python Developer • Philadelphia, PA, United States