Location : Jersey City / ONSITE / NO C2C / NO C2H / NO THIRD PARTIES / ONLY W2 /
ONLY LOCALS TO NJ / NY - NO RELOCATION CANDIDATES
Skillset : Data Engineer
Must Haves : Python, PySpark, AWS ECS, Glue, Lambda, S3
Nice to Haves : Java, Spark, React Js
Interview Process : Interview Process : 2 rounds, 2nd will be on site
Youre ready to gain the skills and experience needed to grow within your role and advance your career and we have the perfect software engineering opportunity for you.
As a Data Engineer III - Python / Spark / Data Lake at JPMorgan Chase within the Consumer and Community Bank , you will be a seasoned member of an agile team, tasked with designing and delivering reliable data collection, storage, access, and analytics solutions that are secure, stable, and scalable. Your responsibilities will include developing, testing, and maintaining essential data pipelines and architectures across diverse technical areas, supporting various business functions to achieve the firm's business objectives.
Job responsibilities :
Supports review of controls to ensure sufficient protection of enterprise data.
Advises and makes custom configuration changes in one to two tools to generate a product at the business or customer request.
Updates logical or physical data models based on new use cases.
Frequently uses SQL and understands NoSQL databases and their niche in the marketplace.
Adds to team culture of diversity, opportunity, inclusion, and respect.
Develop enterprise data models, Design / develop / maintain large-scale data processing pipelines (and infrastructure), Lead code reviews and provide mentoring thru the process, Drive data quality, Ensure data accessibility (to analysts and data scientists), Ensure compliance with data governance requirements, and Ensure business alignment (ensure data engineering practices align with business goals).
Supports review of controls to ensure sufficient protection of enterprise data
Required qualifications, capabilities, and skills
Formal training or certification on data engineering concepts and 2+ years applied experience
Experience across the data lifecycle, advanced experience with SQL (e.g., joins and aggregations), and working understanding of NoSQL databases
Experience with statistical data analysis and ability to determine appropriate tools and data patterns to perform analysis
Extensive experience in AWS, design, implementation, and maintenance of data pipelines using Python and PySpark.
Proficient in Python and PySpark, able to write and execute complex queries to perform curation and build views required by end users (single and multi-dimensional).
Proven experience in performance and tuning to ensure jobs are running at optimal levels and no performance bottleneck.
Advanced proficiency in leveraging Gen AI models from Anthropic (or OpenAI, or Google) using APIs / SDKs
Advanced proficiency in cloud data lakehouse platform such as AWS data lake services, Databricks or Hadoop, relational data store such as Postgres, Oracle or similar, and at least one NOSQL data store such as Cassandra, Dynamo, MongoDB or similar
Advanced proficiency in Cloud Data Warehouse Snowflake, AWS Redshift
Advanced proficiency in at least one scheduling / orchestration tool such as Airflow, AWS Step Functions or similar
Proficiency in Unix scripting, data structures, data serialization formats such as JSON, AVRO, Protobuf, or similar, big-data storage formats such as Parquet, Iceberg, or similar, data processing methodologies such as batch, micro-batching, or stream, one or more data modelling techniques such as Dimensional, Data Vault, Kimball, Inmon, etc., Agile methodology, TDD or BDD and CI / CD tools.
Preferred qualifications, capabilities, and skills
Knowledge of data governance and security best practices.
Experience in carrying out data analysis to support business insights.
Strong Python and Spark
Data Engineer • Jersey City, NJ, US