ONLY C2C
Maximum Pay Rate
USD 65.00
JOB#121927
Data Engineer
Vanguard Group
Worksite Address (Hybrid) 3days onsite)
1001 Cedar Hollow Road
Malvern Pennsylvania 19355
Job Description
Overview
We are seeking a highly experienced Data Engineer with 5 years of experience. This role is critical to hitting product rollout deadlines as the teams work is a hard direct dependency for other product feature rollouts. The ideal candidate will be a hands-on developer with deep expertise in the AWS data stack focusing primarily on data engineering and pipeline development.
Key Responsibilities
Develop and Implement Data Pipelines : Design build and maintain robust data pipelines primarily using AWS Glue and PySpark .
Data Sourcing and Transformation : Source data from various systems including Redshift and Aurora performing necessary streaming transformations and heavy data cleaning .
Data Delivery : Push resulting cleaned datasets into S3 buckets .
External Integration : Manage the secure transfer of resulting files via SFTP to an external 3rd party companys server adhering to non-negotiable external integration deadlines.
Collaboration : Work closely with the team to consult on the best and most efficient solutions for achieving required data outputs given the constraints of the AWS Glue / PySpark environment.
Required Qualifications and Skills
AWS Data Stack : Heavy expertise in the AWS ecosystem specifically AWS Glue .
PySpark Expertise : Hands-on experience working with PySpark on complex application implementations is required.
Database Knowledge : Heavy knowledge of both relational (e.g. Redshift Aurora ) and non-SQL databases and how to leverage them within the AWS Glue / PySpark environment.
Experience Level : Looking for experienced engineers with .
Data Engineering Fundamentals : Strong general knowledge of how to efficiently get transform and push out data.
Job Responsibilities
Key Responsibilities
Develop and Implement Data Pipelines : Design build and maintain robust data pipelines primarily using AWS Glue and PySpark .
Data Sourcing and Transformation : Source data from various systems including Redshift and Aurora performing necessary streaming transformations and heavy data cleaning .
Data Delivery : Push resulting cleaned datasets into S3 buckets .
External Integration : Manage the secure transfer of resulting files via SFTP to an external 3rd party companys server adhering to non-negotiable external integration deadlines.
Collaboration : Work closely with the team to consult on the best and most efficient solutions for achieving required data outputs given the constraints of the AWS Glue / PySpark environment.
Industry
Banking Financial Services & InsuranceEstimated Start Date
2 / 3 / 2026
Estimated End Date
8 / 7 / 2026
Key Skills
Apache Hive,S3,Hadoop,Redshift,Spark,AWS,Apache Pig,NoSQL,Big Data,Data Warehouse,Kafka,Scala
Employment Type : Full Time
Experience : years
Vacancy : 1
Data Engineer • Malvern, Pennsylvania, USA