Job Title: Data Engineer
Duration: 10 months
Location: Plano , TX (Hybrid)
Interview: Coding exercise at the time of interviewOverview: We are seeking a skilled professional with over 5 years of experience in AWS, Python, Scala, Spark, and SQL. The ideal candidate will have a strong understanding of data processing, API development, and data pipeline construction.
Key Responsibilities: - Build and maintain APIs to support enterprise-wide data needs.
- Analyze large volumes of data and write efficient SQL queries.
- Develop and manage billing data pipelines.
- Design and implement ETL pipelines for processing large-scale data.
- Collaborate with the compliance team as part of the centralized data management team.
- Ensure compliance with regulatory and ethical standards through effective data management.
- Handle high-volume data processing and optimize performance.
- Work closely with stakeholders to gather requirements and provide data-driven solutions.
- Experience working with financial institutions is preferred.
About the Team: - We support all compliance-related activities by serving as a centralized data management team. Our primary responsibility is to manage customer data, ensuring accuracy and security.
- This project is an ongoing initiative with no immediate end, involving multiple processes tailored to various teams. Our current focus is on expanding our capabilities by developing new data products to support additional teams and use cases.
- We are expanding rapidly, enabling us to grow our team and enhance our scope across the product.
- Currently, our focus is primarily on customer data, but we are transitioning to include customer transactions.
- This expansion is expected to evolve into a multi-year project, although commitments beyond the current year are uncertain due to external factors.
Data Volume and Infrastructure: - We manage approximately 40 million records daily, with potential increases beyond this figure.
- Our infrastructure includes 16 to 20 data pipelines, ingesting customer data from various sources.
- These pipelines process data volumes ranging from 150 million to 25 million records, with delta processing to optimize storage and performance.
- On average, we handle 60 to 70 million records within our data pipelines.
Skills: - AWS: 20%
- Python: 20%
- Scala: 20%
- Spark: 20%
- SQL: 20%
For immediate consideration please click APPLY.