Position Description :
This role will require someone in Cleveland, OH, Miamisburg, OH, Pittsburgh, PA, Dallas, TX, or Birmingham, AL.
We are seeking a highly skilled developer to join our Data Products Capabilities team. In this role, you will be the primary owner of back-end service development and integration, working closely with product managers, architects, and front-end teams. This is a hands-on position that demands strong technical depth, autonomy, and the ability to deliver scalable micro-services and API s that power seamless customer experiences.
Your future duties and responsibilities :
- Design and develop scalable ETL / ELT pipelines using PySpark and Python for batch and real-time processing.
- . Build and optimized Spark Streaming applications for real-time ingestion, transformation, and event-driven processing using Kafka or other messaging systems.
- . Develop distributed data-processing workflows on Apache Spark, ensuring efficient computation and fault tolerance.
- . Work extensively with SQL for data transformation, aggregation, and performance-tuned querying across large datasets.
- . Integrate pipelines with Hadoop ecosystem components (HDFS, Hive, Yarn) and modern data platforms.
- . Implement data quality checks, validations, and reconciliation logic for both batch and streaming data.
- . Tune Spark jobs using partitioning, caching, broadcast joins, and resource-optimization techniques.
- . Build CI / CD workflows using Git, Jenkins, and Bitbucket for automated deployments and version control.
- . Collaborate with cross-functional teams to troubleshoot, monitor, and improve data pipelines in production environments.
- . Ensure compliance with data security, governance, and access control practices.
Required qualifications to be successful in this role :
6-8 years of hands-on experience with Python and PySpark development.. Strong expertise in Spark DataFrames, RDDs, Spark SQL, and distributed data processing.. Practical experience building Spark Streaming or Structured Streaming applications.. Solid understanding of ETL / ELT pipeline development using PySpark.. Strong proficiency with SQL and query optimization.. Experience with the Hadoop ecosystem (HDFS, Hive, Yarn) or similar big-data platforms.. Experience with containerization and orchestration (e.g., Docker, Kubernetes) is an advantage.. Knowledge of CI / CD tools like Git, Jenkins, Bitbucket.. Understanding of job monitoring, logging, and performance tuning for both batch and streaming workloads.Other Information :
CGI is required by law in some jurisdictions to include a reasonable estimate of the compensation range for this role. The determination of this range includes various factors not limited to skill set, level, experience, relevant training, and licensure and certifications. To support the ability to reward for merit-based performance, CGI typically does not hire individuals at or near the top of the range for their role. Compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range for this role in the U.S. is $70,.00 - $,.00.
CGI's benefits are offered to eligible professionals on their first day of employment to include :
. Competitive compensation. Comprehensive insurance options. Matching contributions through the (k) plan and the share purchase plan. Paid time off for vacation, holidays, and sick time. Paid parental leave.Learning opportunities and tuition assistance. Wellness and Well-being programs#LI-PS1
Skills :
Apache KafkaDatabaseGraphQLJavaShell ScriptSpring Boot