Pyspark , AWS
Core responsibilities
- Data pipeline development Design, develop, and maintain high-performance data pipelines using PySpark.
- Performance optimization Optimize and tune existing data processing workflows for better performance and efficiency.
- Data transformation Implement complex data transformations and integrations, such as reading from external sources, merging data, and loading into target destinations.
- Troubleshooting Monitor and troubleshoot performance issues, errors, and other problems in data processing systems.
- Collaboration Work with cross-functional teams like data scientists, data engineers, and business analysts to understand requirements and deliver solutions.
Required skills and qualifications
Technical skills Strong proficiency in Python and Apache Spark is essential, along with experience in distributed computing concepts.Big data ecosystem Experience with big data technologies like Hadoop, Hive, and data storage solutions (e.g., HDFS, AWS S3) is often required.SQL Proficiency in SQL for querying and data modeling is a must.Cloud platforms Familiarity with cloud environments like AWS, Google Cloud, or Azure is a significant advantage.Development tools Experience with version control (Git) and CICD tools like Jenkins is often expected.Other skills Knowledge of Linux, shell scripting, and agile methodologies is beneficialDiverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.