Fractal is looking for a proactive and driven AWS Lead Data Architect / Engineer to join our cloud and data tech team. In this role, you will work on designing the system architecture and solution, ensuring the platform is scalable while performant, and creating automated data pipelines.
Responsibilities :
- Design & Architecture of Scalable Data Platforms
- Design, develop, and maintain large-scale data processing architectures on the Databricks Lakehouse Platform to support business needs
- Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for various domains (e.g., Retail Execution, Digital Commerce, Logistics, Category Management).
- Leverage Delta Lake, Unity Catalog, and advanced features of Databricks for governed data sharing, versioning, and reproducibility.
- Client & Business Stakeholder Engagement
- Partner with business stakeholders to translate functional requirements into scalable technical solutions.
- Conduct architecture workshops and solutioning sessions with enterprise IT and business teams to define data-driven use cases
- Data Pipeline Development & Collaboration
- Collaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQL
- Enable data ingestion from diverse sources such as ERP (SAP), POS data, Syndicated Data, CRM, e-commerce platforms, and third-party datasets.
- Performance, Scalability, and Reliability
- Optimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques.
- Implement monitoring and alerting using Databricks Observability, Ganglia, Cloud-native tools
- Security, Compliance & Governance
- Design secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies.
- Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging.
- Adoption of AI Copilots & Agentic Development
- Utilize GitHub Copilot, Databricks Assistant, and other AI code agents for
- Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks.
- Generating documentation and test cases to accelerate pipeline development.
- Interactive debugging and iterative code optimization within notebooks.
- Advocate for agentic AI workflows that use specialized agents for
- Data profiling and schema inference.
- Automated testing and validation.
- Innovation and Continuous Learning
- Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling.
- Evaluate and pilot new features from Databricks releases and partner integrations for modern data stack improvements.
Requirements :
Bachelor’s or master’s degree in computer science, Information Technology, or a related field.8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark.Expertise in building high-throughput, low-latency ETL / ELT pipelines on AWS / Azure / GCP using Python, PySpark, SQL.Excellent hands on experience with workload automation tools such as Airflow, Prefect etc.Familiarity with building dynamic ingestion frameworks from structured / unstructured data sources including APIs, flat files, RDBMS, and cloud storageExperience designing Lakehouse architectures with bronze, silver, gold layering.Strong understanding of data modelling concepts, star / snowflake schemas, dimensional modelling, and modern cloud-based data warehousing.Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.).Experience CI / CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions.Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resourcesIn-depth experience with AWS Cloud services such as Glue, S3, Redshift etc.Strong understanding of data privacy, access controls, and governance best practices.Experience working with RBAC, tokenization, and data classification frameworksExcellent communication skills for stakeholder interaction, solution presentations, and team coordination.Proven experience leading or mentoring global, cross-functional teams across multiple time zones and engagements.Ability to work independently in agile or hybrid delivery models, while guiding junior engineers and ensuring solution qualityMust be able to work in PST time zone.