Data Platform Operations
Own the day-to-day operational health of cloud-based data pipelines and platforms
Ensure high data availability, freshness, accuracy, and completeness
Lead operational support for batch and streaming data workloads
Data Reliability & Quality
Define and manage data SLAs, SLOs, and reliability metrics
Implement and maintain data quality checks, validations, and monitoring
Design processes for backfills, reprocessing, and failure recovery
Cloud & Infrastructure
Operate and optimize GCP-based data services, including BigQuery, Cloud Storage, Pub/Sub, and GKE
Partner with platform and SRE teams on scalability, performance, and cost optimization
Manage data infrastructure using Infrastructure as Code (Terraform)
Automation & Tooling
Build and maintain Python-based automation for data operations and monitoring
Improve reliability and repeatability through standardized tooling and workflows
Support and enhance data orchestration platforms (, Airflow / Cloud Composer)
Incident Response & Operational Excellence
Lead response to data incidents, including triage, mitigation, and root cause analysis
Drive post-incident reviews and track corrective actions
Create and maintain runbooks, operational documentation, and playbooks
CI/CD & Governance
Implement CI/CD best practices for data pipelines
Promote testing, version control, and deployment standards across data workflows
Ensure data platforms align with security, governance, and access control requirements
Leadership & Collaboration
Act as a technical leader within the DataOps function
Partner closely with:
Data engineering teams
SRE / platform engineering
Analytics and business stakeholders
Mentor engineers and help raise the operational maturity of the data organization
+ years experience in Data Engineering, DataOps, or Data Platform Operations
+ years experience operating cloud-based data platforms in production
Strong hands-on experience with Google Cloud Platform, including:
GCP: GKE, Compute Engine, Cloud Storage, Pub/Sub (or equivalents)
Cloud Monitoring & Logging
BigQuery
Dataflow
Datastream
IAM and networking
Composer/AIrflow
Kubernetes: deployment, scaling, reliability patterns
Observability: GCP Cloud Monitoring, Logging
Strong proficiency in Python for data pipelines, automation, and operational tooling
Experience with data orchestration frameworks (Airflow preferred)
Experience with Infrastructure as Code (Terraform)
Experience with Azure Devops
Proven experience leading data incident response and operational improvements
Strong SQL skills for data analysis and troubleshooting