Senior DevOps Engineer
Hybrid 3 days/week onsite in Concord, CA
We are hiring two Senior DevOps Engineers to lead build, automation, and operations for modern cloud platforms with emphasis on AIOps and the model/ML development lifecycle. You will partner with Data/ML, Platform, and Security teams to deliver resilient and scalable systems that support AI-enabled applications and services.
Due to client requirements, applicants must be willing and able to work on a W2 basis. For our W2 consultants, we offer a great benefits package that includes medical, dental, and vision benefits, 401k with company matching, and life insurance.
Rate: $70.00 to $75.00/hr. W2
Responsibilities:
- Design and operate CI/CD pipelines for microservices, data services, and ML workloads.
- Implement Infrastructure as Code for cloud environments across AWS, Azure, or GCP.
- Build observability for metrics, logs, and traces, define SLOs and error budgets, and author automated runbooks.
- Drive reliability engineering practices including capacity planning, chaos testing, and incident response.
- Integrate AI/ML tooling to enhance monitoring, anomaly detection, auto-remediation, and incident prediction.
- Operationalize model monitoring and data drift detection with alerting aligned to business KPIs.
- Support end-to-end model lifecycle including data preparation, experiment tracking, model registry, CI/CD for ML, feature stores, and model serving.
- Implement governance for model lineage, approvals, versioning, reproducibility, and compliance controls.
- Embed security in pipelines with SAST/DAST, dependency scanning, and secrets management.
- Enforce RBAC, least privilege, policies-as-code, and auditable change management.
- Partner with Engineering, Data Science, and Product to align on architecture and SLAs.
- Mentor engineers and lead technical deep dives and incident postmortems.
Experience Requirements:
- 7 to 10+ years in DevOps, SRE, or Platform Engineering with production systems.
- Expertise with at least one major cloud provider such as AWS, Azure, or GCP and strong Terraform or equivalent IaC.
- Mastery of CI/CD tools such as GitHub Actions, GitLab CI, Azure DevOps, or Jenkins.
- Containerization and orchestration with Docker and Kubernetes at scale.
- Observability with Prometheus and Grafana, OpenTelemetry, ELK/EFK, Datadog, New Relic, or similar.
- Practical AIOps experience including anomaly detection, intelligent alerting, and automated runbooks or adjacent experience with willingness to lead AIOps adoption.
- Hands-on MDLC and MLOps including experiment tracking such as MLflow, model registry, model serving, feature stores, and model monitoring.
- Strong scripting and coding for automation using Python and Bash, with Go as a plus.
- Security-first mindset including secret management with Vault or KMS, container and image scanning, SBOM, and policy guardrails.
- Experience with data platforms such as Spark, Databricks, and Kafka and event-driven designs (preferred).
- GPU workload orchestration and cost and performance optimization for AI workloads (preferred).
- Governance and compliance experience such as SOC 2, ISO 27001, or HIPAA (preferred).
- FinOps exposure and cost observability in cloud environments (preferred).
- GitOps implementation with Argo CD or Flux (preferred).