Senior DevOps Engineer
Hybrid 3 days / week onsite in Concord, CA
We are hiring two Senior DevOps Engineers to lead build, automation, and operations for modern cloud platforms with emphasis on AIOps and the model / ML development lifecycle. You will partner with Data / ML, Platform, and Security teams to deliver resilient and scalable systems that support AI-enabled applications and services.
Due to client requirements, applicants must be willing and able to work on a W2 basis. For our W2 consultants, we offer a great benefits package that includes medical, dental, and vision benefits, 401k with company matching, and life insurance.
Rate : $70.00 to $75.00 / hr. W2
Responsibilities :
- Design and operate CI / CD pipelines for microservices, data services, and ML workloads.
- Implement Infrastructure as Code for cloud environments across AWS, Azure, or GCP.
- Build observability for metrics, logs, and traces, define SLOs and error budgets, and author automated runbooks.
- Drive reliability engineering practices including capacity planning, chaos testing, and incident response.
- Integrate AI / ML tooling to enhance monitoring, anomaly detection, auto-remediation, and incident prediction.
- Operationalize model monitoring and data drift detection with alerting aligned to business KPIs.
- Support end-to-end model lifecycle including data preparation, experiment tracking, model registry, CI / CD for ML, feature stores, and model serving.
- Implement governance for model lineage, approvals, versioning, reproducibility, and compliance controls.
- Embed security in pipelines with SAST / DAST, dependency scanning, and secrets management.
- Enforce RBAC, least privilege, policies-as-code, and auditable change management.
- Partner with Engineering, Data Science, and Product to align on architecture and SLAs.
- Mentor engineers and lead technical deep dives and incident postmortems.
Experience Requirements :
7 to 10+ years in DevOps, SRE, or Platform Engineering with production systems.Expertise with at least one major cloud provider such as AWS, Azure, or GCP and strong Terraform or equivalent IaC.Mastery of CI / CD tools such as GitHub Actions, GitLab CI, Azure DevOps, or Jenkins.Containerization and orchestration with Docker and Kubernetes at scale.Observability with Prometheus and Grafana, OpenTelemetry, ELK / EFK, Datadog, New Relic, or similar.Practical AIOps experience including anomaly detection, intelligent alerting, and automated runbooks or adjacent experience with willingness to lead AIOps adoption.Hands-on MDLC and MLOps including experiment tracking such as MLflow, model registry, model serving, feature stores, and model monitoring.Strong scripting and coding for automation using Python and Bash, with Go as a plus.Security-first mindset including secret management with Vault or KMS, container and image scanning, SBOM, and policy guardrails.Experience with data platforms such as Spark, Databricks, and Kafka and event-driven designs (preferred).GPU workload orchestration and cost and performance optimization for AI workloads (preferred).Governance and compliance experience such as SOC 2, ISO 27001, or HIPAA (preferred).FinOps exposure and cost observability in cloud environments (preferred).GitOps implementation with Argo CD or Flux (preferred).