Talent.com
CloudDevs: Senior Site Reliability Engineer (SRE)
CloudDevs: Senior Site Reliability Engineer (SRE)Breakout Tools • San Francisco, CA, United States
[error_messages.no_longer_accepting]
CloudDevs : Senior Site Reliability Engineer (SRE)

CloudDevs : Senior Site Reliability Engineer (SRE)

Breakout Tools • San Francisco, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

CloudDevs works with fast-moving, venture-backed startups across the US. We’re building a pool of world-class Site Reliability Engineers for current roles and for upcoming opportunities. You will either be placed directly into one of our partner startups or added to our vetted SRE network for future projects.

This role is ideal for engineers who care about reliability, metrics, performance, and building simple, scalable systems. If you enjoy designing for scale and improving how teams ship software, you’ll fit right in.

Key Responsibilities

  • Work as a hands‑on engineer focused on system reliability, performance, and observability.
  • Define and track SLIs, SLOs, and error budgets.
  • Optimize monitoring cost and signal quality across metrics, logs, and traces.
  • Improve deployment safety, canary rollouts, and UAT pipelines.
  • Build tools for automated and local performance testing and track benchmarks.
  • Lead resilience work like failover drills, chaos tests, and redundancy checks.
  • Partner with engineering teams to improve scaling patterns and architecture as the product grows.
  • Support incident response processes and help reduce operational noise.
  • Write clean, maintainable code in Go, Python, or Node.js.
  • Contribute to CI / CD improvements and automation efforts.
  • Collaborate with engineers across teams to raise reliability standards.

Requirements

  • 5+ years in SRE, DevOps, or Platform Engineering roles.
  • Strong experience with cloud infrastructure (AWS preferred), Terraform, and Kubernetes.
  • Deep knowledge of observability tools like DataDog, Prometheus, or OpenTelemetry.
  • Strong debugging skills across services, networking, and data layers.
  • Hands‑on experience designing and monitoring SLIs / SLOs.
  • Experience with CI / CD tools such as GitHub Actions, Jenkins, or ArgoCD.
  • Ability to write production‑grade code in Go, Python, or Node.js.
  • Comfort working independently in fast‑paced environments.
  • Nice to Have

  • Experience tuning observability costs and optimizing data ingestion.
  • Exposure to chaos engineering and progressive deployments.
  • Background with high‑throughput or latency‑sensitive systems.
  • AWS at scale (EKS, Lambda, DynamoDB, S3).
  • Experience in regulated industries like fintech, payments, or SOC2 environments.
  • Performance testing pipelines or load‑testing automation.
  • Experience handling systems processing tens of millions of API calls.
  • Open Pool for SREs

    Even if you don’t meet every requirement or aren’t a fit for the current role, strong SREs with real production experience are welcome to join our talent pool. We regularly place engineers with different strengths across reliability, DevOps, platform, observability, backend, and infrastructure engineering.

    #J-18808-Ljbffr

    [job_alerts.create_a_job]

    Senior Site Reliability Engineer • San Francisco, CA, United States

    [internal_linking.similar_jobs]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Gridware • San Francisco, CA, US
    [job_card.full_time]
    Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbreaking new class of grid management called active grid response...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Site Reliability Engineer, Global Services Platform

    Senior Site Reliability Engineer, Global Services Platform

    Apple Inc. • San Francisco, CA, United States
    [job_card.full_time]
    A leading technology company in San Francisco is seeking a Site Reliability Engineer (SRE) to manage and optimize their extensive infrastructure. The ideal candidate will have 5-7 years of experienc...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Bigeye • San Francisco, California, United States
    [job_card.full_time]
    Senior Site Reliability Engineer Join to apply for the.Mission We build trusted tools that enable enterprises to move fast with confidence in their data and AI – combining early signal data observa...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted]
    Senior Site Reliability Engineer, Compute

    Senior Site Reliability Engineer, Compute

    Roblox • San Mateo, California, USA
    [job_card.full_time]
    The Infrastructure Compute Site Reliability Engineering (SRE) teams mission is to own and manage the successful operation of our underlying cell infrastructure system along with elements of service...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineering

    Site Reliability Engineering

    Forhyre • San Francisco, CA, US
    [job_card.full_time]
    Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Reliability Engineer

    Senior Reliability Engineer

    Gradient • San Francisco, CA, US
    [job_card.full_time]
    Join us at Gradient, where our purpose is to revolutionize home comfort while championing environmental sustainability.Our mission is to combat the escalating challenge of climate change by redefin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Together AI • San Francisco, CA, United States
    [job_card.full_time]
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Site Reliability Engineer - Platform

    Senior Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, US
    [job_card.full_time]
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, in...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Zipline • South San Francisco, CA, US
    [job_card.full_time]
    Do you want to change the world? Zipline is on a mission to transform the way goods move.Our aim is to solve the world's most urgent and complex access challenges by building, manufacturing and...[show_more]
    [last_updated.last_updated_30] • [promoted]
    CloudDevs : Senior Web site Reliability Engineer (SRE)

    CloudDevs : Senior Web site Reliability Engineer (SRE)

    The10minutecareersolution • San Francisco, CA, United States
    [job_card.full_time]
    CloudDevs : Senior Web site Reliability Engineer (SRE).CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic Technologies • San Francisco, CA, United States
    [job_card.full_time]
    Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    Zilliz • Redwood City, California, United States, 94061
    [job_card.full_time]
    Senior Site Reliability Engineer Cloud Platform.Zilliz is a fast-growing startup developing the industrys leading vector database company for enterprise-grade AI. Founded by the engineers behind Mil...[show_more]
    [last_updated.last_updated_30]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Loft Orbital • San Francisco, CA, United States
    [job_card.full_time]
    Senior Site Reliability Engineer.This range is provided by Loft Orbital.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Loft Orbital is revoluti...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior SRE : Scale Reliable Cloud Systems & Observability

    Senior SRE : Scale Reliable Cloud Systems & Observability

    Air Apps, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    A leading tech company in San Francisco is seeking a Site Reliability Engineer (SRE) to ensure the reliability, availability, and scalability of systems. You will implement automation and monitoring...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Circle • San Francisco, CA, United States
    [job_card.full_time]
    Senior Site Reliability Engineer at Circle.Circle is a financial technology company at the epicenter of the emerging internet of money. Our infrastructure—including USDC, a blockchain‑based dollar—h...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior+ Site Reliability Engineer

    Senior+ Site Reliability Engineer

    Crusoe • San Francisco, CA, US
    [job_card.full_time]
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper • San Francisco, CA, US
    [job_card.full_time]
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer, Site Reliability Engineering

    Software Engineer, Site Reliability Engineering

    WisdomAI • San Mateo, CA, US
    [job_card.full_time]
    WisdomAI has the mission to provide access and insights from data to everyone.We believe in the power of data to drive better decisions and we believe with Generative AI, there is an opportunity to...[show_more]
    [last_updated.last_updated_30] • [promoted]