Talent.com
Software Engineer, Site Reliability (SRE)
Software Engineer, Site Reliability (SRE)Sierra Business Solution • San Francisco, CA, United States
Software Engineer, Site Reliability (SRE)

Software Engineer, Site Reliability (SRE)

Sierra Business Solution • San Francisco, CA, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Software Engineer, Site Reliability (SRE)

Software Engineer, Site Reliability (SRE) at Sierra Business Solution .

About Us

  • We are an in‑person company based in San Francisco with growing offices in Atlanta, New York, and London, building a platform that helps businesses create better, more human customer experiences with AI.
  • Our core values are Trust, Customer Obsession, Craftsmanship, Intensity, and Family.
  • Company founders : Bret Taylor, former Salesforce and Facebook executive; Clay Bavor, former Google Labs leader.

What You’ll Do

  • Own Sierra’s observability stack—monitoring, alerting, logging, and tracing—to give engineers clear visibility into system health and performance.
  • Partner with product and platform engineers to design reliable, scalable systems from day one.
  • Design and implement scalable, secure cloud infrastructure (AWS) using Terraform and modern DevOps tooling.
  • Improve reliability and scalability of LLM deployments, ensuring robust, cost‑effective operation.
  • Lead improvements to deployment pipelines, CI / CD tooling, and incident‑management processes.
  • Define the foundation of SRE practices at Sierra, influencing culture, tooling, and best practices.
  • What You’ll Bring

  • 5+ years of hands‑on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud‑based systems.
  • Experience designing for availability, scalability, and reliability at both infrastructure and application layers.
  • Deep experience with Terraform, AWS services, container orchestration, and cloud networking (IAM, VPC).
  • Strong background in observability systems (Prometheus, Grafana, Datadog, or similar).
  • Experience working with enterprise customers and familiarity with compliance and networking needs.
  • Comfortable working in fast‑moving environments and collaborating across teams.
  • Degree in Computer Science or equivalent professional experience.
  • Even Better

  • Experience with LLM infrastructure—optimizing inference, managing fine‑tuned models, or large‑scale deployment.
  • Early‑stage startup experience defining SRE culture and tooling from scratch.
  • Familiarity with incident‑management automation or self‑healing infrastructure patterns.
  • Benefits

  • Unlimited Paid Time Off
  • Medical, Dental, and Vision benefits
  • Life Insurance and Disability Benefits
  • 401(k) retirement plan with company match
  • Parental Leave and fertility benefits via Carrot
  • Lunch, snacks, coffee, and discretionary stipend
  • Equity plans per applicable policies
  • Equality & Diversity

    We actively encourage applicants of all backgrounds to apply. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, sexual orientation, age, disability, veteran status, or any other protected characteristic.

    #J-18808-Ljbffr

    [job_alerts.create_a_job]

    Site Reliability Engineer Sre • San Francisco, CA, United States

    [internal_linking.similar_jobs]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Gridware • San Francisco, CA, US
    [job_card.full_time]
    Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbreaking new class of grid management called active grid response...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • San Francisco, CA, United States
    [job_card.full_time]
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Software Engineer, Site Reliability

    Software Engineer, Site Reliability

    Fireworks AI • Redwood City, CA, United States
    [job_card.full_time]
    Get AI-powered advice on this job and more exclusive features.Here at Fireworks, we're building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highe...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Lead Site Reliability Engineer (SRE)

    Lead Site Reliability Engineer (SRE)

    EPAM Systems • San Francisco, CA, United States
    [job_card.full_time]
    At EPAM, we’re not just building software — we’re engineering excellence.Lead Site Reliability Engineer (SRE).This role is ideal for someone who thrives in fast-paced financial systems, has a passi...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Bigeye • San Francisco, California, United States
    [job_card.full_time]
    Senior Site Reliability Engineer Join to apply for the.Mission We build trusted tools that enable enterprises to move fast with confidence in their data and AI – combining early signal data observa...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    SS&C Technologies • San Francisco, CA, United States
    [job_card.full_time]
    SS&C Technologies is a global investment and financial services software provider, headquartered in Windsor, Connecticut, and supporting more than 28,000 employees across 35 countries.It specialize...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineering

    Site Reliability Engineering

    Forhyre • San Francisco, CA, US
    [job_card.full_time]
    Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Together AI • San Francisco, CA, United States
    [job_card.full_time]
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer, Site Reliability (SRE)

    Software Engineer, Site Reliability (SRE)

    Sierra • San Francisco, CA, United States
    [job_card.full_time]
    At Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI.We are primarily an in-person company based in San Francisco, with growing offices in A...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Site Reliability Engineer - Platform

    Senior Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, US
    [job_card.full_time]
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, in...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Zipline • South San Francisco, CA, US
    [job_card.full_time]
    Do you want to change the world? Zipline is on a mission to transform the way goods move.Our aim is to solve the world's most urgent and complex access challenges by building, manufacturing and...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer, Site Reliability Engineer (SRE)

    Senior Software Engineer, Site Reliability Engineer (SRE)

    harvey.ai • San Francisco, CA, United States
    [job_card.full_time]
    At Harvey, we’re transforming how legal and professional services operate — not incrementally, but end-to-end.By combining frontier agentic AI, an enterprise-grade platform, and deep domain experti...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Air Apps • San Francisco, CA, United States
    [job_card.full_time]
    Site Reliability Engineer (SRE).Site Reliability Engineer (SRE).Get AI-powered advice on this job and more exclusive features. At Air Apps, we believe in thinking bigger—and moving faster.We’re a fa...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Founding SRE Engineer – Reliability & Growth

    Founding SRE Engineer – Reliability & Growth

    Asana • San Francisco, CA, United States
    [job_card.full_time]
    A leading software company is seeking experienced Software Engineers to join the new Site Reliability Engineering team.This role focuses on building reliable, scalable systems and leading projects ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Software Engineer, Site Reliability Engineering

    Software Engineer, Site Reliability Engineering

    WisdomAI • San Mateo, CA, US
    [job_card.full_time]
    WisdomAI has the mission to provide access and insights from data to everyone.We believe in the power of data to drive better decisions and we believe with Generative AI, there is an opportunity to...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper • San Francisco, CA, US
    [job_card.full_time]
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Baseten • San Francisco, CA, United States
    [job_card.full_time]
    Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible inf...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior SRE Engineer - Reliability & Scale

    Senior SRE Engineer - Reliability & Scale

    Roblox Corporation • San Mateo, CA, United States
    [job_card.full_time]
    A leading gaming platform is seeking a Senior Software Engineer - Site Reliability to ensure system performance, reliability, and efficiency. Responsibilities include creating resilient software, de...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]