Talent.com
Software Engineer, Site Reliability (SRE)
Software Engineer, Site Reliability (SRE)Sierra Business Solution • San Francisco, CA, US
Software Engineer, Site Reliability (SRE)

Software Engineer, Site Reliability (SRE)

Sierra Business Solution • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Software Engineer, Site Reliability (SRE)

Software Engineer, Site Reliability (SRE) at Sierra Business Solution .

About Us

We are an in-person company based in San Francisco with growing offices in Atlanta, New York, and London, building a platform that helps businesses create better, more human customer experiences with AI.

Our core values are Trust, Customer Obsession, Craftsmanship, Intensity, and Family.

Company founders : Bret Taylor, former Salesforce and Facebook executive; Clay Bavor, former Google Labs leader.

What You'll Do

Own Sierra's observability stack—monitoring, alerting, logging, and tracing—to give engineers clear visibility into system health and performance.

Partner with product and platform engineers to design reliable, scalable systems from day one.

Design and implement scalable, secure cloud infrastructure (AWS) using Terraform and modern DevOps tooling.

Improve reliability and scalability of LLM deployments, ensuring robust, cost-effective operation.

Lead improvements to deployment pipelines, CI / CD tooling, and incident-management processes.

Define the foundation of SRE practices at Sierra, influencing culture, tooling, and best practices.

What You'll Bring

5+ years of hands-on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud-based systems.

Experience designing for availability, scalability, and reliability at both infrastructure and application layers.

Deep experience with Terraform, AWS services, container orchestration, and cloud networking (IAM, VPC).

Strong background in observability systems (Prometheus, Grafana, Datadog, or similar).

Experience working with enterprise customers and familiarity with compliance and networking needs.

Comfortable working in fast-moving environments and collaborating across teams.

Degree in Computer Science or equivalent professional experience.

Even Better

Experience with LLM infrastructure—optimizing inference, managing fine-tuned models, or large-scale deployment.

Early-stage startup experience defining SRE culture and tooling from scratch.

Familiarity with incident-management automation or self-healing infrastructure patterns.

Benefits

Unlimited Paid Time Off

Medical, Dental, and Vision benefits

Life Insurance and Disability Benefits

401(k) retirement plan with company match

Parental Leave and fertility benefits via Carrot

Lunch, snacks, coffee, and discretionary stipend

Equity plans per applicable policies

Equality & Diversity

We actively encourage applicants of all backgrounds to apply. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, sexual orientation, age, disability, veteran status, or any other protected characteristic.

J-18808-Ljbffr

[job_alerts.create_a_job]

Site Reliability Engineer Sre • San Francisco, CA, US

[internal_linking.related_jobs]
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Baseten • San Francisco, California, United States
[job_card.full_time]
About Baseten Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Harrison Clarke • San Francisco, CA, US
[job_card.full_time]
Harrison Clarke are working with several high profile companies that are seeking a Principal Site Reliability Engineer (SRE) , to lead the design, implementation, and scaling of the infrastructur...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Cypress HCM • Alameda, CA, United States
[job_card.full_time]
As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Technology Site Reliability Engineer

Senior Technology Site Reliability Engineer

Cooley LLP • San Francisco, CA, United States
[job_card.full_time]
Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Air Apps, Inc. • San Francisco, California, United States
[job_card.full_time]
At Air Apps, we believe in thinking bigger—and moving faster.We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), an...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

SS&C Technologies • San Francisco, CA, United States
[job_card.full_time]
SS&C Technologies is a global investment and financial services software provider, headquartered in Windsor, Connecticut, and supporting more than 28,000 employees across 35 countries.It specialize...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Together • San Francisco, CA, US
[job_card.full_time]
As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Systems Reliability Engineer (SRE) - Edge

Systems Reliability Engineer (SRE) - Edge

Cloudflare • San Francisco, CA, United States
[job_card.full_time]
Systems Reliability Engineer (SRE) - Edge.At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world’s largest networks that powers millions of websi...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer, Site Reliability Engineer (SRE)

Senior Software Engineer, Site Reliability Engineer (SRE)

harvey.ai • San Francisco, CA, US
[job_card.full_time]
Why Harvey At Harvey, we're transforming how legal and professional services operate — not incrementally, but end-to-end. By combining frontier agentic AI, an enterprise-grade platform, and deep dom...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Together AI • San Francisco, CA, United States
[job_card.full_time]
As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer, Site Reliability (SRE)

Software Engineer, Site Reliability (SRE)

Sierra • San Francisco, CA, United States
[job_card.full_time]
At Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI.We are primarily an in-person company based in San Francisco, with growing offices in A...[show_more]
[last_updated.last_updated_30] • [promoted]
Founding SRE Engineer – Reliability & Growth

Founding SRE Engineer – Reliability & Growth

Asana • San Francisco, California, United States
[job_card.full_time]
A leading software company is seeking experienced Software Engineers to join the new Site Reliability Engineering team.This role focuses on building reliable, scalable systems and leading projects ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer I

Site Reliability Engineer I

Prosper • San Francisco, CA, United States
[job_card.full_time]
As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer - Scale & Observability

Site Reliability Engineer - Scale & Observability

gamma.app • San Francisco, CA, US
[job_card.full_time]
A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted]
Software Engineer (Site Reliability Engineer)

Software Engineer (Site Reliability Engineer)

Anyscale • San Francisco, CA, United States
[job_card.full_time]
Software Engineer (Site Reliability Engineer).Software Engineer (Site Reliability Engineer).At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software d...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Air Apps • San Francisco, California, United States
[job_card.full_time]
Join to apply for the Site Reliability Engineer (SRE) role at Air Apps.About Air Apps At Air Apps, we believe in thinking bigger—and moving faster. We’re a family‑founded company on a mission to cre...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Systems Reliability Engineer (SRE), Edge

Systems Reliability Engineer (SRE), Edge

Cloudflare, Inc. • San Francisco, CA, US
[job_card.full_time]
About Us At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet propertie...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior SRE Engineer - Reliability & Scale

Senior SRE Engineer - Reliability & Scale

Roblox Corporation • San Mateo, CA, United States
[job_card.full_time]
A leading gaming platform is seeking a Senior Software Engineer - Site Reliability to ensure system performance, reliability, and efficiency. Responsibilities include creating resilient software, de...[show_more]
[last_updated.last_updated_1_day] • [promoted]