Talent.com
Site Reliability Engineer
Site Reliability EngineerGallup • San Francisco, California, United States
Site Reliability Engineer

Site Reliability Engineer

Gallup • San Francisco, California, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Build Gallup's observability foundation and shift how we detect, respond to and prevent system issues before they affect customers.

As a founding member of Gallup’s new site reliability engineering team, you’ll define and scale our observability strategy across engineering and bring reliability engineering principles — automation, observability and continuous improvement — to everything we build. You’ll unify different teams’ monitoring solutions into a cohesive, proactive approach, consolidate our tooling, build automated workflows and establish processes that help us catch problems before they become incidents.

In this role, you’ll shape Gallup’s global technology platform to ensure the systems delivering analytics and insights to millions remain fast, resilient and always available. If you’re eager to drive resilience in systems that empower people and organizations worldwide, this is your opportunity — apply today.

What You’ll Do

  • Establish the foundation of Gallup’s SRE function by defining standards, best practices and scalable systems that will grow with the organization
  • Build and evolve observability infrastructure using tools like Dynatrace, Datadog, Grafana and PagerDuty to monitor applications running on AWS
  • Design and implement automated alerting workflows that integrate directly with Slack
  • Establish incident response processes that integrate monitoring, alerting and team communication to reduce recovery time and improve service continuity
  • Create dashboards and metrics that give engineering teams real-time insight into application performance and system reliability
  • Identify opportunities for automation and design self-healing systems in partnership with DevOps engineers
  • Enable end-to-end monitoring and faster issue detection by partnering with application teams to embed observability into Java, .NET and Python services
  • Lead initiatives that help engineering teams adopt and use observability tools effectively
  • Identify patterns in system behavior that indicate potential issues before they affect customers

What Makes You Stand Out

  • Observability expertise : You've built or scaled monitoring and observability practices, not just maintained existing systems.
  • Tool consolidation experience : You've successfully unified fragmented monitoring solutions across multiple teams.
  • AI mindset : You reduce repetitive operational work through thoughtful automation and workflow design.
  • Incident response leadership : You've designed or improved incident management processes and know how to balance speed with thoroughness.
  • Communication and enablement : You go beyond building dashboards; you guide others in how to instrument their code and interpret metrics.
  • What You Need

  • Bachelor's degree in computer science, MIS or a related field, or equivalent experience, required
  • At least three years of experience in site reliability engineering, DevOps or infrastructure roles with a focus on monitoring and observability required
  • Experience with observability and monitoring tools such as Dynatrace (preferred), Datadog, Grafana or similar platforms required
  • Experience with incident management tools like PagerDuty or similar alerting systems required
  • Strong understanding of AWS cloud infrastructure and how to monitor distributed systems required
  • Experience integrating monitoring and alerting systems with collaboration platforms like Slack required
  • Ability to work with application teams across multiple languages and frameworks (e.g., Java, .NET, Python) required
  • Knowledge of metrics, logging and tracing as pillars of observability required
  • Experience writing scripts or automation (e.g., Python, Bash, PowerShell) to support monitoring workflows required
  • Experience with containerized applications and infrastructure as code preferred
  • A commitment to working on-site at Gallup’s San Francisco office at least three days a week required
  • About Gallup

    At Gallup, we change the world, one client at a time, through extraordinary analytics and advice on everything important facing humankind.

    Gallup offers a robust benefits package that includes medical, dental, vision, life and other insurance options; a fully vested 401(k) retirement savings plan with company matching; an employee stock ownership program; mass transit reimbursement; family-building benefits; an employee assistance program; and various reimbursements and activities that enhance our associates’ wellbeing. We also offer an estimated annual salary range of $150,000-$200,000 for this role. Salaries are based on a variety of factors, including an individual’s education, experience and skills.

    Gallup is an equal opportunity employer. We consider all qualified applicants without regard to race, color, religion, sex, national origin, disability, protected veteran status, sexual orientation, gender identity, or any other legally protected basis, in accordance with applicable law.

    To review Gallup’s Privacy Statement, please click this link : https : / / www.gallup.com / privacy . This privacy policy is meant to help you understand what information we collect, why we collect it, and how you can update, manage and delete your information. Your application and the information you provide will be processed and stored in the United States.

    #LI-Hybrid

    #LI-KW1

    [job_alerts.create_a_job]

    Site Reliability Engineer • San Francisco, California, United States

    [internal_linking.similar_jobs]
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Stuut • San Francisco, CA, US
    [job_card.full_time]
    Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior / Staff Site Reliability Engineer

    Senior / Staff Site Reliability Engineer

    Mochi Health • San Francisco, CA, US
    [job_card.full_time]
    Mochi Health's mission is to be the discovery layer of healthcare.We are building a platform that makes it easier for patients to find the right providers, access the right medications, and tak...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer - Platform

    Site Reliability Engineer - Platform

    CodeRabbit • San Francisco, CA, United States
    [job_card.full_time]
    CodeRabbit is an innovative research and development company focused on building extraordinarily productive human‑machine collaboration systems. Our primary goal is to create the next generation of ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Mercor, Inc. • San Francisco, California, United States
    [job_card.full_time]
    About Mercor Mercor is at the intersection of labor markets and AI research.We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development.Our vast ta...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    Zilliz • Redwood City, CA, US
    [job_card.full_time]
    Zilliz is a fast-growing startup developing the industry’s leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world’s most pop...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • San Francisco, California, United States
    [job_card.full_time]
    A company is looking for a Site Reliability Engineer (SRE) with strong GitLab platform expertise.Key Responsibilities Administer and optimize GitLab, Jira, and Confluence for reliability, securit...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    gamma.app • San Francisco, CA, United States
    [job_card.full_time]
    We're building the creative layer for modern communication.Every month, over a billion people make presentations — but the tools they use to make them haven't evolved in decades.We're changing that...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    The Voleon Group • Berkeley, CA, United States
    [job_card.full_time]
    Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in finance. For nearly two decades, we have led our industry and worked at the ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineering

    Site Reliability Engineering

    Forhyre • San Francisco, CA, US
    [job_card.full_time]
    Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Site Reliability Engineer | Patreon

    Senior Site Reliability Engineer | Patreon

    Ziphire.hr • San Francisco, CA, US
    [job_card.full_time]
    Patreon is seeking a Senior Site Reliability Engineer to join our dynamic technology team in San Francisco, CA.In this pivotal role, you will be instrumental in ensuring the reliability and perform...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Senior Site Reliability Engineer - Platform

    Senior Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, US
    [job_card.full_time]
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, in...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Zipline • South San Francisco, CA, US
    [job_card.full_time]
    Do you want to change the world? Zipline is on a mission to transform the way goods move.Our aim is to solve the world's most urgent and complex access challenges by building, manufacturing and...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Zoox • Foster City, CA, US
    [job_card.full_time]
    Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles.In this ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Happyrobot Inc. • San Francisco, California, United States
    [job_card.full_time]
    About HappyRobot HappyRobot is the AI-native operating system for the real economy—a system that closes the circuit between intelligence and action. By combining real-time truth, specialized AI work...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    [job_card.full_time]
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior+ Site Reliability Engineer

    Senior+ Site Reliability Engineer

    Crusoe • San Francisco, CA, US
    [job_card.full_time]
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    HappyRobot • San Francisco, California, United States
    [job_card.full_time]
    About HappyRobot HappyRobot is the AI‑native operating system for the real economy—a system that closes the circuit between intelligence and action. By combining real‑time truth, specialized AI work...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Gradle Technologies • San Francisco, CA, US
    [job_card.full_time]
    Develocity is a first-of-its-kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]