Talent.com
Staff Reliability Engineer
Staff Reliability EngineerCelonis • Redwood City, CA, United States
Staff Reliability Engineer

Staff Reliability Engineer

Celonis • Redwood City, CA, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

We're Celonis, the global leader in Process Mining technology and one of the world's fastest-growing SaaS firms. We believe there is a massive opportunity to unlock productivity by placing data and intelligence at the core of business processes - and for that, we need you to join us.

The Team

As a member of our Reliability Engineering team, you will play a critical role in ensuring the health, performance, and resilience of our platform. The team applies advanced software engineering and Site Reliability Engineering (SRE) principles to drive system reliability, scalability, and operational excellence across the organization.

The Role

  • Join a highly technical, collaborative, and innovation-driven team that blends Site Reliability Engineering with modern Software Engineering practices to build resilient and scalable systems.
  • Lead reliability efforts for a fleet of 80+ FedRAMP-compliant microservices running on Kubernetes, applying SRE principles to drive observability, automation, and incident prevention.
  • Develop and enforce SLOs, SLAs, and error budgets to drive reliability-focused development.
  • Provide mentorship and technical leadership across the SRE and engineering teams.
  • Own high-priority application incident escalations, performing deep technical analysis and restoration within defined SLOs, while continuously improving detection and response mechanisms.
  • Engineer solutions to enhance the availability, latency, and performance of production services—automating manual processes to eliminate toil and scale operational efficiency.
  • Collaborate closely with platform and application engineering teams to conduct post-incident reviews, extract insights, and implement systemic changes that improve overall reliability.

The qualifications you need :

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field (or equivalent hands-on experience).
  • Minimum of 8+ years of experience in software engineering or SRE roles.
  • Deep experience with cloud platforms (AWS, GCP, or Azure).
  • Proficiency in Java, the Spring framework, and Python (or a similar scripting language) in a Linux environment.
  • Prior experience contributing to Site Reliability Engineering initiatives or similar operational roles.
  • Demonstrated ability to lead projects and influence engineering culture.
  • Knowledge of SRE principles, including SLI / SLO design, error budgets, and toil reduction strategies.
  • Excellent written and verbal communication skills in English.
  • Please note : This position is not eligible for immigration visa sponsorship, now or in the future.
  • Nice to Have

  • Experience with observability and monitoring tools (e.g., Datadog, etc.).
  • Experience in developing and operating production-grade, scalable services using Kubernetes and elastic cloud architectures.
  • Experience with CI / CD pipelines and tools such as ArgoCD, GitHub Actions, or similar.
  • Experience with Infrastructure as Code (IaC) tools such as Terraform and Kustomize.
  • Exposure to incident management practices, on-call rotations, and postmortem culture.
  • Visa sponsorship is not offered for this role.

    The base salary range below is for the role in the specified location, based on a Full Time Schedule.

    Total compensation package will include base salary + bonus / commission + equity + benefits (health, dental, life, 401k, and paid time off). Please note that the base salary range is a guideline, and that the actual total compensation offer will be determined based on various factors, including, but not limited to, applicant's qualifications, skills, experiences, and location. The base salary range below is for the role in California, based on a Full Time Schedule. $195,000 — $235,000 USD

    What Celonis Can Offer You :

  • The unique opportunity to work with industry-leading process mining technology
  • Investment in your personal growth and skill development (clear career paths, internal mobility opportunities, L&D platform, mentorships, and more)
  • Great compensation and benefits packages (equity (restricted stock units), life insurance, time off, generous leave for new parents from day one, and more). For intern and working student benefits, click here .
  • Physical and mental well-being support (subsidized gym membership, access to counseling, virtual events on well-being topics, and more)
  • A global and growing team of Celonauts from diverse backgrounds to learn from and work with
  • An open-minded culture with innovative, autonomous teams
  • Business Resource Groups to help you feel connected, valued and seen (Black@Celonis, Women@Celonis, Parents@Celonis, Pride@Celonis, Resilience@Celonis, and more)
  • A clear set of company values that guide everything we do : Live for Customer Value, The Best Team Wins, We Own It, and Earth Is Our Future
  • About Us :

    Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise customer deployments across nearly every industry, the Celonis Process Intelligence Platform uses process mining and AI to give you a living digital twin of your business operation. It’s system-agnostic and without bias, and empowers companies to reduce waste, create value and benefit people across the top, bottom, and green lines. Since 2011, the Celonis platform has enabled its customers to identify more than $18 billion in value. Celonis is headquartered in Munich, Germany, and New York City, USA, with more than 20 offices worldwide.

    Get familiar with the Celonis Process Intelligence Platform by watching this video .

    Data Privacy, Equal Opportunity, and Accessibility Information

    Celonis is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment and equal opportunity in all aspects of employment. We will not tolerate any unlawful discrimination or harassment of any kind. We make all employment decisions without regard to race / ethnicity, color, sex, pregnancy, age, sexual orientation, gender identity or expression, transgender status, national origin, citizenship status, religion, physical or mental disability, veteran status, or any other factor protected by applicable anti-discrimination laws. As a US federal contractor, we are committed to the principles of affirmative action in accordance with applicable laws and regulations. Different makes us better .

    Any information you submit to Celonis as part of your application will be processed in accordance with Celonis’ Statements on Data Privacy, Equal Opportunity and Accessibility.

    Please be aware of common job offer scams, impersonators and frauds. Learn more here .

    By submitting this application, you confirm that you agree to the storing and processing of your personal data by Celonis as described in our Privacy Notice for the Application and Hiring Process .

    #J-18808-Ljbffr

    [job_alerts.create_a_job]

    Reliability Engineer • Redwood City, CA, United States

    [internal_linking.related_jobs]
    Senior Staff Site Reliability Engineer

    Senior Staff Site Reliability Engineer

    WEX • San Francisco, CA, United States
    [job_card.full_time]
    We are looking for a highly motivated and high-potential Senior Staff Site Reliability Engineer (SRE) to join our team as a senior technical leader, driving transformational change and delivering s...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer (Senior or Staff), Fabric

    Site Reliability Engineer (Senior or Staff), Fabric

    MongoDB • San Francisco, CA, United States
    [job_card.full_time]
    Staff Site Reliability Engineer, Fabric.MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data.We enable organizations ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Staff Systems Engineer

    Senior Staff Systems Engineer

    Agility Robotics • San Francisco, CA, US
    [job_card.full_time]
    Our robot, Digit, is the first to be sold into workplaces across the globe.Our team is differentiated by its expertise in imagining, engineering, and delivering robots with advanced mobility, dexte...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Altana AI • San Francisco, CA, United States
    [job_card.full_time]
    AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff / Principal Site Reliability Engineer

    Staff / Principal Site Reliability Engineer

    Veza • San Francisco, CA, United States
    [job_card.full_time]
    Staff / Principal Site Reliability Engineer.We are seeking an exceptional Staff / Principal Site Reliability Engineer to lead critical infrastructure initiatives and drive Innovation across our organiz...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Systems Engineer - Flight Reliability

    Staff Systems Engineer - Flight Reliability

    Zipline • South San Francisco, CA, US
    [job_card.full_time]
    Staff Systems Engineer - Flight Reliability.South San Francisco, California, USA.Do you want to change the world? Zipline is on a mission to transform the way goods move. Our aim is to solve the wor...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Site Reliability Engineer - Platform

    Staff Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, US
    [job_card.full_time]
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, in...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Crusoe • San Francisco, CA, United States
    [job_card.full_time]
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Checkr • San Francisco, CA, United States
    [job_card.full_time]
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff / Principal Site Reliability Engineer

    Staff / Principal Site Reliability Engineer

    The Resume Database • Redwood City, CA, United States
    [job_card.full_time]
    Staff / Principal Site Reliability Engineer.Staff / Principal Site Reliability Engineer.You’ll architect scalable solutions, navigate complex technical challenges independently, and deliver results und...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Checkr, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Berkley Hunt • San Francisco, CA, United States
    [job_card.full_time]
    Founder @ Berkley Hunt | Partnering with VC firms to build high-performing tech teams.Berkley Hunt has partnered with a Series B start up, we are seeking a highly skilled Infrastructure Engineer to...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Reliability Engineer

    Reliability Engineer

    Robust.ai • San Carlos, CA, US
    [job_card.full_time]
    Robust AI is a fast-growing, early-stage startup founded in 2019 by an unsurpassed team of veterans in robotics, AI and business. We are a collaborative group with a wide range of backgrounds and pe...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Site Reliability Engineer - Platform

    Staff Site Reliability Engineer - Platform

    Icon Ventures • San Francisco, CA, United States
    [job_card.full_time]
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior / Staff Site Reliability Engineer

    Senior / Staff Site Reliability Engineer

    Fluidstack • San Francisco, CA, United States
    [job_card.full_time]
    At Fluidstack, we’re building the infrastructure for abundant intelligence.We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper • San Francisco, CA, US
    [job_card.full_time]
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    [job_card.full_time]
    We’re a hyper-creative, fast-moving team building the future of identity security.If transforming an industry and securing the world’s top companies excites you, we’d love to have you along for the...[show_more]
    [last_updated.last_updated_30] • [promoted]