Talent.com
Site Reliability Engineer - Observability
Site Reliability Engineer - ObservabilityRivian and Volkswagen Group Technologies • Palo Alto, CA, United States
Site Reliability Engineer - Observability

Site Reliability Engineer - Observability

Rivian and Volkswagen Group Technologies • Palo Alto, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Overview

We are seeking a Senior Site Reliability Engineer (SRE) specializing in Observability to join RivianVW's Data Platform - Production Engineering team. In this role, you will design, implement, and scale robust observability systems to ensure the health, performance, and reliability of our production environment. You will collaborate closely with cross-functional teams to create telemetry solutions that provide actionable insights into our distributed systems.

Responsibilities

  • Observability Platform Design : Architect, implement, and maintain observability systems, leveraging tools like Datadog, LGTM stack, OpenTelemetry, and Vector to enable real-time performance monitoring, logging, and alerting.
  • Telemetry Optimization : Evolve and scale telemetry pipelines to ensure low latency and high availability for metrics, logs, and traces across multi-cloud environments.
  • Performance Engineering : Proactively identify performance bottlenecks, optimize systems, and provide recommendations for reliability improvements.
  • Scalable Automation : Implement automation solutions to scale systems sustainably while driving improvements in reliability and deployment velocity.
  • Incident Management : Collaborate with the incident response team to establish data-driven debugging and troubleshooting processes using observability data.
  • Tooling Development : Create and maintain self-service observability tools and dashboards to empower teams across the organization.
  • Cross-functional Collaboration : Partner with development, DevOps, and infrastructure teams to define SLOs / SLIs and ensure observability is embedded throughout the software lifecycle.

Qualifications

  • Educational Background : Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • Experience : 5+ years in Site Reliability Engineering or a related role with a strong emphasis on observability.
  • Technical Expertise :

  • Proficiency in designing and operating observability platforms with tools like Prometheus, Grafana, Loki, Jaeger, or Datadog.
  • Experience with OpenTelemetry and distributed tracing in microservices architectures.
  • Deep knowledge of Kubernetes (e.g., EKS), ArgoCD, and Crossplane.
  • Programming Skills : Strong proficiency in Python, Go, or similar languages for building automation and custom telemetry solutions.
  • Cloud & Systems : Familiarity with multi-cloud setups, containerization (Docker), and Linux system fundamentals.
  • Soft Skills : Exceptional problem-solving, communication, and a data-driven approach to decision-making.
  • Pay Disclosure

    Salary Range / Hourly Rate for California Based Applicants : $146,900 - $194,610 USD

    Actual Compensation will be determined based on experience, location, and other factors permitted by law.

    Benefits Summary

    Rivian and Volkswagen Group Technologies provides robust medical, prescription, dental and vision insurance packages for full-time employees, their spouse or domestic partner, and their children up to age 26. Coverage is effective on the first day of employment.

    Equal Opportunity

    Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital / domestic partner status, age, military / veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.

    Accommodations

    Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.

    Candidate Data Privacy

    Rivian and Volkswagen Group Technologies ("Rivian and Volkswagen Group Technologies") may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and / or participate in our recruitment processes. This data includes contact, demographic, communications, educational, professional, employment, social media / website, network / device, recruiting system usage / interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law. Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with internal personnel, Rivian and Volkswagen Group Technologies affiliates, and service providers including background checks, staffing services, and cloud services. They may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union, and this data may be subject to the laws and accessible to authorities of such jurisdictions. Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.

    Please note that we are currently not accepting applications from third party application services.

    Seniority level

  • Not Applicable
  • Employment type

  • Full-time
  • Job function

  • Engineering and Information Technology
  • Industries

  • Software Development
  • #J-18808-Ljbffr

    [job_alerts.create_a_job]

    Site Reliability Engineer • Palo Alto, CA, United States

    [internal_linking.related_jobs]
    Senior Reliability Engineer

    Senior Reliability Engineer

    Intuitive • Sunnyvale, California, USA
    [job_card.full_time] +1
    We are looking for a talented individual to join our growing Reliability Engineering team focused on innovative approaches to reliability and life testing. This role has the opportunity to work crea...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, California, United States
    [job_card.full_time]
    PsiQuantum'smission is to build the first useful quantum computers-machines capable of delivering the breakthroughs the field has long promised. Since our founding in 2016, our singular focus has be...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • Palo Alto, CA, United States
    [job_card.full_time]
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer, Energy Software

    Staff Site Reliability Engineer, Energy Software

    Tesla Motors, Inc. • Palo Alto, CA, United States
    [job_card.full_time]
    Tesla is looking for a Site Reliability Engineer to build, enhance, and scale the infrastructure that underpins our Energy IoT applications. These applications provide real-time monitoring, optimiza...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer - Kubernetes Platform

    Site Reliability Engineer - Kubernetes Platform

    Pantera Capital • Palo Alto, CA, United States
    [job_card.full_time]
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Reliability Engineer

    Reliability Engineer

    BCforward • Sunnyvale, CA, United States
    [job_card.full_time]
    Title : Lab Reliability Engineer III.Location : Sunnyvale, CA -Onsite.The Reliability Engineering Team plays a critical role in ensuring client products meet the highest standards of reliability and ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer, Fleetnet, Vehicle Software

    Staff Site Reliability Engineer, Fleetnet, Vehicle Software

    Tesla • Palo Alto, CA, United States
    [job_card.full_time]
    Staff Site Reliability Engineer, Fleetnet.Staff Site Reliability Engineer, Fleetnet.Staff Site Reliability Engineer, Fleetnet. Staff Site Reliability Engineer, Fleetnet.Get AI-powered advice on this...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Fortinet • Sunnyvale, CA, United States
    [job_card.full_time]
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Customer Reliability Engineer

    Customer Reliability Engineer

    Cisco Systems, Inc. • San Jose, CA, United States
    [job_card.full_time]
    This is a fully remote position open to candidates located in the United States with a strong preference for candidates based on the West Coast, with the ability to work in the Pacific Time Zone.Ap...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Cypress HCM • Sunnyvale, California, United States
    [job_card.full_time]
    As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    OPPO • Palo Alto, CA, United States
    [job_card.full_time]
    OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Sr. Reliability Engineer (26861)

    Sr. Reliability Engineer (26861)

    Supermicro • San Jose, California, United States
    [job_card.full_time]
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Santa Clara, California, United States
    [job_card.full_time]
    A company is looking for a Site Reliability Engineer to enhance observability and reliability practices within a distributed environment. Key Responsibilities Own and evolve the observability stac...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Key2Source • San Leandro, California, USA
    [job_card.full_time]
    Job Title : Site Reliability Engineer.Location : San Leandro CA (Onsite).Engineering experience or equivalent demonstrated through one or a combination of the following : work experience training mili...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer (L2)

    Site Reliability Engineer (L2)

    Wave Money • Palo Alto, CA, United States
    [job_card.full_time]
    Job Location : The Campus, Pun Hlaing Estate, Hlaing Thar Yar Township, Yangon.Working Hours : 8 : 30 AM to 5 : 30 PM, (Monday to Friday). Site Reliability Engineer is to perform daily support and monitor...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Grindr • Palo Alto, CA, United States
    [job_card.full_time]
    Staff Site Reliability Engineer.Get AI-powered advice on this job and more exclusive features.This range is provided by Grindr. Your actual pay will be based on your skills and experience — talk wit...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer (SRE) at OPPO US Research Center Palo Alto, CA

    Site Reliability Engineer (SRE) at OPPO US Research Center Palo Alto, CA

    OPPO US Research Center • Palo Alto, CA, United States
    [job_card.full_time]
    Site Reliability Engineer (SRE) job at OPPO US Research Center.OPPO US Research Center is seeking a skilled and proactive. Site Reliability Engineer (SRE).In this role, you will be responsible for e...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Reliability Engineer – Observability & Automation

    Site Reliability Engineer – Observability & Automation

    black.ai • Palo Alto, CA, United States
    [job_card.full_time]
    A leading quantum computing company is seeking a Site Reliability Engineer to join their OS / Platform team in Palo Alto. This role involves maintaining the health and performance of services through ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]