Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerMango • Los Angeles, CA, United States
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Mango • Los Angeles, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.

About Mango, Inc.

Mango is a new type of microscope for rapid bioburden testing.

Description

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship , requiring deep understanding of how compute, storage, and networking layers interact under real workloads.

You will be the go-to expert for diagnosing performance issues in our on-prem system. This could be from kernel-level I / O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable.

Key Responsibilities

Infrastructure Design & Reliability

Design, deploy, and maintain our on-premise and hybrid infrastructure which includes Dell PowerEdge and PowerVault servers, prosumer NAS units, and high-throughput data processing clusters. Implement fault-tolerant systems with reproducible deployments and clear observability.

Performance & Systems Analysis

Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in-house application-level metrics to uncover root causes in filesystems, caching layers, or I / O scheduling.

Automation & Tooling

Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable.

Collaboration

Work closely with our software and hardware teams to co-design systems that meet the needs of high-resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees.

Observability & Incident Response

Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post-mortem efforts with a focus on learning and prevention.

Documentation & Communication

Produce clear documentation and communicate findings effectively to the broader team - from network topology diagrams to kernel tuning rationales.

General Qualifications

  • Deep understanding of Linux systems and performance (I / O schedulers, RAID, caching, NUMA, kernel parameters).
  • Hands-on experience designing and managing on-premise servers, storage arrays, or HPC clusters.
  • Comfort with automation and software development (Python, Go, Bash, or similar).
  • Strong diagnostic and analytical skills : ability to decompose performance problems across multiple layers.
  • Proven track record of improving system reliability, throughput, and maintainability in a fast-paced environment.
  • Excellent written and verbal communication skills for cross-disciplinary collaboration.
  • Self-driven, curious, and motivated by understanding systems deeply rather than just maintaining them.

Bonus Qualities (Not Required)

  • 5-10 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles.
  • Experience tuning Linux filesystems (ext4, btrfs) and software RAID (mdadm).
  • Familiarity with containerization and orchestration (Docker, Compose, Kubernetes).
  • Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE / 40 GbE).
  • Experience supporting data-heavy scientific or ML workloads.
  • Demonstrated technical leadership - mentoring others in debugging, reliability, or performance analysis.
  • Salary

    $150,000 - $175,000 per year

    [job_alerts.create_a_job]

    Senior Site Reliability Engineer • Los Angeles, CA, United States

    [internal_linking.similar_jobs]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Varda Space Industries • El Segundo, CA, United States
    [job_card.full_time] +1
    Low Earth orbit is open for business.Varda is accelerating the development of commercial space infrastructure, from in-orbit pharmaceutical processing to reliable and economical reentry capsules.Fr...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer

    Site Reliability Engineer

    Piper Companies • Los Angeles, CA, United States
    [job_card.full_time]
    Zachary Piper Solutions is seeking an experienced.Site Reliability Engineer (SRE).Unlike a traditional platform or internal tooling role, this position works. Secret and higher environments.AWS C2E,...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Reliability Engineer

    Senior Reliability Engineer

    Medtronic • Los Angeles, CA, United States
    [job_card.full_time]
    We anticipate the application window for this opening will close on - 7 Jan 2026.At Medtronic you can begin a life-long career of exploration and innovation, while helping champion healthcare acces...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley Corp. • Los Angeles, CA, United States
    [job_card.full_time]
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. The Senior Technology Site Reliability Engin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • Santa Monica, CA, United States
    [job_card.full_time]
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Reliability Engineer

    Reliability Engineer

    Teledyne • El Segundo, CA, United States
    [job_card.permanent]
    Teledyne Technologies Incorporated provides enabling technologies for industrial growth markets that require advanced technology and high reliability. These markets include aerospace and defense, fa...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocations • Long Beach, California, United States
    [job_card.full_time]
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design and implement infrastructure and automation scripts for software applications on AWS Optimize and monitor ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley • Santa Monica, CA, United States
    [job_card.full_time]
    Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. The Senior Technology Site Reliability Engin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    LegalZoom • Los Angeles, CA, United States
    [job_card.full_time]
    LegalZoom is on a mission to help people navigate the legal system with confidence and clarity.As a leader in online legal services for over 20 years, we combine technology, attorney-led solutions,...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Site Reliability Engineer / Startup / On Site

    Staff Site Reliability Engineer / Startup / On Site

    Motion Recruitment • El Segundo, CA, United States
    [job_card.full_time]
    Are you looking for your next big challenge? A tech-forward startup in the physical security industry is looking for a Senior Staff Site Reliability Engineer to join their Platform team.You'll arch...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer II

    Site Reliability Engineer II

    Aeg Worldwide Inc • Los Angeles, CA, United States
    [job_card.full_time]
    AXS connects fans with the artists and teams they love.Each year we sell millions of tickets to thousands of incredible events - from concerts and festivals to sports and theater - at some of the m...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer, Consultant

    Site Reliability Engineer, Consultant

    Blue Shield of CA • Long Beach, CA, United States
    [job_card.full_time]
    We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineering

    Site Reliability Engineering

    SARIAN Co • Los Angeles, CA, United States
    [job_card.full_time]
    Role : Site Reliability Engineering (SRE).Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments. Proficiency in container technologies (Docker, Container, Podman).Strong kn...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer, GNC (Falcon)

    Site Reliability Engineer, GNC (Falcon)

    SpaceX • Hawthorne, CA, United States
    [job_card.permanent]
    Site Reliability Engineer, GNC (Falcon).SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not.Today Sp...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Reliability Engineer

    Reliability Engineer

    Teledyne FLIR LLC • El Segundo, CA, United States
    [job_card.permanent]
    Teledyne Technologies Incorporated provides enabling technologies for industrial growth markets that require advanced technology and high reliability. These markets include aerospace and defense, fa...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer (Senior or Staff), Fabric

    Site Reliability Engineer (Senior or Staff), Fabric

    MongoDB • Los Angeles, CA, United States
    [job_card.full_time]
    Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational functions that support the broader engineering organization.Among these ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Site Reliability Engineer II

    Site Reliability Engineer II

    AXS • Los Angeles, CA, United States
    [job_card.full_time]
    AXS connects fans with the artists and teams they love.Each year we sell millions of tickets to thousands of incredible events - from concerts and festivals to sports and theater - at some of the m...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Build Reliability Engineer

    Senior Build Reliability Engineer

    Galaxy Technology Hires LLC • Long Beach, CA, United States
    [job_card.full_time]
    Senior Build Reliability Engineer.Los Angeles, CA Area – Relocation Assistance Provided.Our client is a rapidly growing startup defense company that brings a new and innovative approach to the deve...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]