Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerMango • Los Angeles, CA, United States
[error_messages.no_longer_accepting]
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Mango • Los Angeles, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.

About Mango, Inc.

Mango is a new type of microscope for rapid bioburden testing.

Description

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.

You will be the go-to expert for diagnosing performance issues in our on-prem system. This could be from kernel-level I/O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable.

Key Responsibilities

Infrastructure Design & Reliability

Design, deploy, and maintain our on-premise and hybrid infrastructure which includes Dell PowerEdge and PowerVault servers, prosumer NAS units, and high-throughput data processing clusters. Implement fault-tolerant systems with reproducible deployments and clear observability.

Performance & Systems Analysis

Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in-house application-level metrics to uncover root causes in filesystems, caching layers, or I/O scheduling.

Automation & Tooling

Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable.

Collaboration

Work closely with our software and hardware teams to co-design systems that meet the needs of high-resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees.

Observability & Incident Response

Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post-mortem efforts with a focus on learning and prevention.

Documentation & Communication

Produce clear documentation and communicate findings effectively to the broader team - from network topology diagrams to kernel tuning rationales.

General Qualifications

  • Deep understanding of Linux systems and performance (I/O schedulers, RAID, caching, NUMA, kernel parameters).
  • Hands-on experience designing and managing on-premise servers, storage arrays, or HPC clusters.
  • Comfort with automation and software development (Python, Go, Bash, or similar).
  • Strong diagnostic and analytical skills: ability to decompose performance problems across multiple layers.
  • Proven track record of improving system reliability, throughput, and maintainability in a fast-paced environment.
  • Excellent written and verbal communication skills for cross-disciplinary collaboration.
  • Self-driven, curious, and motivated by understanding systems deeply rather than just maintaining them.
Bonus Qualities (Not Required)
  • 5-10 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles.
  • Experience tuning Linux filesystems (ext4, btrfs) and software RAID (mdadm).
  • Familiarity with containerization and orchestration (Docker, Compose, Kubernetes).
  • Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE/40 GbE).
  • Experience supporting data-heavy scientific or ML workloads.
  • Demonstrated technical leadership - mentoring others in debugging, reliability, or performance analysis.


Salary

$150,000 - $175,000 per year
[job_alerts.create_a_job]

Senior Site Reliability Engineer • Los Angeles, CA, United States

[internal_linking.similar_jobs]
Systems Safety Engineer

Systems Safety Engineer

MANTECH • El Segundo, CA, US
[job_card.full_time]
MANTECH seeks a motivated and detail-oriented.The Senior System Safety Engineer will help protect our national security while supporting the USSF Space Systems Command (SSC) Space Access Directorat...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Engineer - Physical Substation

Senior Engineer - Physical Substation

CDM Smith • Los Angeles, CA, United States
[job_card.full_time]
As a Senior Substation Engineer - Physical, you'll lead the design and execution of extra high voltage (EHV) substations (230kV-765kV), ensuring safe, reliable, and cost-effective solutions for cli...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Rotating Engineer – Onshore Reliability

Rotating Engineer – Onshore Reliability

Hudson Manpower • Los Angeles, CA, US
[job_card.full_time]
Rotating Engineer – Onshore Reliability.Bachelor’s Degree in Mechanical Engineering.Oil & Gas / Refinery (Onshore).The Rotating Engineer – Onshore Reliability will be responsible for ensuring the o...[show_more]
[last_updated.last_updated_variable_days]
Mid-Level Plumbing Design Engineer - Multiple Sectors

Mid-Level Plumbing Design Engineer - Multiple Sectors

Henderson Engineers • Los Angeles, CA, United States
[job_card.full_time]
At Henderson, we're about more than just buildings.We're about the people, experiences, and potential found inside.We're a company of problem-solvers and innovators known for our technical excellen...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Substation Engineer

Senior Substation Engineer

Arup • Los Angeles, CA, United States
[job_card.full_time]
Arup’s purpose, shared values and collaborative approach has set us apart for over 75 years, guiding how we shape a better world.At Arup, you belong to an extraordinary collective – in which we enc...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
(Senior) Engineer, Payload Test, meoSphere

(Senior) Engineer, Payload Test, meoSphere

SES S.A Brazil • Long Beach, CA, United States
[job_card.full_time] +1
Senior) Engineer, Payload Test, meoSphere.Location(s): Long Beach, CA, US; McLean, VA, US; Betzdorf, LU.Senior Engineer, Payload Test, meoSphere.SES’s next-generation Medium Earth Orbit (MEO) satel...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Infrastructure, DevOps & Reliability Engineer (Multiple Roles, Remote & On-Site)

Infrastructure, DevOps & Reliability Engineer (Multiple Roles, Remote & On-Site)

MLabs • Los Angeles, CA, United States
[filters.remote]
[job_card.full_time]
We're recruiting Infrastructure, DevOps, and Reliability Engineers for high-growth startups including AirGarage, Dyno Therapeutics, Codex Health, and Banquet Health.These roles focus on scaling clo...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Rotating Engineer – Offshore Reliability

Rotating Engineer – Offshore Reliability

Hudson Manpower • Los Angeles, CA, US
[job_card.full_time]
Rotating Engineer – Offshore Reliability.Bachelor’s Degree in Mechanical Engineering.Oil & Gas / Refinery (Offshore).The Rotating Engineer – Offshore Reliability will be responsible for ensuring th...[show_more]
[last_updated.last_updated_variable_days]
Plumbing Design Engineer - Healthcare

Plumbing Design Engineer - Healthcare

Legence • Long Beach, CA, United States
[job_card.full_time]
Our specialties include electrical, mechanical, plumbing, fire protection, and technology integration.Our offered services range from engineering and commissioning to construction management.With o...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Director, Software Engineering (Site Reliability Engineering)

Director, Software Engineering (Site Reliability Engineering)

Affirm • Los Angeles, CA, United States
[job_card.full_time]
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.As a Director of Site Rel...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Chief Systems Engineer

Chief Systems Engineer

Cubic • Los Angeles, CA, United States
[job_card.full_time]
Business Unit: Cubic Transportation Systems Company Details: When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's live...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
(Senior) Engineer, DSP Systems Engineering, meoSphere

(Senior) Engineer, DSP Systems Engineering, meoSphere

SES • Long Beach, CA, United States
[job_card.full_time]
Senior) Engineer, DSP Systems Engineering, meoSphere.SES's next-generation Medium Earth Orbit (MEO) satellite constellation, designed to deliver secure connectivity services to government agencies,...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
STATIONARY ENGINEER II (Non-Competitive)

STATIONARY ENGINEER II (Non-Competitive)

The County of Los Angeles • Los Angeles, CA, United States
[job_card.full_time]
STATIONARY ENGINEER II (Non-Competitive).STATIONARY ENGINEER II (Non-Competitive).We welcome applications from everyone.The application filing period will begin onJune 16, 2025, at 8:00 a.We will k...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Systems Engineer

Senior Systems Engineer

Johns Hopkins Applied Physics Laboratory (APL) • El Segundo, CA, United States
[job_card.full_time]
Are you a leader with expertise in systems engineering and development for defense?.Do you want to help ensure our Nation’s preeminence in the 21.Do you like working with and improving complex syst...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Systems Engineer 5 - Survivability, Vulnerability & Susceptibility

Systems Engineer 5 - Survivability, Vulnerability & Susceptibility

The Structures Company • El Segundo, CA, United States
[job_card.permanent]
[filters_job_card.quick_apply]
JOB TITLE: Systems Engineer 5 - Survivability, Vulnerability & Susceptibility LOCATION: El Segundo, CA PAY RATE: $108/hour We are a national aerospace and ...[show_more]
[last_updated.last_updated_variable_days]
Lead Engine Integrator (Experienced or Senior) (Propulsion Analysis - Air)

Lead Engine Integrator (Experienced or Senior) (Propulsion Analysis - Air)

Boeing • Long Beach, CA, United States
[job_card.permanent]
Lead Engine Integrator (Experienced or Senior) (Propulsion Analysis - Air).Boeing Commercial Airlines (BCA).The Propulsion Engine Installation and Integration team is a Design Engineering group tha...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
meoSphere Payload System and Test Engineer

meoSphere Payload System and Test Engineer

SES • Long Beach, CA, United States
[job_card.full_time]
Senior Engineer, MEO Payload System and Test.MEO100 is the next generation of SES's Medium Earth Orbit (MEO) satellites.It is designed to provide secure connectivity services for governments, enter...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer – Space Defense & SE&I Lead

Senior Software Engineer – Space Defense & SE&I Lead

KBR, Inc • El Segundo, CA, United States
[job_card.full_time]
A leading defense contractor is seeking a Senior Software Engineer to join its team in El Segundo, California.The ideal candidate will design and develop software for critical national security sys...[show_more]
[last_updated.last_updated_variable_days] • [promoted]