Site Reliability EngineerPsiQuantum • Palo Alto, CA, United States

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States

[job_card.variable_days_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

Join to apply for the Site Reliability Engineer role at PsiQuantum

Get AI-powered advice on this job and more exclusive features.

Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a real quantum computer. PsiQuantum is on a mission to build the first real, useful quantum computers, capable of delivering the world-changing applications that the technology has long promised. We know that means we will need to build a system with roughly 1 million qubits that supports fault tolerant error correction within a scalable architecture, and a data center footprint.

By harnessing the laws of quantum physics, quantum computers can provide exponential performance increases over today’s most powerful supercomputers, offering the potential for extraordinary advances across a broad range of industries including climate, energy, healthcare, pharmaceuticals, finance, agriculture, transportation, materials design, and many more.

PsiQuantum has determined the fastest path to delivering a useful quantum computer, years earlier than the rest of the industry. Our architecture is based on silicon photonics which gives us the ability to produce our components at Tier-1 semiconductor fabs such as GlobalFoundries where we leverage high-volume semiconductor manufacturing processes, the same processes that are already producing billions of chips for telecom and consumer electronics applications. We also benefit from the quantum mechanics reality that photons don’t feel heat or electromagnetic interference, allowing us to take advantage of existing cryogenic cooling systems and industry standard fiber connectivity.

In 2024, PsiQuantum announced two government-funded projects to support the build-out of our first Quantum Data Centers and utility-scale quantum computers in Brisbane, Australia and Chicago, Illinois. Both projects are backed by nations that understand quantum computing’s potential impact and the need to scale this technology to unlock that potential. And we won’t just be building the hardware, but also the fault tolerant quantum applications that will provide industry-transforming results.

Quantum computing is not just an evolution of the decades-old advancement in compute power. It provides the key to mastering our future, not merely discovering it. The potential is enormous, and we have the plan to make it real. Come join us.

There’s much more work to be done and we are looking for exceptional talent to join us on this extraordinary journey!

Job Summary

Join the OS / Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the day‑to‑day operation of our monitoring stack—Grafana, Prometheus, Loki, and Tempo—crafting dashboards that surface golden signals and drive real‑time insight. You’ll codify reliability through SLIs / SLOs, automate runbooks in Python, and lead incident response to maintain world‑class uptime across both on‑prem and AWS environments.

Responsibilities

Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs / SLOs) and error budgets for critical services.
Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation) for engineers and stakeholders.
Operate and tune our observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low‑latency telemetry ingestion and alerting.
Drive incident response : triage, mitigate, perform post‑incident reviews, and implement preventive actions.
Develop automation and self‑service tooling in Python / Bash to streamline alerts, runbooks, and operational tasks.
Collaborate with Platform and Product teams on capacity planning, performance testing, and change management.
Improve CI / CD health checks and release safety nets within GitLab.
Contribute to infrastructure as code (Terraform, Ansible) for monitoring stack deployments and upgrades.

Experience / Qualifications

Bachelor’s Degree or higher in Computer Science, Engineering or other related technical field.

5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production.

Hands‑on expertise with observability tools : Grafana, Prometheus, Loki, Tempo (or equivalent).

Proven track record designing dashboards and alerts around golden signals and (Utilization, Saturation, Errors) USE and RED (Rate, Errors, Duration) methodologies.

Solid scripting / automation skills in Python and Bash; familiarity with GitLab CI pipelines.

Operational experience with Kubernetes and containerized workloads.

Working knowledge of AWS services, networking fundamentals, and load balancing.

Experience running incident response and writing actionable post‑mortems.

Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management.

Exposure to regulated environments and multi‑region architectures is a plus.

Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, application, and data layers.

PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

Note : PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to recruiting@psiquantum.com .

We are not accepting unsolicited resumes from employment agencies.

The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

U.S. Base Pay Range

$120,000—$140,000 USD

Bay Area Pay Range

$145,000—$165,000 USD

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Computer Hardware Manufacturing

Referrals increase your chances of interviewing at PsiQuantum by 2x

Get notified about new Site Reliability Engineer jobs in Palo Alto, CA .

San Francisco Bay Area $164,000.00-$204,000.00 2 weeks ago

Mountain View, CA $52.00-$60.00 1 week ago

Palo Alto, CA $160,000.00-$180,000.00 2 weeks ago

Palo Alto, CA $100,000.00-$200,000.00 2 weeks ago

Software Engineer - Mapping & Localization

San Jose, CA $130,000.00-$182,000.00 9 months ago

Fremont, CA $117,000.00-$173,000.00 1 week ago

Senior Site Reliability Engineer, ML Platforms

Santa Clara, CA $224,000.00-$425,500.00 3 days ago

Fremont, CA $147,000.00-$208,000.00 1 week ago

Santa Clara, CA $103,000.00-$165,600.00 5 days ago

Mountain View, CA $138,225.00-$207,575.00 1 week ago

Software Engineer, AI Platform - New Grad

Mountain View, CA $145,000.00-$170,000.00 1 week ago

DevOps Engineer EAST COAST RESIDENT (No international / OPT / CPT consideration for this role)

Belmont, CA $110,000.00-$145,000.00 6 hours ago

Site Reliability Engineer, Global E-Commerce

San Jose, CA $136,800.00-$259,200.00 1 week ago

AI / ML Software Engineer Intern (Data Platform) - 2025 Fall (BS / MS)

Software Engineer- Python / Django / Linux : 5+yrs

San Jose, CA $146,600.00-$203,100.00 3 weeks ago

Software Engineer Intern (Big Data - Data Platform) - 2025 Summer / Fall (MS)

Mountain View, CA $145,000.00-$170,000.00 1 week ago

San Jose, CA $110,000.00-$230,000.00 1 week ago

San Mateo, CA $150,000.00-$185,000.00 1 week ago

Foster City, CA $160,000.00-$250,000.00 4 months ago

Santa Clara, CA $175,000.00-$195,000.00 1 month ago

New Grads 2025 - General Software Engineer

San Jose, CA $120,000.00-$165,000.00 5 months ago

San Mateo, CA $150,000.00-$185,000.00 1 week ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

[job_alerts.create_a_job]

Site Reliability Engineer • Palo Alto, CA, United States

[internal_linking.similar_jobs]

Reliability Engineer

nEye Systems • Santa Clara, CA, US

[job_card.full_time]

Eye’s MEMS-based silicon photonics optical circuit switches (OCS) eliminate critical bottlenecks in AI processing by enabling direct optical connections among thousands of GPUs and memory uni...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Reliability Quality Engineer

PROCEPT BioRobotics • San Jose, CA, US

[job_card.permanent]

Embark on an enriching journey with PROCEPT BioRobotics, where our vision, mission, and values guide everything we do as a company. At PROCEPT, we put the patient first in everything we do and ...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer

PSI Quantum • Palo Alto, CA, United States

[job_card.full_time]

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineering

Forhyre • Sunnyvale, CA, US

[job_card.full_time]

Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...[show_more]

[last_updated.last_updated_30] • [promoted]

Reliability Engineer

Pivotal • Palo Alto, CA, US

[job_card.full_time]

Pivotal is the leader in the emerging market of electric Vertical Takeoff and Landing (eVTOL) aircraft.We design, develop, and manufacture light eVTOL aircraft and are renowned for the BlackFly, th...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer (SRE) / DevOps Engineer

E-Space • Saratoga, CA, US

[job_card.full_time]

Ready to make connectivity from space universally accessible, secure, and actionable? Then you’ve come to the right place!. At E-Space, we’re focused on bridging Earth and space with the...[show_more]

[last_updated.last_updated_30] • [promoted]

Product Infrastructure Engineer - Site Reliability

Zyphra • Palo Alto, CA, US

[job_card.full_time]

Infrastructure Engineer - Site Reliability.Your work will be essential to ensuring the reliability and reproducibility of ML workloads, the safety and control of deployments, and the long-term main...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer

Hamilton Barnes • Fremont, CA, United States

[job_card.full_time]

Senior Platform Engineer / Site Reliability Engineer – AI Infrastructure.Join a stealth-mode startup building out their AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready t...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Site Reliability Engineer

Archetype AI • Palo Alto, CA, United States

[job_card.full_time]

Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...[show_more]

[last_updated.last_updated_30] • [promoted]

Reliability Engineer

Etched • Cupertino, CA, US

[job_card.full_time]

Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer

Foxconn Industrial Internet - FII • San Jose, CA, US

[job_card.full_time] +1

Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industr...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Sr. Reliability Engineer (26861)

Supermicro • San Jose, CA, United States

[job_card.full_time]

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]

[last_updated.last_updated_30] • [promoted]

Customer Reliability Engineer

Cisco Systems, Inc. • San Jose, CA, United States

[job_card.full_time]

This is a fully remote position open to candidates located in the United States with a strong preference for candidates based on the West Coast, with the ability to work in the Pacific Time Zone.Ap...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer

black.ai • Palo Alto, CA, United States

[job_card.full_time]

Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer

Cryptoware Technologies Inc • Santa Clara, CA, US

[job_card.full_time]

Lead the effort of global expansion of Huobi globe spanning infrastructure.Work with engineering teams to make sure new features and changes are deployed quickly and safely.Constantly improve our s...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Manager

Commscope • Sunnyvale, California, US

[job_card.full_time]

In our ‘always on’ world, we believe it’s essential to have a genuine connection with the work you do.RUCKUS Networks builds and delivers purpose-driven networks that perform in the tough, unique e...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer

Neara • Palo Alto, CA, United States

[job_card.full_time]

Job type : Full Time • Department : Backend Engineer • Work type : Remote.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-calib...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Reliability Systems Engineer | EAG Laboratories

Eurofins USA Material Sciences • Santa Clara, CA, US

[job_card.permanent]

Eurofins Scientific is a global leader in analytical testing, operating over 950 labs in 60 countries with 65,000 employees. EAG Laboratories, part of Eurofins, offers advanced services in analytica...[show_more]

[last_updated.last_updated_30] • [promoted]