Talent.com
Site Reliability Engineer
Site Reliability EngineerAmiri Recruiting • Mountain View, CA, US
Site Reliability Engineer

Site Reliability Engineer

Amiri Recruiting • Mountain View, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

Site Reliability Engineer

Onsite- Bay Area, CA

Skills

Relevant Skills and Experience

What You’ll Do (Day-to-Day)

Own and manage our cloud infrastructure (GCP or AWS, on-prem).

Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters).

Implement and improve CI / CD pipelines (GitHub Actions).

Write and maintain Infrastructure as Code (Terraform).

Monitor system health and performance using Grafana and other observability tools.

Ensure high availability, reliability, and uptime across platforms.

Handle infrastructure maintenance, upgrades, and scaling.

Administer and improve our platform architecture and apply general security best practices across the stack.

Note : This is an internal-facing role — no customer interaction.

Must-Have :

4+ years in SRE, DevOps, or Infrastructure Engineering

Solid experience with GCP or AWS (hybrid / on-prem a plus)

Experience with Kubernetes cluster management (GPU experience a bonus)

Hands-on with Terraform and CI / CD (GitHub)

Experience with monitoring / observability (Grafana, etc.)

Strong understanding of high availability and infrastructure reliability

Familiarity with platform / cluster architecture and administration

Security mindset and ability to apply best practice

Nice-to-Have :

Startup experience (you enjoy building, not just maintaining)

Experience with scalable GPU infrastructure for AI / ML

[job_alerts.create_a_job]

Site Reliability Engineer • Mountain View, CA, US

[internal_linking.related_jobs]
Site Reliability Engineer

Site Reliability Engineer

Fortinet • Sunnyvale, CA, United States
[job_card.full_time]
At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineering

Site Reliability Engineering

Forhyre • Sunnyvale, CA, US
[job_card.full_time]
Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Technology Site Reliability Engineer

Senior Technology Site Reliability Engineer

Cooley LLP • Palo Alto, CA, United States
[job_card.full_time]
Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer, AI / ML Infrastructure

Site Reliability Engineer, AI / ML Infrastructure

Boson AI • Santa Clara, CA, US
[job_card.full_time]
We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
[job_card.full_time]
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer - Kubernetes Platform

Site Reliability Engineer - Kubernetes Platform

Pantera Capital • Palo Alto, CA, United States
[job_card.full_time]
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer - Remote

Site Reliability Engineer - Remote

PayNearMe • Santa Clara, CA, US
[filters.remote]
[job_card.full_time]
At PayNearMe, we’re on a mission to make paying and getting paid as simple as possible.We build innovative technology that transforms the way businesses and their customers experience payment...[show_more]
[last_updated.last_updated_30] • [promoted]
Sr Principal Site Reliability Engineer (SASE)

Sr Principal Site Reliability Engineer (SASE)

Palo Alto Networks • Santa Clara, CA, US
[job_card.full_time]
At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer a...[show_more]
[last_updated.last_updated_30] • [promoted]
Customer Reliability Engineer

Customer Reliability Engineer

Cisco Systems, Inc. • San Jose, CA, United States
[job_card.full_time]
This is a fully remote position open to candidates located in the United States with a strong preference for candidates based on the West Coast, with the ability to work in the Pacific Time Zone.Ap...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Foxconn Industrial Internet - FII • San Jose, CA, US
[job_card.full_time] +1
Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industr...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer - Observability

Site Reliability Engineer - Observability

Rivian and Volkswagen Group Technologies • Palo Alto, CA, United States
[job_card.full_time]
Senior Site Reliability Engineer (SRE).RivianVW's Data Platform - Production Engineering team.In this role, you will design, implement, and scale robust observability systems to ensure the health, ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

OPPO • Palo Alto, CA, United States
[job_card.full_time]
OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer (L2)

Site Reliability Engineer (L2)

Wave Money • Palo Alto, CA, United States
[job_card.full_time]
Job Location : The Campus, Pun Hlaing Estate, Hlaing Thar Yar Township, Yangon.Working Hours : 8 : 30 AM to 5 : 30 PM, (Monday to Friday). Site Reliability Engineer is to perform daily support and monitor...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Grindr • Palo Alto, CA, United States
[job_card.full_time]
Staff Site Reliability Engineer.Get AI-powered advice on this job and more exclusive features.This range is provided by Grindr. Your actual pay will be based on your skills and experience — talk wit...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Cryptoware Technologies Inc • Santa Clara, CA, US
[job_card.full_time]
Lead the effort of global expansion of Huobi globe spanning infrastructure.Work with engineering teams to make sure new features and changes are deployed quickly and safely.Constantly improve our s...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer (SRE) at OPPO US Research Center Palo Alto, CA

Site Reliability Engineer (SRE) at OPPO US Research Center Palo Alto, CA

OPPO US Research Center • Palo Alto, CA, United States
[job_card.full_time]
Site Reliability Engineer (SRE) job at OPPO US Research Center.OPPO US Research Center is seeking a skilled and proactive. Site Reliability Engineer (SRE).In this role, you will be responsible for e...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer - Kubernetes Platform

Site Reliability Engineer - Kubernetes Platform

xAI • Palo Alto, CA, US
[job_card.full_time]
AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...[show_more]
[last_updated.last_updated_30] • [promoted]
Sr. Site Reliability Engineer (SRE)

Sr. Site Reliability Engineer (SRE)

Avenue Code • Mountain View, CA, United States
[job_card.full_time]
We’re seeking an experienced, highly collaborative SRE to partner with product teams and tackle our most critical infrastructure challenges. You’ll be hands-on in designing, building, and operating ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]