Talent.com
Principal Site Reliability Engineer (Prisma AIRS)
Principal Site Reliability Engineer (Prisma AIRS)Palo Alto Networks • Santa Clara, CA, US
Principal Site Reliability Engineer (Prisma AIRS)

Principal Site Reliability Engineer (Prisma AIRS)

Palo Alto Networks • Santa Clara, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

Company Description

Our Mission

At Palo Alto Networks® everything starts and ends with our mission :

Being the cybersecurity partner of choice, protecting our digital way of life.

Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We believe collaboration thrives in person. That’s why most of our teams work from the office full time, with flexibility when it’s needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.

Job Description

Your Career

Palo Alto Networks Prisma AIRS leads the industry in advanced AI Security Capabilities, including Runtime Security, Model Scanning and Red Teaming. As a Site Reliability Engineer, you will be embedded directly in an engineering team, enabling deep collaboration with Software Engineers, AI / ML Researchers, Architects, and Product Managers. You will have a direct and fulfilling impact on the future of AI Security.

A Principal Site Reliability Engineer in Prisma AIRS embodies integrity, creativity, and a tireless dedication to continuous improvement. You will have the opportunity to design, build and operate cutting edge cloud-native applications from the ground up at massive scale. We're looking for resourceful and discerning engineers with a diverse technology background and a bias for action who will accelerate the team with creativity, experience and clean code.

Your Impact

Operate Prisma AIRS Cloud Services through contemporary Reliability Engineering practices.

Design, Build, Operate and Secure Cloud-Native Microservice Applications at Global Scale.

Own End-to-End Service Delivery in Production - Availability, Performance, Scalability, Security.

Partner with Software & ML Engineers to design and build new capabilities and features.

Banish toil through automation - from shell scripting to cluster orchestration to dynamic CI pipelines.

Gain a deep understanding of how we deliver AI Security; you'll be able troubleshoot end-to-end a production issue from an inbound HTTP request, through the network, webserver, model inferencing, database, down to the hardware layer.

Qualifications

Your Experience

You must be an expert in all things Kubernetes; you have a deep understanding of Kubernetes concepts, experience with building and operating production applications in multi-cluster environments, writing Helm charts from scratch and interacting with the Kubernetes API.

You must be an expert in either GCP or AWS, with at least 5 years of experience building and operating production cloud infrastructure at scale.

You must have significant Software Engineering / Development experience building applications in Go and / or Python.

You should have demonstrated experience in network operations, such as cloud networking, network security, and / or distributed computing systems.

You should have demonstrated experience in Linux administration, particularly in the context of cloud-native distributed systems, container runtimes, or Linux server fleets.

You should have experience with Relational Databases and SQL; you know how to read, write and refactor SQL queries, identify opportunities for and design secondary indexes, manage database objects such as tables, views, stored procedures, and perform backup / restore operations.

You should have experience designing, building and maintaining CI and / or GitOps pipelines for complex multi-application / multi-environment projects.

You should have experience in building application observability through Prometheus / OpenTelemetry metrics, Structured Logging or Distributed Tracing systems.

You may have practical experience in Information Security, such as Cloud / Application / Network Security and are familiar with compliance programs such as SOC2, ISO / IEC 27001, PCI-DSS, FedRAMP or control frameworks such as MITRE ATT&CK, NIST 800-53, OWASP or others.

You may have experience with running LLM / Machine Learning Inferencing Servers at scale across heterogeneous multi-GPU cloud environments.

Additional Information

The Team

Our Prisma AIRS team is a group of highly motivated and innovative engineers and researchers dedicated to solving the most challenging problems in AI security. We thrive in a collaborative environment where we value creativity, ownership, and a commitment to excellence. You will have the opportunity to work with cutting-edge technology and make a significant impact on the future of cybersecurity.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales / commissioned roles) is expected to be between $151,600 - $245,294 / YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.

Our Commitment

We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple : we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at   accommodations@paloaltonetworks.com.

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

[job_alerts.create_a_job]

Site Reliability Engineer • Santa Clara, CA, US

[internal_linking.similar_jobs]
Site Reliability Engineer - xAI Technical Operations

Site Reliability Engineer - xAI Technical Operations

xAI • Palo Alto, CA, US
[job_card.full_time]
AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...[show_more]
[last_updated.last_updated_30] • [promoted]
Reliability Engineer

Reliability Engineer

nEye Systems • Santa Clara, CA, US
[job_card.full_time]
Eye’s MEMS-based silicon photonics optical circuit switches (OCS) eliminate critical bottlenecks in AI processing by enabling direct optical connections among thousands of GPUs and memory uni...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineering

Site Reliability Engineering

Forhyre • Sunnyvale, CA, US
[job_card.full_time]
Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer (Fremont)

Site Reliability Engineer (Fremont)

Cypress HCM • Fremont, CA, US
[job_card.part_time]
As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Technology Site Reliability Engineer

Senior Technology Site Reliability Engineer

Cooley LLP • Palo Alto, CA, United States
[job_card.full_time]
Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
[job_card.full_time]
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Amiri Recruiting • Mountain View, CA, US
[job_card.full_time]
Relevant Skills and Experience.What You’ll Do (Day-to-Day).Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backe...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Archetype AI • Palo Alto, CA, United States
[job_card.full_time]
Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer - Kubernetes Platform

Site Reliability Engineer - Kubernetes Platform

Pantera Capital • Palo Alto, CA, United States
[job_card.full_time]
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

FLUIX • Palo Alto, CA, United States
[job_card.full_time]
FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure.We are based in Silicon Valley. We specialize in providing AI-driven solutions for data centers and pow...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Reliability Engineer

Reliability Engineer

Etched • Cupertino, CA, US
[job_card.full_time]
Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Foxconn Industrial Internet - FII • San Jose, CA, US
[job_card.full_time] +1
Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industr...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Fortinet • Sunnyvale, California, United States
[job_card.full_time]
At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer - Observability

Site Reliability Engineer - Observability

Rivian and Volkswagen Group Technologies • Palo Alto, CA, United States
[job_card.full_time]
Senior Site Reliability Engineer (SRE).RivianVW's Data Platform - Production Engineering team.In this role, you will design, implement, and scale robust observability systems to ensure the health, ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Cryptoware Technologies Inc • Santa Clara, CA, US
[job_card.full_time]
Lead the effort of global expansion of Huobi globe spanning infrastructure.Work with engineering teams to make sure new features and changes are deployed quickly and safely.Constantly improve our s...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer (L2)

Site Reliability Engineer (L2)

Wave Money • Palo Alto, CA, United States
[job_card.full_time]
Job Location : The Campus, Pun Hlaing Estate, Hlaing Thar Yar Township, Yangon.Working Hours : 8 : 30 AM to 5 : 30 PM, (Monday to Friday). Site Reliability Engineer is to perform daily support and monitor...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Grindr • Palo Alto, CA, United States
[job_card.full_time]
Staff Site Reliability Engineer.Get AI-powered advice on this job and more exclusive features.This range is provided by Grindr. Your actual pay will be based on your skills and experience — talk wit...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Sr. Site Reliability Engineer (SRE)

Sr. Site Reliability Engineer (SRE)

Avenue Code • Mountain View, CA, United States
[job_card.full_time]
We’re seeking an experienced, highly collaborative SRE to partner with product teams and tackle our most critical infrastructure challenges. You’ll be hands-on in designing, building, and operating ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]