Talent.com
Software Engineer, Fleet Infrastructure
Software Engineer, Fleet InfrastructureOpenai • San Francisco, California, United States
Software Engineer, Fleet Infrastructure

Software Engineer, Fleet Infrastructure

Openai • San Francisco, California, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU fleet to support OpenAI’s general purpose model training and deployment. Work on this team ranges from

Maximizing GPUs doing useful work by building user-friendly scheduling and quota systems

Running a reliable and low maintenance platform by building push-button automation for kubernetes cluster provisioning and upgrades

Supporting research workflows with service frameworks and deployment systems

Ensuring fast model startup times though high performance snapshot delivery across blob storage down to hardware caching

Much more!

About the Role

As an engineer within Fleet infrastructure, you will design, write, deploy, and operate infrastructure systems for model deployment and training on one of the world’s largest GPU fleet. The scale is immense, the timelines are tight, and the organization is moving fast; this is an opportunity to shape a critical system in support of OpenAI's mission to advance AI capabilities responsibly.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will :

Design, implement and operate components of our compute fleet including job scheduling, cluster management, snapshot delivery, and CI / CD systems.

Interface with researchers and product teams to understand workload requirements

Collaborate with hardware, infrastructure, and business teams to provide a high utilization and high reliability service

You might thrive in this role if you :

Have experience with hyperscale compute systems

Possess strong programming skills

Have experience working in public clouds (especially Azure)

Have experience working in Kubernetes

Execution focused mentality paired with a rigorous focus on user requirements

As a bonus, have an understanding of AI / ML workloads

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates : Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this  link .

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

[job_alerts.create_a_job]

Software Engineer Infrastructure • San Francisco, California, United States

[internal_linking.similar_jobs]
Flight Software Engineer

Flight Software Engineer

Planet Labs PBC • San Francisco, CA, United States
[job_card.full_time]
We believe in using space to help life on Earth.Planet designs, builds, and operates the largest constellation of imaging satellites in history. This constellation delivers an unprecedented dataset ...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer, Fleet Orchestration & Infra

Software Engineer, Fleet Orchestration & Infra

OpenAI • San Francisco, CA, United States
[job_card.full_time]
A leading AI research company in San Francisco seeks a Software Engineer for Fleet Management to design systems that integrate cloud and bare-metal infrastructure. The role involves optimizing large...[show_more]
[last_updated.last_updated_30] • [promoted]
Staff Infrastructure Software Engineer, Enterprise AI

Staff Infrastructure Software Engineer, Enterprise AI

Scale AI, Inc. • San Francisco, CA, United States
[job_card.full_time]
Scale GP is building the next generation of enterprise-grade Generative AI products.Our platform provides APIs for knowledge retrieval, inference, and evaluation, enabling customers to build and de...[show_more]
[last_updated.last_updated_30] • [promoted]
Infrastructure Engineer

Infrastructure Engineer

Kernel • San Francisco, California, United States
[job_card.full_time]
AI agents use applications, starting with browsers.Our edge is an infrastructure platform that’s extensible, observable, and built for scale from day one. Our platform handles the hard stuff : spinni...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Infrastructure Engineer

Infrastructure Engineer

Pangian • San Francisco, California, United States
[job_card.full_time]
Example org is a leading software company.Example org allows real-time collaboration on important example workflows.Founded in 2012 we have over 10,000 customers worldwide and are backed by fantast...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Lead Cloud Infrastructure Engineer

Lead Cloud Infrastructure Engineer

Together Ai • San Francisco, California, United States
[job_card.full_time]
Together AI is hiring a Lead Cloud Infrastructure Engineer to own and operate the cloud foundation that powers our rapidly scaling data platforms. In this role, you will be the primary engineer resp...[show_more]
[last_updated.last_updated_30] • [promoted]
Infrastructure Engineer

Infrastructure Engineer

Mercor • San Francisco, California, United States
[job_card.full_time]
Mercor is training models that predict how well someone will perform on a job better than a human can.We use our platform to source, vet, and onboard expert contractors who help train AI models in ...[show_more]
[last_updated.last_updated_30] • [promoted]
Infrastructure Engineer

Infrastructure Engineer

Workos • San Francisco, California, United States
[job_card.full_time]
WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees across...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Cloud Infrastructure Engineer

Cloud Infrastructure Engineer

Florvets Structures • San Francisco, California, United States
[filters.remote]
[job_card.full_time] +1
Position : Cloud Infrastructure Engineer.Florvets Structures is a leading construction and engineering company based in San Francisco, California. We specialize in building innovative and sustainable...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer, Infrastructure

Software Engineer, Infrastructure

Sift Stack, Inc. • San Francisco, CA, United States
[job_card.permanent]
Design, build, and maintain scalable, resilient infrastructure solutions to support our growing platform and customer base. Collaborate with software engineers to optimize application performance an...[show_more]
[last_updated.last_updated_30] • [promoted]
Infrastructure Software Engineer

Infrastructure Software Engineer

VirtualVocations • Oakland, California, United States
[job_card.full_time]
A company is looking for an Infrastructure Software Engineer for their Developer Platform.Key Responsibilities Build infrastructure to manage extensive metadata and user data, facilitating millio...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Infrastructure Engineer

Infrastructure Engineer

Vibecode • San Francisco, California, United States
[job_card.full_time]
We're democratizing software creation.Our platform lets anyone describe an idea and instantly turn it into a working application—no coding required. We're solving one of computing's fundamental chal...[show_more]
[last_updated.last_updated_30] • [promoted]
MTS, Infrastructure Engineer

MTS, Infrastructure Engineer

Delphina • San Francisco, California, United States
[job_card.full_time]
Today’s Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...[show_more]
[last_updated.last_updated_30] • [promoted]
Lead Infrastructure Engineer

Lead Infrastructure Engineer

PIP Labs • San Francisco, California, United States
[job_card.full_time]
Story aims to grow the creativity of the internet.The internet has introduced Story is building the IP infrastructure for the internet era, where creativity and intelligence move at the speed of cu...[show_more]
[last_updated.last_updated_30] • [promoted]
ML Infrastructure Engineer

ML Infrastructure Engineer

Phizenix • Menlo Park, California, United States
[job_card.full_time] +1
Menlo Park, CA | On-Site | Full-Time / Direct Hire.Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering ...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer - Infrastructure

Software Engineer - Infrastructure

Koah Labs • San Francisco, CA, United States
[job_card.full_time]
Koah Labs is building the ad network to power the next generation of AI-native products.Our mission is to help publishers monetize and help advertisers reach the right audience — without compromisi...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Infrastructure Software Engineer

Infrastructure Software Engineer

Alaris • San Francisco, CA, United States
[job_card.full_time]
Alaris Security is building the core technology stack for intelligent cyber security.Designed for enterprise and defense organizations, our platform unifies security data through our proprietary gr...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Infrastructure Engineer

Infrastructure Engineer

Tamarind Bio • San Francisco, California, United States
[job_card.full_time]
We enable any scientist to access AI-powered drug discovery.Thousands of scientists from large pharma companies, top biotechs, and academic institutions use Tamarind to design protein drugs, improv...[show_more]
[last_updated.last_updated_1_day] • [promoted]