Talent.com
Senior Infrastructure Engineer - Supercomputing
Senior Infrastructure Engineer - SupercomputingInstitute of Foundation Models • Sunnyvale, CA, US
Senior Infrastructure Engineer - Supercomputing

Senior Infrastructure Engineer - Supercomputing

Institute of Foundation Models • Sunnyvale, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

We are operating some of the world’s largest GPU supercomputing clusters to support cutting-edge AI research and large-scale model deployment. We’re looking for an Infrastructure Engineer to join our core platform team to help build, operate, and scale our hybrid infrastructure across both on-prem and cloud environments.

This role is ideal for engineers who thrive at the intersection of distributed systems, cloud automation, and high-performance computing.

Key Responsibilities

  • Operate and scale high-performance GPU clusters used for AI training and production inference.
  • Manage infrastructure across on-premise (Slurm-based) HPC environments and cloud providers like AWS and Azure .
  • Implement and maintain Infrastructure as Code using Pulumi , Terraform , or Ansible .
  • Enhance and secure deployment pipelines using Kubernetes , Flux , and ArgoCD .
  • Help define and enforce security best practices for internal researchers and production services.
  • Continuously improve observability, resiliency, and operational tooling across environments.

Tech Stack

  • Kubernetes, Slurm
  • Pulumi, Terraform, Ansible
  • Rust and Go
  • Flux, ArgoCD
  • AWS, Azure
  • Professional Experience

  • Strong experience managing compute infrastructure in hybrid environments (on-prem and cloud).
  • Hands-on experience operating Slurm clusters at scale.
  • Proficiency in deploying and managing containerized applications, ideally written in Rust or Go .
  • Solid background in IaC and CI / CD best practices.
  • Experience working with GPU workloads or HPC infrastructure is a strong plus.
  • Familiarity with securing and monitoring multi-tenant compute environments.
  • Salary depends on level.

    Visa Sponsorship

    This position is eligible for visa sponsorship.

    Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
  • [job_alerts.create_a_job]

    Senior Infrastructure Engineer • Sunnyvale, CA, US

    [internal_linking.related_jobs]
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    Five9 • San Ramon, CA, US
    [job_card.full_time]
    Join us in bringing joy to customer experience.Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.Living our values everyday...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Hardcore Engineer - Infrastructure / Supercomputing

    Hardcore Engineer - Infrastructure / Supercomputing

    xAI • Palo Alto, CA, US
    [job_card.full_time]
    AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Hardcore Engineer - Multimodal Infrastructure

    Hardcore Engineer - Multimodal Infrastructure

    xAI • Palo Alto, CA, US
    [job_card.full_time]
    AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Infrastructure Engineer

    Infrastructure Engineer

    Meshy • Sunnyvale, CA, US
    [job_card.full_time]
    Meshy is the leading 3D generative AI company on a mission to.Meshy makes it effortless for both professional artists and hobbyists to create unique 3D assets—turning text and images into stu...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Infrastructure Engineer InfraOps

    Senior Infrastructure Engineer InfraOps

    BitGo • Palo Alto, California, USA
    [job_card.full_time]
    BitGo is the leading infrastructure provider of digital asset solutions delivering custody wallets staking trading financing and settlement services from regulated cold storage.Since our founding i...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Infrastructure Linux & DevOps Engineer

    Senior Infrastructure Linux & DevOps Engineer

    Matrix Precise, Inc. • Pleasanton, CA, US
    [job_card.full_time]
    Infra Linux Engineer’s primary function will be to advance the infrastructure team from a traditional infrastructure methodology to an infrastructure as code approach.You will be responsible ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Sr. Software Engineer, Traffic Infrastructure

    Sr. Software Engineer, Traffic Infrastructure

    Genesis10 • Sunnyvale, CA, US
    [job_card.permanent]
    Genesis10 is currently seeking a Sr.Software Engineer, Traffic Infrastructure with our client in their Sunnyvale, CA location. This is a 6 month + contract remote position.Summary : As part of our wo...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Cloud Infrastructure Engineer TechOps CICD

    Sr. Cloud Infrastructure Engineer TechOps CICD

    CrowdStrike • Sunnyvale, California, USA
    [job_card.full_time]
    As a global leader in cybersecurity CrowdStrike protects the people processes and technologies that drive modern organizations. Since 2011 our mission hasnt changed were here to stop breaches and w...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Infrastructure Engineer (Core Infra, US)

    Senior Infrastructure Engineer (Core Infra, US)

    Workato • Palo Alto, CA, US
    [job_card.full_time]
    Workato transforms technology complexity into business opportunity.As the leader in enterprise orchestration, Workato helps businesses globally streamline operations by connecting data, processes, ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    Crusoe • Sunnyvale, CA, US
    [job_card.full_time]
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Enterprise Cloud Infrastructure Engineer

    Enterprise Cloud Infrastructure Engineer

    InsideHigherEd • Stanford, California, United States
    [job_card.full_time]
    Enterprise Cloud Infrastructure Engineer.Business Affairs : University IT (UIT), Redwood City, California, United States. Information Technology Services📅Sep 05, 2025 Post Date📅107211 Requisi...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    (Senior) Software Engineer, Infrastructure (Kubernetes Platform)

    (Senior) Software Engineer, Infrastructure (Kubernetes Platform)

    pony.ai • Fremont, CA, US
    [job_card.full_time]
    Founded in 2016 in Silicon Valley, Pony.Operating Robotaxi, Robotruck and Personally Owned Vehicles (POV) business units, Pony. CNBC Disruptor list of the 50 most innovative and disruptive tech comp...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Infrastructure Engineer

    Infrastructure Engineer

    Dtex Systems • Fremont, CA, US
    [job_card.full_time]
    DTEX is seeking an experienced Site Reliability Engineer (SRE) with a strong software engineering background to help drive modernization of our infrastructure and operations.This is a high-impact r...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Staff Cloud Infrastructure Engineer

    Senior Staff Cloud Infrastructure Engineer

    Zscaler • San Jose, California, USA
    [job_card.full_time]
    Zscaler accelerates digital transformation so our customers can be more agile efficient resilient and secure.Our cloud native Zero Trust Exchange platform protects thousands of customers from cyber...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Crusoe • Sunnyvale, CA, US
    [job_card.full_time]
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Forhyre • Sunnyvale, CA, US
    [job_card.full_time]
    Do you enjoy solving technical issues, empathize with customer user experiences and want to keep up with the latest tech? We are looking for a Cloud Infrastructure Engineer that will work with tale...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Design Infrastructure Engineer

    Design Infrastructure Engineer

    Etched • Cupertino, CA, US
    [job_card.full_time]
    Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Systems Engineer (Contract)

    Senior Systems Engineer (Contract)

    Blue Star Partners LLC • Pleasanton, CA, US
    [job_card.full_time]
    Pleasanton, CA – 100% onsite – Local candidates only.Strong potential for extension / direct hire.Hours over 40 will be paid at Time and a Half. The Senior Systems Engineer (Contract) will...[show_more]
    [last_updated.last_updated_30] • [promoted]