Talent.com
Senior Engineering Manager - Compute Server Bring Up
Senior Engineering Manager - Compute Server Bring UpNVIDIA • Santa Clara, CA, US
Senior Engineering Manager - Compute Server Bring Up

Senior Engineering Manager - Compute Server Bring Up

NVIDIA • Santa Clara, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Senior Engineering Manager

NVIDIA data center systems have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA Networking, NVIDIA Data Center CPUs, and a fully optimized NVIDIA AI and HPC software stack.

We are seeking an excellent Senior Engineering Manager to lead the Compute Server Bring-Up team. This team is responsible for the bringup, integration, validation and troubleshooting for compute tray platforms of GPU Racks ensuring servers are fully functional and validated as per requirement before mass deployment in data centers. You will directly lead all aspects of a group of bringup engineers and form a larger virtual team spanning across NVIDIA software & firmware teams to ensure successful bring up compute platforms both internally and with customers.

What you'll be doing :

  • Own Initial Power-On and Board Bring-Up : Lead the initial power-on and functional validation of compute trays (CPU, GPU, NIC, storage including NVMe, cooling, etc.) internally and with customers. Ensure all functional requirements are met.
  • Form and lead a virtual team across NVIDIA software & firmware teams to ensure subject matter experts are available as needed throughout bringup. Regular reporting on status of bringup to provide visibility and ensure teams across the company are fully activated to help.
  • Oversee flashing, updating, and validation of firmware for all server components as per defined architecture. Ensure appropriate validation done for boundary, stress, and regression testing, and confirm telemetry, logging, and hardware management features working as per requirements. Document pain points, bring up failures, recovery flows, and provide actionable feedback to hardware, firmware, and software teams. Ensure usability, firmware / BIOS update coverage, and error reporting for reliable customer installation and operation
  • Factory & Manufacturing Support : Support manufacturing flows, firmware updates, and diagnostic procedures. Ensure BOM change signoff and process optimization.
  • Debug, Issue Resolution & Customer Support : Lead root cause analysis and resolution of bring-up failures. Collaborate with partners, ODMs, and customers for technical support.
  • Documentation & Knowledge Transfer : Own and maintain platform design guides, bring-up checklists, and install instructions. Provide training and enablement for internal and external teams.
  • Product Ownership : Drive product life cycles with QA teams, ensuring robust bring up, productization, and delivery.
  • Performance Management : Conduct performance evaluations, develop a culture of excellence, and ensure high productivity.

What we need to see :

  • 5+ years of relevant experience managing systems / platform software teams, ideally in server bring up, firmware development, or data center solutions. Deep experience operating successfully in a matrix environment, forming and leading high impact virtual teams spanning multiple disciplines.
  • BS, MS, or PhD in EE / CS or related field (or equivalent experience) with 12+ overall years of experience. Strong knowledge of compute tray designs, firmware enablement, and system-level architecture.
  • Proven track record of delivering scalable server products and solutions for large scale data centers. Experience collaborating with hardware, firmware, manufacturing, diags and QA teams.
  • Experience with SCM (Git, Perforce) and project management tools (Jira).
  • Excellent written and oral communication skills, strong work ethic, and dedication to teamwork.
  • Hands-on experience with x86 / ARM system architecture and coding (C / C++, Python).
  • You are a self-starter who loves to find creative solutions to complicated problems.
  • Proven excellence in server architecture, collaborating across teams for delivering server products as per defined Key Performance Indicators (KPIs).
  • Ways to stand out from the crowd :

  • Experience leading bring-up for sophisticated compute architectures like GB200 NVL72.
  • NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hardworking and self-motivated, we want to hear from you!

    Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 425,500 USD. You will also be eligible for equity and benefits.

    Applications for this job will be accepted at least until November 25, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    [job_alerts.create_a_job]

    Senior Engineering Manager • Santa Clara, CA, US

    [internal_linking.similar_jobs]
    Senior Manager, Distributed Cloud Engineering

    Senior Manager, Distributed Cloud Engineering

    F5 • San Jose, CA, United States
    [job_card.full_time]
    A leading cloud solutions provider is seeking a Senior Manager of Engineering in San Jose, CA.In this role, you will lead and mentor a team of engineers focused on designing and maintaining distrib...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Solution Manager - Servers (27179)

    Solution Manager - Servers (27179)

    Supermicro • San Jose, CA, United States
    [job_card.full_time]
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Linux Engineering Manager - Optimisation for Latest Hardware

    Linux Engineering Manager - Optimisation for Latest Hardware

    Canonical • San Jose, CA, United States
    [job_card.full_time]
    Linux Engineering Manager - Optimisation for Latest Hardware.Linux Engineering Manager - Optimisation for Latest Hardware. Linux Engineering Manager - Optimisation for Latest Hardware.Be among the f...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineering Manager, Compute Systems Software

    Senior Software Engineering Manager, Compute Systems Software

    General Motors • Mountain View, CA, United States
    [job_card.full_time]
    Hybrid : This role is categorized as hybrid.This means the successful candidate is expected to report to Mountain View, CA, three times per week, at minimum. The Vehicle Experiences Engine (VEE) at G...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Technology Site Reliability Engineering Manager

    Senior Technology Site Reliability Engineering Manager

    Cooley LLP • Palo Alto, CA, United States
    [job_card.full_time]
    Senior Technology Site Reliability Engineering Manager.Cooley is seeking a Senior Site Reliability Engineering Manager to join the. Infrastructure & Development Operations.The Senior Technology Site...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Manager, End User Engineering

    Manager, End User Engineering

    Zscaler • San Jose, CA, United States
    [job_card.full_time]
    Our general and administrative teams help to support and scale our great company.Whether striving to grow our workforce, nurture an amazing culture and work environment, support our financial and l...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Distributed Cloud Engineering Leader

    Distributed Cloud Engineering Leader

    F5 Networks, Inc. • San Jose, CA, United States
    [job_card.full_time]
    A leading network and security company is seeking an Engineering Sr Manager to lead a team building and operating distributed cloud services. The role requires over 10 years of software engineering ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Manager, Solution Engineering

    Manager, Solution Engineering

    Support Revolution • San Jose, CA, United States
    [job_card.full_time]
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Engineering Manager, Terraform Registry — Global Infra Lead

    Engineering Manager, Terraform Registry — Global Infra Lead

    IBM • San Jose, CA, United States
    [job_card.full_time]
    A leading technology company is seeking a Software Engineering Manager for its HashiCorp Terraform Registry team in San Jose, California. In this role, you will manage a globally distributed team of...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Product Manager - GPU Server & Cloud Infrastructure

    Senior Product Manager - GPU Server & Cloud Infrastructure

    Super Micro Computer Spain, S.L. • San Jose, CA, United States
    [job_card.full_time]
    A leading tech company seeks a Sr.Product Manager to lead GPU server and workstation product development.The role involves collaboration with marketing and engineering teams, direct customer engage...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Manager, Foundry Customer Engineering

    Senior Manager, Foundry Customer Engineering

    Samsung Semiconductor • San Jose, CA, US
    [job_card.full_time]
    To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period.Advancing the World's Tec...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineering Manager II, Google Cloud Compute

    Software Engineering Manager II, Google Cloud Compute

    Google Inc. • Sunnyvale, CA, United States
    [job_card.full_time]
    Software Engineering Manager II, Google Cloud Compute.X Note : By applying to this position you will have an opportunity to share your preferred working location from the following : .Sunnyvale, CA, U...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineering Manager II, Google Cloud Compute

    Software Engineering Manager II, Google Cloud Compute

    Google • Sunnyvale, CA, United States
    [job_card.full_time]
    Software Engineering Manager II, Google Cloud Compute.Software Engineering Manager II, Google Cloud Compute.Bachelor’s degree, or equivalent practical experience. Python, C, C++, Java, JavaScript).M...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Manager, Software Engineering - Enterprise Infrastructure

    Manager, Software Engineering - Enterprise Infrastructure

    LinkedIn • Mountain View, CA, United States
    [job_card.full_time]
    Manager, Software Engineering - Enterprise Infrastructure.Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights eve...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Sr. Engineering Manager - Storage Engineering

    Sr. Engineering Manager - Storage Engineering

    Cloudera • San Jose, CA, United States
    [job_card.full_time]
    For information on how we process your personal data, please review our.Business Area : • •Engineering • •Seniority Level : • •Mid-Senior level • •Job Description : • •At Cloudera, we empower people to transfor...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Senior Engineering Manager - Platform Integrations

    Senior Engineering Manager - Platform Integrations

    Intuit • Mountain View, CA, United States
    [job_card.full_time]
    A leading financial technology company is seeking a Senior Engineering Manager in Mountain View to lead a strategic team focusing on platform integrations and AI innovations.The ideal candidate wil...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Manager, Performance Engineering (Cortex Cloud)

    Senior Manager, Performance Engineering (Cortex Cloud)

    Palo Alto Networks • Santa Clara, CA, US
    [job_card.full_time]
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer a...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Engineering Manager - Developer Portal

    Senior Engineering Manager - Developer Portal

    ID.me • Mountain View, CA, US
    [job_card.full_time]
    Consumers can verify their identity with ID.Over 152 million users experience streamlined login and identity verification with ID. More than 600+ consumer brands use ID.Commerce Department and is ap...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]