Talent.com
Senior High Performance Computing Cluster Administrator
Senior High Performance Computing Cluster AdministratorNVIDIA • Remote, CA, US
Senior High Performance Computing Cluster Administrator

Senior High Performance Computing Cluster Administrator

NVIDIA • Remote, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
  • [filters.remote]
[job_card.job_description]

NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and / or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.

What you'll be doing :

Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.

Coordinate Storage Solutions and plan for growth.

Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)

Actively connect with management regarding any problems with the equipment and propose resolution.

Plan, build and install / upgrade new systems that support NVIDIA DL Software

What we need to see :

You have a BA, BS, or MS in CS, EE, CE or equivalent experience

4+ years of previous experience deploying and administrating HPC clusters

Familiar with resource scheduling managers (Slurm (preferred), LSF, etc!

Proven track record to script in bash, Perl or python

Experience with containers (Docker, Singularity, LXC)

Deep understanding of operating systems, computer networks, and high-performance applications

Ability to work well with developers & test engineers

Hard-working dedication to provide quality in support for your users

Ways to stand out from the crowd :

Familiarity and prior work experience with technologies such as : Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker

Familiarity with GPU usage in Compute Cluster and Cuda

Experience with mobile and embedded systems

Basic knowledge of Deep Learning.

Experience coding / scripting in Perl / Python / bash

The base salary range is 148,000 USD - 230,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

[job_alerts.create_a_job]

Senior High Performance Computing Cluster Administrator • Remote, CA, US

[internal_linking.similar_jobs]
Cerner Application Developer

Cerner Application Developer

Prosum • CA, United States
[job_card.full_time]
Bachelor's degree in related field required, or combined experience / education as substitute for minimum education.In lieu of a bachelor’s degree, 9 years of relevant business experience are require...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Counsel

Senior Counsel

Jobot • Auberry, CA, United States
[job_card.permanent]
This Jobot Job is hosted by : Natasha van der Griendt.Are you a fit? Easy Apply now by clicking the "Apply" button and sending us your resume. Salary : $145,000 - $185,000 per year.We are a long-stand...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Computer Systems Analyst (Journeyman)

Computer Systems Analyst (Journeyman)

Dynamic Solutions Technology LLC • CA, USA
[job_card.full_time]
[filters_job_card.quick_apply]
Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the. Analyze science, engineering, business, and other data proces...[show_more]
[last_updated.last_updated_30]
PACS Administrator - Imaging Applications

PACS Administrator - Imaging Applications

Prosum • CA, United States
[job_card.full_time]
The ideal candidate will have hands-on experience managing cardiovascular PACS and related imaging applications, along with strong knowledge of DICOM standards and clinical workflow integration.A l...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior SDK Engineer

Senior SDK Engineer

Firestorm • Remote, California, United States,
[filters.remote]
[job_card.permanent]
[filters_job_card.quick_apply]
At Firestorm, we’re on a mission to revolutionize how defense solutions are designed and delivered.We call this vision “democratized deterrence. As a VC-backed company at the intersection of defense...[show_more]
[last_updated.last_updated_1_day]
Provider Contracting Operations Analyst

Provider Contracting Operations Analyst

Astrana Health, Inc. • CA, California, US
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
Job Title : Provider Contracting Operations Analyst.Department : Provider Contracting Operations.This role is responsible for designing, developing, and maintaining reporting databases to support the...[show_more]
[last_updated.last_updated_variable_days]
Solution Architect

Solution Architect

RGH-Global Limited • California, California, United States
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
Technical Solutions Engineer (AI / Cloud) – California (U.Compensation : US$215,692 – US$287,590.Full‑Time or Priority Part‑Time. Join a rapidly expanding global technology innovator leading the way ...[show_more]
[last_updated.last_updated_variable_days]
Assistant Administrator

Assistant Administrator

Glen Park Senior Living • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
Glen Park Senior Living community in Glendale, CA is seeking a reliable, proactive Assistant Administrator to help ensure smooth day-to-day operations and a high standard of care ...[show_more]
[last_updated.last_updated_variable_days]
Senior Software Engineer, Tactical Applications

Senior Software Engineer, Tactical Applications

Firestorm • Remote, California, United States,
[filters.remote]
[job_card.permanent]
[filters_job_card.quick_apply]
At Firestorm, we’re on a mission to revolutionize how defense solutions are designed and delivered.We call this vision “democratized deterrence. As a VC-backed company at the intersection of defense...[show_more]
[last_updated.last_updated_30]
Senior Field Solutions Engineer

Senior Field Solutions Engineer

Miovision • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
At Miovision, we’re unlocking transportation networks that move you.Our vision and mission is to enable smart, fast, safe communities that simply flow, as we drive the Intelligent Mobility Re...[show_more]
[last_updated.last_updated_variable_days]
Storage Consultant

Storage Consultant

MetroSys • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
MetroSys is seeking an experienced Storage Consultant with hands-on Cirrus Data expertise to support customer storage modernization and data mobility initiatives. This role will focus on planning, d...[show_more]
[last_updated.last_updated_30]
Project Manager -

Project Manager -

Prosum • CA, United States
[job_card.full_time]
Experience working with – or in – Healthcare IT Project Management Office.Familiarity with healthcare interoperability standards such as.[show_more]
[last_updated.last_updated_variable_days] • [promoted]
IaaS Automation Engineer

IaaS Automation Engineer

MetroSys • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
Overview MetroSys is seeking a highly skilled Infrastructure-as-a-Service (IaaS) Automation Engineer to support customer delivery of automated infrastructure solutions. This role focuses on building...[show_more]
[last_updated.last_updated_30]
Jira Admin

Jira Admin

TWC Global • HALIFAX, California, USA
[job_card.temporary]
MsoNoSpacing"> TWC Global Services is hiring a Senior Jira Administrator<...[show_more]
[last_updated.last_updated_variable_days]
Office Administrator

Office Administrator

ASSA ABLOY Entrance Systems • CA, US
[job_card.full_time]
Amarr, part of global opening solutions company ASSA ABLOY, is hiring a process-oriented Office Administrator to join our distribution center team in Sacramento, CA. This is the ideal position for t...[show_more]
[last_updated.last_updated_variable_days]
Computer Systems Analyst (Junior)

Computer Systems Analyst (Junior)

Dynamic Solutions Technology LLC • CA, USA
[job_card.full_time]
[filters_job_card.quick_apply]
Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the. Analyze science, engineering, business, and other data proces...[show_more]
[last_updated.last_updated_30]
Director, Healthcare Software Solutions

Director, Healthcare Software Solutions

Intermedia Intelligent Communications • United States, California, US
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
You’ve spent years working with or inside the healthcare system, navigating how providers communicate with patients, how staff coordinate care, and how technology fits into those workflows.You may ...[show_more]
[last_updated.last_updated_30]
Storage DevOps Engineer

Storage DevOps Engineer

MetroSys • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
Overview MetroSys is seeking a highly skilled Storage DevOps Engineer with strong automation, scripting, and infrastructure-as-code expertise. This role focuses on building and supporting automation...[show_more]
[last_updated.last_updated_30]