Talent.com
Senior High Performance Computing Cluster Administrator
Senior High Performance Computing Cluster AdministratorNVIDIA • Remote, CA, US
Senior High Performance Computing Cluster Administrator

Senior High Performance Computing Cluster Administrator

NVIDIA • Remote, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
  • [filters.remote]
[job_card.job_description]

NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and / or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.

What you'll be doing :

Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.

Coordinate Storage Solutions and plan for growth.

Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)

Actively connect with management regarding any problems with the equipment and propose resolution.

Plan, build and install / upgrade new systems that support NVIDIA DL Software

What we need to see :

You have a BA, BS, or MS in CS, EE, CE or equivalent experience

4+ years of previous experience deploying and administrating HPC clusters

Familiar with resource scheduling managers (Slurm (preferred), LSF, etc!

Proven track record to script in bash, Perl or python

Experience with containers (Docker, Singularity, LXC)

Deep understanding of operating systems, computer networks, and high-performance applications

Ability to work well with developers & test engineers

Hard-working dedication to provide quality in support for your users

Ways to stand out from the crowd :

Familiarity and prior work experience with technologies such as : Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker

Familiarity with GPU usage in Compute Cluster and Cuda

Experience with mobile and embedded systems

Basic knowledge of Deep Learning.

Experience coding / scripting in Perl / Python / bash

The base salary range is 148,000 USD - 230,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

[job_alerts.create_a_job]

Senior Administrator • Remote, CA, US

[internal_linking.similar_jobs]
Mid-Senior IT Professional (Multiple Opportunities)

Mid-Senior IT Professional (Multiple Opportunities)

Hire Resolve.com • CA, US
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
Hire Resolve is assisting IT organizations in hiring experienced IT professionals to support U.This is a multi-role opportunity covering several functions across Information Technology, including (...[show_more]
[last_updated.last_updated_variable_days]
Healthcare Operations Manager (Facility Administrator)

Healthcare Operations Manager (Facility Administrator)

Davita Inc. • Visalia, CA, United States
[job_card.full_time]
W Cypress Ave, Visalia, California, 93277, United States of America.As a Healthcare Operations Manager (Facility Administrator) at DaVita, you'll be a part of a Team that values work-life balance a...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Computer Systems Analyst (Journeyman)

Computer Systems Analyst (Journeyman)

Dynamic Solutions Technology LLC • CA, USA
[job_card.full_time]
[filters_job_card.quick_apply]
Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the. Analyze science, engineering, business, and other data proces...[show_more]
[last_updated.last_updated_30]
Senior Commissions Analyst

Senior Commissions Analyst

Intermedia Intelligent Communications • United States, California, US
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
Are you looking for a company where.Then Intermedia is the place for you.Intermedia has established itself as a leading provider of cloud communications and collaboration tech that allows companies...[show_more]
[last_updated.last_updated_30]
Survey Enumerator

Survey Enumerator

HealthBay • California, California, United States
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
We are seeking detail-oriented and reliable Remote Survey Enumerators to collect accurate survey data through phone calls, online questionnaires, or other approved remote methods.This position is o...[show_more]
[last_updated.last_updated_variable_days]
C# Architect

C# Architect

Openkyber • CA, United States
[job_card.full_time]
[filters_job_card.quick_apply]
Job Description : Role : Software Test Engineer Location : San Jose, CA (On-site) Ty...[show_more]
[last_updated.last_updated_variable_days]
Mid-Senior Healthcare Professional

Mid-Senior Healthcare Professional

Hire Resolve.com • CA, US
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
Hire Resolve is assisting healthcare organizations in hiring experienced healthcare professionals.This is a multi-role opportunity designed to attract licensed medical professionals and healthcare ...[show_more]
[last_updated.last_updated_variable_days]
Lead Full Stack Developer

Lead Full Stack Developer

3GC Group • California, CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
About The Role As a Lead Full Stack Developer (Individual Contributor), you’ll take ownership of designing and delivering scalable software solutions while collaborating closely with cr...[show_more]
[last_updated.last_updated_30]
Storage Consultant

Storage Consultant

MetroSys • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
MetroSys is seeking an experienced Storage Consultant with hands-on Cirrus Data expertise to support customer storage modernization and data mobility initiatives. This role will focus on planning, d...[show_more]
[last_updated.last_updated_30]
QE / SDET

QE / SDET

Mango Analytics • CA, United States
[job_card.full_time]
[filters_job_card.quick_apply]
Position : QE / SDET Employment Type : Contr...[show_more]
[last_updated.last_updated_variable_days]
IaaS Automation Engineer

IaaS Automation Engineer

MetroSys • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
Overview MetroSys is seeking a highly skilled Infrastructure-as-a-Service (IaaS) Automation Engineer to support customer delivery of automated infrastructure solutions. This role focuses on building...[show_more]
[last_updated.last_updated_30]
Union Energy Storage Ironworker Journeyman - EdSan 2A2B

Union Energy Storage Ironworker Journeyman - EdSan 2A2B

Mortenson • CA, United States
[job_card.full_time]
Mortenson is currently seeking an.Set up, flag, and inspect a crane; inspect rigging properly.Run a site level; check in truck deliveries against bill of lading. Count bars and bundles of rebar, dob...[show_more]
[last_updated.last_updated_30] • [promoted]
AWS Redshift Data Architect

AWS Redshift Data Architect

E-Solutions INC • CA, United States
[job_card.full_time]
[filters_job_card.quick_apply]
Hi Professionals, Title : AWS Redshift Data Architect Location : Remote Duration : Long Term ...[show_more]
[last_updated.last_updated_variable_days]
Data Center Procurement Killer!

Data Center Procurement Killer!

RM Staffing B.V. • California, CA, US
[job_card.full_time]
Reboot Monkey is a leading provider of comprehensive data center management solutions, offering services such as managed colocation, smart hands, and rack and stack solutions.We ensure fast deploym...[show_more]
[last_updated.last_updated_30]
Computer Systems Analyst (Junior)

Computer Systems Analyst (Junior)

Dynamic Solutions Technology LLC • CA, USA
[job_card.full_time]
[filters_job_card.quick_apply]
Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the. Analyze science, engineering, business, and other data proces...[show_more]
[last_updated.last_updated_30]
Director, Healthcare Software Solutions

Director, Healthcare Software Solutions

Intermedia Intelligent Communications • United States, California, US
[filters.remote]
[job_card.full_time]
[filters_job_card.quick_apply]
You’ve spent years working with or inside the healthcare system, navigating how providers communicate with patients, how staff coordinate care, and how technology fits into those workflows.You may ...[show_more]
[last_updated.last_updated_30]
Storage DevOps Engineer

Storage DevOps Engineer

MetroSys • CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
Overview MetroSys is seeking a highly skilled Storage DevOps Engineer with strong automation, scripting, and infrastructure-as-code expertise. This role focuses on building and supporting automation...[show_more]
[last_updated.last_updated_30]
Cadence Application Analyst

Cadence Application Analyst

APN Software Services Inc • CA, United States
[job_card.full_time]
[filters_job_card.quick_apply]
Under the direction of an Application Manager, the Application Analyst performs troubleshooting, maintenance, and optimization of existing software applications. They design, build, test, and suppor...[show_more]
[last_updated.last_updated_1_day]