Talent.com
Senior HPC Cluster Systems Administrator
Senior HPC Cluster Systems AdministratorLawrence Berkeley National Laboratory • Berkeley, CA, United States
Senior HPC Cluster Systems Administrator

Senior HPC Cluster Systems Administrator

Lawrence Berkeley National Laboratory • Berkeley, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Berkeley Lab's ( LBNL ) Information Technology Division ( IT ) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team !

In this exciting role, you will support the Berkeley Lab research community by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters. This role provides extensive expertise in High Performance Computing infrastructure and delivers advanced Linux solutions to further scientific endeavors at Berkeley Lab. The mission of Scientific Computing under ScienceIT is to facilitate groundbreaking fundamental research globally by providing essential computing tools, networks, and expertise to enable pioneering science.

This position has an anticipated start date of January 5, 2026.

We're here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!

Why join Berkeley Lab?

We invest in our employees by offering a total rewards package you can count on :

  • Exceptional health and retirement benefits , including pension or 401K-style plans
  • Opportunities to grow in your career - check out our Tuition Assistance Program
  • A culture where you'll belong - we are invested in our teams!
  • In addition to accruing vacation and sick time, we also have an annual Winter Holiday Shutdown
  • Parental bonding leave (for both mothers and fathers)
  • Pet insurance

What You Will Do :

  • Perform Linux system and HPC cluster maintenance and installations, operating system upgrades, system security hardening and intrusion detection, storage and file system management, system hardware, customization of user group working environment, troubleshooting, network monitoring, and crash recovery.
  • Design, deploy, and manage scalable applications using Kubernetes, ensuring the availability, performance, and readiness of the Kubernetes infrastructure.
  • Automate deployment, scaling, and management of containerized applications, and collaborating with DevOps and development teams to streamline CI / CD pipelines.
  • Design, deploy, and manage the global storage platform to ensure high performance, massive scalability, reliability, and future-proof solutions.
  • Support storage technologies such as Lustre, VAST, and networks.
  • Resolve I / O issues related to business applications, including diagnosing and resolving complex storage, Linux, and networking challenges in a fast-paced environment.
  • Research new storage management technologies, techniques, and provide recommendations.
  • Participate in developing system administration, security, and network policies, documentation, and tools oriented towards efficient systems management.
  • Participate in cluster support to staff and researchers, including initial installation, integration, and ongoing maintenance of Linux High-Performance Computing cluster systems. This includes travel to remote sites if as needed.
  • Co-leading technical efforts with other senior system administrators in areas of HPC technologies such as job schedulers, high-performance interconnects, parallel file systems, cybersecurity, cluster management, container orchestration, VM infrastructure, networking, performance tuning, or data center planning.
  • Co-leading group projects of small to medium size and complexity, to implement and deploy new computing technologies and associated services to the research community.
  • What We Are Looking For :

  • A Bachelor's Degree (or equivalent knowledge / training) in Computer Science, Engineering, or a related discipline, and a minimum of 12 years of relevant experience in Linux system administration within a large distributed computing environment, including experience providing systems and end-user support for multiple scientific or computational research groups or an equivalent combination of education and experience.
  • Demonstrated ability to manage large-scale, performance-critical environments, including capacity planning, scaling, and optimization.
  • Significant experience deploying, scaling, and managing Kubernetes clusters, with a strong understanding of its architecture (pods, deployments, services, ingress) and container orchestration. Proven proficiency with CI / CD tools like Jenkins or GitLab CI.
  • Proven experience with Red Hat derivatives (CentOS, Scientific Linux, Rocky Linux), Debian, Ubuntu, and large-scale system and configuration management tools (Kickstart, Ansible, Puppet, Chef, Warewulf). Expertise in supporting standard services (NFS, LDAP, SMB, MySQL, Apache / Nginx HTTPD).
  • Strong HPC expertise, including Linux, job schedulers, high-performance interconnects, parallel file systems, cybersecurity, container orchestration, cluster management, VM infrastructure, networking, performance tuning, scientific application support, and data center planning.
  • Proficiency in Python and Bash for building, optimizing, and debugging scientific codes (C, C++, Fortran, Java), including experience with compilers (GCC, Intel), debuggers, Makefiles, and version-control (git, Subversion).
  • Expertise in storage system design and optimization (Lustre, S3, VAST, Weka, Ceph, DDN), including a deep understanding of the storage stack (kernel to user space, including file systems, block storage, I / O schedulers, VFS), storage benchmarking, and performance tuning (throughput, latency, IOPS, workload-specific optimizations).
  • Excellent oral and written communication skills including experience organizing and presenting customer focused technical data, reports, and projects to audiences with varying degrees of technical expertise.
  • Strong interpersonal skills including experience with research facilitation and project management in a multidisciplinary team environment.
  • Desired Qualifications :

  • An Advanced Degree (or equivalent knowledge / training) in Computer Science, Engineering, or a related discipline.
  • Experience with software engineering and / or software development.
  • Familiarity with Kubernetes-related tools like Helm, Istio, and Prometheus.
  • Demonstrated experience supporting research at a National Lab and / or in an academic or research environment.
  • Additional Information :

  • Application Deadline : For full consideration, please apply with a resume and a cover letter describing your interest by December 19, 2025 .
  • Appointment type : This is a full-time, career appointment, exempt (monthly paid) from overtime pay.
  • Salary Information : This position is expected to pay $178,644 - $218,364 annually, which fits within the full salary range of $158,808 - $267,996 annually for job code C70.4. It is not typical for an individual to be offered a salary at or near the top of the range for a position. Salary for this position will be commensurate with the final candidate's qualification and experience, including skills, knowledge, relevant education, certifications, and aligned with the internal peer group.
  • Background Check : This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
  • Work Modality : This position is eligible for a hybrid work schedule - a combination of teleworking and performing work on site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA 94720. Work schedules are dependent on business needs. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Starting May 7, a REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here ).
  • Relocation : This position is eligible for relocation assistance.
  • Work Authorization : Applicants must be legally authorized to work in the United States. Berkeley Lab does not provide visa sponsorship for this position.
  • Want to learn more about working at Berkeley Lab? Please visit : careers.lbl.gov

    Equal Employment Opportunity Employer : The foundation of Berkeley Lab is our Stewardship Values : Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.

    Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.

    Misconduct Disclosure Requirement : As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

    [job_alerts.create_a_job]

    System Administrator • Berkeley, CA, United States

    [internal_linking.related_jobs]
    Space Systems Solutions Architect

    Space Systems Solutions Architect

    Planet Labs PBC • San Francisco, CA, United States
    [job_card.full_time]
    We believe in using space to help life on Earth.Planet designs, builds, and operates the largest constellation of imaging satellites in history. This constellation delivers an unprecedented dataset ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    IT Systems Engineer - East

    IT Systems Engineer - East

    Omada Health • South San Francisco, CA, United States
    [job_card.full_time]
    Candidates must reside on the East Coast in the U.Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time. As an IT Systems Engineer, you will play a critica...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Director, Data and AI Architecture Leader

    Senior Director, Data and AI Architecture Leader

    Dynavax Technologies • Emeryville, CA, United States
    [job_card.full_time]
    This position can be 100% remote, but must be located in the United States.Dynavax is a commercial-stage biopharmaceutical company developing and commercializing novel vaccines to help protect the ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior HPC Cluster Systems Administrator

    Senior HPC Cluster Systems Administrator

    Lawrence Berkeley National Laboratory • Berkeley, CA, United States
    [job_card.full_time]
    Information Technology Division (.Senior HPC Cluster Systems Administrator to join their.In this exciting role, you will support the Berkeley Lab research community by building, integrating, and ma...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Hybrid IT Systems Administrator — Robotics & Automation

    Hybrid IT Systems Administrator — Robotics & Automation

    Serve Robotics • San Francisco, CA, United States
    [job_card.full_time]
    A leading robotics company in Redwood City is seeking an IT Systems Administrator to manage critical IT infrastructure with a focus on advanced troubleshooting and automation solutions.This hybrid ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Business Systems Analyst (0657U), Berkeley IT - 82566

    Business Systems Analyst (0657U), Berkeley IT - 82566

    InsideHigherEd • Berkeley, California, United States
    [job_card.full_time]
    Business Systems Analyst (0657U), Berkeley IT - 82566.At the University of California, Berkeley, we are dedicated to fostering a community where everyone feels welcome and can thrive.Our culture of...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Desktop Systems Engineer

    Senior Desktop Systems Engineer

    Considine Search • Sonoma, CA, United States
    [job_card.full_time]
    Are you ready to play a key role in driving growth at a top law firm? Preier, global law firm is seeking a Senior Desktop Systems Engineer to manage Azure and Intune environments.In this role, you’...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Slurm Administration & Systems Architecture

    Slurm Administration & Systems Architecture

    Midjourney • Sonoma, CA, US
    [job_card.full_time]
    We are seeking a highly skilled HPC / AI / ML Cluster Engineer to support the design, deployment, and ongoing operations of large-scale HPC environments powered by Slurm. This role centers on cluster en...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Remote IT Systems Administrator : Cloud & Storage Expert

    Remote IT Systems Administrator : Cloud & Storage Expert

    Kimball Electronics • San Francisco, CA, United States
    [filters.remote]
    [job_card.full_time]
    A global electronics company is seeking an IT Systems Administrator to manage infrastructure systems.Key responsibilities include administration, Level II support, and business collaboration.Candid...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior IT Systems Administrator - Desktop & Network Support

    Senior IT Systems Administrator - Desktop & Network Support

    Shiva IT Services • San Francisco, CA, United States
    [job_card.full_time]
    An established industry player is seeking a dedicated IT Support Specialist to join their dynamic team.In this role, you will provide Tier 1 and 2 desktop user support, ensuring smooth operations f...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Relativity Senior Systems Administrator

    Senior Relativity Senior Systems Administrator

    CGS Federal (Contact Government Services) • San Francisco, CA, United States
    [job_card.full_time]
    Senior Relativity Senior Systems Administrator.We are seeking a Senior Relativity Sr.Systems Administrator to join our team. You will handle a variety of projects to support and improve the organiza...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Systems Administrator

    Senior Systems Administrator

    CGS • San Francisco, California, United States, 94102
    [job_card.full_time]
    Employment Type : Full Time, Senior-level.CGS is seeking a Senior Systems Administrator to join our team supporting a wide-ranging technical support initiative for a large Federal agency.CGS brings ...[show_more]
    [last_updated.last_updated_30]
    IT Systems Engineer Manager

    IT Systems Engineer Manager

    Scale AI, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Scale AI is seeking an experienced IT Systems Engineering Manager to lead the design, development, and operation of our expanding SaaS and infrastructure ecosystem. In this role, you'll have the opp...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Hadoop Administrator

    Sr. Hadoop Administrator

    InfoCepts • Alameda, CA, United States
    [job_card.full_time]
    The mission of the Big Data Operations team is to help teams harness the power of Big Data by providing reliable and robust platform. We’re currently building NextGen Big Data platform on AWS, while...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    HPC Linux Systems Administrator

    HPC Linux Systems Administrator

    Jobot • Berkeley, CA, United States
    [job_card.full_time]
    This Jobot Job is hosted by : Kurt Holzmuller.Are you a fit? Easy Apply now by clicking the "Apply" button and sending us your resume. Salary : $120,000 - $180,000 per year.We are a leading global...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Emergency Medicine (Per Diem) | Vallejo

    Emergency Medicine (Per Diem) | Vallejo

    Kaiser Permanente - The Permanente Medical Group, Inc. -Northern California • Vallejo, US
    [job_card.full_time]
    The Permanente Medical Group is currently actively seeking.Per Diem Emergency Medicine Physicians.Northern & Central Valley, California. Range is $225 to $345 per hour Based on Base or Premium Rates...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Emergency Medicine (Per Diem) | Vallejo

    Emergency Medicine (Per Diem) | Vallejo

    HealthEcareers - Client • Vallejo, CA, USA
    [job_card.full_time]
    The Permanente Medical Group is currently actively seeking.Per Diem Emergency Medicine Physicians.Northern & Central Valley, California. Range is $225 to $345 per hour Based on Base or Premium Rates...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Systems Engineer

    Senior Systems Engineer

    Leidos Inc • San Francisco, CA, United States
    [job_card.full_time]
    Leidos is looking for a Systems Engineer with a TS / SCI with polygraph to support work on an information technology (IT) contract. Information Technology (IT) in support of its mission.The client's o...[show_more]
    [last_updated.last_updated_1_day] • [promoted]