Talent.com
Site Reliability Engineer - Hardware Specialist
Site Reliability Engineer - Hardware SpecialistXai • Memphis, Tennessee, United States
Site Reliability Engineer - Hardware Specialist

Site Reliability Engineer - Hardware Specialist

Xai • Memphis, Tennessee, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

As an SRE - Hardware Specialist, you will serve as a hardware reliability expert focused on firmware, hardware specifications, vendor relations, and failure analysis. You will proactively identify and resolve hardware issues, manage RMA processes, and stay ahead of emerging hardware technologies to support xAI's datacenter operations. This role demands deep technical expertise in hardware diagnostics, vendor negotiations, and forward-looking hardware evaluation.

Responsibilities

  • Analyze firmware packages and hardware specifications for upcoming releases to ensure compatibility, performance, and reliability in xAI's datacenter environment.
  • Investigate and diagnose hardware failures, including "grey failures" (ambiguous or intermittent issues), proving them as true hardware defects through rigorous testing and data analysis.
  • Manage vendor relationships, including initiating RMA (Return Merchandise Authorization) claims, negotiating beyond standard processes when necessary, and holding vendors accountable for resolutions.
  • Collaborate with Datacenter Operations Technicians to troubleshoot, repair, and optimize hardware systems in real-time.
  • Research and evaluate next-generation hardware technologies that are not yet released, providing insights and recommendations to inform xAI's infrastructure roadmap.
  • Develop and implement monitoring tools, scripts, and processes to detect hardware anomalies early and minimize downtime.
  • Document failure modes, RMA outcomes, and hardware evaluations to build a knowledge base for the team.
  • Participate in on-call rotations and incident response for hardware-related issues in the Memphis datacenter.

Required Qualifications

  • Bachelor's degree in Systems Engineering, Electrical Engineering, Computer Science, or a related field (or equivalent experience).
  • 5+ years of experience in hardware reliability engineering, preferably in high-performance computing or datacenter environments.
  • Proven expertise in firmware analysis, hardware specifications review, and release validation.
  • Strong experience with RMA processes, including filing claims, vendor negotiations, and pushing for resolutions outside standard protocols.
  • Demonstrated ability to diagnose and prove complex hardware failures, including grey or intermittent issues, using tools, logic analyzers, or diagnostic software.
  • Familiarity with datacenter hardware components (e.g., servers, GPUs, networking equipment) and emerging technologies.
  • Proficiency in scripting languages (e.g., Python, Bash) for automation and analysis.
  • Excellent problem-solving skills with a data-driven approach to reliability engineering.
  • Ability to work collaboratively with cross-functional teams, including operations technicians.
  • Preferred Qualifications

  • Experience in AI / ML infrastructure or supercomputing environments.
  • Knowledge of vendor ecosystems (e.g., NVIDIA, Dell, HP, Supermicro) and supply chain management.
  • Certifications in hardware engineering or reliability (e.g., CRE, CompTIA Server+).
  • Prior work in a fast-paced startup or tech company like xAI.
  • xAI is an equal opportunity employer.

    California Consumer Privacy Act (CCPA) Notice

    [job_alerts.create_a_job]

    Site Reliability Engineer • Memphis, Tennessee, United States

    [internal_linking.related_jobs]
    Reliability Coordinator 318132

    Reliability Coordinator 318132

    SOMERSET STAFFING • Memphis, TN, United States
    [job_card.full_time]
    [filters_job_card.quick_apply]
    Job Description : Industry : Food & Beverages Job Category : Engineering - Chemical Job Details&l...[show_more]
    [last_updated.last_updated_variable_hours] • [new]
    Lead Maintenance Engineer (TOWER) (Southaven)

    Lead Maintenance Engineer (TOWER) (Southaven)

    IP Casino Resort Spa • Southaven, MS, United States
    [job_card.full_time]
    Boyd Gaming Corporation has been successful in gaming jurisdiction in which we operate in the United States and is one of the premier casino entertainment companies in the United States.Never conte...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Roofing Specialist

    Roofing Specialist

    B&A Roofing • southhaven, MS, USA
    [job_card.full_time]
    [filters_job_card.quick_apply]
    We are looking to hire 2 experienced roofing sales representatives in our North Mississippi market areas to run pre-set and confirmed roofing appointments who can also generate their own leads.B an...[show_more]
    [last_updated.last_updated_variable_days]
    Production Systems Specialist

    Production Systems Specialist

    Konica Minolta Business Solutions • Memphis, Tennessee, United States
    [job_card.full_time]
    Are you mechanically inclined and excited about establishing a growth-oriented career?.We have opportunities for career growth in all areas of the company!. Join us now and receive a$1,500sign-on bo...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Site Supervisor - 1st Shift

    Site Supervisor - 1st Shift

    Kuehne Nagel • Southaven, MS, US
    [job_card.full_time]
    As a Contract Logistics Specialist at Kuehne+Nagel, you will manage end-to-end warehousing operations for our customers.By doing so with precision, you not only contribute to the success of your te...[show_more]
    [last_updated.last_updated_variable_days]
    Continuous Improvement Specialist

    Continuous Improvement Specialist

    Rite-Hite Company • Horn Lake, MS, United States
    [job_card.full_time]
    Our innovative products and world class sales organization ensure solid, consistent growth, both for our company and our staff. We are always looking ahead to develop innovative new products and ser...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Reliability Coordinator 318132

    Reliability Coordinator 318132

    VIR Consultants Inc • Memphis, TN, United States
    [job_card.full_time]
    [filters_job_card.quick_apply]
    Job Description : Industry : Food & Beverages Job Category : Engineering - Chemical Job Details&l...[show_more]
    [last_updated.last_updated_variable_hours] • [new]
    Sr Eng

    Sr Eng

    FedEx • Memphis, Tennessee, US
    [job_card.full_time]
    The Senior Industrial Engineer role involves leading the evaluation and implementation of manpower, systems, and corporate projects aimed at reducing operating expenses, increasing revenues, and en...[show_more]
    [last_updated.last_updated_variable_days]
    Cath Lab Technician FT Days

    Cath Lab Technician FT Days

    Saint Francis Hospital - Bartlett • BARTLETT, Tennessee, United States
    [job_card.full_time] +1
    We are a community built on care.Our caregivers and supporting staff extend compassion to those in need, helping to improve the health and well-being of those we serve, and provide comfort and heal...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Hardware Testing Specialist (MEMPHIS)

    Hardware Testing Specialist (MEMPHIS)

    JABIL CIRCUIT, INC • MEMPHIS, Tennessee, US
    [job_card.part_time]
    Under limited supervision designs, develops and maintains test procedures, tester hardware and software for electronic circuit board production. ESSENTIAL DUTIES AND RESPONSIBILITIES include the fol...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Renewal Risk Engineer - Management and Professional Liability (Southeast)Remote, Southeast (United States)

    Renewal Risk Engineer - Management and Professional Liability (Southeast)Remote, Southeast (United States)

    Counterpart • Memphis, TN, US
    [filters.remote]
    [job_card.full_time]
    Renewal Risk Engineer - Management And Professional Liability (Southeast).Counterpart is an insurtech platform reimagining management and professional liability for the modern workplace.We believe ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Retail Hardware Partnerships Leader

    Retail Hardware Partnerships Leader

    Digimarc • Memphis, TN, US
    [job_card.full_time]
    Retail Hardware Partnerships Leader.Digimarc is seeking a highly experienced and strategic Retail Hardware Partnerships Leader to cultivate and manage executive-level relationships with key hardwar...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Hardware Testing Specialist (MEMPHIS)

    Hardware Testing Specialist (MEMPHIS)

    Jabil Circuit, Inc. • Memphis, TN, United States
    [job_card.full_time]
    SUMMARY Experience, qualification, and soft skills, have you got everything required to succeed in this opportunity Find out below. Under limited supervision designs, develops and maintains test pr...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sales Engineer

    Sales Engineer

    Promise • Memphis, TN, US
    [job_card.full_time] +1
    Promise empowers utilities and government agencies to create flexible, affordable solutions for individuals struggling with debt. Our innovative approach to payment plans and relief distribution sig...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Full Stack Engineer (Remote)

    Full Stack Engineer (Remote)

    Scale AI • Memphis, Tennessee, United States
    [filters.remote]
    [job_card.full_time]
    Join a global community of talented professionals to shape the future of AI.Earn up to $15 USD / hr and additional rewards based on quality of submission. Outlier is committed to improving the intelli...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Electrical Reliability Engineer

    Electrical Reliability Engineer

    International Flavors and Fragrances • Memphis, Tennessee, United States
    [job_card.full_time]
    We are seeking a highly skilled Electrical Engineer specializing in power distribution systems to join our Memphis manufacturing facility. This critical role focuses on ensuring the reliability, saf...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Environmental Remediation Project Engineer

    Environmental Remediation Project Engineer

    EnSafe • Memphis, TN, US
    [job_card.full_time]
    Do you enjoy collaborating with diverse project teams in a fast-paced environment? Are you interested in joining an organization focused on making a positive impact? If so, we would like to discuss...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Reserve Engineer (On-site) - Memphis, TN

    Reserve Engineer (On-site) - Memphis, TN

    Cobalt Robotics • Memphis, Tennessee, United States, 38103
    [job_card.full_time]
    Reserve Engineer (On-site) - Memphis, TN.Cobalt Robotics utilizes a combination of human expertise, robotic technology, and omni solutions to effectively resolve any security incidents.Our dedicate...[show_more]
    [last_updated.last_updated_30]