Talent.com
HPC Performance and Validation Engineer
HPC Performance and Validation EngineerINSPYR Solutions • Dallas, TX, United States
[error_messages.no_longer_accepting]
HPC Performance and Validation Engineer

HPC Performance and Validation Engineer

INSPYR Solutions • Dallas, TX, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
  • [job_card.permanent]
[job_card.job_description]

Title : HPC Performance and Validation Engineer

Location : Dallas, Texas (Hybrid - relocation assistance available)

Duration : Full-time - direct hire

Compensation : $200,000 - $300,000 (Base) OTE : $400,000+

Work Requirements : US Citizen OR GC Holders

HPC Performance and Validation Engineer

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?

Our client is a leading quantitative research and technology firm, with offices in London and Dallas.

They are proud to employ some of the best people in their field and to nurture their talent in a dynamic, flexible and highly stimulating culture where world-beating ideas are cultivated and rewarded.

This is a hybrid role based in their new Dallas infrastructure hub where they work on the latest technologies in a cutting-edge environment.

The Role :

  • As an HPC Validation and Performance Engineer, you will take ownership of the validation and optimization of our HPC CPU and GPU calc farms.
  • This critical role will involve developing a validation and performance baselining framework, which ensures system readiness for AI / ML and HPC workloads across multiple architectures. Your role will be essential in providing continuous performance benchmarking, real-time observability, and long-term strategic readiness.
  • You will drive the implementation of advanced tooling and frameworks, maintaining an infrastructure that is crucial to our cutting-edge research efforts. You will be accountable for providing data driven performance metrics to support architectural design choices as we continue to globally scale our data centre footprint.
  • We are looking for someone with deep technical expertise in compute, storage or networking optimizations and performance engineering who can develop solutions that scale with our growing infrastructure.
  • This role demands a forward-thinking engineer who can anticipate industry trends and adopt emerging architectures and strategies to keep us at the forefront of innovation.

Key Responsibilities :

  • Architecting and implementing a validation framework to certify the readiness and utilization of GPU nodes across a large, distributed HPC environment
  • Defining methodologies to continually assess performance and optimising infrastructure across AI / ML workloads
  • Developing and executing comprehensive performance testing using industry and customer specific benchmarks, ensuring optimal performance across HPC compute, storage and networking
  • Contribute to research reports that will describe the discoveries of the benchmarking, evaluating the complete HW performance and efficiency
  • Leading efforts to debug, identify and then resolve bottlenecks in system performance
  • Building robust, scalable tools for automated validation and testing, utilising Python, Go, Kubernetes and CI / CD pipelines to streamline continuous validation and benchmarking processes
  • Implementing monitoring solutions using Prometheus, Grafana and other modern monitoring technologies to track performance metrics and real-time health of the cluster
  • Defining and implementing best practice for continuous performance validation, ensuring that the infrastructure remains reliable and efficient as new technologies emerge
  • Staying informed on industry trends and advancements to ensure long-term strategic alignment
  • Working cross-functionally with engineering, infrastructure and research teams to align validation efforts with the broader business objectives, ensuring that the platform meets evolving research demands
  • Who are we looking for?

  • Accelerator performance experience, including profiling and tuning with large-scale GPU clusters
  • In-depth understanding of NVIDIA ClusterKit, Nsight and Validation Suite, MLPerf and DCGM tools for GPU and DPUs
  • Networking & Storage performance experience, including profiling and optimisation with NVIDIA ClusterKit, iPerf or equivalent across InfiniBand / RoCe network implementations
  • System benchmarking experience across Linux and familiarity with the Phronix suite or equivalent
  • Experience with HPC workloads across distributed global locations, bringing data driven performance data to compliment key architectural decisions
  • Strong proficiency in developing automation tools and micro benchmarking frameworks for validation using Python, Go, and Kubernetes in a Ubuntu Linux environment
  • Expertise with key monitoring platforms including OTEL, Prometheus, ELK and Grafana and in definition and implementing the overall observability strategy for HPC validation and performance monitoring
  • A deep understanding of emerging technologies, architectures and strategies, with the ability to assess their potential impact on infrastructure and adopt them as part of a long-term plan
  • Proven ability to lead complex technical projects, influence decisions and engage with stakeholders across technical and research teams
  • About INSPYR Solutions

    Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.

    INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities.

    Information collected and processed through your application with INSPYR Solutions (including any job applications you choose to submit) is subject to INSPYR Solutions' Privacy Policy and INSPYR Solutions' AI and Automated Employment Decision Tool Policy : https : / / www.inspyrsolutions.com / policies / . By submitting an application, you are consenting to being contacted by INSPYR Solutions through phone, email, or text.

    25-16620

    [job_alerts.create_a_job]

    Validation Engineer • Dallas, TX, United States

    [internal_linking.similar_jobs]
    Verification and Validation Engineer

    Verification and Validation Engineer

    Spectral MD • Dallas, TX, United States
    [job_card.full_time]
    The Verification and Validation Engineer will be responsible for all activities related to the design Verification and design Validation (V&V) of Spectral AI, Inc. This includes, but is not limited ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Travel Physical Therapist (PT) - $1,863 per week in Ennis, TX

    Travel Physical Therapist (PT) - $1,863 per week in Ennis, TX

    AlliedTravelCareers • Ennis, TX, US
    [job_card.full_time]
    AlliedTravelCareers is working with OneStaff Medical to find a qualified Physical Therapist (PT) in Ennis, Texas, 75119!. An independently-owned, nationally-recognized and amazingly awesome staffing...[show_more]
    [last_updated.last_updated_30] • [promoted]
    General Manager

    General Manager

    Sonic Drive-In • Palmer, TX, United States
    [job_card.full_time]
    It's the dream job you never have to wake up from.At SONIC, you'll whistle while you work, gaining a sense of accomplishment along the way. You'll interact with fantastic people, earn great pay, spo...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Principal SoC Design Verification Engineer

    Principal SoC Design Verification Engineer

    GLOBALFOUNDRIES • Richardson, TX, United States
    [job_card.full_time]
    GlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world’s most inspired technology compa...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Lead HPC Hardware Engineer

    Lead HPC Hardware Engineer

    G-Research • Dallas, TX, United States
    [job_card.full_time]
    Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?.G-Research is a leading quantitative research and technology firm, with offices in London...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    FPGA Engineer I

    FPGA Engineer I

    ADTRAN • Dallas, TX, United States
    [job_card.full_time]
    Our Growth is Creating Great Opportunities!.Our team is expanding, and we want to hire the most talented people we can.Continued success depends on it! Once you've had a chance to explore our curre...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Staff Verification Engineer

    Senior Staff Verification Engineer

    Infineon Technologies AG • Dallas, TX, United States
    [job_card.full_time]
    As a Senior Staff Verification Engineer, you will be responsible for verifying the DisplayPort / eDP, protocol and supporting circuitry. Ethernet PHY, MAC, Interoperability, with MIPI protocol.Workin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    VAVE Engineer

    VAVE Engineer

    International Staff Consulting • Dallas, TX, United States
    [job_card.full_time]
    A well-established HVAC manufacturer is seeking a.VAVE (Value Analysis / Value Engineering) Engineer.This role combines hands-on technical analysis, supplier collaboration, and cross-functional teamw...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Memory Design Engineer, HBM

    Staff Memory Design Engineer, HBM

    Micron • Richardson, TX, US
    [job_card.full_time]
    Our vision is to transform how the world uses information to enrich life for.Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of inf...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    PLC Programmer / Controls Engineer 163679

    PLC Programmer / Controls Engineer 163679

    A-Line Staffing Solutions LLC • Plano, TX, US
    [job_card.full_time] +1
    A-Line Staffing is now hiring a PLC Programmer / Controls Engineer in Plano, TX!.The PLC Programmer / Controls Engineer will be working in a. PLC Programmer / Controls Engineer Highlights.The pay fo...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    DevOps Engineer

    DevOps Engineer

    Ascentt • Plano, Texas, United States
    [job_card.full_time]
    Job Description (Summary of Responsibilities) : .Cloud Infrastructure Management : Design, implement, and manage cloud-based infrastructure on AWS and Azure, ensuring optimal scalability, performance,...[show_more]
    [last_updated.last_updated_30] • [promoted]
    HVDC Engineer

    HVDC Engineer

    Jacobs • Dallas, TX, United States
    [job_card.full_time]
    Our People & Places Solutions business - reinforces our drive to improve the lives of people everywhere and epitomizes the "why" of what we do - the tremendous positive impact and value our solutio...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Travel Cath Lab Tech Job in Ennis, TX - $12,167 per Month (2 Years Experience Needed)

    Travel Cath Lab Tech Job in Ennis, TX - $12,167 per Month (2 Years Experience Needed)

    Vetted Health • Ennis, TX, United States
    [job_card.full_time]
    Vetted is seeking a Cath Lab Tech for a travel job in Ennis, Texas.Must have 2+ years of experience.This contract pays approximately $12,167 / month gross. Assignment details : Contract length : 13 week...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted]
    RTL Engineer

    RTL Engineer

    Glow Networks • Dallas, TX, United States
    [job_card.full_time]
    The RTL Engineer performs detailed block design from system requirements and evolving specifications.Perform RTL coding, Lint checks, CDC tests, creating timing constraint file.Working closely with...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    HPC Performance and Validation Engineer

    HPC Performance and Validation Engineer

    NorthMark Strategies • Dallas, TX, United States
    [job_card.full_time]
    NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-p...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    RN Cardiac Inpatient Telemetry Nights

    RN Cardiac Inpatient Telemetry Nights

    Baylor Scott & White Health • Ennis, TX, US
    [job_card.full_time]
    Baylor Scott & White Medical Center Waxahachie.Full-time, Nights, 3 / 12s 7P-7A, No call .In-patient experience with PCU level experience preferred. Advanced critical thinking skills; self-mo...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Physician (MD / DO) - Anesthesiology - General / Other in Ennis, TX

    Physician (MD / DO) - Anesthesiology - General / Other in Ennis, TX

    LocumJobsOnline • Ennis, TX, US
    [job_card.full_time]
    Doctor of Medicine | Anesthesiology - General / Other.Competitive weekly pay (inquire for details) .LocumJobsOnline is working with CompHealth to find a qualified Anesthesiology MD in Ennis, Texas, 7...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Design Verification Engineer

    Design Verification Engineer

    Kasmo Global • Dallas, TX, United States
    [job_card.full_time]
    Looking for a senior verification engineers to manage complex subsystem verification with Synopsys peripherals.Hands on experience on testbench development, test plan, coverage and validation for n...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]