Talent.com
AI Cluster Test Automation Engineer
AI Cluster Test Automation EngineerAdvanced Micro Devices, Inc • Santa Clara, California, United States
AI Cluster Test Automation Engineer

AI Cluster Test Automation Engineer

Advanced Micro Devices, Inc • Santa Clara, California, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE : AMD is looking for an AI solutions validation Engineer who is passionate about complex AI solutions, AI infrastructure, building cluster scale automation for distributed training and inference workloads, MLOps. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON : The ideal candidate should be passionate about software engineering, system design, validation, automation and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES : Work with AMD’s architecture specialists to validate AI solutions for distributed training and inference workloads with AMD's ROCM software Build cluster scale automation for distributed training and inference workloads Reproduce field defects and develop appropriate tests to prevent future issues. Design, develop and deploy testing tools and automation libraries necessary to perform testing. Lead the adoption of tooling and industry best practices by means of advocacy and outreach to help our development communities level up. Other duties as assigned PREFERRED EXPERIENCE : Languages : Python, C, C++, Linux Shell scripting. Frameworks / Libraries : TensorFlow, PyTorch, ONNXRT Tools : Prior experience with Linux, Docker, Kubernetes,SLURM, LLVM compilers Good experience with complex computer systems used in AI, HPC deployments, backend network designs in RDMA clusters Experience in validating complex AI infrastructure - GPUs, networking, ROCEv2, UEC, running benchmark tests like IBPerf benchmarking, RCCL / NCCL. Experience with performance profiling of CPUs, GPUs and debugging complex compute, network, storage problems. Experience with running training of LLMs, MoE models, Image Generation, recommendations models with different frameworks like PyTorch, Tensorflow, Megatron-LM, JAX. Running training performance benchmarks. Experience with running inference workloads in AI clusters with different inference frameworks like vLLM, SGLang. Running performance benchmarks for inference. Desired Skills : Understanding of High-Performance Computing application, Machine learning and GPU Programming, MPI Parallel Programming, Enabling various ML / / Inference models ACADEMIC CREDENTIALS : Bachelor's Degree or higher in Computer Science or related quantitative field. An advanced degree or equivalent practical work experience is a plus. This role is not eligible for visa sponsorship. #LI-CJ3 #LI-Hybrid Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.THE ROLE : AMD is looking for an AI solutions validation Engineer who is passionate about complex AI solutions, AI infrastructure, building cluster scale automation for distributed training and inference workloads, MLOps. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON : The ideal candidate should be passionate about software engineering, system design, validation, automation and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES : Work with AMD’s architecture specialists to validate AI solutions for distributed training and inference workloads with AMD's ROCM software Build cluster scale automation for distributed training and inference workloads Reproduce field defects and develop appropriate tests to prevent future issues. Design, develop and deploy testing tools and automation libraries necessary to perform testing. Lead the adoption of tooling and industry best practices by means of advocacy and outreach to help our development communities level up. Other duties as assigned PREFERRED EXPERIENCE : Languages : Python, C, C++, Linux Shell scripting. Frameworks / Libraries : TensorFlow, PyTorch, ONNXRT Tools : Prior experience with Linux, Docker, Kubernetes,SLURM, LLVM compilers Good experience with complex computer systems used in AI, HPC deployments, backend network designs in RDMA clusters Experience in validating complex AI infrastructure - GPUs, networking, ROCEv2, UEC, running benchmark tests like IBPerf benchmarking, RCCL / NCCL. Experience with performance profiling of CPUs, GPUs and debugging complex compute, network, storage problems. Experience with running training of LLMs, MoE models, Image Generation, recommendations models with different frameworks like PyTorch, Tensorflow, Megatron-LM, JAX. Running training performance benchmarks. Experience with running inference workloads in AI clusters with different inference frameworks like vLLM, SGLang. Running performance benchmarks for inference. Desired Skills : Understanding of High-Performance Computing application, Machine learning and GPU Programming, MPI Parallel Programming, Enabling various ML / / Inference models ACADEMIC CREDENTIALS : Bachelor's Degree or higher in Computer Science or related quantitative field. An advanced degree or equivalent practical work experience is a plus. This role is not eligible for visa sponsorship. #LI-CJ3 #LI-Hybrid

Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

[job_alerts.create_a_job]

AI Cluster Test Automation Engineer • Santa Clara, California, United States

[internal_linking.similar_jobs]
Automation Test Engineer

Automation Test Engineer

Omni Inclusive • San Ramon, CA, United States
[job_card.full_time]
Skills required: CodeceptJS, JavaScript/Playwright, RestAPI, CI/CD, SQL, Agile.We are looking for an experienced Migration Engineer to help us migrate our test automation framework from Protractor ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
QE Automation Engineer (AI/ML)

QE Automation Engineer (AI/ML)

Redolent • Sunnyvale, CA, United States
[job_card.full_time]
Location: Sunnyvale CA or Remote.A QE (Quality Engineering) Engineer specializing in AI/ML focuses on ensuring the reliability, performance, and accuracy of AI and machine learning systems.This inv...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Sr. Software Test Automation Engineer

Sr. Software Test Automation Engineer

ManpowerGroup Global, Inc. • Milpitas, CA, United States
[job_card.full_time]
Software Test Automation Engineer.McCarthy Blvd, Milpitas, California USA 95035.Software Test Automation Engineer.Software as a Medical Device (SaMD).The ideal candidate will possess solid programm...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff SASE Test Engineer — AI-Driven Automation

Staff SASE Test Engineer — AI-Driven Automation

Palo Alto Networks • Santa Clara, CA, United States
[job_card.full_time]
A leading cybersecurity firm is seeking a Test Engineer to enhance the automation of its Prisma SASE Test team.The ideal candidate should possess strong automation skills in Python or TypeScript, p...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Software Engineer, Hardware Test Automation

Software Engineer, Hardware Test Automation

Tesla Motors, Inc. • Palo Alto, CA, United States
[job_card.full_time]
This team supports over a hundred stakeholders across a dozen hardware testing teams at Tesla by developing and continuously improving software related to hardware reliability test automation.This ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Autonomy AI Platforms Integration Engineer

Autonomy AI Platforms Integration Engineer

Tesla Motors, Inc. • Palo Alto, CA, United States
[job_card.full_time]
A leading electric vehicle manufacturer is seeking an Integration Engineer to work on cutting-edge AI and robotics.The role involves developing C/C++ software for both vehicle and robotic platforms...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal AI Automation / Test Engineer (Cloud NGFW)

Principal AI Automation / Test Engineer (Cloud NGFW)

Palo Alto Networks • Santa Clara, CA, United States
[job_card.full_time]
At Palo Alto Networks®, we're united by a shared mission-to protect our digital way of life.We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge tec...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Test Automation Infra Engineer – Hybrid + Equity Options

Test Automation Infra Engineer – Hybrid + Equity Options

Waymo • Mountain View, CA, United States
[job_card.full_time]
A leading autonomous driving technology company is seeking a qualified candidate for a Software Engineer role.The position requires a Bachelor’s degree and 3+ years of experience in C++.The ideal c...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Test Automation Engineer, Photonics

Senior Test Automation Engineer, Photonics

PsiQuantum • Milpitas, CA, United States
[job_card.full_time]
PsiQuantum'smission is to build the first useful quantum computers-machines capable of delivering the breakthroughs the field has long promised.Since our founding in 2016, our singular focus has be...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Wireless System Automation and Test Engineer

Wireless System Automation and Test Engineer

Apple • Sunnyvale, CA, United States
[job_card.full_time]
Do you have a passion for taking on big challenges? Do you love pushing the limits of what’s considered feasible? As part of our Wireless Hardware group, you’ll be responsible for bringing groundbr...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Software Engineer, Test & Automation

Software Engineer, Test & Automation

nEye Systems, Inc. • Santa Clara, CA, United States
[job_card.full_time]
Eye’s MEMS-based silicon photonics optical circuit switches (OCS) eliminate critical bottlenecks in AI processing by enabling direct optical connections among thousands of GPUs and memory units.The...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior AI Engineer

Senior AI Engineer

Mogi I/O: OTT/Podcast/Short Video Apps for you • Santa Clara, CA, United States
[job_card.full_time] +1
Location :New York / New Jersey / Santa Clara / Dallas (Hybrid or On-site as required).Work Type :Full-Time, Permanent.Experience Required :8–12 Years.Compensation :USD 130,000 — 160,000 (plus 10% ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Software Engineer - Agentic AI & Automation

Software Engineer - Agentic AI & Automation

Pure Storage, Inc. • Santa Clara, CA, United States
[job_card.full_time]
Software Engineer - Agentic AI & Automation.We’re in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow al...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Automation Test Engineer

Automation Test Engineer

TechDigital Corporation • San Ramon, CA, United States
[job_card.full_time]
Skills required: CodeceptJS, JavaScript/Playwright, RestAPI, CI/CD, SQL, Agile Framework.The ideal candidate will have experience with given frameworks and be able to identify and resolve any issue...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Test Automation Engineer - Real-Time Media (Hybrid)

Senior Test Automation Engineer - Real-Time Media (Hybrid)

Cantina • Sunnyvale, CA, United States
[job_card.full_time]
A leading media technology company in Sunnyvale is seeking a Senior Software Engineer in Test to develop automated test infrastructure.The role involves working closely with engineers, mentorship, ...[show_more]
[last_updated.last_updated_30] • [promoted]
Contract Senior Test Automation Engineer

Contract Senior Test Automation Engineer

InterSources • Fremont, CA, United States
[job_card.full_time]
Contract Senior Test Automation Engineer.Establishes and maintains processes for automating test cases; mentors the Quality Assurance (QA) department related to automation.Creates, modifies and mai...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior IVI Test Automation Engineer — Hybrid/Remote

Senior IVI Test Automation Engineer — Hybrid/Remote

Ford Motor Company • Palo Alto, CA, United States
[filters.remote]
[job_card.full_time]
A leading automotive company is seeking a QA Automation Engineer in Palo Alto, CA to enhance in-vehicle infotainment systems.Candidates should have 6+ years in QA Automation, proficient in Python a...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Mobile Automation & Test Engineer (CI/CD)

Senior Mobile Automation & Test Engineer (CI/CD)

Netflix • Los Gatos, CA, United States
[job_card.full_time]
A leading streaming service provider is seeking a Software Engineer 5 in Mobile Automation and Test.This role involves developing and maintaining mobile automation tools and workflows.The ideal can...[show_more]
[last_updated.last_updated_variable_days] • [promoted]