Talent.com
AI Cluster Test Automation Engineer
AI Cluster Test Automation EngineerAMD • Santa Clara, CA, US
[error_messages.no_longer_accepting]
AI Cluster Test Automation Engineer

AI Cluster Test Automation Engineer

AMD • Santa Clara, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]


WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.



THE ROLE:

AMD is looking for an AI solutions validation Engineer who is passionate about complex AI solutions, AI infrastructure, building cluster scale automation for distributed training and inference workloads, MLOps. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology.

THE PERSON:

The ideal candidate should be passionate about software engineering, system design, validation, automation and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

KEY RESPONSIBILITIES:

  • Work with AMD's architecture specialists to validate AI solutions for distributed training and inference workloads with AMD's ROCM software
  • Build cluster scale automation for distributed training and inference workloads
  • Reproduce field defects and develop appropriate tests to prevent future issues.
  • Design, develop and deploy testing tools and automation libraries necessary to perform testing.
  • Lead the adoption of tooling and industry best practices by means of advocacy and outreach to help our development communities level up.
  • Other duties as assigned

PREFERRED EXPERIENCE:

  • Languages: Python, C, C++, Linux Shell scripting.
  • Frameworks/Libraries: TensorFlow, PyTorch, ONNXRT
  • Tools: Prior experience with Linux, Docker, Kubernetes,SLURM, LLVM compilers
  • Good experience with complex computer systems used in AI, HPC deployments, backend network designs in RDMA clusters
  • Experience in validating complex AI infrastructure - GPUs, networking, ROCEv2, UEC, running benchmark tests like IBPerf benchmarking, RCCL/NCCL.
  • Experience with performance profiling of CPUs, GPUs and debugging complex compute, network, storage problems.
  • Experience with running training of LLMs, MoE models, Image Generation, recommendations models with different frameworks like PyTorch, Tensorflow, Megatron-LM, JAX. Running training performance benchmarks.
  • Experience with running inference workloads in AI clusters with different inference frameworks like vLLM, SGLang. Running performance benchmarks for inference.
  • Desired Skills: Understanding of High-Performance Computing application, Machine learning and GPU Programming, MPI Parallel Programming, Enabling various ML//Inference models

ACADEMIC CREDENTIALS:

  • Bachelor's Degree or higher in Computer Science or related quantitative field.
  • An advanced degree or equivalent practical work experience is a plus.

This role is not eligible for visa sponsorship.

#LI-CJ3

#LI-Hybrid



Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

[job_alerts.create_a_job]

AI Cluster Test Automation Engineer • Santa Clara, CA, US

[internal_linking.similar_jobs]

Automation Test Engineer

Omni InclusiveSan Ramon, CA, United States
[job_card.full_time]

Mandatory Skills : Skills required: CodeCeptJS, JavaScript/Playwright, RestAPI, CI/CD, SQL, Agile.Framework Migration Engineer to help us migrate our test automation framework.The ideal candidate w...[internal_linking.show_more]

 • [job_card.promoted]

Test Triage & Automation Engineer, Siri

AppleCupertino, CA, United States
[job_card.full_time]

Apple is where individual imaginations gather together, committing to the values that lead to great work.Every new product we build, service we create, or Apple Store experience we deliver is the r...[internal_linking.show_more]

 • [job_card.promoted]

Sr Staff AI Automation / Test Engineer

Palo Alto NetworksSanta Clara, CA, United States
[job_card.full_time]

At Palo Alto Networks®, we're united by a shared mission-to protect our digital way of life.We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge tec...[internal_linking.show_more]

 • [job_card.promoted]

Senior Hardware Systems Test Automation Engineer

Taara Connect, IncSunnyvale, CA, United States
[job_card.full_time]

Born at X, Google's Moonshot Factory, Taara is on a mission to connect billions of people lacking abundant and affordable internet today by pioneering the way we use light to deliver faster, cheape...[internal_linking.show_more]

 • [job_card.promoted]

Senior AI Engineer

Cadence Design SystemsSan Jose, CA, United States
[job_card.full_time]

At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.We are looking for a talented Software Engineer with experience in Machine Learning.You ...[internal_linking.show_more]

 • [job_card.promoted]

Mobile Test Automation

Info Way SolutionsMountain View, CA, United States
[job_card.full_time]

Mountain View, CA - Onsite Job.Maestro framework and Selenium/Appium.Execute automated test suites, analyse test results, and generate comprehensive test reports to provide insights into the qualit...[internal_linking.show_more]

 • [job_card.promoted]

Device Automation Test Senior Engineer

Tata Consultancy ServicesCupertino, CA, United States
[job_card.full_time]

JOB DESCRIPTION: Key Responsibilities.Execute E2E testing across devices, backend services, data pipelines, browser UI and supporting systems.Validate functionality of unreleased device features an...[internal_linking.show_more]

 • [job_card.promoted]

Senior Test Automation Engineer - Cloud

Rootshell Enterprise TechnologiesSanta Clara, CA, United States
[job_card.full_time]

Responsible for setting up test environment and automation jobs for application microservices deployed in both on-prem and on Cloud.Setup test tools to validate environment, application and solutio...[internal_linking.show_more]

 • [job_card.promoted]

AI Engineet

SparktekSan Jose, CA, United States
[job_card.full_time]

Cisco has opened a new AI position and the requirement is immediate.We need to submit strong and relevant profiles at the earliest.Please prioritize this request and start sharing suitable resumes ...[internal_linking.show_more]

 • [job_card.promoted]

Automation Test Engineer

QualiTest GroupSanta Clara, CA, United States
[job_card.full_time]

Are you interested in working with the World's leading AI-powered Quality Engineering Company? Ready to advance your career, team up with global thought leaders across industries and make a differe...[internal_linking.show_more]

 • [job_card.promoted]

Software Engineer - Test Automation

KLAMilpitas, CA, United States
[job_card.full_time]

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem.Virtually every electronic device in the world is produced using our technologies.No laptop, smartpho...[internal_linking.show_more]

 • [job_card.promoted]

Senior Automation Test Engineer, Android

Otter.aiMountain View, CA, United States
[job_card.full_time]

Join us on our quest to make conversations more valuable!.At Otter, we're fueled by the unwavering teamwork and dedication of our employees.Our collective passion drives us to lead with AI innovati...[internal_linking.show_more]

 • [job_card.promoted]

Automation Engineer

E-SolutionsMountain View, CA, United States
[job_card.full_time]

Job Title: Automation Engineer.Automation QA (Test) Engineer designs, develops, and executes automated tests to ensure the quality and performance of software applications.They work closely with de...[internal_linking.show_more]

 • [job_card.promoted]

AI/ML Engineer

InterSourcesFremont, CA, United States
[job_card.temporary]

Experience in AI/ML development, with focus on OpenAI services, NLPs and LLMs.Ability to fine-tune pre-trained models for custom tailored solutions.Drive AI-powered automation for testing and test-...[internal_linking.show_more]

 • [job_card.promoted]

Automotive Test Automation Engineer

Cinder LLCSunnyvale, CA, United States
[job_card.full_time]

Automotive Test Automation Engineer.This is for a Potential Future Opportunity.Novus Labs is a engineering company supporting the largest tech companies in bringing some of the most innovative prod...[internal_linking.show_more]

 • [job_card.promoted]

AI Research Engineer - Enterprise Automation

NIOSan Jose, CA, United States
[job_card.full_time]

NIO is a pioneer and a leading company in the premium smart electric vehicle market.Founded in November 2014, NIO's mission is to shape a joyful lifestyle.NIO aims to build a community starting wit...[internal_linking.show_more]

 • [job_card.promoted]

Senior Machine Learning Engineer, AI Automation

UnityMountain View, CA, United States
[job_card.full_time]

At Unity, we are building a world-class ad-tech ecosystem that connects billions of users with games and experiences they love.The Advertiser Growth team is at the heart of this mission, owning the...[internal_linking.show_more]

 • [job_card.promoted]

Senior Applied AI Engineer for Enterprise Automation

GEICOPalo Alto, CA, United States
[job_card.full_time]

A leading insurance company is seeking a Staff Engineer focused on Applied AI to design scalable AI solutions that enhance customer experiences.The role requires over 8 years of experience in softw...[internal_linking.show_more]