Talent.com
Director of Machine Learning Engineering - Training and Performance
Director of Machine Learning Engineering - Training and PerformanceAdvanced Micro Devices, Inc. • San Jose, CA, US
[error_messages.no_longer_accepting]
Director of Machine Learning Engineering - Training and Performance

Director of Machine Learning Engineering - Training and Performance

Advanced Micro Devices, Inc. • San Jose, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you\'ll discover the real differentiator is our culture. We push the limits of innovation to solve the world\'s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE

AMD is seeking a Director of Machine Learning Engineering to join our Models and Applications organization. In this role, you will define and execute the technical vision for distributed training of large-scale generative AI and recommendation models on AMD GPUs. You\'ll guide a world-class engineering team focused on scaling AI training efficiency, optimizing model performance, and advancing AMD\'s leadership in AI systems.

This position blends deep technical expertise with strategic leadership. You will partner closely with research, hardware, and software teams to shape the roadmap for AMD\'s AI training stack - driving innovation at both the model and application levels, influencing how next-generation AI models are trained and deployed efficiently on AMD platforms.

THE PERSON

The ideal candidate is a strategic technical leader with a strong foundation in distributed training and AI infrastructure, coupled with experience building or guiding high-impact ML applications such as recommendation systems and ranking models. You combine visionary thinking with execution excellence, thrive in cross-functional collaboration, and are passionate about scaling AI systems that fully leverage AMD GPU performance across both model and application layers.

KEY RESPONSIBILITIES

Strategic Leadership & Vision : Define and drive AMD\'s distributed training strategy for large-scale generative and recommendation models. Align technical initiatives with broader AI platform goals and business impact.

Technical Direction & Innovation : Architect and optimize distributed training pipelines (Pre-training, SFT, RL etc.) for large-scale models. Explore new approaches for efficient training and inference of LLMs and ranking systems.

Execution & Delivery : Lead development of high-performance, reliable training pipelines that scale across thousands of GPUs. Ensure world-class efficiency, stability, and model convergence.

Cross-Functional Collaboration : Partner with compiler, runtime, system software, and hardware architecture teams to co-design solutions that maximize end-to-end performance.

Team Leadership & Development : Build, mentor, and empower a team of expert engineers focused on innovation, collaboration, and technical excellence.

Open Source & External Engagement : Drive AMD\'s engagement in open-source communities through contributions to frameworks such as PyTorch, JAX, TorchTitan, and Megatron-LM. Represent AMD\'s leadership in AI system design across industry and research communities.

Research & Trends : Stay ahead of emerging advances in distributed training, LLMs, recommendation systems, and AI infrastructure - and translate them into scalable engineering practices.

PREFERRED EXPERIENCE

10+ years in machine learning, distributed systems, or AI infrastructure; 5+ years in technical leadership or management roles.

Proven experience building and optimizing distributed training systems for large models.

Prefer experience in both model and application-level development and optimization.

Strong familiarity with ML frameworks (PyTorch, JAX, TensorFlow) and distributed frameworks (TorchTitan, Megatron-LM).

Hands-on expertise with LLMs, recommendation systems, or ranking models.

Proficiency in Python and C++, including performance profiling, debugging, and large-scale optimization.

Experience collaborating across hardware, compiler, and system software layers.

Excellent communication, leadership, and problem-solving skills with the ability to influence across organizations and external partners.

ACADEMIC CREDENTIALS

Master\'s or Ph.D. in Computer Science, Artificial Intelligence, Machine Learning, or a related field.

LOCATION

San Jose, CA or Bellevue, WA preferred. Other U.S. locations near AMD offices may be considered.

LI-MV1

BENEFITS

Benefits offered are described : AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants\' needs under the respective laws throughout all stages of the recruitment and selection process.

J-18808-Ljbffr

[job_alerts.create_a_job]

Director Of Training • San Jose, CA, US

[internal_linking.similar_jobs]
Director of Machine Learning Engineering Training and Performance

Director of Machine Learning Engineering Training and Performance

Advanced Micro Devices • San Jose, CA, United States
[job_card.full_time]
WHAT YOU DO AT AMD CHANGES EVERYTHING.At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded syst...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Director of Machine Learning Engineering - Training and Performance

Director of Machine Learning Engineering - Training and Performance

Advanced Micro Devices, Inc. • San Jose, CA, United States
[job_card.full_time]
WHAT YOU DO AT AMD CHANGES EVERYTHING.At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded syst...[show_more]
[last_updated.last_updated_30] • [promoted]
Director, Machine Learning Engineering

Director, Machine Learning Engineering

PayPal • San Jose, CA, United States
[job_card.full_time]
Director, Machine Learning Engineering.This job will drive the strategic vision and development of cutting-edge machine learning models and algorithms to solve complex problems.You will work closel...[show_more]
[last_updated.last_updated_30] • [promoted]
Director Machine Learning Engineering - AI / ML Model Compiler and Applications

Director Machine Learning Engineering - AI / ML Model Compiler and Applications

AMD • San Jose, CA, United States
[job_card.full_time]
Director Machine Learning Engineering - AI / ML Model Compiler and Applications.Director Machine Learning Engineering - AI / ML Model Compiler and Applications. What You Do at AMD Changes Everything.At ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Director, Machine Learning Engineering, Programmatic Ads

Director, Machine Learning Engineering, Programmatic Ads

Pinterest • Palo Alto, CA, United States
[job_card.full_time]
Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a mission to br...[show_more]
[last_updated.last_updated_30] • [promoted]
Sr. Machine Learning Engineering Manager – ML Data

Sr. Machine Learning Engineering Manager – ML Data

Apple Inc. • Cupertino, CA, United States
[job_card.full_time]
Machine Learning Engineering Manager – ML Data.Cupertino, California, United States.At Apple, we strive every day to create products that enrich people's lives. Apple Ads group helps users worldwide...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
AI & Machine Learning Engineering Consultant - Manager - Consulting - Location OPEN

AI & Machine Learning Engineering Consultant - Manager - Consulting - Location OPEN

EY • Palo Alto, CA, United States
[job_card.full_time]
AI & Machine Learning Engineering Consultant – Manager – Consulting.At EY, we’re all in to shape your future with confidence. We’ll help you succeed in a globally connected powerhouse of diverse tea...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Director of Data Quality

Director of Data Quality

Epicor • Dublin, CA, United States
[job_card.permanent]
As Director of Data Quality at Epicor, you will lead a transformative enterprise-wide initiative to ensure the integrity, accuracy, and reliability of data across our billion-dollar software organi...[show_more]
[last_updated.last_updated_30] • [promoted]
Director Engineering, AI / ML

Director Engineering, AI / ML

CoreWeave • Sunnyvale, CA, US
[job_card.permanent]
CoreWeave is The Essential Cloud for AI™.Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confi...[show_more]
[last_updated.last_updated_30] • [promoted]
Director of Machine Learning

Director of Machine Learning

Expedia Group • San Jose, CA, United States
[job_card.full_time]
Our Technology Team partners with teams across Expedia Group to create innovative products, services, and tools to deliver high-quality experiences for travelers, partners, and our employees.A sing...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Manager, Machine Learning Engineering

Manager, Machine Learning Engineering

Adobe • San Jose, CA, United States
[job_card.full_time]
Manager, Machine Learning Engineering.Adobe Firefly Foundry is a managed generative-AI service purpose-built for enterprises that need to scale content creation without compromising brand safety or...[show_more]
[last_updated.last_updated_30] • [promoted]
Strategic Systems Engineering Director, AI / ML Leader

Strategic Systems Engineering Director, AI / ML Leader

Celestica • San Jose, CA, United States
[job_card.full_time]
A leading technology company seeks a Sr Director Technical Engineer in San Jose, California, to lead technical customer engagements and foster innovation. Candidates should have a deep technical bac...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Director AI / ML Strategic Customers Engineering

Director AI / ML Strategic Customers Engineering

Oracle • Santa Clara, CA, United States
[job_card.full_time]
Director AI / ML Strategic Customers Engineering.Director AI / ML Strategic Customers Engineering.The Strategic Customers Engineering team manages relationships for some of OCI’s top revenue generating...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Director of Product Management - AI Products

Director of Product Management - AI Products

Synopsys • Sunnyvale, CA, United States
[job_card.full_time]
At Synopsys, we drive the innovations that shape the way we live and connect.Our technology is central to the Era of Pervasive Intelligence, from self-driving cars to learning machines.We lead in c...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, Machine Learning Behavior Planning & Prediction

Engineering Manager, Machine Learning Behavior Planning & Prediction

Woven • Palo Alto, CA, United States
[job_card.full_time]
Engineering Manager, Machine Learning Behavior Planning & Prediction.Palo Alto, CA / Product & Technology - AD / ADAS / Employee / hybrid. Woven by Toyota is enabling Toyota’s once-in-a-century transf...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Medicare Medical Director (MD)

Medicare Medical Director (MD)

Central California Alliance for Health • Santa Cruz, CA, United States
[job_card.full_time] +1
This is a hybrid position with the expectation to work in our service area(s) 2-3 days per month.The Alliance service area includes Santa Cruz, Monterey, Merced, San Benito, and Mariposa counties.W...[show_more]
[last_updated.last_updated_30] • [promoted]
Top-Tier Silicon Valley Role With Competitive Compensation, Bonuses & High Growth Potential

Top-Tier Silicon Valley Role With Competitive Compensation, Bonuses & High Growth Potential

HealthEcareers - Client • Scotts Valley, California, United States
[job_card.full_time]
Find a Career Where You Can Thrive—Not Just Another Job.At Schweiger Dermatology Group, we offer an opportunity to grow and excel in a supportive and dynamic environment. New York, New Jersey, Penns...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Clinical Director

Clinical Director

Insight Global • Santa Cruz, CA, US
[job_card.full_time]
Insight Global is seeking a Clinical Director to provide leadership and oversight for behavioral health programs within our Crisis Stabilization Unit (CSU). This role is pivotal in guiding clinical ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]