Talent.com
Machine Learning Engineer, Training Infrastructure
Machine Learning Engineer, Training InfrastructureIntellipro Group • San Francisco, California, United States
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Intellipro Group • San Francisco, California, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Title :   Machine Learning Engineer, Training Infrastructure

Position Type : Full time

Location : San Francisco, CA, USA

Salary Range :  $150,000 - $250, 000 (USD)

Job ID# : 158135

Job Description :

We are looking for an ML Engineer with  3+ YOE  in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate  has diverse  experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don't meet every requirement — we value curiosity, creativity, and the drive to solve hard problems.

Responsibilities

Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.

Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.

Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.

Monitor system performance and implement improvements to maximize efficiency  and utilization , using tools like Airflow for orchestration.

Collaborate across research teams to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Requirements :

Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration.

Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.

This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability.

Values engineering processes and version control (CI / CD).

Knowledge of containerization technologies like Docker and Kubernetes required for deployments at scale.

Understanding of distributed training techniques and how to scale models across multi-node clusters aligning with video generation needs.

Strong problem-solving and communication skills, given the need to collaborate with diverse teams.

About Us :

Founded in 2009, IntelliPro is a global leader in talent acquisition and HR solutions. Our commitment to delivering unparalleled service to clients, fostering employee growth, and building enduring partnerships sets us apart. We continue leading global talent solutions with a dynamic presence in over 160 countries, including the USA, China, Canada, Singapore, Japan, Philippines, UK, India, Netherlands, and the EU.

IntelliPro, a global leader connecting individuals with rewarding employment opportunities, is dedicated to understanding your career aspirations. As an Equal Opportunity Employer, IntelliPro values diversity and does not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, genetic information, disability, or any other legally protected group status. Moreover, our Inclusivity Commitment emphasizes embracing candidates of all abilities and ensures that our hiring and interview processes accommodate the needs of all applicants. Learn more about our commitment to diversity and inclusivity at https : / / intelliprogroup.com / .

Compensation : The pay offered to a successful candidate will be determined by various factors, including education, work experience, location, job responsibilities, certifications, and more. Additionally, IntelliPro provides a comprehensive benefits package, all subject to eligibility.

[job_alerts.create_a_job]

Machine Learning Engineer • San Francisco, California, United States

[internal_linking.related_jobs]
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal Labs, Inc. • San Francisco, CA, United States
[job_card.full_time]
Our mission is to build causal intelligence, starting with physics models to predict and control the weather.We're building a small team driven by a deep passion and urgency to solve this civilizat...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra, Inc • San Francisco, CA, United States
[job_card.full_time]
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Greylock Partners • San Francisco, CA, United States
[job_card.full_time]
Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Engineer, Relevance

Machine Learning Engineer, Relevance

Patreon • San Francisco, California, United States
[job_card.full_time]
Patreon is a media and community platform where over 300,000 creators give their biggest fans access to exclusive work and experiences. We offer creators a variety of ways to engage with their fans ...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Abridge • San Francisco, CA, United States
[job_card.full_time]
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI‑powered platform...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Engineer Model Evaluations, Public Sector

Machine Learning Engineer Model Evaluations, Public Sector

Scale AI • San Francisco, California, USA
[job_card.full_time]
Machine Learning Engineer - Model Evaluations Public Sector.The Public Sector ML team at Scale deploys advanced AI systemsincluding LLMs agentic models and multimodal pipelinesinto mission-critical...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Machine Learning Engineer - Training & Infrastructure

Machine Learning Engineer - Training & Infrastructure

P-1 AI • San Francisco, CA, United States
[job_card.full_time]
We are building an engineering AGI.We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Ambience Healthcare • San Francisco, CA, US
[job_card.full_time]
Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care.Trusted by ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Machine Learning Engineer

Machine Learning Engineer

Jobot • San Francisco, CA, US
[job_card.full_time]
Entry Level ML Engineer Needed for Growing AI Startup!.This Jobot Job is hosted by : Reed Kellick.Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.Salary : ...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

IntelliPro Group Inc. • San Francisco, CA, US
[job_card.full_time]
[filters_job_card.quick_apply]
Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...[show_more]
[last_updated.last_updated_30]
Machine Learning Engineer, Distributed & Scalable Training

Machine Learning Engineer, Distributed & Scalable Training

Lila Sciences • San Francisco, California, United States
[job_card.full_time]
We’re seeking a ML Engineer specializing in.You’ll design and maintain large-scale training systems, optimize performance for massive models, and integrate cutting-edge techniques to improve effici...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

Apple Inc. • San Francisco, CA, United States
[job_card.full_time]
Machine Learning Infrastructure Engineer, Evaluation.San Francisco, California, United States Software and Services.How do we ensure that Apple's most advanced AI features perform flawlessly for ev...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Machine Learning Engineer - Model Evaluations, Public Sector

Machine Learning Engineer - Model Evaluations, Public Sector

Scale AI, Inc. • San Francisco, California, United States
[job_card.full_time]
Machine Learning Engineer - Model Evaluations, Public Sector.The Public Sector ML team at Scale deploys advanced AI systems-including LLMs, agentic models, and multimodal pipelines-into mission-cri...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Character.AI • San Francisco, CA, United States
[job_card.full_time]
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra • San Francisco, CA, United States
[job_card.full_time]
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Ipro Networks Pte. Ltd. • San Francisco, CA, United States
[job_card.full_time]
Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...[show_more]
[last_updated.last_updated_30] • [promoted]
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Ambience Healthcare, Inc. • San Francisco, California, United States
[job_card.full_time]
About Us : Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care.T...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Machine Learning Engineer, Foundation Model

Machine Learning Engineer, Foundation Model

Stripe • San Francisco, California, United States
[job_card.full_time]
Stripe’s mission is to accelerate global economic and technological development.We offer financial infrastructure and a variety of services to serve the needs of a wide range of users, from startup...[show_more]
[last_updated.last_updated_30] • [promoted]