MTS-ML Infra, AGIAmazon • San Francisco, California, United States

MTS-ML Infra, AGI

Amazon • San Francisco, California, United States

[job_card.variable_days_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents.

AGI Autonomy is focused on developing new foundational capabilities for useful AI agents that can take actions in the digital and physical worlds. In other words, we're enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled.

In this role, you will work closely with research teams to design, build, and maintain systems for training and evaluating state-of-the-art agent models.

Our team works inside the Amazon AGI SF Lab, an environment designed to empower AI researchers and engineers to work with speed and focus. Our philosophy combines the agility of a startup with the resources of Amazon.

Key job responsibilities

Evaluate performance of the training infrastructure, diagnose problems and address any gaps that exist.

Develop reliable infrastructure to schedule training and model evaluation jobs across clusters.

Work closely with researchers to create new techniques, infrastructure, and tooling around emerging research capabilities and evaluating models to meet customer needs.

Manage project prioritization, deliverables, timelines, and stakeholder communication.

Illuminate trade-offs, educate the team on best practices, and influence technical strategy.

Operate in a dynamic environment to deliver high quality software.

About the team

The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. In other words, we're enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. The lab is designed to empower AI researchers and engineers to make major breakthroughs with speed and focus toward this goal. Our philosophy combines the agility of a startup with the resources of Amazon. By keeping the team lean, we're able to maximize the amount of compute per person. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research.-

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience, or Bachelor's degree in computer science or equivalent
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience-
Experience as a mentor, tech lead or leading an engineering team
Experience with full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations, or experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants : Job duties for this position include : work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company's reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https : / / amazon.jobs / content / en / how-we-hire / accommodations for more information. If the country / region you're applying in isn't listed, please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $150,000 / year in our lowest geographic market up to $325,000 / year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and / or other benefits. For more information, please visit https : / / www.aboutamazon.com / workplace / employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.

[job_alerts.create_a_job]

Infra • San Francisco, California, United States

[internal_linking.similar_jobs]

Lead ML Infra Engineer, Ads Platform

Roblox • San Mateo, CA, United States

[job_card.full_time]

A leading gaming platform company is seeking a Principal Machine Learning Infrastructure Engineer in San Mateo, CA.This role focuses on leading the planning and execution of scalable ML systems, me...[show_more]

[last_updated.last_updated_1_day] • [promoted]

ML Engineer

Catalyst Labs • Menlo Park, CA, US

[job_card.full_time]

Is a rapidly growing Tier 1 VC backed startup based in New York with $60 million in funding revolutionizing how outside sales and service teams work. Their AI technology captures and analyzes real-w...[show_more]

[last_updated.last_updated_30] • [promoted]

Tech Lead Manager- MLRE, ML Systems

Scale AI • San Francisco, CA, United States

[job_card.full_time]

Scale's LLM post-training platform team builds our internal distributed framework for large language model training.The platform powers MLEs, researchers, data scientists, and operators for fast an...[show_more]

[last_updated.last_updated_30] • [promoted]

MLE, ML Platform

zaimler • San Mateo, CA, US

[job_card.full_time]

We’re creating the foundation for AI systems that don’t just generate, but retrieve, link, and reason over enterprise knowledge. In just over a year, we’ve begun partnering with Fo...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Platform Engineer - ML / LLM Infra on AWS

jobr.pro • San Francisco, CA, United States

[job_card.full_time]

A tech startup in San Francisco is seeking a Senior Software Engineer to lead technical direction for their core platform. The role involves mentoring engineers, overseeing infrastructure across AWS...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

ML Systems Engineer

Genmo • San Francisco, CA, US

[job_card.full_time]

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the bo...[show_more]

[last_updated.last_updated_30] • [promoted]

ML Infrastructure Engineer

BlueSpace • Oakland, CA, US

[job_card.full_time]

Unlike conventional autonomy software, our patented 4D Predictive Perception removes reliance on data.By leveraging next-gen 4D sensors, we can precisely predict the motion of all objects, increasi...[show_more]

[last_updated.last_updated_30] • [promoted]

Staff / Principal ML Ops Engineer

PRAGMATIKE • San Francisco, CA, US

[job_card.full_time]

Cambridge, MA (Eastern Time / UTC -4).Pragmatike is hiring on behalf of a.AI startup recognized as a Top 10 GenAI company by GTM Capital. Staff / Principal ML Ops Engineer.ML infrastructure and prod...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Infra Engineer for Scalable ML Training Systems

Thinking Machines Lab Inc. • San Francisco, CA, United States

[job_card.full_time]

An innovative AI research organization is seeking an Infrastructure Research Engineer in San Francisco, California.This role involves designing and optimizing distributed training systems for large...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

AIML - ML Infrastructure Engineer, ML Platform & Technology - ML Compute

Apple Inc. • San Francisco, CA, United States

[job_card.full_time]

AIML - ML Infrastructure Engineer, ML Platform & Technology - ML Compute.San Francisco Bay Area, California, United States Machine Learning and AI. Apple is where individual imaginations gather toge...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Senior ML Infra Engineer - Scale Production ML

Gridware • San Francisco, CA, United States

[job_card.full_time]

A technology company in San Francisco is seeking a Senior ML Infrastructure Engineer to design and maintain ML model deployment infrastructure. The ideal candidate will have over 5 years of experien...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

ML Infrastructure Engineer

Phizenix • Menlo Park, CA, US

[job_card.full_time] +1

Menlo Park, CA | On-Site | Full-Time / Direct Hire.Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference—pure language focus...[show_more]

[last_updated.last_updated_30] • [promoted]

Principal ML Infrastructure Engineer — Scalable GPU Platform

Menlo Ventures • Burlingame, CA, United States

[job_card.full_time]

A leading biotech company in Burlingame, CA is seeking an experienced ML Engineer to drive AI innovations in drug discovery. In this role, you will lead the engineering of scalable AI platforms, enh...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

ML Engineer : Production Pipelines & Cloud Systems

Capital One • San Francisco, CA, United States

[job_card.part_time]

Bachelor’s Degree • At least 2 years of experience designing and building data-intensive solutions using distributed computing (Internship experience does not apply) • At least 2 years of experience ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Sr. ML Engineer

Visa • Foster City, CA, United States

[job_card.full_time]

As the world's leader in digital payments technology, Visa's mission is to connect the world through the most creative, reliable and secure payment network - enabling individuals, businesses, and e...[show_more]

[last_updated.last_updated_30] • [promoted]

Tech Lead Manager- MLRE, ML Systems

Scale AI, Inc. • San Francisco, CA, United States

[job_card.full_time]

[last_updated.last_updated_variable_days] • [promoted]

Senior ML Infra Engineer — Distributed GPU Platforms

Genesis Therapeutics Inc. • Burlingame, CA, United States

[job_card.full_time]

A biotechnology company in Burlingame is seeking experienced ML infrastructure engineers to lead engineering efforts on their AI platform focused on generative modeling. Responsibilities include opt...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

ML Infrastructure Engineer, Safeguards

Anthropic • San Francisco, CA, United States

[job_card.full_time]

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]

[last_updated.last_updated_30] • [promoted]