Machine Learning Systems EngineerMenlo Ventures • Berkeley, CA, United States

[error_messages.no_longer_accepting]

Machine Learning Systems Engineer

Menlo Ventures • Berkeley, CA, United States

[job_card.30_days_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

Who We Are

At RelationalAI, we are building the future of intelligent data systems through our cloud-native relational knowledge graph management system—a platform designed for learning, reasoning, and prediction.

We are a remote-first, globally distributed team with colleagues across six continents. From day one, we’ve embraced asynchronous collaboration and flexible schedules, recognizing that innovation doesn’t follow a 9-to-5.

We are committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of every team member and believe in fostering a culture of respect, curiosity, and innovation. We support each other’s growth and success—and take the well‑being of our colleagues seriously. We encourage everyone to find a healthy balance that affords them a productive, happy life, wherever they choose to live.

We bring together engineers who love building core infrastructure, obsess over developer experience, and want to make complex systems scalable, observable, and reliable.

Machine Learning Systems Engineer

Location : Remote (San Francisco Bay Area / North America / South America)

Experience Level : 3+ years of experience in machine learning engineering or research

About ScalarLM

This role will involve heavily working with the ScalarLM framework and team.

ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self‑improving agents—all via an OpenAI‑compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron‑LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher‑lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation.

ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 – October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry.

It is a fully open source project (CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self‑improving AI agents similar to DeepSeek R1.

ScalarLM is supported and maintained by TensorWave in addition to RelationalAI.

The Role

As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large‑scale language model applications on top of it. You’ll operate at the intersection of high-performance computing, distributed systems, and cutting‑edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.

This is an opportunity to take on technically demanding projects, contribute to foundational systems, and help shape the next generation of intelligent computing.

You Will

Contribute code and performance improvements to the open source project.
Develop and optimize distributed training algorithms for large language models.
Implement high‑performance inference engines and optimization techniques.
Work on integration between vLLM, Megatron‑LM, and HuggingFace ecosystems.
Build tools for seamless model training, fine‑tuning, and deployment.
Optimize performance of advanced GPU architectures.
Collaborate with the open source community on feature development and bug fixes.
Research and implement new techniques for self‑improving AI agents.

Who You Are

Technical Skills

Programming Languages : Proficiency in both C / C++ and Python

High Performance Computing : Deep understanding of HPC concepts, including :

MPI (Message Passing Interface) programming and optimization

Bulk Synchronous Parallel (BSP) computing models

Multi‑GPU and multi‑node distributed computing

CUDA / ROCm programming experience preferred

Machine Learning Foundations :

Solid understanding of gradient descent and backpropagation algorithms

Experience with transformer architectures and the ability to explain their mechanics

Knowledge of deep learning training and its applications

Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)

Research and Development

Publications : Experience with machine learning research and publications preferred

Research Skills : Ability to read, understand, and implement techniques from recent ML research papers

Open Source : Demonstrated commitment to open source development and community collaboration

Experience

3+ years of experience in machine learning engineering or research.

Experience with large-scale distributed training frameworks (Megatron‑LM, DeepSpeed, FairScale, etc.).

Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.).

Experience with containerization (Docker, Kubernetes) and cluster management.

Background in systems programming and performance optimization.

Bonus points if :

PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field.

Experience with SLURM, Kubernetes, or other cluster orchestration systems.

Knowledge of mixed precision training, data parallel training, and scaling laws.

Experience with transformer architecture, pytorch, decoding algorithms.

Familiarity with high performance GPU programming ecosystem.

Previous contributions to major open source ML projects.

Experience with MLOps and model deployment at scale.

Understanding of modern attention mechanisms (multi‑head attention, grouped query attention, etc.).

Why RelationalAI

RelationalAI is committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of our team. We are driven by curiosity, value innovation, and help each other to succeed and to grow. We take the well‑being of our colleagues seriously, and offer flexible working hours so each individual can find a healthy balance that affords them a productive, happy life wherever they choose to live.

🌎 Global Benefits at RelationalAI

At RelationalAI, we believe that people do their best work when they feel supported, empowered, and balanced. Our benefits prioritize well‑being, flexibility, and growth, ensuring you have the resources to thrive both professionally and personally.

We are all owners in the company and reward you with a competitive salary and equity.

Work from anywhere in the world.

Comprehensive benefits coverage, including global mental health support

Open PTO – Take the time you need, when you need it.

Company Holidays, Your Regional Holidays, and RAI Holidays—where we take one Monday off each month, followed by a week without recurring meetings, giving you the time and space to recharge.

Paid parental leave – Supporting new parents as they grow their families.

We invest in your learning & development

Regular team offsites and global events – Building strong connections while working remotely through team offsites and global events that bring everyone together.

A culture of transparency & knowledge‑sharing – Open communication through team standups, fireside chats, and open meetings.

Country Hiring Guidelines

RelationalAI hires around the world. All of our roles are remote; however, some locations might carry specific eligibility requirements.

Because of this, understanding location & visa support helps us better prepare to onboard our colleagues.

Our People Operations team can help answer any questions about location after starting the recruitment process.

Privacy Policy

EU residents applying for positions at RelationalAI can see our Privacy Policy here.

California residents applying for positions at RelationalAI can see our Privacy Policy here.

Equal Opportunity

RelationalAI is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.

#J-18808-Ljbffr

[job_alerts.create_a_job]

Machine Learning Engineer • Berkeley, CA, United States

[internal_linking.related_jobs]

Machine Learning Engineer

Hive • San Francisco, California, United States

[job_card.full_time]

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...[show_more]

[last_updated.last_updated_30] • [promoted]

Lead Machine Learning Engineer

Mind Company • San Francisco, California, United States

[job_card.full_time]

Mind Company's mission is to build non-invasive neural interfaces - that is, enabling a communication layer between humans and other humans or computers, directly using thoughts.In pursuit of this ...[show_more]

[last_updated.last_updated_30] • [promoted]

MTS, Machine Learning Engineer

Delphina • San Francisco, California, United States

[job_card.full_time]

Today’s Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer, Machine Learning

Planet Labs PBC • San Francisco, CA, United States

[job_card.full_time]

We believe in using space to help life on Earth.Planet designs, builds, and operates the largest constellation of imaging satellites in history. This constellation delivers an unprecedented dataset ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Sr. Machine Learning Engineer (Recommendation Systems)

Philo • San Francisco, California, United States

[job_card.full_time]

At Philo, we’re a group of technology and product people who set out to build the future of television, marrying the best in modern technology with the most compelling medium ever invented — in sho...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Machine Learning Engineer

Curai • San Francisco, California, United States

[job_card.full_time]

Curai Health is an AI-powered virtual clinic on a mission to improve access to care at scale.As the pioneer in deploying machine learning into clinical workflows, Curai Health enables its dedicated...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning, Platform Engineer

Together Ai • San Francisco, California, United States

[job_card.full_time]

This role focuses on enabling custom models and dedicated inference on Together.We are responsible for optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performanc...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Engineer

Scribd • San Francisco, California, United States

[job_card.full_time]

At Scribd (pronounced “scribbed”), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...[show_more]

[last_updated.last_updated_30] • [promoted]

Founding Machine Learning Engineer

Fermàt • San Francisco, California, United States

[job_card.full_time]

Commerce brands to transform clicks into conversions with highly-personalized , 1 : 1 dynamic shopping experiences.We've raised $30M+ to date and are backed by Bain Capital Ventures, Greylock, QED, a...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Engineer

Orchard Robotics • San Francisco, California, United States

[job_card.full_time]

Series A startup backed by top VCs like Quiet Capital, Shine Capital, and General Catalyst.We're securing America’s food supply by building the AI farmer that automates our nation’s farms.We've rai...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Engineer

Bland • San Francisco, California, United States

[job_card.full_time]

Based out of San Francisco, we're a quickly growing team striving to change the way customers interact with businesses.We've raised $65 million from Silicon Valley's finest; Including Emergence Cap...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Engineer

Kiddom • San Francisco, California, United States

[job_card.full_time] +1

Kiddom is a groundbreaking educational platform that promotes student equity and growth by uniting high-quality instructional materials with dynamic digital learning. Through unparalleled curriculum...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Machine Learning Engineer

Rainesdev • San Francisco, California, United States

[job_card.full_time]

We're a leading partner in social commerce, collaborating with major athletic wear, footwear, and electronics brands to expand their influencer -driven sales channels. Having achieved significant re...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Machine Learning Engineer South San Francisco CA

Esrhealthcare • San Bruno, California, United States

[job_card.full_time]

Machine Learning Engineer (Operations).South San Francisco CA (Hybrid, 3 days / week) (Not remote).Strong understanding of machine learning concepts, algorithms, and best practices.Proven experience ...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Engineer, Training Infrastructure

Intellipro Group • San Francisco, California, United States

[job_card.full_time]

Machine Learning Engineer, Training Infrastructure.We are looking for an ML Engineer with .ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if y...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Engineer - Collision Avoidance System

Zoox • Foster City, California, United States

[job_card.full_time]

The Collision Avoidance System (CAS) is responsible for detecting and reacting to imminent collision situations in support of our vehicle’s overall safety goals. CAS Perception is responsible for pr...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Machine Learning Engineer

Block • San Francisco, California, United States

[job_card.full_time]

Block is one company built from many blocks, all united by the same purpose of economic empowerment.The blocks that form our foundational teams — People, Finance, Counsel, Hardware, Information Sec...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Machine Learning Engineer I

Lumapps • San Francisco, California, United States

[job_card.full_time]

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair sh...[show_more]

[last_updated.last_updated_30] • [promoted]