Talent.com
Machine Learning, Platform Engineer
Machine Learning, Platform EngineerTogether AI • San Francisco, CA, United States
Machine Learning, Platform Engineer

Machine Learning, Platform Engineer

Together AI • San Francisco, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Machine Learning, Platform Engineer

This role focuses on enabling custom models and dedicated inference on Together. We are responsible for optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performance, and providing a best-in-class developer experience with great tooling.

Required Qualifications

  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
  • Experience running serverless inference platforms, doing model bring-up on short notice, being on call, or general cloud provider is a very big plus
  • Good taste and ability to thoughtfully discuss how what you've built has failed over time
  • Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • Excellent understanding of low level operating systems concepts including concurrency, networking and storage, performance and scale
  • Expert-level programmer in one or more of Golang, Rust, Python, C++, or Haskell
  • Proficiency in writing and maintaining Infrastructure as Code (IaC) using tools like Terraform
  • Experience with Kubernetes or other container orchestration systems
  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
  • Writing-heavy roles or companies are a plus

Key Responsibilities

  • New hires may work on multi-cluster orchestration, portfolio optimization, predictive autoscaling, control panes, model bring-up, light model optimization, APIs for managing deployments, inference worker SDKs, and CLI tools.
  • Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  • Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  • Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  • Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
  • About Together AI

    Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

    Compensation

    We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is : $160,000 - $250,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

    Equal Opportunity

    Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

    Please see our privacy policy at https : / / www.together.ai / privacy

    [job_alerts.create_a_job]

    Machine Learning Platform Engineer • San Francisco, CA, United States

    [internal_linking.similar_jobs]
    Machine Learning Engineer

    Machine Learning Engineer

    UnifyID (acquired by Prove) • Redwood City, California, US
    [job_card.full_time]
    Job Description Job Description About Prove (acquired UnifyID) Prove is the modern platform for continuous identity authentication and is used by over 1,000 enterprises and 500 financial institut...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior / Principal Machine Learning Engineer, Performance DSP

    Senior / Principal Machine Learning Engineer, Performance DSP

    Pubmatic • Redwood City, CA, United States
    [job_card.full_time]
    Senior / Principal Machine Learning Engineer, Performance DSP.PubMatic is one of the worlds leading scaled digital advertising platforms, offering more transparent advertising solutions to publisher...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Founding Machine Learning Engineer

    Founding Machine Learning Engineer

    Orbit • San Francisco, CA, United States
    [job_card.full_time]
    We're a team of engineers, neuroscientists, and designers solving the most difficult and meaningful challenge : understanding the human brain. Our translational brain computer interface and pioneerin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Founding Machine Learning Engineer

    Founding Machine Learning Engineer

    Trove • San Francisco, CA, United States
    [job_card.full_time]
    Trove is developing an AI associate for financial firms - think enterprise search & agents for private equity, hedge funds, and banks. Our mission is to deliver associate‑level AGI.We’ve raised near...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Principal Machine Learning Engineer

    Principal Machine Learning Engineer

    Gap Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Principal Machine Learning Engineer.Full time Two Folsom, San Francisco, CA, US 94105.Our brands bridge the gaps we see in the world. Old Navy democratizes style to ensure everyone has access to qua...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior ML Platform Engineer Scale AI Pipelines

    Senior ML Platform Engineer Scale AI Pipelines

    Guidewire • San Mateo, CA, United States
    [job_card.full_time]
    A leading insurance technology company is seeking a Senior Machine Learning Platform Engineer to architect and scale their ML platform. The role involves collaborating with cross-functional teams, d...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Engineer

    Machine Learning Engineer

    Valence • San Francisco, CA, United States
    [job_card.full_time]
    Valence has built the only AI native coaching platform for enterprise, offering personalized, expert, and human-like guidance and support to any leader or employee. We’re not just talking about the ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Machine Learning Engineer, Recommendations

    Senior Machine Learning Engineer, Recommendations

    Inkitt • San Francisco, California, US
    [job_card.full_time]
    Job Description Job Description Inkitt is building the Disney of the 21st Century, standing at the forefront of technology and entertainment. Leveraging AI and predictive algorithms, Inkitt discov...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Lead Machine Learning Engineer

    Lead Machine Learning Engineer

    San Jose Staffing • San Francisco, CA, United States
    [job_card.full_time] +1
    Lead Machine Learning Engineer.At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine lea...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Kiddom • San Francisco, California, US
    [job_card.full_time] +1
    Job Description Job Description About Kiddom Kiddom is a groundbreaking educational platform that promotes student equity and growth by uniting high-quality instructional materials with dynamic ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Engineer

    Machine Learning Engineer

    Reveal Health Tech • San Francisco, California, US
    [job_card.full_time]
    Job Description Job Description Reveal HealthTech is a dedicated healthcare life sciences focused technology services company - helping our clients with a range of AI and product engineering serv...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Platform Engineer

    Machine Learning Platform Engineer

    Strava • San Francisco, CA, United States
    [job_card.full_time]
    Strava is the app for active people.With over 150 million athletes in more than 185 countries, Strava is where connection, motivation, and personal bests thrive. No matter your activity, gear, or go...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Pivotal Health • San Francisco, CA, United States
    [job_card.full_time]
    Pivotal Health is the leading technology platform that helps healthcare providers get paid fairly in an increasingly complex reimbursement landscape. We combine software, data, and service into an A...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Engineer

    Machine Learning Engineer

    Hive • San Francisco, CA, United States
    [job_card.full_time]
    We are looking for developers who are excited about staying at the forefront of deep learning technology, prototyping state-of-the-art neural net models and launching these models into production.W...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Engineer - Deployments Team

    Machine Learning Engineer - Deployments Team

    Roboflow • San Francisco, CA, United States
    [job_card.full_time]
    Machine Learning Engineer - Deployments Team.Our mission is to make the world programmable.Sight is one of the key ways we understand the world, and soon this will be true for the software we use, ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Engineer

    Machine Learning Engineer

    Sciforium • San Francisco, CA, United States
    [job_card.full_time]
    Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct spons...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    IntelliPro Group Inc. • San Francisco, California, US
    [job_card.full_time]
    Job Description Job Description Job Title : Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    ML Engineer : Predictive Maintenance & Asset Intelligence

    ML Engineer : Predictive Maintenance & Asset Intelligence

    MaintainX • San Francisco, CA, United States
    [job_card.full_time]
    A leading mobile-first Asset and Work Intelligence platform is seeking a Senior Applied Machine Learning Engineer to guide the architecture of predictive maintenance and asset intelligence initiati...[show_more]
    [last_updated.last_updated_30] • [promoted]