Talent.com
Software Engineer - Model API's
Software Engineer - Model API'sBaseten • San Francisco, CA, United States
Software Engineer - Model API's

Software Engineer - Model API's

Baseten • San Francisco, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

ABOUT BASETEN

Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. With our recent $150M Series D funding, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we’re scaling our team to meet accelerating customer demand.

THE ROLE

Baseten’s Model Performance (MP) team is responsible for ensuring the models running on our platform are fast, reliable, and cost‑efficient. As part of this team, you’ll focus on Model API's — the infrastructure powering our hosted API endpoints for the latest open‑source models. This work spans distributed systems, model serving, and developer experience. You’ll join a small, high‑impact team operating at the intersection of product, model performance, and infra, helping to define how developers interact with AI models at scale.

RESPONSIBILITIES

  • Design, build, and operate the Model APIs surface with focus on advanced inference capabilities : structured outputs (JSON mode, grammar-constrained generation), tool / function calling and multi-modal serving
  • Profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, implement custom CUDA operators, tune memory allocation patterns for maximum throughput and optimize communication patterns across multi-GPU setups
  • Productionize performance improvements across runtimes with deep understanding of their internals : speculative decoding implementations, guided generation for structured outputs, custom scheduling and routing algorithms for high-performance serving
  • Build comprehensive benchmarking frameworks that measure real-world performance across different model architectures, batch sizes, sequence lengths, and hardware configurations
  • Productionize performance improvements across runtimes (e.g.TensorRT, TensorRT‑LLM) : speculative decoding, quantization, batching, and KV‑cache reuse.
  • Instrument deep observability (metrics, traces, logs) and build repeatable benchmarks to measure speed, reliability, and quality.
  • Implement platform fundamentals : API versioning, validation, usage metering, quotas, and authentication.
  • Collaborate closely with other teams to deliver robust, developer‑friendly model serving experiences.

REQUIREMENTS

  • 3+ years experience building and operating distributed systems or large‑scale APIs.
  • Proven track record of owning low‑latency, reliable backend services (rate‑limiting, auth, quotas, metering, migrations).
  • Infra instincts with performance sensibilities : profiling, tracing, capacity planning, and SLO management.
  • Comfortable debugging complex systems, from runtime internals to GPU execution traces.
  • Strong written communication; able to produce clear design docs and collaborate across functions.
  • NICE TO HAVE

  • Experience with LLM runtimes (vLLM, SGLang, TensorRT‑LLM) or contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI)
  • Knowledge of Kubernetes, service meshes, API gateways, or distributed scheduling.
  • Background in developer‑facing infrastructure or open‑source APIs.
  • We value infra‑leaning generalists who bring strong engineering fundamentals and curiosity. ML experience is a plus, but not required.
  • BENEFITS

  • Competitive compensation package.
  • This is a unique opportunity to be part of a rapidly growing startup in one of the most exciting engineering fields of our era.
  • An inclusive and supportive work culture that fosters learning and growth.
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
  • Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

    At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

    #J-18808-Ljbffr

    [job_alerts.create_a_job]

    Software Engineer • San Francisco, CA, United States

    [internal_linking.similar_jobs]
    Software Engineer, Mid-Level

    Software Engineer, Mid-Level

    Jobright.ai • Menlo Park, CA, United States
    [job_card.full_time]
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software / API Engineer

    Software / API Engineer

    Lawrence Berkeley National Laboratory • Berkeley, CA, United States
    [job_card.full_time]
    The National Energy Research Scientific Computing Center (NERSC) is seeking a versatile Software / API Engineer to join our team building software systems that integrate scientific workflows and su...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer - API Designer

    Software Engineer - API Designer

    OpenAI • San Francisco, CA, United States
    [job_card.full_time]
    Our team brings OpenAI’s most capable technology to the world through our developer platform : the OpenAI API.As the leading AI development platform, our API is used by millions of developers and th...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Mid-Level Software Engineer

    Mid-Level Software Engineer

    Authorium • San Francisco, CA, United States
    [job_card.full_time]
    Authorium is transforming the way agencies manage complex, document-centric workflows.Our unified platform integrates essential administrative functions, from Budgeting to Procurement to Contractin...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Software Engineer III - Atlanta, GA

    Software Engineer III - Atlanta, GA

    OpenGov • San Francisco, CA, United States
    [job_card.full_time]
    OpenGov is the leader in AI and ERP solutions for local and state governments in the U.More than 2,000 cities, counties, state agencies, school districts, and special districts rely on the OpenGov ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer, AI Model serving - Portland, USA

    Senior Software Engineer, AI Model serving - Portland, USA

    Clutch Canada • San Francisco, CA, United States
    [job_card.full_time]
    The mission of Speechify is to make sure that reading is never a barrier to learning.Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, G...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior AI / ML Software Engineer (Remote in California)

    Senior AI / ML Software Engineer (Remote in California)

    Rocket Lawyer • San Francisco, California, United States
    [filters.remote]
    [job_card.full_time]
    We believe everyone deserves access to affordable and simple legal services.Founded in 2008, Rocket Lawyer is the largest and most widely used online legal service platform in the world.With office...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Fullstack Software Engineer, Applied AI

    Fullstack Software Engineer, Applied AI

    LangChain • San Francisco, CA, United States
    [job_card.full_time]
    Fullstack Software Engineer, Applied AI.About LangChain : At LangChain, our mission is to make intelligent agents ubiquitous. We provide the agent engineering platform and open source frameworks deve...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Software Engineer II, AI Box

    Software Engineer II, AI Box

    Box • Redwood City, California, United States
    [job_card.full_time]
    Box (NYSE : BOX) is the leader in Intelligent Content Management.Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, secure critical content, and transform ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer

    Software Engineer

    Heliux • San Francisco, CA, United States
    [job_card.permanent]
    Get AI-powered advice on this job and more exclusive features.Heliux is a software platform that unifies and accelerates enterprise-wide operations for manufacturers. Our centralized operating syste...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer (Fullstack)

    Software Engineer (Fullstack)

    Nugen Deeptech Pvt. Ltd. • San Francisco, CA, United States
    [job_card.full_time]
    We are looking for a Software Engineer (Fullstack) to help build and maintain Nugen's Domain-Aligned AI™ platform that enables enterprises to trust decisions made by AI-assisted workflows.You will ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Software Engineer L3

    Software Engineer L3

    Twilio • San Francisco, CA, United States
    [job_card.full_time]
    At Twilio, we're shaping the future of communications, all from the comfort of our homes.We deliver innovative solutions to. As we continue to revolutionize how the world interacts, we're acquiring ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer, Scientific Models (Platform)

    Software Engineer, Scientific Models (Platform)

    Benchling • San Francisco, CA, United States
    [job_card.full_time]
    Software Engineer, Scientific Models (Platform).Biotechnology is rewriting life as we know it, from the medicines we take, to the crops we grow, the materials we wear, and the household goods that ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Software Engineer, Localization

    Software Engineer, Localization

    Kodiak • San Francisco, CA, United States
    [job_card.full_time]
    Software Engineer, Localization.The company has developed an artificial intelligence (AI) powered technology stack purpose-built for commercial trucking and the public sector.The company delivers f...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer, API

    Senior Software Engineer, API

    Gridware Technologies Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbreaking new class of grid management called active grid response...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer, Model Serving

    Senior Software Engineer, Model Serving

    Databricks Inc. • San Francisco, CA, United States
    [job_card.full_time]
    At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Software Engineer, API Platform

    Software Engineer, API Platform

    Convex • San Francisco, CA, United States
    [job_card.full_time]
    Convex is transforming the way developers build applications.Our mission is to fundamentally change how software is built on the Internet by empowering developers to create fast, reliable, and dyna...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Lead Software Engineer, Model Serving Platform

    Lead Software Engineer, Model Serving Platform

    Sciforium • San Francisco, CA, United States
    [job_card.full_time]
    Lead Software Engineer, Model Serving Platform.Lead Software Engineer, Model Serving Platform.Sciforium is an AI infrastructure company developing next‑generation multimodal AI models and a proprie...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]