Talent.com
Inference
Senior Software Engineer - Model PerformanceInference • San Francisco, CA, United States
No longer accepting applications
Senior Software Engineer - Model Performance

Senior Software Engineer - Model Performance

Inference • San Francisco, CA, United States
7 days ago
Salary
$220,000.00 yearly
Job type
  • Full-time
Job description

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'd love to meet you.

About Inference.net

Inference.net trains and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models we train match GPT-5 accuracy but are smaller, faster, and up to 90% cheaper. Our platform handles everything end-to-end: distillation, training, evaluation, and planet-scale hosting.

We are a well-funded ten-person team of engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do. Most of us are in the office 4 days a week in SF; hybrid works for Bay Area candidates.

About the Role

You will be responsible for making our inference stack as fast and efficient as possible. Your work spans from implementing known optimization techniques to experimenting with novel approaches, always with the goal of serving models faster and cheaper at scale.

Your north star is inference performance: latency, throughput, cost efficiency, and how quickly we can bring new model architectures into production. You'll work across the full inference stack-from CUDA kernels to serving frameworks-to find and eliminate bottlenecks. This role reports directly to the founding team. You'll have the autonomy, a large compute budget, and technical support to push the limits of what's possible in model serving.

Key Responsibilities

  • Implement and productionize optimization techniques including quantization, speculative decoding, KV cache optimization, continuous batching, and LoRA serving
  • Deep dive into inference frameworks (vLLM, SGLang, TensorRT-LLM) and underlying libraries to debug and improve performance
  • Profile and optimize CUDA kernels and GPU utilization across our serving infrastructure
  • Add support for new model architectures, ensuring they meet our performance standards before going to production
  • Experiment with novel inference techniques and bring successful approaches into production
  • Build tooling and benchmarks to measure and track inference performance across our fleet
  • Collaborate with applied ML engineers to ensure trained models can be served efficiently
Requirements
  • 2+ years of experience in ML systems, inference optimization, or GPU programming
  • Strong proficiency in Python and familiarity with C++
  • Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)
  • Deep understanding of GPU architecture and experience profiling GPU workloads
  • Familiarity with LLM optimization techniques (quantization, speculative decoding, continuous batching, KV cache management)
  • Experience with PyTorch and understanding of how models execute on hardware
  • Track record of measurably improving system performance
Nice-to-Have
  • Experience with CUDA programming
  • Familiarity with serving non-LLM models (TTS, vision, embeddings)
  • Experience with distributed inference and multi-GPU serving
  • Contributions to open-source inference frameworks
  • Experience with Docker and Kubernetes

You don't need to tick every box. Curiosity and the ability to learn quickly matter more.

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $220,000 - $320,000, plus equity and benefits, depending on experience.

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're excited about making AI inference faster for everyone, we'd love to hear from you. Please send your resume and GitHub to amar@inference.net and/or apply here on Ashby.
Create a job alert for this search

Senior Software Engineer - Model Performance • San Francisco, CA, United States

Similar jobs

Senior Software Engineer – Analytics Platform & Scale

SentrySan Francisco, CA, United States
Full-time

A technology monitoring company seeks a Senior Software Engineer to enhance data visibility through expanding search infrastructure capabilities.The role requires over 4 years in backend engineerin... Show more

 • Promoted

Senior Software Engineer

Premier GroupSan Francisco, CA, United States
Full-time

Technology Headhunting Specialist | Cloud & Infrastructure.We are seeking versatile Senior Software Engineers who specialize across disciplines – Machine Learning, Data Engineering, and Full‑Stack ... Show more

 • Promoted

Senior Software Engineer

Hike-MedicalSan Francisco, CA, United States
Full-time

Hike Medical is building the defining company in musculoskeletal care.We sit at the intersection of AI, robotics, and healthcare, operating across three product lines: a proprietary AI-vision platf... Show more

 • Promoted

Senior Software Engineer, Portal

Hayden AI Technologies, Inc.San Francisco, CA, United States
Full-time

At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other government agencies address real-world challenges.From bus lane and bus stop... Show more

 • Promoted

Senior Software Engineer, Consumer Revenue

DiscordSan Francisco, CA, United States
Full-time

Discord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform:.Over 90% of our users play games, spending a comb... Show more

 • Promoted

Senior Software Engineer I

Forge GlobalSan Francisco, CA, United States
Full-time

At Forge, we know our team is our greatest asset.As technology innovators in the private market, our vision is to deliver a richer future for everyone.We live that vision through our values of bein... Show more

 • Promoted

Senior Software Engineer - ML Systems & AI Acceleration

AmazonSan Francisco, CA, United States
Full-time

A leading technology company is seeking a Senior Software Development Engineer to join a dynamic team focused on AI technologies in San Francisco.This role involves designing innovative software so... Show more

 • Promoted

Senior Software Engineer, Behavior Planning

AerovectSan Francisco, CA, United States
Full-time

Remote; Atlanta - Hybrid; New York City - Hybrid; South San Francisco - Hybrid; Toronto - Remote.AeroVect is transforming ground handling with autonomy, redefining how airlines and ground service p... Show more

 • Promoted

Senior Software Engineer

Breakout ToolsSan Francisco, CA, United States
Full-time

Swiftly is on a mission to help cities move more efficiently.We are the leading transit data platform for agencies to share real-time passenger information, manage day-to-day operations, and improv... Show more

 • Promoted

Senior Software Engineer, GenAI

Scale AISan Francisco, CA, United States
Full-time

At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including ge... Show more

 • Promoted

Senior Software Engineer, AI Model LifeCycle

Epoch BiodesignSan Francisco, CA, United States
Full-time

Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe... Show more

 • Promoted

Senior Software Engineer, Personalization & ML

UpstartSan Mateo, CA, United States
Full-time

At Upstart, we’re united by a mission that matters: to radically reduce the cost and complexity of borrowing for all Americans.Every day, we bring creativity, experimentation, and advanced AI to re... Show more

 • Promoted

Senior Software Engineer, ML Platform

ParafinSan Francisco, CA, United States
Full-time

At Parafin, we’re on a mission to grow small businesses.Small businesses are the backbone of our economy, but traditional banks often don’t have their backs.We build tech that makes it simple for s... Show more

 • Promoted

Senior Full-stack Software Engineer

SkydioSan Mateo, CA, United States
Full-time

Senior Software Engineer - Fleet Management.Skydio is the leading US drone company and the world leader in autonomous flight, the key technology for the future of drones and aerial mobility.The Sky... Show more

 • Promoted

Senior ML Software Engineer

LyftSan Francisco, CA, United States
Full-time

At Lyft, our purpose is to serve and connect.We aim to achieve this by cultivating a work environment where all team members belong and have the opportunity to thrive.With over half a billion rides... Show more

 • Promoted

Senior Software Engineer, GenAI

Scale AI, Inc.San Francisco, CA, United States
Full-time

At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including: g... Show more

 • Promoted

Senior Software Engineer

LuminXSan Francisco, CA, United States
Full-time

LuminX is a startup transforming warehouse operations through advanced AI and computer vision.Our goal is to redefine how businesses manage inventory and logistics, driving unprecedented efficiency... Show more

 • Promoted

Sr. Software Engineer - Performance

Databricks Inc.San Francisco, California, United States
Full-time

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems.We do this by building and running the world's best data and AI infrastructure platform so our cust... Show more

 • Promoted

Senior Software Engineer AI/ML Enterprise SaaS

Pony Express HQSan Mateo, CA, United States
Full-time

Senior Software Engineer AI/ML Enterprise SaaS at Pony Express HQ.Join to apply for the Senior Software Engineer AI/ML Enterprise SaaS role at Pony Express HQ.Pony Express HQ helps businesses uncov... Show more

 • Promoted

Senior Software Engineer, Financial Products

UberSan Francisco, CA, United States
Full-time

Our team is responsible for the platforms that power the next generation of Financial Products designed to help our customers achieve their life goals.These products play a vital role in the custom... Show more