Senior Software Engineer - Model PerformanceInference • San Francisco, CA, United States

Senior Software Engineer - Model Performance

Inference • San Francisco, CA, United States

[job_card.variable_hours_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

Inference Optimization Engineer

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'd love to meet you.

Inference.net trains and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models we train match GPT-5 accuracy but are smaller, faster, and up to 90% cheaper. Our platform handles everything end-to-end : distillation, training, evaluation, and planet-scale hosting.

We are a well-funded ten-person team of engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do. Most of us are in the office 4 days a week in SF; hybrid works for Bay Area candidates.

About the Role

You will be responsible for making our inference stack as fast and efficient as possible. Your work spans from implementing known optimization techniques to experimenting with novel approaches, always with the goal of serving models faster and cheaper at scale.

Your north star is inference performance : latency, throughput, cost efficiency, and how quickly we can bring new model architectures into production. You'll work across the full inference stackfrom CUDA kernels to serving frameworksto find and eliminate bottlenecks. This role reports directly to the founding team. You'll have the autonomy, a large compute budget, and technical support to push the limits of what's possible in model serving.

Key Responsibilities

Implement and productionize optimization techniques including quantization, speculative decoding, KV cache optimization, continuous batching, and LoRA serving
Deep dive into inference frameworks (vLLM, SGLang, TensorRT-LLM) and underlying libraries to debug and improve performance
Profile and optimize CUDA kernels and GPU utilization across our serving infrastructure
Add support for new model architectures, ensuring they meet our performance standards before going to production
Experiment with novel inference techniques and bring successful approaches into production
Build tooling and benchmarks to measure and track inference performance across our fleet
Collaborate with applied ML engineers to ensure trained models can be served efficiently

Requirements

2+ years of experience in ML systems, inference optimization, or GPU programming

Strong proficiency in Python and familiarity with C++

Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)

Deep understanding of GPU architecture and experience profiling GPU workloads

Familiarity with LLM optimization techniques (quantization, speculative decoding, continuous batching, KV cache management)

Experience with PyTorch and understanding of how models execute on hardware

Track record of measurably improving system performance

Nice-to-Have

Experience with CUDA programming

Familiarity with serving non-LLM models (TTS, vision, embeddings)

Experience with distributed inference and multi-GPU serving

Contributions to open-source inference frameworks

Experience with Docker and Kubernetes

You don't need to tick every box. Curiosity and the ability to learn quickly matter more.

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $220,000 - $320,000, plus equity and benefits, depending on experience.

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

[job_alerts.create_a_job]

Senior Software Engineer Model Performance • San Francisco, CA, United States

[internal_linking.similar_jobs]

Senior Software Engineer, Model Serving

Databricks Inc. • San Francisco, CA, United States

[job_card.full_time]

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical ...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer, Model Inference

Apple Inc. • San Francisco, CA, United States

[job_card.full_time]

Senior Software Engineer, Model Inference.San Francisco Bay Area, California, United States Software and Services.Join Apple Maps to help build the best map in the world. In this role on ML Platform...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer, AI Team

Sprig • San Francisco, CA, United States

[job_card.full_time]

Sprig is building the AI-native successor to legacy survey tools, like Qualtrics, Medallia, and SurveyMonkey.We believe the future of experience research won't be powered by slow, siloed platforms....[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Senior Engineer, Model Serving & Inference

Databricks • San Francisco, CA, United States

[job_card.full_time]

A leading data and AI company is seeking a Senior Software Engineer, Model Serving to design and implement core systems that ensure scalability and operational excellence.You will drive architectur...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer

Paradromics Inc • San Francisco, CA, United States

[job_card.full_time]

Paradromics is building a brain?computer interface (BCI) platform that records brain activity at the highest possible resolution : the individual neuron. AI algorithms then decode this massive amount...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Senior Software Engineer (Fullstack)

Hirewell • San Francisco, CA, United States

[job_card.full_time] +1

Senior Software Engineer (Fullstack).Salary Range : $180,000 - $215,000.Benefits : EquityHealthcareno 401K.Senior Full Stack Software Engineer. San Francisco • Engineering • Full-Time • Hybrid.As a ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Senior Software Engineer, Observability

Together AI • San Francisco, CA, United States

[job_card.full_time]

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastruct...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer (Fullstack)

Triumph, LLC • San Francisco, CA, United States

[job_card.full_time]

Triumph makes mobile gaming more thrilling by letting players wager and win real money, play in mass multiplayer games, and compete in social tournaments. We've built the top app in our App St...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer

FleetWorks Technology, Inc. • San Francisco, CA, United States

[job_card.full_time]

Every year, companies spend over a trillion dollars moving freight across the U.We’re building voice agents that transform the chaotic freight booking process into a modern, intelligent marketplace...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer (GTM)

Toma, Inc. • San Francisco, CA, United States

[job_card.full_time]

Join the fastest growing AI company for the automotive vertical.Senior Software Engineer (GTM).We're building the AI platform for underserved industries. LLM usage has seen a meteoric rise in the pa...[show_more]

[last_updated.last_updated_less] • [promoted] • [new]

Senior Software Engineer, GenAI

Scale AI, Inc. • San Francisco, CA, United States

[job_card.full_time]

At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including : g...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer - New Markets & Models

Grow Therapy • San Francisco, CA, United States

[job_card.full_time]

Senior Software Engineer - New Markets & Models.Grow Therapy is on a mission to serve as the trusted partner for therapists growing their practice, and patients accessing high-quality care.Powered ...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Senior Software Engineer - Freight Marketplace

FleetWorks • San Francisco, CA, United States

[job_card.full_time]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer, Fullstack

SiriusXM • Oakland, CA, United States

[job_card.full_time]

SiriusXM and its brands (Pandora, SiriusXM Media, AdsWizz, Simplecast, and SiriusXM Connected Vehicle Services) are leading a new era of audio entertainment and services by delivering the most comp...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Senior Software Engineer

Elicit • Oakland, CA, United States

[job_card.full_time]

Elicit is an AI research assistant that uses language models to help researchers figure out what's true and make better decisions, starting with common research tasks like literature review.Elicit ...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Senior Software Engineer

Omada Health • South San Francisco, CA, United States

[job_card.full_time]

Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time.Omada Health is a digital care provider that empowers people to achieve their health goals through s...[show_more]

[last_updated.last_updated_30] • [promoted]

Software Engineer - Model Performance

Baseten • San Francisco, CA, United States

[job_card.full_time]

Software Engineer – Model Performance.Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer.By uni...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Software Engineer

FLEETWORKS INC. • San Francisco, CA, United States

[job_card.full_time]

Every year, companies spend over a trillion dollars moving freight across the U.We're building voice agents that transform the chaotic freight booking process into a modern, intelligent marketplace...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]