Talent.com
Senior Software Engineer, Model Inference
Senior Software Engineer, Model InferenceApple • San Francisco, California, United States
Senior Software Engineer, Model Inference

Senior Software Engineer, Model Inference

Apple • San Francisco, California, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Overview

Summary

Join Apple Maps to help build the best map in the world. In this role on ML Platform, you will help bring advanced deep learning and large language models into high-volume, low-latency, highly available production serving, improving search quality and powering experiences across Maps. You will partner closely with research and product teams, take end-to-end ownership, and deliver measurable results at global scale.

Description

As a Software Engineer on the Apple Maps team, you will lead the design and implementation of large-scale, high-performance inference services that support a wide range of models used across Maps, including deep learning and large language models. You will collaborate closely with research and product partners to bring models into production, with a strong focus on efficiency, reliability, and scalability. Your responsibilities span the full server stack, including onboarding new use cases, optimizing inference across heterogeneous accelerated compute hardware, deploying services on Kubernetes, building and integrating inference engines and control-plane components, and ensuring seamless integration with Maps infrastructure.

Responsibilities

Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving deep learning and large language models across Apple Maps.

Lead system-level optimization efforts across the inference stack, balancing latency, throughput, accuracy, and cost through advanced techniques such as quantization, kernel fusion, speculative decoding, and efficient runtime scheduling.

Design and evolve control-plane services responsible for model lifecycle management, including deployment orchestration, versioning, traffic routing, rollout strategies, capacity planning, and failure handling in production environments.

Drive adoption of platform abstractions and standards that enable partner teams to onboard, deploy, and operate models reliably and efficiently at scale.

Partner closely with research, product, and infrastructure teams to translate model requirements into production-ready systems, providing technical guidance and feedback to influence upstream model design.

Optimize inference execution across heterogeneous compute environments, including GPUs and specialized accelerators, collaborating with runtime, compiler, and kernel teams to maximize hardware utilization.

Establish robust observability and performance diagnostics, defining metrics, dashboards, and profiling workflows to proactively identify bottlenecks and guide optimization decisions.

Provide technical leadership and mentorship, reviewing designs, setting engineering best practices, and raising the quality bar across teams contributing to the inference ecosystem.

Continuously evaluate emerging research and industry trends in LLM inference, distributed systems, and ML infrastructure, driving the transition of high-impact ideas into production systems.

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.

Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.

Proficiency in Python, Java or C++.

Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.

Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc).

Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.

Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.

Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment.

Preferred Qualifications

Master’s or PhD in Computer Science, Machine Learning, or a related field.

Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.

Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.

Strong understanding of distributed systems, multi-GPU / multi-node parallelism, and system-level optimization for large-scale inference.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including : Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note : Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Apple accepts applications to this posting on an ongoing basis.

#J-18808-Ljbffr

[job_alerts.create_a_job]

Senior Software Engineer Model Inference • San Francisco, California, United States

[internal_linking.similar_jobs]
Senior Software Engineer, AI

Senior Software Engineer, AI

Valence • San Francisco, CA, United States
[job_card.full_time]
In this role, youll be hands?on shaping the technical foundation of our LLM?powered product used by leaders at Fortune 500 companies. Youll work closely with our VP Engineering and Tech Leads to shi...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer

Senior Software Engineer

Paradromics Inc • San Francisco, CA, United States
[job_card.full_time]
Paradromics is building a brain?computer interface (BCI) platform that records brain activity at the highest possible resolution : the individual neuron. AI algorithms then decode this massive amount...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer, Data Model

Senior Software Engineer, Data Model

Roblox • San Mateo, CA, United States
[job_card.full_time]
For roles that are based at our headquarters in San Mateo, CA : The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer - Model Performance

Senior Software Engineer - Model Performance

Inference • San Francisco, CA, United States
[job_card.full_time]
Inference Optimization Engineer.Help us make inference blazingly fast.If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techn...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer, AISan Francisco, CA

Senior Software Engineer, AISan Francisco, CA

Peregrine Technologies • San Francisco, CA, United States
[job_card.full_time]
Backed by leading investors from Silicon Valley, Peregrine supports public safety agencies across the countryfrom Los Angeles to Louisville to Atlantaempowering public servants to improve operation...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer - AI Platform

Senior Software Engineer - AI Platform

StubHub • San Francisco, CA, United States
[job_card.full_time]
StubHub is on a mission to redefine the live event experience on a global scale.Whether someone is looking to attend their first event or their hundredth, we're here to delight them all the way fro...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Staff Software Engineer, Inference Infra – High-Perf ML

Staff Software Engineer, Inference Infra – High-Perf ML

Cohere • San Francisco, CA, United States
[job_card.full_time]
A technology company specializing in AI seeks Members of Technical Staff to develop and deploy high-performance machine learning systems. The ideal candidate will have over 5 years of engineering ex...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer, GenAI

Senior Software Engineer, GenAI

Scale AI • San Francisco, CA, United States
[job_card.full_time]
At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including : g...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Senior Platform Engineer, Model Serving & AI Inference

Senior Platform Engineer, Model Serving & AI Inference

Sciforium • San Francisco, CA, United States
[job_card.full_time]
A leading AI infrastructure company in San Francisco is seeking a Senior Technical Leader to architect and develop a cutting-edge model serving platform. This role involves hands-on development of c...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer, AI Model serving - Intl, Non-USA

Senior Software Engineer, AI Model serving - Intl, Non-USA

Speechify • San Francisco, CA, United States
[job_card.full_time]
Senior Software Engineer, AI Model Serving - Intl, Non-USA / h2pstrongMission / strong / ppThe mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer - AI Core Engineering

Senior Software Engineer - AI Core Engineering

The Walt Disney Studios • San Francisco, CA, United States
[job_card.full_time]
Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more all working to build and advance th...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer - New Markets & Models

Senior Software Engineer - New Markets & Models

Grow Therapy • San Francisco, CA, United States
[job_card.full_time]
Senior Software Engineer - New Markets & Models.Grow Therapy is on a mission to serve as the trusted partner for therapists growing their practice, and patients accessing high-quality care.Powered ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Software Engineer

Senior Software Engineer

Idler • San Francisco, CA, United States
[job_card.full_time]
Idler builds reinforcement learning environments that teach AI models to code like 0.Our training environments are based on real-world coding scenarios that frontier models will actually encounter....[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer

Senior Software Engineer

Elicit • Oakland, CA, United States
[job_card.full_time]
Elicit is an AI research assistant that uses language models to help researchers figure out what's true and make better decisions, starting with common research tasks like literature review.Elicit ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior AI / ML Software Engineer — XR Innovation

Senior AI / ML Software Engineer — XR Innovation

Google Inc. • San Francisco, CA, United States
[job_card.full_time]
A leading tech company seeks a Software Engineer III specializing in AI / ML and Extended Reality in San Francisco.This role involves developing cutting-edge technologies, collaborating with peers, a...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer - Intelligence

Senior Software Engineer - Intelligence

Hard Yaka • San Francisco, CA, United States
[job_card.full_time]
We exist to accelerate innovation.We do this by giving more people the opportunity to participate in the venture economy by building the financial infrastructure that makes it possible for more peo...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior AI Software Engineer

Senior AI Software Engineer

Hyperproof • San Francisco, CA, United States
[job_card.full_time]
Bishop Fox is the leading authority in offensive security, providing solutions ranging from continuous penetration testing, red teaming, and attack surface management to product, cloud, and applica...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer - Simulation Graphics and AI / ML

Senior Software Engineer - Simulation Graphics and AI / ML

Zoox • San Mateo, CA, United States
[job_card.full_time]
Senior Software Engineer - Simulation Graphics and AI / ML.The 3D Simulation Group at Zoox is looking for 3D graphics engineers to simulate sensors (lidar, radar, cameras), combining modern graphics ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]