Talent.com
Senior Software Engineer, Model Inference
Senior Software Engineer, Model InferenceApple • San Francisco, CA, United States
[error_messages.no_longer_accepting]
Senior Software Engineer, Model Inference

Senior Software Engineer, Model Inference

Apple • San Francisco, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]
Senior Software Engineer, Model Inference

San Francisco Bay Area, California, United States Software and Services

Join Apple Maps to help build the best map in the world. In this role on ML Platform, you will help bring advanced deep learning and large language models into high-volume, low-latency, highly available production serving, improving search quality and powering experiences across Maps. You will partner closely with research and product teams, take end-to-end ownership, and deliver measurable results at global scale.

Description

As a Software Engineer on the Apple Maps team, you will lead the design and implementation of large-scale, high-performance inference services that support a wide range of models used across Maps, including deep learning and large language models. You will collaborate closely with research and product partners to bring models into production, with a strong focus on efficiency, reliability, and scalability. Your responsibilities span the full server stack, including onboarding new use cases, optimizing inference across heterogeneous accelerated compute hardware, deploying services on Kubernetes, building and integrating inference engines and control-plane components, and ensuring seamless integration with Maps infrastructure.

Responsibilities
  • Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving deep learning and large language models across Apple Maps.
  • Lead system-level optimization efforts across the inference stack, balancing latency, throughput, accuracy, and cost through advanced techniques such as quantization, kernel fusion, speculative decoding, and efficient runtime scheduling.
  • Design and evolve control-plane services responsible for model lifecycle management, including deployment orchestration, versioning, traffic routing, rollout strategies, capacity planning, and failure handling in production environments.
  • Drive adoption of platform abstractions and standards that enable partner teams to onboard, deploy, and operate models reliably and efficiently at scale.
  • Partner closely with research, product, and infrastructure teams to translate model requirements into production-ready systems, providing technical guidance and feedback to influence upstream model design.
  • Optimize inference execution across heterogeneous compute environments, including GPUs and specialized accelerators, collaborating with runtime, compiler, and kernel teams to maximize hardware utilization.
  • Establish robust observability and performance diagnostics, defining metrics, dashboards, and profiling workflows to proactively identify bottlenecks and guide optimization decisions.
  • Provide technical leadership and mentorship, reviewing designs, setting engineering best practices, and raising the quality bar across teams contributing to the inference ecosystem.
  • Continuously evaluate emerging research and industry trends in LLM inference, distributed systems, and ML infrastructure, driving the transition of high-impact ideas into production systems.
Minimum Qualifications
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
  • 5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.
  • Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.
  • Proficiency in Python, Java or C++.
  • Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
  • Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc).
  • Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.
  • Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.
  • Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment.
Preferred Qualifications
  • Masters or PhD in Computer Science, Machine Learning, or a related field.
  • Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.
  • Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.
  • Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference.
Compensation and Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apples discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apples Employee Stock Purchase Plan. Additional benefits include comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and reimbursement for certain educational expensesincluding tuition. This role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Apple accepts applications to this posting on an ongoing basis.

#J-18808-Ljbffr
[job_alerts.create_a_job]

Senior Software Engineer Model Inference • San Francisco, CA, United States

[internal_linking.similar_jobs]
Senior Engineer, Model Serving & Inference

Senior Engineer, Model Serving & Inference

Databricks • San Francisco, CA, United States
[job_card.full_time]
A leading data and AI company is seeking a Senior Software Engineer, Model Serving to design and implement core systems that ensure scalability and operational excellence.You will drive architectur...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior AI/ML Software Engineer – Inference on Neuron

Senior AI/ML Software Engineer – Inference on Neuron

Amazon • San Francisco, CA, United States
[job_card.full_time]
A leading technology company in Herndon, Virginia is seeking a Senior Software Development Engineer to work on AI/ML projects.You will design and optimize machine learning models for deployment on ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior ML Inference Platform Engineer

Senior ML Inference Platform Engineer

Baseten • San Francisco, CA, United States
[job_card.full_time]
A prominent AI company in San Francisco is seeking a Senior Software Engineer specializing in Infrastructure.The role involves architecting and developing the ML inference platform to support produ...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer, Machine Learning

Senior Software Engineer, Machine Learning

Fathom • San Francisco, CA, United States
[job_card.full_time]
Senior Software Engineer, Machine Learning.Fathom is on a mission to use AI to understand and structure the world’s medical data, starting by making sense of the terabytes of clinician notes contai...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Model API Engineer Fast, Reliable AI Inference

Model API Engineer Fast, Reliable AI Inference

BaseTen Labs, Inc. • San Francisco, CA, United States
[job_card.full_time]
An innovative AI technology company in San Francisco is seeking a skilled individual to join their Model Performance team.You will design and operate Model APIs, focusing on advanced inference capa...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Senior Software Engineer, Observability

Senior Software Engineer, Observability

Together AI • San Francisco, CA, United States
[job_card.full_time]
Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastruct...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer, Machine Learning (Safety)

Senior Software Engineer, Machine Learning (Safety)

Discord • San Francisco, CA, United States
[job_card.full_time]
Discord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform:.Over 90% of our users play games, spending a comb...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer - AI

Senior Software Engineer - AI

CorVel • San Francisco, CA, United States
[job_card.full_time]
As our business continues to grow, we are expanding our capabilities in harnessing cutting-edge AI advancements.We seek a highly motivated problem solver who thrives in addressing complex challenge...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Senior ML Inference Engineer - PyTorch Performance

Senior ML Inference Engineer - PyTorch Performance

Comfy • San Francisco, CA, United States
[job_card.full_time]
A leading AI platform company in San Francisco is seeking a talented individual to optimize model inference for their advanced visual AI product.The ideal candidate will engage in building efficien...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer, Model Inference

Software Engineer, Model Inference

OpenAI • San Francisco, CA, United States
[job_card.full_time]
Our Inference team brings OpenAI's most capable research and technology to the world through our products.We empower consumers, enterprise and developers alike to use and access our state-of-the-ar...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer - Life Sciences

Senior Software Engineer - Life Sciences

McKinsey & Company • San Francisco, CA, United States
[job_card.full_time]
Senior Software Engineer - Life Sciences.As a Senior Software Engineer with the Life Sciences Data and Analytics CoE, you will develop full stack software applications in a Life Sciences context, l...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior/Staff Software Engineer, Inference

Senior/Staff Software Engineer, Inference

Anthropic • San Francisco, CA, United States
[job_card.full_time]
Senior/Staff Software Engineer, Inference.Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a ...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer, Inference

Software Engineer, Inference

Trypulse • San Francisco, CA, United States
[job_card.full_time]
Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale.We have a breakthrough approach to docume...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer

Senior Software Engineer

Elicit • Oakland, CA, United States
[job_card.full_time]
Elicit is an AI research assistant that uses language models to help researchers figure out what's true and make better decisions, starting with common research tasks like literature review.Elicit ...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Senior Software Engineer AI Systems & Research Infrastructure

Senior Software Engineer AI Systems & Research Infrastructure

Jobr • Emeryville, CA, United States
[job_card.full_time]
A private foundation in Emeryville, CA, is seeking a Senior Software Engineer to help reverse-engineer the human brain.The ideal candidate will architect high-performance infrastructure, design cus...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Software Engineer, Model Serving

Senior Software Engineer, Model Serving

Databricks Inc. • San Francisco, CA, United States
[job_card.full_time]
At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical ...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer - Machine Learning

Senior Software Engineer - Machine Learning

Rippling • San Francisco, CA, United States
[job_card.full_time]
Senior Software Engineer - Machine Learning.Rippling gives businesses one place to run HR, IT, and Finance.It brings together all of the workforce systems that are normally scattered across a compa...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer - Simulation Graphics and AI/ML

Senior Software Engineer - Simulation Graphics and AI/ML

Zoox • San Mateo, CA, United States
[job_card.full_time]
Senior Software Engineer - Simulation Graphics and AI/ML.The 3D Simulation Group at Zoox is looking for 3D graphics engineers to simulate sensors (lidar, radar, cameras), combining modern graphics ...[show_more]
[last_updated.last_updated_1_day] • [promoted]