Talent.com
Engineering Manager, Support and Customer Engineering
Engineering Manager, Support and Customer EngineeringBaseten • San Francisco, CA, US
Engineering Manager, Support and Customer Engineering

Engineering Manager, Support and Customer Engineering

Baseten • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

ABOUT BASETEN

Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. With our recent $150M Series D funding, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we're scaling our team to meet accelerating customer demand.

The Role As an Engineering Manager (Player & Coach) focused on Support and Customer Engineering, you'll lead a team responsible for the performance, reliability, and success of large-scale ML workloads in production. Applying both hands-on technical ownership and managerial leadership, you will guide your team through complex incidents while improving observability and operational practices and shaping how we deliver world-class AI infrastructure support to our customers. While you will actively coach and grow your team, you'll also stay close to the technology including diving into runtime debugging, optimizing GPU utilization, and helping evolve the Baseten platform based on real-world patterns and customer feedback.

Example Initiatives

Take a look at these blog posts written by members of our Forward Deployed Engineering team

Forward Deployed Engineering on the frontier of AI

The fastest, most accurate Whisper transcription

Deploy production-ready model servers from Docker images

Deploy custom ComfyUI workflows as APIs

Responsibilities Lead, mentor, and scale a team of Support Engineers specializing in AI and ML production environments, fostering technical depth, accountability, and a customer-first mindset.

Serve as a player-coach, directly contributing to complex troubleshooting, inference optimization, and incident resolution for high-value enterprise customers.

Diagnose and resolve runtime issues impacting model performance, such as latency spikes, memory pressure, GPU scheduling, and concurrency management.

Debug Kubernetes infrastructure (pods, controllers, networking) and observability stacks using tools like Grafana, Loki, and Prometheus.

Own critical incidents end-to-end — coordinating across Engineering, Product, and Sales to ensure timely resolution, transparent communication, and SLA compliance.

Drive continuous improvement by enhancing diagnostic runbooks, refining alerting strategies, and developing internal automation for faster root-cause analysis.

Collaborate with product and platform teams to surface insights from production issues — shaping roadmap priorities around reliability, inference efficiency, and operational scalability.

Lead initiatives that enhance observability, monitoring, and alerting for AI workloads across distributed compute environments.

Balance tactical execution with strategic vision, ensuring your team not only resolves today's issues but also builds systems that prevent tomorrow's.

Requirements Proven experience leading or mentoring technical teams in Support Engineering, Infrastructure, or Site Reliability within production AI / ML or distributed systems environments.

Deep Kubernetes troubleshooting expertise, including advanced resource debugging, runtime performance analysis, and observability-driven diagnostics.

Hands-on experience managing distributed systems or AI products at scale — optimizing GPU / CPU utilization, batch sizing, concurrency, and memory efficiency.

Expertise with observability and monitoring tools (Grafana, Prometheus, Loki) and alerting best practices.

Skilled in incident management and customer escalation handling, with a proven ability to drive clarity and confidence in high-stakes situations.

Demonstrated project management and organizational skills, capable of orchestrating multi-stakeholder efforts from incident triage through resolution and RCA.

Bonus / Nice-to-Have Experience implementing or managing incident-response and ticketing systems (e.g., Zendesk, Pylon).

BENEFITS Competitive compensation, including meaningful equity.

100% coverage of medical, dental, and vision insurance for employee and dependents

Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)

Paid parental leave

Company-facilitated 401(k)

Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

J-18808-Ljbffr

[job_alerts.create_a_job]

Engineering Manager • San Francisco, CA, US

[internal_linking.related_jobs]
Senior Engineering Manager, Account Management Platform - ThousandEyes

Senior Engineering Manager, Account Management Platform - ThousandEyes

Cisco Systems, Inc. • San Francisco, CA, United States
[job_card.full_time]
The application window is expected to close on : 12 / 18 / 25.NOTE : Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.Cisco ThousandEyes...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, R2 Storage

Engineering Manager, R2 Storage

Cloudflare, Inc. • San Francisco, CA, United States
[job_card.full_time]
At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Software Engineering Manager, Account Services

Software Engineering Manager, Account Services

Crunchyroll • San Francisco, CA, United States
[job_card.full_time]
Software Engineering Manager, Account Services at Crunchyroll.This role leads the Account Services Team responsible for building and maintaining account services at massive, multi-million user scal...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager

Engineering Manager

Sigma Computing • San Francisco, CA, US
[job_card.full_time]
Sigma is transforming how businesses run by delivering a high performance platform on the modern data architecture.As we grow the engineering team, we are looking for engineering leaders who want t...[show_more]
[last_updated.last_updated_30] • [promoted]
Support Engineering Manager

Support Engineering Manager

Canonical • San Francisco, CA, United States
[job_card.full_time]
Be among the first 25 applicants.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely u...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, Desktop

Engineering Manager, Desktop

anthropic • San Francisco, CA, United States
[job_card.full_time]
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager

Engineering Manager

GTMnow • San Francisco, CA, United States
[job_card.full_time]
Owner is the all-in-one platform that restaurants use to succeed online.Thousands of restaurant owners use our tools to build their website, drive online orders, create their own branded app, manag...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Technical Support Customer Success Manager

Senior Technical Support Customer Success Manager

Qualys • Foster City, CA, United States
[job_card.full_time]
Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.Technical Support Customer Success Manager will be responsible for managing key ...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, Customer Dashboard Experience

Engineering Manager, Customer Dashboard Experience

Checkr • San Francisco, CA, US
[job_card.full_time]
Engineering Manager, Customer Dashboard Experience.Checkr is building the data platform to power safe and fair decisions. Established in 2014, Checkr's innovative technology and robust data platform...[show_more]
[last_updated.last_updated_30] • [promoted]
Enablement Engineering Manager

Enablement Engineering Manager

Lumafield • San Francisco, CA, United States
[job_card.full_time]
Lumafield was founded in 2019 to upgrade manufacturing.We are engineers with deep experience across the product development cycle, from initial ideas to shipping hardware, across industries and spe...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Desktop

Engineering Manager, Desktop

Anthropic • San Francisco, CA, United States
[job_card.full_time]
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, NetSuite

Engineering Manager, NetSuite

VirtualVocations • Oakland, California, United States
[job_card.full_time]
A company is looking for an Engineering Manager, NetSuite.Key Responsibilities Lead, mentor, and develop a team of NetSuite engineers, ensuring collaboration and accountability Guide technical d...[show_more]
[last_updated.last_updated_less] • [promoted] • [new]
Solutions Engineering Manager, ASEAN

Solutions Engineering Manager, ASEAN

Cloudflare • San Francisco, CA, United States
[job_card.full_time]
Solutions Engineering Manager, ASEAN.At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the worlds largest networks that powers millions of websites an...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Manager, Solutions Engineering

Senior Manager, Solutions Engineering

Intercom • San Francisco, CA, United States
[job_card.full_time]
Intercom is the AI Customer Service company on a mission to help businesses provide incredible customer experiences.Our AI agent Fin, the most advanced customer service AI agent on the market, lets...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Support Engineering Manager

Support Engineering Manager

Retool • San Francisco, CA, United States
[job_card.full_time]
Nebarly every company in the world runs on custom software for critical operations such as tracking performance metrics, handling customer support workflows, building admin dashboards, and many oth...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Benefits Flex Platform

Engineering Manager, Benefits Flex Platform

Rippling • San Francisco, CA, United States
[job_card.full_time]
A technology company in San Francisco is seeking an Engineering Manager for Benefits Flex Products.This role involves overseeing the development of critical features that assist in managing employe...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Hands-on Engineering Manager - Hybrid Lead & Mentor

Hands-on Engineering Manager - Hybrid Lead & Mentor

Quindar • San Francisco, CA, US
[job_card.full_time]
A leading tech company is looking for a Software Engineering Manager to lead their Software Development team in a hybrid role. This position balances hands-on development, team management, and strat...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Core Services

Engineering Manager, Core Services

Lambda • San Francisco, CA, United States
[job_card.full_time]
Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference.Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to a...[show_more]
[last_updated.last_updated_variable_days] • [promoted]