Talent.com
Engineering Manager, Support and Customer Engineering
Engineering Manager, Support and Customer EngineeringBaseten • San Francisco, CA, US
[error_messages.no_longer_accepting]
Engineering Manager, Support and Customer Engineering

Engineering Manager, Support and Customer Engineering

Baseten • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

ABOUT BASETEN

Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. With our recent $150M Series D funding, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we're scaling our team to meet accelerating customer demand.

The Role As an Engineering Manager (Player & Coach) focused on Support and Customer Engineering, you'll lead a team responsible for the performance, reliability, and success of large-scale ML workloads in production. Applying both hands-on technical ownership and managerial leadership, you will guide your team through complex incidents while improving observability and operational practices and shaping how we deliver world-class AI infrastructure support to our customers. While you will actively coach and grow your team, you'll also stay close to the technology including diving into runtime debugging, optimizing GPU utilization, and helping evolve the Baseten platform based on real-world patterns and customer feedback.

Example Initiatives

Take a look at these blog posts written by members of our Forward Deployed Engineering team

Forward Deployed Engineering on the frontier of AI

The fastest, most accurate Whisper transcription

Deploy production-ready model servers from Docker images

Deploy custom ComfyUI workflows as APIs

Responsibilities Lead, mentor, and scale a team of Support Engineers specializing in AI and ML production environments, fostering technical depth, accountability, and a customer-first mindset.

Serve as a player-coach, directly contributing to complex troubleshooting, inference optimization, and incident resolution for high-value enterprise customers.

Diagnose and resolve runtime issues impacting model performance, such as latency spikes, memory pressure, GPU scheduling, and concurrency management.

Debug Kubernetes infrastructure (pods, controllers, networking) and observability stacks using tools like Grafana, Loki, and Prometheus.

Own critical incidents end-to-end — coordinating across Engineering, Product, and Sales to ensure timely resolution, transparent communication, and SLA compliance.

Drive continuous improvement by enhancing diagnostic runbooks, refining alerting strategies, and developing internal automation for faster root-cause analysis.

Collaborate with product and platform teams to surface insights from production issues — shaping roadmap priorities around reliability, inference efficiency, and operational scalability.

Lead initiatives that enhance observability, monitoring, and alerting for AI workloads across distributed compute environments.

Balance tactical execution with strategic vision, ensuring your team not only resolves today's issues but also builds systems that prevent tomorrow's.

Requirements Proven experience leading or mentoring technical teams in Support Engineering, Infrastructure, or Site Reliability within production AI / ML or distributed systems environments.

Deep Kubernetes troubleshooting expertise, including advanced resource debugging, runtime performance analysis, and observability-driven diagnostics.

Hands-on experience managing distributed systems or AI products at scale — optimizing GPU / CPU utilization, batch sizing, concurrency, and memory efficiency.

Expertise with observability and monitoring tools (Grafana, Prometheus, Loki) and alerting best practices.

Skilled in incident management and customer escalation handling, with a proven ability to drive clarity and confidence in high-stakes situations.

Demonstrated project management and organizational skills, capable of orchestrating multi-stakeholder efforts from incident triage through resolution and RCA.

Bonus / Nice-to-Have Experience implementing or managing incident-response and ticketing systems (e.g., Zendesk, Pylon).

BENEFITS Competitive compensation, including meaningful equity.

100% coverage of medical, dental, and vision insurance for employee and dependents

Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)

Paid parental leave

Company-facilitated 401(k)

Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

J-18808-Ljbffr

[job_alerts.create_a_job]

Engineering Manager • San Francisco, CA, US

[internal_linking.similar_jobs]
Senior Engineering Manager, Account Management Platform - ThousandEyes

Senior Engineering Manager, Account Management Platform - ThousandEyes

Cisco Systems, Inc. • San Francisco, CA, United States
[job_card.full_time]
The application window is expected to close on : 12 / 18 / 25.NOTE : Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.Cisco ThousandEyes...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, Customer Experience & Operations

Engineering Manager, Customer Experience & Operations

Whatnot • San Francisco, CA, United States
[job_card.full_time]
Join the Future of Commerce with Whatnot!.Whatnot is the largest live shopping platform in North America and Europe to buy, sell, and discover the things you love. We’re re-defining e-commerce by bl...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager

Engineering Manager

Forethought Technologies Inc. • San Francisco, CA, United States
[job_card.full_time]
Launched in 2018, Forethought is the first AI-native platform for enterprise customer support, built on a multi-agent architecture for omnichannel resolution. Trusted by leading companies like Upwor...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Manager, Endpoint Engineering

Manager, Endpoint Engineering

Zoox • Foster City, CA, US
[job_card.full_time]
Zoox is seeking a highly motivated and experienced IT Endpoint Engineering Manager to lead our dynamic Endpoint Engineering (also known as Client Platform Engineering) team.This pivotal role is res...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineering Manager, Account Services

Software Engineering Manager, Account Services

Crunchyroll • San Francisco, CA, United States
[job_card.full_time]
Software Engineering Manager, Account Services at Crunchyroll.This role leads the Account Services Team responsible for building and maintaining account services at massive, multi-million user scal...[show_more]
[last_updated.last_updated_30] • [promoted]
Support Engineering Manager

Support Engineering Manager

Canonical • San Francisco, CA, United States
[job_card.full_time]
Be among the first 25 applicants.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely u...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager - Platform Abuse Response

Engineering Manager - Platform Abuse Response

Cloudflare, Inc. • San Francisco, CA, United States
[job_card.full_time]
At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Engineering Manager, Integrations & Platform

Senior Engineering Manager, Integrations & Platform

Trunk Tools • San Francisco, CA, United States
[job_card.full_time]
A leading AI company in construction is seeking an Engineering Manager to lead a high-impact team focused on integrating with external systems. The role involves overseeing the integration platform ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Manager I, Customer Success Engineering

Manager I, Customer Success Engineering

OpenGov • San Francisco, CA, United States
[job_card.full_time]
OpenGov is the leader in AI and ERP solutions for local and state governments in the U.More than 2,000 cities, counties, state agencies, school districts, and special districts rely on the OpenGov ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager

Engineering Manager

GTMnow • San Francisco, CA, United States
[job_card.full_time]
Owner is the all-in-one platform that restaurants use to succeed online.Thousands of restaurant owners use our tools to build their website, drive online orders, create their own branded app, manag...[show_more]
[last_updated.last_updated_30] • [promoted]
Enablement Engineering Manager

Enablement Engineering Manager

Lumafield • San Francisco, CA, United States
[job_card.full_time]
Lumafield was founded in 2019 to upgrade manufacturing.We are engineers with deep experience across the product development cycle, from initial ideas to shipping hardware, across industries and spe...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, Desktop

Engineering Manager, Desktop

Anthropic • San Francisco, CA, United States
[job_card.full_time]
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Technical Support Customer Success Manager

Senior Technical Support Customer Success Manager

Qualys • Foster City, CA, United States
[job_card.full_time]
Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.Technical Support Customer Success Manager will be responsible for managing key ...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineering Manager, Consumer Solutions

Software Engineering Manager, Consumer Solutions

GoodLeap • San Mateo, CA, US
[job_card.full_time]
GoodLeap’s Consumer Solutions Business Unit is redefining how homeowners manage and benefit from sustainable home solutions. This is a high-impact role for a technical leader.You will own outc...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Desktop

Engineering Manager, Desktop

Menlo Ventures • San Francisco, CA, United States
[job_card.full_time]
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Experienced Engineering Manager - Customer Growth & Experience

Experienced Engineering Manager - Customer Growth & Experience

Plaid • San Francisco, CA, US
[job_card.full_time]
The Customer Growth and Experience (CGX) org's mission is to accelerate customer velocity and create delightful experiences at the most pivotal stages of our customers’ journeys with Plai...[show_more]
[last_updated.last_updated_30] • [promoted]
Solutions Engineering Manager, ASEAN

Solutions Engineering Manager, ASEAN

Cloudflare • San Francisco, CA, United States
[job_card.full_time]
Solutions Engineering Manager, ASEAN.At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the worlds largest networks that powers millions of websites an...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Support Engineering Manager

Support Engineering Manager

Retool • San Francisco, CA, United States
[job_card.full_time]
Nebarly every company in the world runs on custom software for critical operations such as tracking performance metrics, handling customer support workflows, building admin dashboards, and many oth...[show_more]
[last_updated.last_updated_variable_days] • [promoted]