Talent.com
Engineering Manager, Support and Customer Engineering
Engineering Manager, Support and Customer EngineeringBaseten • San Francisco, CA, US
[error_messages.no_longer_accepting]
Engineering Manager, Support and Customer Engineering

Engineering Manager, Support and Customer Engineering

Baseten • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

ABOUT BASETEN

Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. With our recent $150M Series D funding, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we're scaling our team to meet accelerating customer demand.

The Role As an Engineering Manager (Player & Coach) focused on Support and Customer Engineering, you'll lead a team responsible for the performance, reliability, and success of large-scale ML workloads in production. Applying both hands-on technical ownership and managerial leadership, you will guide your team through complex incidents while improving observability and operational practices and shaping how we deliver world-class AI infrastructure support to our customers. While you will actively coach and grow your team, you'll also stay close to the technology including diving into runtime debugging, optimizing GPU utilization, and helping evolve the Baseten platform based on real-world patterns and customer feedback.

Example Initiatives

Take a look at these blog posts written by members of our Forward Deployed Engineering team

Forward Deployed Engineering on the frontier of AI

The fastest, most accurate Whisper transcription

Deploy production-ready model servers from Docker images

Deploy custom ComfyUI workflows as APIs

Responsibilities Lead, mentor, and scale a team of Support Engineers specializing in AI and ML production environments, fostering technical depth, accountability, and a customer-first mindset.

Serve as a player-coach, directly contributing to complex troubleshooting, inference optimization, and incident resolution for high-value enterprise customers.

Diagnose and resolve runtime issues impacting model performance, such as latency spikes, memory pressure, GPU scheduling, and concurrency management.

Debug Kubernetes infrastructure (pods, controllers, networking) and observability stacks using tools like Grafana, Loki, and Prometheus.

Own critical incidents end-to-end — coordinating across Engineering, Product, and Sales to ensure timely resolution, transparent communication, and SLA compliance.

Drive continuous improvement by enhancing diagnostic runbooks, refining alerting strategies, and developing internal automation for faster root-cause analysis.

Collaborate with product and platform teams to surface insights from production issues — shaping roadmap priorities around reliability, inference efficiency, and operational scalability.

Lead initiatives that enhance observability, monitoring, and alerting for AI workloads across distributed compute environments.

Balance tactical execution with strategic vision, ensuring your team not only resolves today's issues but also builds systems that prevent tomorrow's.

Requirements Proven experience leading or mentoring technical teams in Support Engineering, Infrastructure, or Site Reliability within production AI / ML or distributed systems environments.

Deep Kubernetes troubleshooting expertise, including advanced resource debugging, runtime performance analysis, and observability-driven diagnostics.

Hands-on experience managing distributed systems or AI products at scale — optimizing GPU / CPU utilization, batch sizing, concurrency, and memory efficiency.

Expertise with observability and monitoring tools (Grafana, Prometheus, Loki) and alerting best practices.

Skilled in incident management and customer escalation handling, with a proven ability to drive clarity and confidence in high-stakes situations.

Demonstrated project management and organizational skills, capable of orchestrating multi-stakeholder efforts from incident triage through resolution and RCA.

Bonus / Nice-to-Have Experience implementing or managing incident-response and ticketing systems (e.g., Zendesk, Pylon).

BENEFITS Competitive compensation, including meaningful equity.

100% coverage of medical, dental, and vision insurance for employee and dependents

Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)

Paid parental leave

Company-facilitated 401(k)

Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

J-18808-Ljbffr

[job_alerts.create_a_job]

Engineering Manager • San Francisco, CA, US

[internal_linking.related_jobs]
Enterprise, Customer Success Manager

Enterprise, Customer Success Manager

RingCentral, Inc • Belmont, CA, United States
[job_card.full_time]
It's not everyday that you consider starting a new career.We're RingCentral, and we're happy that someone as talented as you is considering this role. First, a little about us, we're a $2 Billion an...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager

Engineering Manager

Omada Health • South San Francisco, CA, United States
[job_card.full_time]
Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time.Omada Health is a digital care provider that empowers people to achieve their health goals through s...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, R2 Storage

Engineering Manager, R2 Storage

Cloudflare, Inc. • San Francisco, CA, United States
[job_card.full_time]
At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager

Engineering Manager

Forethought • San Francisco, CA, US
[job_card.full_time]
Launched in 2018, Forethought is the first AI-native platform for enterprise customer support, built on a multi-agent architecture for omnichannel resolution. Trusted by leading companies like Upwor...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineering Manager, Account Services

Software Engineering Manager, Account Services

Crunchyroll • San Francisco, CA, United States
[job_card.full_time]
Software Engineering Manager, Account Services at Crunchyroll.This role leads the Account Services Team responsible for building and maintaining account services at massive, multi-million user scal...[show_more]
[last_updated.last_updated_30] • [promoted]
Support Engineering Manager

Support Engineering Manager

Canonical • San Francisco, CA, United States
[job_card.full_time]
Be among the first 25 applicants.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely u...[show_more]
[last_updated.last_updated_30] • [promoted]
Customer Support Manager

Customer Support Manager

VirtualVocations • Oakland, California, United States
[job_card.full_time]
A company is looking for a Customer Support Manager to oversee a team and enhance customer satisfaction.Key Responsibilities Manage a team of Customer Support Engineers and support global custome...[show_more]
[last_updated.last_updated_30] • [promoted]
Engineering Manager, Foundations

Engineering Manager, Foundations

P2P • San Francisco, CA, United States
[job_card.full_time]
Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the industry-standard platform for providing access ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Enterprise Foundations

Engineering Manager, Enterprise Foundations

Anthropic • San Francisco, CA, US
[job_card.full_time]
Engineering Manager, Enterprise Foundations.Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Engineering Manager, Desktop

Engineering Manager, Desktop

anthropic • San Francisco, CA, United States
[job_card.full_time]
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager

Engineering Manager

GTMnow • San Francisco, CA, United States
[job_card.full_time]
Owner is the all-in-one platform that restaurants use to succeed online.Thousands of restaurant owners use our tools to build their website, drive online orders, create their own branded app, manag...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Technical Support Customer Success Manager

Senior Technical Support Customer Success Manager

Qualys • Foster City, CA, United States
[job_card.full_time]
Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.Technical Support Customer Success Manager will be responsible for managing key ...[show_more]
[last_updated.last_updated_30] • [promoted]
Enablement Engineering Manager

Enablement Engineering Manager

Lumafield • San Francisco, CA, United States
[job_card.full_time]
Lumafield was founded in 2019 to upgrade manufacturing.We are engineers with deep experience across the product development cycle, from initial ideas to shipping hardware, across industries and spe...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Solutions Engineering Manager, ASEAN

Solutions Engineering Manager, ASEAN

Cloudflare • San Francisco, CA, United States
[job_card.full_time]
Solutions Engineering Manager, ASEAN.At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the worlds largest networks that powers millions of websites an...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Manager, Solutions Engineering

Senior Manager, Solutions Engineering

Intercom • San Francisco, CA, United States
[job_card.full_time]
Intercom is the AI Customer Service company on a mission to help businesses provide incredible customer experiences.Our AI agent Fin, the most advanced customer service AI agent on the market, lets...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Support Engineering Manager

Support Engineering Manager

Retool • San Francisco, CA, United States
[job_card.full_time]
Nebarly every company in the world runs on custom software for critical operations such as tracking performance metrics, handling customer support workflows, building admin dashboards, and many oth...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Benefits Flex Platform

Engineering Manager, Benefits Flex Platform

Rippling • San Francisco, CA, United States
[job_card.full_time]
A technology company in San Francisco is seeking an Engineering Manager for Benefits Flex Products.This role involves overseeing the development of critical features that assist in managing employe...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Engineering Manager, Core Services

Engineering Manager, Core Services

Lambda • San Francisco, CA, United States
[job_card.full_time]
Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference.Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to a...[show_more]
[last_updated.last_updated_variable_days] • [promoted]