Performance engineer [h1.location_city]
[job_alerts.create_a_job]
Performance engineer • berkeley ca
Software Engineer (AI Performance)
Gimlet Labs, IncSan Francisco, CA, United StatesSenior Software Engineer - Compute Performance
LambdaSan Francisco, California, United StatesFounding Resilience Engineer — Performance & Observability
PersonaSan Francisco, CA, United StatesPerformance ML Engineer : CUDA, GPU Systems
RelaceSan Francisco, CA, United States- [promoted]
Building Performance Engineer
Harrison Consulting SolutionsSan Francisco, California, USAGPU Systems Engineer : High-Performance C++
10X Recruiting PartnersSan Francisco, CA, United StatesSenior ML Inference Engineer - PyTorch Performance
ComfySan Francisco, CA, United StatesGPU Performance Engineer
Genmo Inc.San Francisco, CA, United StatesSr Building Performance Engineer
HGASan Francisco, CA, USSenior HPC Performance Engineer
NVIDIARemote, CA, USBackend Engineer - High-Performance Search Systems
ExaSan Francisco, CA, United StatesPerformance Engineer
Menlo VenturesSan Francisco, CA, United StatesPerformance Modelling Engineer
PageBolt WordPressSan Francisco, CA, United States- [promoted]
HPC / AI Data Performance Engineer
Lawrence Berkeley National LaboratoryBerkeley, CA, United StatesSr. Software Engineer - Performance
DatabricksSan Francisco, CaliforniaSenior Performance Engineer
VirtualVocationsOakland, California, United StatesPerformance Engineer SMTS
Salesforce, Inc.San Francisco, CA, United StatesProduct Performance Engineer
OpenAISan FranciscoSoftware Engineer (AI Performance)
Gimlet Labs, IncSan Francisco, CA, United States- [job_card.full_time]
Gimlet Labs is building the foundation for the next generation of AI applications. As generative AI workloads rapidly scale, inference efficiency is becoming the critical bottleneck. Gimlet is redefining AI inference from the ground up, combining cutting-edge research with an integrated hardware-software stack that delivers breakthrough performance, efficiency, and model quality. Gimlet pairs its inference stack with a seamless developer experience, allowing users to deploy, manage, and monitor AI workloads from frameworks like PyTorch and LangChain at production scale in seconds.
Gimlet is spun out of a Stanford research project under Professors Zain Asgar and Sachin Katti. The founding team has deep experience across AI, distributed systems, and hardware with previous successful exits.
Gimlet Labs is seeking a Software Engineer focused on AI Performance. You will be researching and implementing techniques to drive performance and quality optimizations across the latest AI models. You will implement techniques such as quantization, KV caching, and FlashAttention to enable inference efficiency. You will design parallelism strategies to distribute data and workloads across compute nodes at production scale. You will dive deep into GPU code and kernel optimizations to accelerate AI workloads.
Responsibilities
- Evaluating and implementing cutting-edge AI research for model performance and efficiency
- Architecting infrastructure for distributed AI workloads across both the software stack and GPU kernel layers
- Profiling, benchmarking, and analyzing system performance, identifying bottlenecks and optimization opportunities in execution runtimes targeting various hardware systems
Qualifications
Preferred Qualifications
#J-18808-Ljbffr