This isn’t your typical QA role – it’s a unique blend of technical engineering, machine learning evaluation, and data analysis.You’ll work closely with cutting-edge conversational AI technology, de...[show_more][last_updated.last_updated_30]
[promoted]
AIML - Sr. Software Development Engineer, Evaluation
AppleCupertino, CA, United States
[job_card.full_time]
At Apple, we create world-class innovative products that seamlessly combine cutting-edge hardware with intelligent software experiences, powered by advanced machine learning technologies.The Evalua...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Manager
Tekfortune IncSan Jose, CA, United States
[job_card.permanent]
Tekfortune is a fast-growing consulting firm specialized in permanent, contract & project-based staffing services for world's leading organizations in a broad range of industries.In this quickly ch...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Coordinator
InsideHigherEdStanford, California, United States
[job_card.full_time] +1
Dean of Research, Stanford, California, United States.Administration📅Mar 27, 2026 Post Date📅108594 Requisition #.The Ginzton Laboratory houses the research operations of 20 Principal Invest...[show_more][last_updated.last_updated_variable_days]
[promoted]
Human Evaluation & Content Quality Vendor Operations Manager
US Tech SolutionsMountain View, CA, United States
[job_card.temporary]
Location: Mountain View, CA (Hybrid).As a Human Evaluation & Content Quality Vendor Operations Manager, you will play a key role in scaling and optimizing our global human evaluation ecosystem - th...[show_more][last_updated.last_updated_1_day]
[promoted]
Program Manager
eTeamSan Jose, CA, United States
[job_card.full_time]
Candidate will run ( manage) Cleint's B2B data and analytics program, coordinate the team members and stakeholders, across workstreams, Jira intake process, newsletters, ensure decks and documents ...[show_more][last_updated.last_updated_variable_days]
[new]
Research Scientist (Model Evaluation)
SanasPalo Alto, California, United States, 94301
[job_card.full_time]
[filters_job_card.quick_apply]
Sanas is pioneering the future of human communication.Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time spee...[show_more][last_updated.last_updated_variable_hours]
[promoted]
Program Manager
Insight GlobalSan Jose, CA, United States
[job_card.full_time]
Support the launch of One Card, a debit card with integrated payment functions.Partner with Product to translate product requirements into executable program plans.Own program level requirements an...[show_more][last_updated.last_updated_variable_days]
Program Associate
StanfordStanford, CA, United States
[job_card.full_time]
The Hoover Institution at Stanford University is seeking qualified candidates for the full-time position of Program Associate.To ensure your application information is captured in our official file...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Manager
Pride GlobalCupertino, CA, United States
[job_card.full_time]
Months with Possible Extension.We are seeking a highly capable and dynamic New Product Introduction Operations Program Manager (NPI - OPM) to lead our team in successfully launching Beats products ...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Manager
Super Micro ComputerSan Jose, CA, United States
[job_card.full_time]
Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customer...[show_more][last_updated.last_updated_variable_days]
[promoted]
Software Engineer, Data & Evaluation
WaymoMountain View, CA, United States
[job_card.full_time]
Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...[show_more][last_updated.last_updated_variable_days]
[promoted]
Software Engineer, Metrics, GenAI Model Evaluation
TeslaPalo Alto, CA, United States
[job_card.full_time]
The AI Evaluation team is the main line of defense in ensuring customer safety.Working alongside our AI team, you will design metrics that utilize fleet data and run on large inference clusters to ...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Manager
ACL DigitalSan Jose, CA, United States
[job_card.permanent]
The Program Manager is responsible for customer product delivery schedules, planning, managing, and tracking.The Program Manager reports to the Chief Executive Officer.Leads all phases of assigned ...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Manager
Kaav Inc.Sunnyvale, CA, United States
[job_card.full_time]
Proven ability to drive results, especially in a matrixed or influence based environment.Experience and deep understanding of software development methodologies and principles (including SDLC, Lean...[show_more][last_updated.last_updated_variable_days]
[promoted]
Staff Program Manager
Vaco by HighspringMountain View, CA, US
[job_card.full_time]
Come join an Expert Network Team as a Staff Program Manager, driving readiness for changes impacting a large customer success expert ecosystem.This AI-driven platform and expert network includes th...[show_more][last_updated.last_updated_variable_days]
[promoted]
Full-stack Engineer, Data Platform - Experimentation & Evaluation
Tik TokSan Jose, CA, United States
[job_card.full_time]
Team Introduction Our mission in experimentation and evaluation team is to build the next-gen A/B testing platform, that empowers the company to make data-driven decision for the products.The suppo...[show_more][last_updated.last_updated_variable_days]
[promoted]
Program Manager
CyngnMountain View, CA, United States
[job_card.full_time]
Based in Mountain View, CA, Cyngn is a publicly-traded autonomous technology company.We deploy self-driving industrial vehicles like forklifts and tuggers to factories, warehouses, and other facili...[show_more][last_updated.last_updated_variable_days]
Director, Simulation and Evaluation - Autonomous Driving
Bosch GroupSunnyvale, California, United States
[job_card.full_time]
Company Description* *We Are Bosch.At Bosch, we shape the future by inventing high-quality technologies and services that spark enthusiasm and enrich people’s lives.Our areas of activity are every ...[show_more][last_updated.last_updated_variable_days]
About the Role: We are seeking a LLM Evaluation Engineer to join a forward-thinking team responsible for developing a sophisticated voice assistant platform. This isn’t your typical QA role – it’s a unique blend of technical engineering, machine learning evaluation, and data analysis. You’ll work closely with cutting-edge conversational AI technology, designing evaluation frameworks, building custom scripts, and creating data visualizations to assess platform performance.
Key Responsibilities:
Design and implement evaluation strategies for voice and language models, including automated testing approaches.
Analyze unstructured data from log store systems to identify performance gaps and optimize user experiences.
Build and maintain custom Python scripts to streamline data processing and generate actionable insights.
Develop visual reports to communicate findings and drive continuous improvement.
Collaborate with cross-functional teams globally to identify and address pain points in conversational AI performance.
Use prompt engineering techniques to refine LLM outputs and articulate system health.
Ideal Candidate:
3+ years of experience in machine learning evaluation, data analysis, or related technical roles.
Intermediate to advanced Python scripting, including log parsing and API testing.
Familiarity with GenAI and LLMs, including automated workflows and API integrations.
Strong analytical mindset, capable of working independently and identifying innovative solutions.
Excellent communication skills, able to present complex findings clearly to both technical and non-technical stakeholders.