This isn’t your typical QA role – it’s a unique blend of technical engineering, machine learning evaluation, and data analysis.
You’ll work closely with cutting-edge conversational AI technology, de...[show_more][last_updated.last_updated_30]
[promoted]
Member of Technical Staff, Evaluation
Boson AISanta Clara, CA, US
[job_card.full_time]
Boson AI is an early-stage startup building large language tools for everyone to use.Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientist...[show_more][last_updated.last_updated_30]
[promoted]
Program Coordinator (Adult Day Program)
Friends of Children with Special NeedsSan Jose, CA, US
[job_card.temporary]
Salary : $28 - $36 / hourly (depending on experience).Friends of Children with Special Needs.FCSN) is a Bay Area non-profit organization founded in 1996 and focused on helping individuals with speci...[show_more][last_updated.last_updated_30]
Senior Software Engineer, ML Systems Evaluation
ASunnyvale, California, United States
[job_card.full_time]
Our Wayfinder team is building scalable, certifiable autonomy systems to power the next generation of commercial aircraft.
Our team of experts is driving the maturation of machine learning and other...[show_more][last_updated.last_updated_variable_days]
[promoted]
Member of Technical Staff, Model Evaluation
xAIPalo Alto, CA, US
[job_card.full_time]
AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
Our team is small, highly motivated, and focused on engineering exc...[show_more][last_updated.last_updated_30]
[promoted]
Program Manager
CBRE GroupMenlo Park, CA, US
[job_card.full_time]
As a CBRE Program Manager, you will manage a team responsible for facilitating small to medium cross-functional projects and programs.
This job is part of the Program Management function.They are re...[show_more][last_updated.last_updated_30]
Evaluation Consultant
Paradise Architectural PANELS & STEELSan Jose, California, USA
[job_card.full_time]
Paradise Architectural PANELS & STEEL is a leading manufacturer of high-quality architectural panels and steel products.We are committed to providing our clients with innovative and sustainable sol...[show_more][last_updated.last_updated_variable_days]
Applied Science Manager, GenAI Evaluation Media (GEM)
AmazonSunnyvale, California, USA
[job_card.full_time]
Passionate about creating visual customer experiences that push the boundaries at the forefront of GenAI.The North America Stores GenAI Evaluation Media (GEM) team is seeking an experienced Applied...[show_more][last_updated.last_updated_variable_days]
Senior Project Manager, Post-Market Safety Evaluation.Abbott is a global healthcare leader that helps people live more fully at all stages of life.
Our portfolio of life-changing technologies spans ...[show_more][last_updated.last_updated_variable_days]
[promoted]
Responsible AI ML Engineer – Safety & Evaluation
Apple Inc.Cupertino, CA, United States
[job_card.full_time]
A leading technology company in Cupertino seeks a Machine Learning Engineer focused on Responsible AI.You'll work on developing evaluations for safety and fairness in generative AI applications, co...[show_more][last_updated.last_updated_variable_days]
[promoted]
Human Evaluation & Content Quality Vendor Operations Manager
US Tech SolutionsMountain View, CA, US
[job_card.temporary]
Human Evaluation & Content Quality Vendor Operations Manager.Location : Mountain View, CA (Hybrid) Duration : 5 months contract.
Job Description : As a Human Evaluation & Content Quality Vendor Operati...[show_more][last_updated.last_updated_30]
[new]
System Safety Engineer Autonomous Driving - AV Risk Evaluation
Applied IntuitionSunnyvale, CA, United States
[job_card.full_time]
Applied Intuition is the vehicle intelligence company that accelerates the global adoption of safe, AI-driven machines.Founded in 2017 and now valued at $15 billion following its recent Series F fu...[show_more][last_updated.last_updated_variable_hours]
[promoted]
Director, Simulation Evaluation
WaymoMountain View, CA, United States
[job_card.full_time]
Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...[show_more][last_updated.last_updated_30]
[new]
Senior Manager - Search, Evaluation Program Management - Apple Maps
AppleCupertino, CA, United States
[job_card.full_time]
Apple Maps is seeking a top-tier leader for our human evaluation team and ensure the quality and relevance of Maps features! We need someone to lead a team who provides data and insights to improve...[show_more][last_updated.last_updated_variable_hours]
[promoted]
Product Manager, Evaluation & Data Generation
Hippocratic AIPalo Alto, CA, US
[job_card.full_time]
Hippocratic AI is seeking a PM to lead the development of our model evaluation and data generation platform.In this role, you'll drive the creation of high-quality training and test datasets that i...[show_more][last_updated.last_updated_30]
[promoted]
Program Management - Program Manager V
eTeamMenlo Park, CA, US
[job_card.full_time]
Location : Remote (EST & CST Preferred).Duration : 6 months (Potential for extension).Lead end-to-end risk assessments for projects and initiatives involving people data, maintaining accountability t...[show_more][last_updated.last_updated_30]
TLM, Autonomy Evaluation
NuroMountain View, California
[job_card.full_time]
Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automoti...[show_more][last_updated.last_updated_30]
Wireless Technologies Evaluation Engineer
Tata Consultancy ServicesCupertino, CA
[job_card.full_time]
In this role, you will be part of Product RF Definition team and support the Evaluation and Characterization of various technologies from RF perspective.
You will work independently under Product RF...[show_more][last_updated.last_updated_30]
We are seeking a LLM Evaluation Engineer to join a forward-thinking team responsible for developing a sophisticated voice assistant platform. This isn’t your typical QA role – it’s a unique blend of technical engineering, machine learning evaluation, and data analysis. You’ll work closely with cutting-edge conversational AI technology, designing evaluation frameworks, building custom scripts, and creating data visualizations to assess platform performance.
Key Responsibilities :
Design and implement evaluation strategies for voice and language models, including automated testing approaches.
Analyze unstructured data from log store systems to identify performance gaps and optimize user experiences.
Build and maintain custom Python scripts to streamline data processing and generate actionable insights.
Develop visual reports to communicate findings and drive continuous improvement.
Collaborate with cross-functional teams globally to identify and address pain points in conversational AI performance.
Use prompt engineering techniques to refine LLM outputs and articulate system health.
Ideal Candidate :
3+ years of experience in machine learning evaluation, data analysis, or related technical roles.
Intermediate to advanced Python scripting, including log parsing and API testing.
Familiarity with GenAI and LLMs, including automated workflows and API integrations.
Strong analytical mindset, capable of working independently and identifying innovative solutions.
Excellent communication skills, able to present complex findings clearly to both technical and non-technical stakeholders.