Senior Research Engineer, LLM Evaluation and Behavioral AnalysisTogether AI • San Francisco, CA, United States

[error_messages.no_longer_accepting]

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI • San Francisco, CA, United States

[job_card.variable_days_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

About the Role

Together AI is building the fastest, most capable open-source-aligned LLMs and inference stack in the world. As part of the Turbo organization, you will be a critical bridge between cutting-edge model research and real-world behavioral reliability. This role focuses on deeply understanding model behavior - probing reasoning, tool use, function calling, multi-step interactions, and subtle failure modes - and building the evaluation systems that ensure models behave intelligently and consistently in production.

You will develop robust evaluation pipelines, design high-quality behavioral test suites, and work closely with training, post-training, inference, and product teams to identify regressions, shape datasets, and influence model improvements. Your work will directly define how Together measures model quality and reliability across releases.

Responsibilities

Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors.
Develop specialized evaluation suites for :
Function calling - argument correctness, schema adherence, tool selection, multi-function planning, and error recovery.
Agentic workflows - task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences.
Tool-augmented interactions - search, retrieval, code execution, API-driven actions.
Create CI / CD automated pipelines for A / B comparisons, regression detection, behavioral drift monitoring, and adversarial probing.
Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains.
Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements.
Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases.
Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers.

Requirements

Strong engineering skills with Python, evaluation tooling, and distributed workflows.

Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming.

Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns.

Experience designing experiments, building datasets, and interpreting noisy behavioral signals.

Understanding offunction calling and structured output formats.

Familiarity with GPU or distributed compute environments.

Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines.

Experience with multi-turn or multi-step reasoning tasks.

Familiarity with inference systems, distributed infrastructure, or post-training workflows.

Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures.

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets including FlashAttention, Hyena, FlexGen, ATLAS, and RedPajama. We invite you to join a passionate group of researchers and engineers in building the next generation of AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits. The US base salary range for this full-time position is : $220,000 - $270,000 + equity + benefits. Compensation varies by location, level, and experience.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal opportunity to all individuals regardless of race, color, ancestry, religion, sex, sexual orientation, national origin, age, citizenship, marital status, disability, gender identity, veteran status, or other protected characteristics.

Please see our privacy policy at https : / / www.together.ai / privacy

[job_alerts.create_a_job]

Senior Research Engineer Llm • San Francisco, CA, United States

[internal_linking.similar_jobs]

Senior Development Evaluation Analyst

VirtualVocations • Oakland, California, United States

[job_card.full_time]

Development Evaluation Analyst.Key Responsibilities Design and maintain dashboards, scorecards, KPIs, and reports using data visualization tools Gather business requirements and conduct advanced...[show_more]

[last_updated.last_updated_1_day] • [promoted]

Senior Analytics Engineer

Color • South San Francisco, CA, US

[job_card.full_time]

Color Health is revolutionizing cancer care with the nation’s first Virtual Cancer Clinic, delivering high-quality, physician-led multidisciplinary care across all 50 states.Our innovative, g...[show_more]

[last_updated.last_updated_30] • [promoted]

Applied AI / ML Engineer

Catalyst Labs • Menlo Park, CA, US

[job_card.full_time]

Catalyst Labs is a leading talent agency with a specialized vertical in Applied AI, Machine Learning, and Data Science.We stand out as an agency thats deeply embedded in our clients recruitment ope...[show_more]

[last_updated.last_updated_30] • [promoted]

ML Engineer

Phizenix • Menlo Park, CA, US

[job_card.full_time] +1

Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an innovative generative AI startup that's developing diffusion-based ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI • San Francisco, CA, United States

[job_card.full_time]

Senior Research Engineer, LLM Evaluation and Behavioral Analysis.Together AI is building the fastest, most capable open‑source‑aligned LLMs and inference stack in the world.As part of the Turbo org...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Research Engineer

Harrison Clarke • Alameda, CA, United States

[job_card.full_time]

A fast-growing, deeply technical AI company is looking for a.This is an opportunity to work at the frontier of AI, helping design and evaluate models that can understand, write, and reason about co...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Institutional Research Analyst (5876U), Planning & Analysis Office - 83533

InsideHigherEd • Berkeley, California, United States

[job_card.full_time]

Senior Institutional Research Analyst (5876U), Planning & Analysis Office - 83533.At the University of California, Berkeley, we are dedicated to fostering a community where everyone feels welcome a...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Research Associate, Analytical Development

Aequita Bioworks • San Carlos, CA, US

[job_card.full_time]

We are seeking a highly effective, motivated recent graduate to join our team.You will experience all facets of building a startup at a cutting edge biotech company in San Carlos! You will work wit...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior AI Engineer — LLM Systems & Evaluation

Acceler8 Talent • San Francisco, CA, United States

[job_card.full_time]

A cybersecurity firm is looking for a Mid-Senior level Engineer to develop AI and LLM-powered solutions.Responsibilities include creating evaluation pipelines and optimizing various foundation mode...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Applied Research Engineer - AI & LLM Evaluation

Mercor • San Francisco, CA, United States

[job_card.full_time]

An innovative AI company in San Francisco is seeking a Research Engineer to contribute to the advancement of AI models.The role involves working on post-training and evaluation tasks, designing exp...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Senior Research Engineer, LLM

GENIES INC • San Francisco, CA, United States

[job_card.full_time]

Genies is an avatar technology company powering the next era of interactive digital identity through AI companions.With the Avatar Framework and intuitive creation tools, Genies enables developers,...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

AI Incubator - Senior Data Engineer

Sprinter Health • Menlo Park, CA, US

[job_card.full_time]

At Sprinter Health, our mission is reimagining how people access care by bringing it directly to their homes.Nearly 30% of patients in the U. For many, the ER becomes their first touchpoint with the...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Research Engineer, LLM Data

MLabs • San Francisco, CA, United States

[job_card.full_time]

Our client is a research lab that provides post-training data and RL environments to foundation model labs and frontier applied AI companies. They have raised significant funding from top-tier VCs a...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

GenAI Evaluation Scientist Enterprise LLM Systems

Scale AI • San Francisco, CA, United States

[job_card.full_time]

A leading AI technology company is seeking an AI Research Engineer to join their Enterprise Evaluations team.In this critical role, you will enhance evaluation systems for LLM-powered workflows.Can...[show_more]

[last_updated.last_updated_30] • [promoted]

Head of Research, Evaluation, and Insights

XQ • Oakland, CA, US

[job_card.full_time]

XQ Institute is the nation's leading organization dedicated to rethinking the high school experience so that every student graduates ready to succeed in life. We work in communities nationwide, ...[show_more]

[last_updated.last_updated_30] • [promoted]

Lead Research Engineer, Model Evaluations Platform

Anthropic • San Francisco, CA, United States

[job_card.full_time]

A leading AI research organization in San Francisco seeks a Research Engineer to lead the design and implementation of its evaluation platform. You will ensure the safety and effectiveness of AI mod...[show_more]

[last_updated.last_updated_30] • [promoted]

Machine Learning Research Engineer - Training

EPM Scientific • San Francisco, CA, US

[job_card.full_time]

Machine Learning Research Engineer - Training.Our team is partnered with a frontier small molecule discovery team based in San Francisco and New York, led by world-class researchers and engineers a...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Senior Analytics Engineer

Zipline • South San Francisco, CA, US

[job_card.full_time]

Do you want to change the world? Zipline uses drones to deliver critical and lifesaving medicine to thousands of hospitals serving millions of people in multiple countries.Our mission is to provide...[show_more]

[last_updated.last_updated_30] • [promoted]