Talent.com
Evaluation Scenario Writer - AI Agent Testing Specialist
Evaluation Scenario Writer - AI Agent Testing SpecialistMindrift • WI, US
Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift • WI, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.part_time]
  • [filters.remote]
  • [filters_job_card.quick_apply]
[job_card.job_description]

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.

At Mindrift , innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.

What we do

The Mindrift platform, launched and powered by Toloka , connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.

About the Role

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically :

  • Designing structured test scenarios based on real-world tasks.
  • Defining the golden path and acceptable agent behavior.
  • Annotating task steps, expected outputs, and edge cases.
  • Working with devs to test your scenarios and improve clarity.
  • Reviewing agent outputs and adapting tests accordingly

How to get started

Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • Bachelor's and / or Master’s Degreein Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
  • Background in QA, software testing, data analysis, or NLP annotation.
  • Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).
  • Strong written communication skills in English.
  • Comfortable with structured formats like JSON / YAML for scenario description.
  • Can define expected agent behaviors (gold paths) and scoring logic.
  • Basic experience with Python and JS.
  • Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.
  • You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines.
  • Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.
  • Nice to Have

  • Experience in writing manual or automated test cases.
  • Familiarity with LLM capabilities and typical failure modes.
  • Understanding of scoring metrics (precision, recall, coverage, reward functions).
  • Benefits

    Contribute on your own schedule, from anywhere in the world. This opportunity allows you to :

  • Get paid for your expertise, with  rates that can go up to $60 / hour  depending on your skills, experience, and project needs.
  • Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
  • Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
  • Influence how future AI models understand and communicate in your field of expertise.
  • [job_alerts.create_a_job]

    Evaluation Testing • WI, US

    [internal_linking.related_jobs]
    Remote M&A Associate - AI Trainer ($50-$60 / hour)

    Remote M&A Associate - AI Trainer ($50-$60 / hour)

    Data Annotation • Oshkosh, Wisconsin
    [filters.remote]
    [job_card.full_time] +1
    We are looking for a finance professional to join our team to train AI models.You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Work-From-Home Online Product Tester - $45 per hour

    Work-From-Home Online Product Tester - $45 per hour

    Online Consumer Panels America • Wisconsin, US
    [filters.remote]
    [job_card.part_time] +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Travel Cath Lab Tech - $3,016 per week in Wisconsin

    Travel Cath Lab Tech - $3,016 per week in Wisconsin

    AlliedTravelCareers • All Cities, WI, US
    [job_card.full_time]
    AlliedTravelCareers is working with talent4health to find a qualified Cath Lab Tech in Wisconsin!.Talent4Health is the most candidate-centric agency in the industry. Talent4Health has client relatio...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Insurance Claims Environmental

    Insurance Claims Environmental

    Diedre Moire Corp. • Oshkosh, WI, US
    [job_card.full_time]
    Long Tail & Latent Claims Examiner - Oshkosh, WI Insurance Claims Specialist Adjuster Examiner Analyst Attorney Environmental Toxic Tort Asbestos Pollution Health Hazard _.REMOTE WORK FROM HOME AVA...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Immunology Specialist

    Immunology Specialist

    Syneos Health / inVentiv Health Commercial LLC • Oshkosh, WI, United States
    [job_card.full_time]
    You have what it takes : a competitive drive coupled with exceptional sales ability.In this role, you will be responsible for implementing the sales plan by delivering proficient sales presentations...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Reliability Engineer

    Senior Reliability Engineer

    Axelon Services Corporation • Oshkosh, WI, US
    [job_card.full_time]
    Benefits Included : Healthcare plans, PTO package, 5% annual bonus target.Relocation offered? Not at this time lets target local talent. Must be onsite : 2855 S Oakwood Road, Oshkosh, WI 54904.Travel ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Supplier Quality Engineer

    Supplier Quality Engineer

    Genesis10 • Oshkosh, WI, US
    [job_card.permanent] +1
    Genesis10 is seeking a Quality Engineer (AQE) for a 6-month contract position with a client located in Milwaukee, WI.Summary : The Quality Engineer is responsible for supporting and facilitating the...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Project Engineer

    Project Engineer

    Metalcraft of Mayville Careers • Mayville, Wisconsin, US
    [job_card.full_time]
    Metalcraft of Mayville is an Equal Opportunity Employer : Minorities / females / veterans / individuals with disabilities / sexual orientation / gender identity. Benefits of working for Metalcraft : .Reporting t...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    AQE Quality Engineer

    AQE Quality Engineer

    Axelon Services Corporation • Oshkosh, WI, US
    [job_card.full_time]
    The Quality Engineer is responsible for support and facilitation of the quality system in all company processes to assure ongoing improvements in quality performance and customer satisfaction.This ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Remote FinTech Product Analyst - AI Trainer ($50-$60 / hour)

    Remote FinTech Product Analyst - AI Trainer ($50-$60 / hour)

    Data Annotation • Oshkosh, Wisconsin
    [filters.remote]
    [job_card.full_time] +1
    We are looking for a finance professional to join our team to train AI models.You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Delivery Driver - Sign Up and Start Earning

    Delivery Driver - Sign Up and Start Earning

    DoorDash • Waupun, WI, United States
    [job_card.full_time] +1
    DoorDash is the #1 category leader in food delivery, food pickup, and convenience store delivery in the US, trusted by millions of customers every day. As a Dasher, you’ll stay busy with a variety o...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Financial Specialist

    Financial Specialist

    Dodge County • Juneau, WI, United States
    [job_card.full_time]
    Position Open Until Filled - Application Review Date : .Please submit Resume and Cover Letter with application.Monday - Friday : 8 : 00am - 4 : 30pm with alternative work arrangements may be available aft...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Remote Product Tester - $25-45 per hour

    Remote Product Tester - $25-45 per hour

    Online Consumer Panels America • Wisconsin, US
    [filters.remote]
    [job_card.part_time] +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...[show_more]
    [last_updated.last_updated_30] • [promoted]
    AI Agent Evaluation Analyst (Freelance)

    AI Agent Evaluation Analyst (Freelance)

    Mindrift • WI, US
    [filters.remote]
    [job_card.part_time] +1
    [filters_job_card.quick_apply]
    This opportunity is only for candidates currently residing in the specified country.Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of En...[show_more]
    [last_updated.last_updated_variable_days]
    Dashers - Sign Up and Start Earning

    Dashers - Sign Up and Start Earning

    DoorDash • Wisconsin
    [job_card.full_time] +1
    DoorDash is the #1 category leader in food delivery, food pickup, and convenience store delivery in the US, trusted by millions of customers every day. As a Dasher, you’ll stay busy with a variety o...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Test Engineer II

    Test Engineer II

    Axelon Services Corporation • Oshkosh, WI, US
    [job_card.full_time]
    Primarily onsite with occasional work from home as needed.The position requires the person to be in the shop / office to work on the trucks, so WFH is limited due to the nature of our work.Percentage...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Remote Financial Advising Expert - AI Trainer ($50-$60 / hour)

    Remote Financial Advising Expert - AI Trainer ($50-$60 / hour)

    Data Annotation • Oshkosh, Wisconsin
    [filters.remote]
    [job_card.full_time] +1
    We are looking for a finance professional to join our team to train AI models.You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Life Enrichment Assistant I-Behavioral Health

    Life Enrichment Assistant I-Behavioral Health

    Dodge County • Juneau, WI, United States
    [job_card.part_time]
    Friday before and Monday after off.Dodge County offers a generous benefits package including : .Paid Time Off (PTO) - available for use after 30 days of employment. Health, Dental, Vision Insurance.He...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]