Talent.com
Freelance Agent Evaluation Engineer
Freelance Agent Evaluation EngineerMindrift • New York, NY, US
Freelance Agent Evaluation Engineer

Freelance Agent Evaluation Engineer

Mindrift • New York, NY, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.part_time]
  • [job_card.permanent]
  • [filters.remote]
  • [filters_job_card.quick_apply]
[job_card.job_description]

Please submit your CV in English and indicate your level of English proficiency.

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

What this opportunity involves

While each project involves unique tasks, contributors may :

  • Create structured test cases that simulate complex human workflows
  • Define gold-standard behavior and scoring logic to evaluate agent actions
  • Analyze agent logs, failure modes, and decision paths
  • Work with code repositories and test frameworks to validate your scenarios
  • Iterate on prompts, instructions, and test cases to improve clarity and difficulty
  • Ensure that scenarios are production-ready, easy to run, and reusable

What we look for

This opportunity is a good fit for software engineers, open to part-time, non-permanent projects. Ideally, contributors will have :

  • 3+ of software development experience with strong Python focus
  • Experience with Git and code repositories
  • Comfortable with structured formats like JSON / YAML for scenario description
  • Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
  • Familiarity with Docker
  • English proficiency - B2
  • How it works

    Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

    Project time expectations

    Tasks for this project are estimated to take 6-10 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

    Payment

  • Paid contributions, with rates up to $80 / hour
  • Fixed project rate or individual rates, depending on the project
  • Some projects include incentive payments
  • Note : Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
  • [job_alerts.create_a_job]

    Freelance Agent Evaluation Engineer • New York, NY, US

    [internal_linking.similar_jobs]
    AI Deployment Engineer

    AI Deployment Engineer

    VirtualVocations • Yonkers, New York, United States
    [job_card.full_time]
    A company is looking for an AI Forward Deployed Engineer.Key Responsibilities Identify and refine use cases based on implementation insights Create demos and proof of concepts to showcase platfo...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Project Engineer

    Project Engineer

    Skill • New York, NY, United States
    [job_card.temporary]
    Aquent is partnering with a leading energy company that is at the forefront of powering communities and advancing critical infrastructure. This organization is dedicated to ensuring reliable and sus...[show_more]
    [last_updated.last_updated_less] • [promoted] • [new]
    Applied AI Engineer

    Applied AI Engineer

    Columbia University • New York, NY, United States
    [job_card.full_time]
    Job Type : Officer of Administration.Salary Range : $130,000-$150,000.The salary of the finalist selected for this role will be set based on a variety of factors, including but not limited to departm...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Forward-Deployed Engineer - Polymarket US

    Forward-Deployed Engineer - Polymarket US

    Polymarket • New York, NY, United States
    [filters.remote]
    [job_card.full_time]
    Act as the technical bridge between Polymarket’s exchange and its institutional counterparties.You’ll own the integrations that connect FCMs, liquidity providers, and ISVs to Polymarket’s regulated...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Founding Engineer

    Founding Engineer

    Wand • New York, New York, United States
    [job_card.full_time]
    We're building AI-powered software that solves critical back-office workflow challenges for restaurants, starting with allergen menu compliance. Our mission : eliminate tedious manual back office tas...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Flexible

    Flexible

    HomeJobFinder • Red Bank, NJ
    [filters.remote]
    [job_card.full_time]
    Flexible Job Opportunity : Link Posting Marketing Rep (No Experience Needed!) • •Are you driven, dependable, and ready to build income from ANYWHERE? • • We’re searching for motivated ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI

    Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI

    Scale AI, Inc. • New York, NY, United States
    [job_card.full_time]
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Fragrance Evaluator

    Fragrance Evaluator

    International Flavors and Fragrances • Hazlet, NJ, United States
    [job_card.full_time]
    Are you ready to take your skills to the next level and make a real impact?.IFF is a global leader in flavors, fragrances, food ingredients and health & biosciences. We deliver sustainable innovatio...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    CT Scan Technologist

    CT Scan Technologist

    Careers Integrated Resources Inc • Neptune City, NJ, US
    [job_card.full_time]
    Job Title : CT Scan Technologist.Duration : 3 Months+ assignment (Possibility of extension).Shift Time : Sunday, Monday, Tuesday 7 : 30 PM - 8 : 00 AM 3 x 12. Weekend Requirement : Every Sunday.Holiday Requ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr Software Engineer - Full Stack (React / Java)

    Sr Software Engineer - Full Stack (React / Java)

    Ukg • Bradley Gardens, New Jersey, United States
    [job_card.full_time]
    Sr Software Engineer - Full Stack (React / Java).At UKG, the work you do matters.The code you ship, the decisions you make, and the care you show a customer all add up to real impact.Today, tens of m...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Engineer, MACD (Remote / Third Shift)

    Engineer, MACD (Remote / Third Shift)

    Presidio Networked Solutions, LLC • New York, NY, United States
    [filters.remote]
    [job_card.full_time]
    Presidio, Where Teamwork and Innovation Shape the Future.AtPresidio, we're at the forefront of a global technology revolution, transforming industries throughcutting-edge digital solutions and next...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Licensed Refrig Engr

    Licensed Refrig Engr

    Columbia University • New York, NY, United States
    [job_card.full_time]
    Job Type : Support Staff - Union.Standard Work Schedule : Monday 11PM -7AM Thursday & Friday 3-11 PM Saturday & Sunday 7AM -3PM. The salary of the finalist selected for this role will be set based on ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Forward Deployed Engineer

    Forward Deployed Engineer

    Nira Energy • New York, New York, United States
    [job_card.full_time]
    Nira's mission is to help convert the US power grid to be 100% fossil-free.Nira is a software platform that helps renewable energy developers find the cheapest points on the grid to connect.We focu...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Solutions Engineer

    Solutions Engineer

    Gumgum • New York, New York, United States
    [job_card.full_time]
    GumGum is the contextual-first technology leader transforming digital advertising with AI-powered, non-invasive data and media solutions. We champion effective advertising that uplifts and respects ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Forward Deployed Engineer

    Forward Deployed Engineer

    VirtualVocations • Yonkers, New York, United States
    [job_card.full_time]
    A company is looking for a Forward Deployed Engineer to drive customer engagement and solve complex business problems.Key Responsibilities Operate as a dedicated resource within customer teams to...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Remote Side Hustle Evaluator - Flexible Online Gig Work

    Remote Side Hustle Evaluator - Flexible Online Gig Work

    Finance Buzz • Leonardo, New Jersey, US
    [filters.remote]
    [job_card.temporary]
    Are you looking to earn extra income from the comfort of your home? We're seeking motivated individuals to explore and test a variety of remote side hustle opportunities featured on FinanceBuzz.Thi...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Associate Value Engineer (Management Consulting for AI Implementation) - Orbit Program

    Associate Value Engineer (Management Consulting for AI Implementation) - Orbit Program

    Celonis • New York, New York, United States
    [job_card.full_time]
    We're Celonis, the global leader in Process Intelligence technology and one of the world's fastest-growing SaaS firms.We believe there is a massive opportunity to unlock productivity by placing AI,...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Gen AI Developers ( W2 Only - No Vendors ). 100 % Remote.

    Sr. Gen AI Developers ( W2 Only - No Vendors ). 100 % Remote.

    Iris Software Inc. • Clifton, NJ, US
    [filters.remote]
    [job_card.temporary]
    Our Client which is a large Firm is urgently looking to hire a Sr.Lead GEN AI Developers - Engineers.No Third Party resumes from Vendors ( No C2C or 1099 ). Python, Fast API, Backend Libraries , ( A...[show_more]
    [last_updated.last_updated_1_day] • [promoted]