Talent.com
AI Evaluation Engineer
AI Evaluation EngineerApex Systems • San Francisco, CA
AI Evaluation Engineer

AI Evaluation Engineer

Apex Systems • San Francisco, CA
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.temporary]
  • [filters_job_card.quick_apply]
[job_card.job_description]

Job# : 3018441

Job Description :

Remote – Working PST schedule

Contract : 6 months + extension opportunity

We are looking for engineers to join us on a 6-month contract (with the possibility of extension) our Engineering Team. The primary work is split between engineering work to port external benchmarks to run on internal infrastructure and developing novel model evaluations. You should be comfortable with fast execution speed, high velocity learning, and engineering work with clear documentation and sharp debugging.

Responsibilities

  • Porting new external benchmarks to the teamʼs internal infrastructure so they can be run as part of their evaluation stack for new model releases.
  • Keeping up to date with new evals and benchmarks, pitching the team on porting newly released evals.
  • Performing rigorous quality control for new and existing evals.
  • Implementing novel evaluations to measure dangerous capabilities and safety of frontier models.

Requirements

  • Strong Python coding experience and writing clean code fast.
  • Working in a small team on a large, shared codebase.
  • Experience designing and building model evaluations.
  • Detail-oriented, with tenacity to dig through transcripts to identify and resolve issues.
  • Ability to quickly and independently learn new skills and frameworks.
  • Team player with strong communication skills.
  • In addition, it would be advantageous if you have

  • Demonstrated research experience in the evals space.
  • Experience with agentic evaluations and working with Docker.
  • Apex Benefits Overview : Apex offers a range of supplemental benefits, including medical, dental, vision, life, disability, and other insurance plans that offer an optional layer of financial protection. We offer an ESPP (employee stock purchase program) and a 401K program which allows you to contribute typically within 30 days of starting, with a company match after 12 months of tenure. Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts. In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses / books / seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Apex has a dedicated customer service team for our Consultants that can address questions around benefits and other resources, as well as a certified Career Coach. You can access a full list of our benefits, programs, support teams and resources within our ‘Welcome Packet’ as well, which an Apex team member can provide.

    EEO Employer

    Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at  or 844-463-6178.

    Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico.

    Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico. Apex uses a virtual recruiter as part of the application process. Click here for more details.

    Apex Benefits Overview :   Apex offers a range of supplemental benefits, including medical, dental, vision, life, disability, and other insurance plans that offer an optional layer of financial protection. We offer an ESPP (employee stock purchase program) and a 401K program which allows you to contribute typically within 30 days of starting, with a company match after 12 months of tenure. Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts. In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses / books / seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Apex has a dedicated customer service team for our Consultants that can address questions around benefits and other resources, as well as a certified Career Coach. You can access a full list of our benefits, programs, support teams and resources within our ‘Welcome Packet’ as well, which an Apex team member can provide.

    [job_alerts.create_a_job]

    AI Evaluation Engineer • San Francisco, CA

    [internal_linking.similar_jobs]
    AI Engineer

    AI Engineer

    The Mortgage Office (Applied Business Software Inc.,) • San Mateo, CA, US
    [job_card.full_time]
    The Mortgage Office (TMO) is the leading B2B fintech platform serving the private lending industry.Our software helps private lenders, fund managers, municipalities, and non-profits originate and s...[show_more]
    [last_updated.last_updated_30] • [promoted]
    AI Evaluation Engineer - Enterprise GenAI Systems

    AI Evaluation Engineer - Enterprise GenAI Systems

    Scale AI • San Francisco, CA, United States
    [job_card.full_time]
    A leading technology company in San Francisco is seeking an AI Research Engineer to join their Enterprise Evaluations team. In this pivotal role, you will contribute to the industry's premier GenAI ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior AI Evaluation Engineer

    Senior AI Evaluation Engineer

    Sentry • San Francisco, CA, United States
    [job_card.full_time]
    A software monitoring firm is seeking a Senior Software Engineer to join its AI / ML team in San Francisco.In this role, you will design evaluation frameworks to measure AI system performance, develo...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Education Engineer for AI Agents & Evaluation

    Education Engineer for AI Agents & Evaluation

    LangChain • San Francisco, CA, United States
    [job_card.full_time]
    A leading AI technology company in San Francisco is hiring an Education Engineer to create and deliver educational content on agentic AI using LangSmith. The role requires a technical background in ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Research Engineer, Model Evaluations

    Research Engineer, Model Evaluations

    Menlo Ventures • San Francisco, CA, United States
    [job_card.full_time]
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Applied AI Engineer – Generative AI

    Applied AI Engineer – Generative AI

    Kodiak • San Francisco, CA, United States
    [job_card.full_time]
    The company has developed an artificial intelligence (AI) powered technology stack purpose-built for commercial trucking and the public sector. The company delivers freight daily for its customers a...[show_more]
    [last_updated.last_updated_30] • [promoted]
    GenAI Evaluations Engineer — Build Trusted AI Systems

    GenAI Evaluations Engineer — Build Trusted AI Systems

    Apple Inc. • San Francisco, CA, United States
    [job_card.full_time]
    A leading technology company in San Francisco is seeking a driven Software Engineer to join its Generative AI Evaluations team. The role involves designing evaluation frameworks, collaborating close...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Remote Rust Engineer for AI Data & Evaluation

    Remote Rust Engineer for AI Data & Evaluation

    Labelbox • San Francisco, CA, United States
    [filters.remote]
    [job_card.full_time]
    A leading technology firm is seeking a Rust Developer to design and optimize high-performance systems supporting AI models. The ideal candidate has over 5 years of experience in production Rust prog...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Applied AI Engineer

    Applied AI Engineer

    Getsafetykit • San Francisco, CA, United States
    [job_card.full_time]
    We’re inventing the future of B2B SaaS with AI agents.We’re betting on language models and we’re betting on scale.You’ll test new models the day they come out and understand their characteristics b...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Applied AI Engineer

    Applied AI Engineer

    Scoutbee GmbH • San Francisco, CA, United States
    [job_card.full_time]
    At Scoutbee, we've built the world's most sophisticated AI-powered supplier intelligence platform through rigorous innovation and engineering excellence. Now, as we join forces with Coupa, we’re mai...[show_more]
    [last_updated.last_updated_30] • [promoted]
    AI Research Engineer, Enterprise Evaluations

    AI Research Engineer, Enterprise Evaluations

    Scale AI, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Scale AI is seeking a technically rigorous and driven.This high-impact role is critical to our mission of delivering the industry's leading. You will be a hands-on contributor to the core systems th...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Applied Research Engineer - AI & LLM Evaluation

    Applied Research Engineer - AI & LLM Evaluation

    Mercor • San Francisco, CA, United States
    [job_card.full_time]
    An innovative AI company in San Francisco is seeking a Research Engineer to contribute to the advancement of AI models.The role involves working on post-training and evaluation tasks, designing exp...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Research Engineer : AI Systems & LLM Evaluation + Equity

    Research Engineer : AI Systems & LLM Evaluation + Equity

    Mercor, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    A cutting-edge technology company in San Francisco is seeking a Research Engineer.The role involves working on post-training and RLVR, designing experiments, and improving large language models.Ide...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Applied AI Research Engineer — RAG & Evaluation

    Applied AI Research Engineer — RAG & Evaluation

    Drata • San Francisco, CA, United States
    [job_card.full_time]
    A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentati...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Education Engineer for AI Agents & Evaluation

    Education Engineer for AI Agents & Evaluation

    Langchain • San Francisco, CA, United States
    [job_card.full_time]
    A leading AI education platform is seeking an experienced Education Engineer to create educational content on applied AI concepts. This hybrid role requires collaboration with engineers to develop l...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Lead Research Engineer, Model Evaluations Platform

    Lead Research Engineer, Model Evaluations Platform

    Anthropic • San Francisco, CA, United States
    [job_card.full_time]
    A leading AI research organization in San Francisco seeks a Research Engineer to lead the design and implementation of its evaluation platform. You will ensure the safety and effectiveness of AI mod...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Research Engineer - Causal AI

    Research Engineer - Causal AI

    Mxv • San Francisco, CA, United States
    [job_card.full_time]
    IC4 Estimated salary commensurate with experience.IC5 Estimated salary commensurate with experience.Market-based : Our formula ensures new hires earn at or above real-time benchmarks.Ownership : Our ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Metrics Engineer-Maps Evaluation

    Metrics Engineer-Maps Evaluation

    Apple • San Francisco, CA, United States
    [job_card.full_time]
    Apple Maps Evaluation team is looking for an experienced metrics engineer to join our Data Insights team.Our team supports Apple Maps wide metrics that provide a holistic view of Apple Maps product...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]