The Role
This role is responsible for ensuring the reliability, accuracy, and safety of our Veeva AI Agents through rigorous evaluation and systematic validation methodologies. We're looking for experienced candidates with :
1. A meticulous, critical, and curious mindset with a dedication to product quality in a rapidly evolving technological domain2. Exceptional analytical and systematic problem-solving capabilities3. Excellent ability to communicate technical findings to both engineering and product management audiences4. Ability to learn application areas quickly
Thrive in our Work Anywhere environment : We support your flexibility to work remotely or in the office within Canada or the US, ensuring seamless collaboration within your product team's time zone.Join us and be part of a mission-driven organization transforming the life sciences industry.
What You'll Do
- Evaluation Strategy & Planning : Define and establish comprehensive evaluation strategies for new AI Agents. Prioritize the integrity and coverage of test data sets to reflect real-world usage and potential failure modes
- LLM Output Integrity Assessment : Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics (e.g., factual accuracy, contextual relevance, coherence, and safety standards)
- Creating High-Fidelity Datasets : Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios. Evaluate LLM outputs to proactively identify system biases, unsafe content, hallucinations, and critical edge cases
- Automation of Evaluation Pipelines : Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior and prevent regressions with new features and model updates
- Root Cause Analysis : Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations
- Reporting & Performance Metrics : Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams
Requirements
Data Integrity & Validation : A strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards. Ability to craft diverse and adversarial test data to uncover AI edge casesPrompt Engineering & Model Expertise : Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions. Deep knowledge of LLM common failure modes (hallucination, incoherence, jailbreaking)Automated Evaluation Implementation : Proficiency in designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors. Familiarity with quality metrics such as task success rate, semantic similarity, and sentiment analysis for output measurementDebugging Agentic Systems : Must be comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure pointsProgramming & Frameworks : Proficiency in Python for developing custom evaluation frameworks, writing scripts, and integrating pipelines with CI / CD systems. Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools)Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMsHigh work ethic. Veeva is a hard-working companyHigh integrity and honesty. Veeva is a PBC and a “do the right thing” company. We expect that from all employeesApplicants must have the unrestricted right to work in the United States or Canada. Veeva will not provide sponsorship at this timeLearn More
Perks & Benefits
Medical, dental, vision, and basic life insuranceFlexible PTO and company paid holidaysRetirement programs1% charitable giving programCompensation
Base pay : $85,000 - $225,000The salary range listed here has been provided to comply with local regulations and represents a potential base salary range for this role. Please note that actual salaries may vary within the range above or below, depending on experience and location. We look at compensation for each individual and base our offer on your unique qualifications, experience, and expected contributions. This position may also be eligible for other types of compensation in addition to base salary, such as variable bonus and / or stock bonus.