Talent.com
Data Engineer, Scientific Data Ingestion
Data Engineer, Scientific Data IngestionMithrl • San Francisco, CA, US
Data Engineer, Scientific Data Ingestion

Data Engineer, Scientific Data Ingestion

Mithrl • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

ABOUT MITHRL

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports.

Our traction speaks for itself :

12X year-over-year revenue growth

Trusted by leading biotechs and big pharma across three continents

Driving real breakthroughs from target discovery to patient outcomes.

WHAT YOU WILL DO

Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel / CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think : units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).

Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data — extracting metadata, inferring column roles / types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion — so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data.

Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have

5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.

Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).

Excellent experience dealing with messy Excel / CSV / spreadsheet-style data — inconsistent headers, multiple sheets, mixed formats, free-text fields — and normalizing it into clean structures.

Comfort designing and maintaining robust ETL / ELT pipelines, ideally for scientific or lab-derived data.

Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.

Strong desire and ability to own the ingestion & normalization layer end-to-end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability.

Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.

Nice-to-have

Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).

Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.

Experience with cloud infrastructure and data storage (AWS S3, data lakes / warehouses, database schemas) to support multi-tenant ingestion.

Past exposure to LLM-based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically.

Any background in computational biology / lab-data / bioinformatics is a bonus — though not required.

WHAT YOU WILL LOVE AT MITHRL

Mission-driven impact : you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.

High ownership & autonomy : this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents.

Team : Join a tight-knit, talent-dense team of engineers, scientists, and builders

Culture : We value consistency, clarity, and hard work. We solve hard problems through focused daily execution

Speed : We ship fast (2x / week) and improve continuously based on real user feedback

Location : Beautiful SF office with a high-energy, in-person culture

Benefits : Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Compensation Range : $150K - $200K

[job_alerts.create_a_job]

Data Engineer • San Francisco, CA, US

[internal_linking.similar_jobs]
Data Engineer

Data Engineer

Lancesoft INC • San Bruno, CA, US
[job_card.full_time]
Hybrid — minimum 3 days onsite (typically Tue–Thu).You’ll work across the data stack—designing, building, and maintaining data models and pipelines that power product and an...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Tendo • San Francisco, CA, US
[job_card.full_time]
As a Senior Data Engineer, you will work within the Engineering team and contribute to Tendo’s strategic data engineering solutions by ingesting, transforming, and warehousing healthcare-rela...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Platform Engineer – Cloud, Lakehouse & Scale

Senior Data Platform Engineer – Cloud, Lakehouse & Scale

Capital One • San Francisco, CA, United States
[job_card.full_time]
A leading financial services company is seeking a Senior Distinguished Data Engineer to drive the technical strategy for consumption platforms. The role requires deep expertise in AWS, Lakehouse arc...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer

Senior Data Engineer

Crusoe • San Francisco, CA, US
[job_card.full_time]
Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer, Data Lake & Governance

Senior Data Engineer, Data Lake & Governance

Gridware • San Francisco, CA, United States
[job_card.full_time]
Get AI-powered advice on this job and more exclusive features.Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbre...[show_more]
[last_updated.last_updated_30] • [promoted]
Sr. Data Engineer

Sr. Data Engineer

Contact Government Services, LLC • San Francisco, CA, US
[job_card.full_time]
Employment Type : Full-Time, Mid-level.Department : Business Intelligence.CGS is seeking a passionate and driven Data Engineer to support a rapidly growing Data Analytics and Business Intelligence pl...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Platform Data Engineer - Ingestion | Unlimited PTO

Senior Platform Data Engineer - Ingestion | Unlimited PTO

Trunk.io • San Francisco, CA, United States
[job_card.full_time]
A leading software company in San Francisco seeks a Senior Software Engineer to join their Platform / Data Engineering team. You'll be responsible for designing and optimizing data ingestion pipelin...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Platform Engineer – Distributed Data Engine (Remote)

Senior Platform Engineer – Distributed Data Engine (Remote)

Pocus • San Francisco, CA, United States
[filters.remote]
[job_card.full_time]
A dynamic tech startup in San Francisco is seeking a Senior Engineer to join the core platform team.The ideal candidate will build a reliable and extensible distributed data platform using AWS, Kub...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Data Science Engineer

Data Science Engineer

VirtualVocations • Oakland, California, United States
[job_card.full_time]
A company is looking for a Data Science Engineer specializing in Data Operations.Key Responsibilities Design, build, and maintain observability and testing solutions for the Business Data Platfor...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff Data Platform Engineer - Scale Databricks & AWS

Staff Data Platform Engineer - Scale Databricks & AWS

Gemini • San Francisco, CA, United States
[job_card.full_time]
A leading crypto and Web3 platform in San Francisco is seeking a Staff Data Platform Engineer to own and evolve the data warehouse infrastructure. This role requires a minimum of 8 years of experien...[show_more]
[last_updated.last_updated_30] • [promoted]
AI-Driven Scientific Data Ingestion Engineer

AI-Driven Scientific Data Ingestion Engineer

Mithrl Inc. • San Francisco, CA, United States
[job_card.full_time]
A pioneering AI healthcare startup in San Francisco is seeking a Data Engineer to build and own an AI-powered data ingestion and normalization pipeline. The ideal candidate has over 5 years of exper...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer

Senior Data Engineer

RxSense • San Francisco, CA, US
[job_card.full_time]
We are a healthcare technology company that provides platforms and solutions to improve the management and access of cost-effective pharmacy benefits. Our technology helps enterprise and partnership...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Ingestion Platform Engineer — Remote

Senior Data Ingestion Platform Engineer — Remote

LiveRamp • San Francisco, CA, United States
[filters.remote]
[job_card.full_time]
A leading data collaboration platform in San Francisco is seeking an experienced Software Engineer to lead the development of their next-generation data processing platform.The ideal candidate will...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer - Scalable AWS Data Platform

Senior Data Engineer - Scalable AWS Data Platform

Rippling • San Francisco, CA, United States
[job_card.full_time]
A leading HR and IT solutions provider based in San Francisco is looking for an experienced data engineer to drive key projects and develop scalable data infrastructure. Candidates should have over ...[show_more]
[last_updated.last_updated_30] • [promoted]
Medical Data & Analytics Engineer

Medical Data & Analytics Engineer

Genentech • South San Francisco, CA, United States
[job_card.full_time]
It's what drives us to innovate.To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Data Engineer

Principal Data Engineer

Autodesk, Inc. • San Francisco, CA, United States
[job_card.full_time]
Senior Data Engineer page is loaded## Senior Data Engineerlocations : San Francisco, CA, USA : AMER - United States - Washington - Offsite / Home : AMER - United States - California - Offsite / Home...[show_more]
[last_updated.last_updated_30] • [promoted]
Data Engineer II

Data Engineer II

Nimble Robotics • San Francisco, CA, US
[job_card.full_time]
Nimble is an AI robotics company building the autonomous supply chain to enable fast, efficient, and sustainable commerce. We’re developing a general-purpose robot AI and a warehouse gene...[show_more]
[last_updated.last_updated_30] • [promoted]
Research Engineer, Data Ingestion

Research Engineer, Data Ingestion

Anthropic • San Francisco, CA, United States
[job_card.full_time]
Anthropic's mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]