Talent.com
Data Engineer, Scientific Data Ingestion
Data Engineer, Scientific Data IngestionMithrl • San Francisco, CA, US
Data Engineer, Scientific Data Ingestion

Data Engineer, Scientific Data Ingestion

Mithrl • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

ABOUT MITHRL

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports.

Our traction speaks for itself :

12X year-over-year revenue growth

Trusted by leading biotechs and big pharma across three continents

Driving real breakthroughs from target discovery to patient outcomes.

WHAT YOU WILL DO

Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel / CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think : units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).

Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data — extracting metadata, inferring column roles / types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion — so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data.

Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have

5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.

Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).

Excellent experience dealing with messy Excel / CSV / spreadsheet-style data — inconsistent headers, multiple sheets, mixed formats, free-text fields — and normalizing it into clean structures.

Comfort designing and maintaining robust ETL / ELT pipelines, ideally for scientific or lab-derived data.

Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.

Strong desire and ability to own the ingestion & normalization layer end-to-end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability.

Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.

Nice-to-have

Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).

Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.

Experience with cloud infrastructure and data storage (AWS S3, data lakes / warehouses, database schemas) to support multi-tenant ingestion.

Past exposure to LLM-based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically.

Any background in computational biology / lab-data / bioinformatics is a bonus — though not required.

WHAT YOU WILL LOVE AT MITHRL

Mission-driven impact : you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.

High ownership & autonomy : this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents.

Team : Join a tight-knit, talent-dense team of engineers, scientists, and builders

Culture : We value consistency, clarity, and hard work. We solve hard problems through focused daily execution

Speed : We ship fast (2x / week) and improve continuously based on real user feedback

Location : Beautiful SF office with a high-energy, in-person culture

Benefits : Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Compensation Range : $150K - $200K

[job_alerts.create_a_job]

Data Engineer • San Francisco, CA, US

[internal_linking.similar_jobs]
Senior Data Engineer

Senior Data Engineer

Gallup • San Francisco, California, United States
[job_card.full_time]
Engineer data systems that change how people live and work.As a senior data engineer at Gallup, you’ll play a key role in designing, developing and optimizing the data systems that underpin our fla...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Plum Inc • San Francisco, California, United States
[filters.remote]
[job_card.full_time]
PLUM is a fintech company empowering financial institutions to grow their business through a cutting-edge suite of AI-driven software, purpose-built for lenders and their partners across the financ...[show_more]
[last_updated.last_updated_30] • [promoted]
Data Engineer

Data Engineer

Kikoff • San Francisco, California, United States
[job_card.full_time]
We are looking for a Data Engineer or Analytics Engineer to join our Data team.You will collaborate with the data scientist and engineers to design, build, and scale high-leverage data models, foun...[show_more]
[last_updated.last_updated_30] • [promoted]
Catastrophe Data Engineer

Catastrophe Data Engineer

Pear Vc • San Francisco, California, United States
[filters.remote]
[job_card.full_time]
Rising disasters—from earthquakes to wildfires—are destabilizing the property insurance.Founded by Stanford PhDs and backed by a $4M seed round led by LDV Capital, we fuse.AI to expose critical vul...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Alembic • San Francisco, California, United States
[job_card.full_time]
Alembic is where top engineers are solving marketing's hardest problem : proving what actually works.If you're looking for frontier technical challenges at an applied science company, this is the pl...[show_more]
[last_updated.last_updated_30] • [promoted]
Data Engineer II

Data Engineer II

VirtualVocations • Oakland, California, United States
[job_card.full_time]
A company is looking for a Data Engineer II (Data Stage).Key Responsibilities Lead system enhancement efforts to improve department systems, processes, and applications Provide technical support...[show_more]
[last_updated.last_updated_30] • [promoted]
Data Engineer

Data Engineer

Duckbill • San Francisco, California, United States
[job_card.full_time]
We are developing a SaaS product that simplifies financial planning and analysis of cloud billing data for large enterprises with complex cloud spending requirements. We're looking for a data engine...[show_more]
[last_updated.last_updated_30] • [promoted]
AI Incubator - Data Engineer

AI Incubator - Data Engineer

Sprinter Health • Menlo Park, California, United States
[job_card.full_time]
At Sprinter Health, our mission is reimagining how people access care by bringing it directly to their homes.Nearly 30% of patients in the U. For many, the ER becomes their first touchpoint with the...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Xscion • San Francisco, California, United States
[filters.remote]
[job_card.full_time]
As an employee, you Turn Change Into Value® - for our clients, for our company, for your professional growth, for the consumers. We hire the best and brightest, who are driven to create lasting valu...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Vumedi • Oakland, California, United States
[job_card.full_time]
Vumedi is the largest video education platform for doctors worldwide, dedicated to advancing medical education through innovative video-based learning. Our mission is to empower healthcare professio...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Data Engineer - Platform 1 (P1)

Data Engineer - Platform 1 (P1)

Zipline • South San Francisco, California, United States
[job_card.full_time]
Do you want to change the world? Zipline is on a mission to transform the way goods move.Our aim is to solve the world’s most urgent and complex access challenges by building, manufacturing and ope...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Together Ai • San Francisco, California, United States
[job_card.full_time]
Together AI is looking for a Senior Data Engineer to help define, build, and operate the data infrastructure that handles millions of events every day to power Together’s mission-critical systems.A...[show_more]
[last_updated.last_updated_30] • [promoted]
Sr. Data Engineer

Sr. Data Engineer

Visa • Foster City, California, United States
[job_card.full_time]
Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...[show_more]
[last_updated.last_updated_30] • [promoted]
Principal Data Scientist - Engine Systems

Principal Data Scientist - Engine Systems

Roblox • San Mateo, California, United States
[job_card.full_time]
The Data Science & Analytics organization’s mission is to increase our speed, frequency and acumen of making decisions at scale by instilling a data-influenced approach to building products.We cove...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Gofundme • San Francisco, California, United States
[job_card.full_time]
Want to help us help others? We’re hiring! .GoFundMe is the world’s most powerful community for good, dedicated to helping people help each other. By uniting individuals and nonprofits in one place,...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Data Engineer II

Data Engineer II

Axon • San Francisco, California, United States
[job_card.full_time]
Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof...[show_more]
[last_updated.last_updated_30] • [promoted]
Data Engineer

Data Engineer

West Monroe • San Francisco, California, United States
[job_card.full_time]
Are you ready to make an impact?.West Monroe is seeking a talented Data Engineer to join our Data Engineering & Analytics team. In this role, you will collaborate with our clients to address their m...[show_more]
[last_updated.last_updated_30] • [promoted]
Lead Data Engineer

Lead Data Engineer

Nuna • San Francisco, California, United States
[job_card.full_time]
At Nuna, our mission is to make high-quality healthcare affordable for everyone.We are dedicated to tackling one of our nation’s biggest problems with ingenuity, creativity, and a keen moral compas...[show_more]
[last_updated.last_updated_variable_days] • [promoted]