Talent.com
Data Engineer, Scientific Data Ingestion
Data Engineer, Scientific Data IngestionMithrl Inc. • San Francisco, CA, United States
Data Engineer, Scientific Data Ingestion

Data Engineer, Scientific Data Ingestion

Mithrl Inc. • San Francisco, CA, United States
[job_card.variable_hours_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

About Mithrl

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the worlds first commercially available AI Co-Scientista discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent?ready reports.

Our traction speaks for itself :

12X year?over?year revenue growth

Trusted by leading biotechs and big pharma across three continents

Driving real breakthroughs from target discovery to patient outcomes.

What you will do

Build and own an AI?powered ingestion & normalization pipeline to import data from a wide variety of sources unprocessed Excel / CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think : units normalization, metadata standardization, variable?name harmonization, vendor?instrument quirks, plate?reader formats, reference?genome or annotation updates, batch?effect correction, etc.).

Use LLM?driven and classical data?engineering tools to structure semi?structured or messy tabular data extracting metadata, inferring column roles / types, cleaning free?text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch?correction) execute during ingestion so downstream analytics / the AI Co?Scientist always works with clean, canonical data.

Build validation, verification, and quality?control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

What you bring

Must?have

5+ years of experience in data engineering / data wrangling with real?world tabular or semi?structured data.

Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).

Excellent experience dealing with messy Excel / CSV / spreadsheet?style data inconsistent headers, multiple sheets, mixed formats, free?text fields and normalizing it into clean structures.

Comfort designing and maintaining robust ETL / ELT pipelines, ideally for scientific or lab?derived data.

Ability to combine classical data engineering with LLM?powered data normalization / metadata extraction / cleaning.

Strong desire and ability to own the ingestion & normalization layer end?to?end from raw upload ? final clean dataset with an eye for maintainability, reproducibility, and scalability.

Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real?world messy data problems into robust engineering solutions.

Nice?to?have

Familiarity with scientific data types and modalities (e.g. plate?readers, genomics metadata, time?series, batch?info, instrumentation outputs).

Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.

Experience with cloud infrastructure and data storage (AWS S3, data lakes / warehouses, database schemas) to support multi?tenant ingestion.

Past exposure to LLM?based data transformation or cleansing agents building or integrating tools that clean or structure messy data automatically.

Any background in computational biology / lab?data / bioinformatics is a bonus though not required.

What you will love at Mithrl

Mission?driven impact : youll be the gatekeeper of data quality ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis?ready. Youll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.

High ownership & autonomy : this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. Youll work closely with our product, data science, and infrastructure teams shaping how data is ingested, stored, and exposed to end users or AI agents.

Team : Join a tight?knit, talent?dense team of engineers, scientists, and builders

Culture : We value consistency, clarity, and hard work. We solve hard problems through focused daily execution

Speed : We ship fast (2x / week) and improve continuously based on real user feedback

Location : Beautiful SF office with a high?energy, in?person culture

Benefits : Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top?tier plans

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

#J-18808-Ljbffr

[job_alerts.create_a_job]

Data Engineer Scientific Data Ingestion • San Francisco, CA, United States

[internal_linking.similar_jobs]
Petabyte-Scale Bio Data Engineer

Petabyte-Scale Bio Data Engineer

Prima Mente • San Francisco, CA, United States
[job_card.full_time]
A pioneering biotech firm seeks a Data Engineer to own and scale its biological data infrastructure for petabyte-scale multi-omic datasets. You will be responsible for designing and implementing var...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Data Engineer, Scalable Healthcare Data Platforms

Senior Data Engineer, Scalable Healthcare Data Platforms

GRAIL • Menlo Park, CA, United States
[job_card.full_time]
A healthcare technology firm in Menlo Park is seeking a Data Engineer to build and automate secure data infrastructure.The ideal candidate will have extensive experience with data modeling, DevOps ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Data Engineer

Senior Data Engineer

Kikoff • San Francisco, CA, United States
[job_card.full_time]
Get AI-powered advice on this job and more exclusive features.We are looking for a Data Engineer or Analytics Engineer to join our Data team. You will collaborate with the data scientist and enginee...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Data Engineer

Senior Data Engineer

Kikoff Inc. • San Francisco, CA, United States
[job_card.full_time]
We are looking for a Data Engineer or Analytics Engineer to join our Data team.You will collaborate with the data scientist and engineers to design, build, and scale high-leverage data models, foun...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Data Lead Engineer – SFO,CA – Hybrid

Data Lead Engineer – SFO,CA – Hybrid

LEO DOES IT INC • San Francisco, CA, United States
[job_card.full_time]
The Lead Engineer in the FOE POD is a senior technical leader responsible for architecting, building, and scaling next generation marketing technology solutions. This role blends deep MarTech orches...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer, Data Lake & Governance

Senior Data Engineer, Data Lake & Governance

Gridware • San Francisco, CA, United States
[job_card.full_time]
Get AI-powered advice on this job and more exclusive features.Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbre...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

Cerebras • San Francisco, CA, United States
[job_card.full_time]
We are looking for a Lead Data Engineer to build our Data Engineering function.You will collaborate with the data scientist and engineers to design, build, and scale high-leverage data models, foun...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Data Engineer

Senior Data Engineer

Together AI • San Francisco, CA, United States
[job_card.full_time]
Together AI is looking for a Senior Data Engineer to help define, build, and operate the data infrastructure that handles millions of events every day to power Together’s mission-critical systems.A...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Sr. Distinguished, Data Engineer - Enterprise Data Storage and Consumption Platforms Data Engineer

Sr. Distinguished, Data Engineer - Enterprise Data Storage and Consumption Platforms Data Engineer

Capital One • San Francisco, CA, United States
[job_card.full_time] +1
Distinguished Data Engineers are individual contributors who strive to be diverse in thought so we visualize the problem space. At Capital One, we believe diversity of thought strengthens our abilit...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer, MLOps [Remote-US]

Senior Data Engineer, MLOps [Remote-US]

Get Furniture Jobs • San Francisco, CA, United States
[filters.remote]
[job_card.full_time]
Quanata is on a mission to help ensure a better world through context‑based insurance solutions.We are an exceptional, customer‑centered team with a passion for creating innovative technologies, di...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Data Engineer II, ShipTech Analytics

Data Engineer II, ShipTech Analytics

Amazon • San Francisco, CA, United States
[job_card.full_time]
ShipTech Analytics (STA) is on a mission to revolutionize Amazon's global transportation network through data-driven innovation and artificial intelligence. Our vision is to be the central nervous s...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Data Engineer

Data Engineer

Airtable • San Francisco, CA, United States
[job_card.full_time]
Airtable is the no-code app platform that empowers people closest to the work to accelerate their most critical business processes. More than 500,000 organizations, including 80% of the Fortune 100,...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Data Engineer

Data Engineer

I did my part and supported the Regular Toilet • San Francisco, CA, United States
[job_card.full_time]
Nextdata is hiring a Data Engineer in San Francisco!.This job was posted more than 6 months ago.Find new data scientist, data engineering, and machine learning jobs here. Data Engineer Job at Nextda...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Data Science Engineer

Data Science Engineer

Lawrence Berkeley National Laboratory • Berkeley, California, United States
[job_card.full_time]
Lawrence Berkeley National Laboratory is hiring a Data Science Engineer within the Scientific Data division.Computational Biosciences Group. CSE2) in the area of multi-modal data modeling and analys...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff Data Platform Engineer-Lead Petabyte Storage & AI RAG

Staff Data Platform Engineer-Lead Petabyte Storage & AI RAG

Ambient AI, Inc. • Redwood City, CA, United States
[job_card.full_time]
A leading technology firm is seeking a Staff Data Platform Engineer to establish strategy for the data platform team.You'll design and scale data storage systems, enable advanced caching, and tackl...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Data Engineer : ETL & R&D Data Ingestion (Equity)

Data Engineer : ETL & R&D Data Ingestion (Equity)

Uncountable Inc. • San Francisco, CA, United States
[job_card.full_time]
A tech startup in data engineering seeks recent graduates in New York for a Data Engineer role.Key responsibilities include structuring and ingesting datasets, writing Python scripts for data manip...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Data Engineer

Senior Data Engineer

Neara • San Francisco, CA, United States
[job_card.full_time]
Full Time • Data Engineering • Hybrid • USD 110000 -210000 / year.DeepScribe is building the future of healthcare technology. Our vision goes beyond automating medical notes - we are building AI agen...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Staff Data Engineer : Streaming & Lakehouse Platform Leader

Staff Data Engineer : Streaming & Lakehouse Platform Leader

Sony Playstation • San Mateo, CA, United States
[job_card.full_time]
A leading gaming company in San Mateo is seeking a Staff Data Engineer to own and evolve large-scale data platforms.In this role, you will lead the design and development of data solutions using te...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]