Talent.com
Senior Data Engineer II - Electronic Health Records (EHR)
Senior Data Engineer II - Electronic Health Records (EHR)Formation Bio • New York City, New York, United States
Senior Data Engineer II - Electronic Health Records (EHR)

Senior Data Engineer II - Electronic Health Records (EHR)

Formation Bio • New York City, New York, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

About Formation Bio


Formation Bio is a tech and AI driven pharma company differentiated by radically more efficient drug development.

Advancements in AI and drug discovery are creating more candidate drugs than the industry can progress because of the high cost and time of clinical trials. Recognizing that this development bottleneck may ultimately limit the number of new medicines that can reach patients, Formation Bio, founded in 2016 as TrialSpark Inc., has built technology platforms, processes, and capabilities to accelerate all aspects of drug development and clinical trials. Formation Bio partners, acquires, or in-licenses drugs from pharma companies, research organizations, and biotechs to develop programs past clinical proof of concept and beyond, ultimately helping to bring new medicines to patients. The company is backed by investors across pharma and tech, including a16z, Sequoia, Sanofi, Thrive Capital, Sam Altman, John Doerr, Spark Capital, SV Angel Growth, and others.

You can read more at the following links:

  • Our Vision for AI in Pharma
  • Our Current Drug Portfolio
  • Our Technology & Platform

At Formation Bio, our values are the driving force behind our mission to revolutionize the pharma industry. Every team and individual at the company shares these same values, and every team and individual plays a key part in our mission to bring new treatments to patients faster and more efficiently.

About the Position

We’re looking for a Senior Data Engineer to join the Data Platform team at Formation Bio to help transform Electronic Health Records (EHR) data into structured, analytics-ready assets. In this role, you’ll be partnering closely with our Data Science team to model, transform, and refine data for operational and scientific use cases.

This position sits at the intersection of healthcare data engineering, modern data platform infrastructure, and generative AI. While your initial focus will be on building high-quality EHR models for Formation Bio platform, you’ll also contribute to our broader data architecture by leveraging tools like Snowflake, Dagster, and dbt to enable scalable, governed, and high-reliability pipelines.

The ideal candidate combines deep data engineering experience with both GenAI fluency (e.g., LLM-based entity extraction, summarization, classification) and strong technical expertise with modern data tooling. You’ll play a key role in shaping how healthcare data becomes discoverable, structured, and impactful across the organization.

Responsibilities

  • Model and transform raw EHR data into clean, canonical, and analytics-ready datasets using SQL, Python, and clinical standards like FHIR, HL7, or OMOP.
  • Build and manage scalable data pipelines using Dagster for orchestration, dbt for transformation, and Snowflake as the primary compute and storage engine.
  • Collaborate with Data Science and product stakeholders to co-develop cohort logic, derived features, and structured outputs that meet real-world scientific needs.
  • Apply Generative AI techniques within transformation layers—using LLMs for named entity recognition, document summarization, classification, and schema alignment.
  • Write robust, testable, and version-controlled code that adheres to CI/CD and data governance best practices.
  • Implement data validation and observability frameworks to ensure quality, trust, and reproducibility of datasets.
  • Document transformation logic, assumptions, and data lineage in collaboration with metadata and cataloging systems.
  • Contribute to the evolution of the Data Platform by helping define standards, patterns, and best practices around GenAI and platform-scale data engineering.

About You

  • You have 5+ years of experience in data engineering, ideally with at least 2 years working in healthcare or life sciences, including direct exposure to EHR datasets.
  • Experience with ontologies and biomedical schemas (e.g. UMLS, LOINC, ICD9/10, MeSH, etc.)]
  • Experience and understanding of modalities found within EHR datasets incl. Billing claims, lab results, visit notes, images
  • Experience in biomedical feature engineering, e.g. variable transformations and derivatives
  • You’re fluent in SQL and Python, and you’ve built and maintained production-grade pipelines that support analytics, science, or operational workflows.
  • You have hands-on expertise with modern data infrastructure, including:
  • You’re experienced in applying GenAI techniques within pipelines, including prompt engineering, LLM-based entity extraction, and classification/summarization workflows.
  • You value clarity, documentation, and structured thinking—especially when working with complex data like healthcare records.
  • You have a growth mindset and are excited to build bridges between isolated data environments and governed, shared models that power scientific innovation.
  • Bonus: You’ve worked in regulated or privacy-sensitive data environments, and you’re familiar with governance models for PHI or sensitive data

Formation Bio is prioritizing hiring in key hubs, primarily the New York City and Boston metro areas, with additional growth in the Research Triangle (NC) and San Francisco Bay Area. Please only apply if you reside in these locations or are willing to relocate.


Compensation:


The target salary range for this role is: $230,000 - $280,000.

Salary ranges are informed by a number of factors including geographic location. The range provided includes base salary only. In addition to base salary, we offer equity, comprehensive benefits, generous perks, hybrid flexibility, and more. If this range doesn't match your expectations, please still apply because we may have something else for you.

You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

#LI-hybrid

[job_alerts.create_a_job]

Senior Data Engineer II Electronic Health Records EHR • New York City, New York, United States

[internal_linking.similar_jobs]
Senior Databricks Data Engineer #Senior Databricks Data Engineer

Senior Databricks Data Engineer #Senior Databricks Data Engineer

Axiom Path • City of Jersey City, New Jersey, US
[job_card.full_time]
Job Description Job Description Be Part Of A High-Performing Global Financial Technology Team: This opportunity sits within the technology division of a globally recognized financial institution kn...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer – Vice President

Senior Data Engineer – Vice President

Morgan Stanley • New York, NY, United States
[job_card.full_time]
In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff Data Platform Engineer - Scale Databricks & AWS

Staff Data Platform Engineer - Scale Databricks & AWS

Gemini • New York, NY, United States
[job_card.full_time]
A leading crypto and Web3 platform in San Francisco is seeking a Staff Data Platform Engineer to own and evolve the data warehouse infrastructure.This role requires a minimum of 8 years of experien...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Research and Development Engineer I/II

Research and Development Engineer I/II

Cresilon, Inc. • New York, New York, US
[job_card.full_time]
Job Description Job Description Cresilon ® is a Brooklyn-based biotechnology company that develops, manufactures, and markets hemostatic medical devices utilizing the company's proprietary hydrogel...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior AI Solutions Engineer

Senior AI Solutions Engineer

IPG DXTRA • New York, NY, United States
[job_card.full_time]
AI core team and play a crucial role in this transformation.Reporting to our VP, Global AI Technology Lead, you'll drive the practical implementation of AI solutions that revolutionize how we work ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Manager II, Engineering - AI Platform Training, Serving and Storage (NorAm)

Manager II, Engineering - AI Platform Training, Serving and Storage (NorAm)

Datadog • New York City, NY, United States
[job_card.full_time]
The AI platform is responsible for all AI infrastructure across Datadog.Our mission is to provide tools and platforms that enable data scientists and engineers to conduct large-scale training and i...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Senior Data Engineer: Scale Data for Accessible Mental Health

Senior Data Engineer: Scale Data for Accessible Mental Health

Headway • New York, NY, United States
[job_card.full_time]
An innovative healthcare technology company is seeking a Senior Data Engineer to enhance data accessibility in mental healthcare.The role involves designing scalable data platforms, ensuring data g...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Remote Senior Azure Data Engineer: Lead Data Integration

Remote Senior Azure Data Engineer: Lead Data Integration

Cognizant • Hoboken, NJ, United States
[filters.remote]
[job_card.full_time]
A leading technology company is seeking a Sr.Azure Data Engineer to drive data integration and migration initiatives across enterprise platforms.This remote position requires expertise in ETL proce...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Security Engineer III ~ Data Loss Prevention

Security Engineer III ~ Data Loss Prevention

Capital Group • New York, New York, United States
[job_card.full_time]
We want you to feel comfortable doing great work and bringing your best, authentic self to everything you do.We value your talents, traditions, and uniqueness-and we're committed to fostering a str...[show_more]
[last_updated.last_updated_30] • [promoted]
Software Engineer II - Biostatistics Service

Software Engineer II - Biostatistics Service

Memorial Sloan • New York, New York, United States
[job_card.full_time]
The people of Memorial Sloan Kettering Cancer Center (MSK) are united by a singular mission: ending cancer for life.Our specialized care teams provide personalized, compassionate, expert care to pa...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior Data Analytics Engineer

Senior Data Analytics Engineer

Virtual Vocations Inc • Jackson Heights, NY, United States
[job_card.full_time]
A company is looking for a Senior Data Analytics Engineer to join their Data Solutions team.Key Responsibilities Lead data analytics projects to build innovative solutions while ensuring adherence ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Senior GenAI Engineer — Scalable Data Platform

Senior GenAI Engineer — Scalable Data Platform

Scale • New York, NY, United States
[job_card.full_time]
A leading AI company based in New York is seeking a seasoned Software Engineer to design and build scalable, robust systems across the stack.The ideal candidate will have 5+ years of software engin...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Data Engineer

Senior Data Engineer

(EDO) Entertainment Data Oracle, Inc. • New York, NY, United States
[job_card.full_time]
EDO is the TV outcomes company.Our leading measurement platform connects convergent TV airings to the ad‑driven consumer behaviors most predictive of future sales.EDO empowers the advertising indus...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Research Engineer

Senior Research Engineer

PATHOS • New York, NY, United States
[job_card.full_time]
Drug development shouldn't be guesswork, not when patients are waiting.Pathos is building a next-generation biotech with AI at the core.Not as a feature, but as the operating system for how medicin...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Data Engineer: ETL & R&D Data Ingestion (Equity)

Data Engineer: ETL & R&D Data Ingestion (Equity)

Uncountable Inc. • New York, NY, United States
[job_card.full_time]
A tech startup in data engineering seeks recent graduates in New York for a Data Engineer role.Key responsibilities include structuring and ingesting datasets, writing Python scripts for data manip...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Engineer

Senior Engineer

Tata Consultancy Services • New York, NY, United States
[job_card.full_time]
Must Have Technical/Functional Skills.Hands-on experience in building ETL using Databricks SaaS infrastructure.Experience in developing data pipeline solutions to ingest and exploit new and existin...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Data Engineer - Vice President

Senior Data Engineer - Vice President

PowerToFly • New York, NY, United States
[job_card.full_time]
In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Staff Engineer, Data Management

Senior Staff Engineer, Data Management

Regeneron Pharmaceuticals • Tarrytown, NY, United States
[job_card.full_time]
The Data Enablement and Analytics (DEA) team, within the PAPD (Product, Analytics and Process Development) organization, is a multi-functional team that drives PAPD's digitalization efforts by maki...[show_more]
[last_updated.last_updated_variable_days] • [promoted]