Design and deploy scalable ETL / ELT pipelines
to ingest, transform,
and load clinical data from diverse sources (EMRs, labs, IoT devices, data
lakes, FHIR / HL7 APIs), including Azure and Snowflake.
Architect and optimize Microsoft Azure and Snowflake
environments
for clinical data storage, extraction, transformation, and loading, machine
learning operations (MLOps) performance tuning, cost management, and secure
data sharing.
Ensure compliance with healthcare regulations
(HIPAA, GDPR) by
implementing data anonymization, encryption, and audit trails.
Collaborate with clinical stakeholders
to translate business
requirements into technical solutions for analytics and reporting.
Develop and maintain data governance frameworks
, including metadata
management, data lineage, and quality checks (e.g., validation of lab results,
patient demographics).
Automate data pipelines
using orchestration tools (e.g., Apache
Airflow, Prefect) and integrate real -time streaming solutions (e.g., Kafka)
where applicable.
Build and maintain documentation
for data models,
pipelines, and processes to ensure reproducibility and transparency.
Advanced
proficiency in
Snowflake
(Snowpipe, Time Travel, Zero -Copy Cloning) and
SQL for complex transformations.
Hands -on
experience with
ETL / ELT tools
(Apache Spark, AWS Glue, Azure Data
Factory) and cloud platforms (AWS, Azure, GCP).
Strong
programming skills in
Python / Scala
(Pandas, PySpark) for data scripting
and automation.
Familiarity
with healthcare data formats (OMOP, FHIR, HL7, DICOM) and clinical workflows.
Expertise in
federated learning and running large jobs on high -performance computing servers
is a plus.
Data Governance : Ability to implement
data quality frameworks (e.g., Great Expectations) and metadata management
tools.
Regulatory Compliance :
Proven experience
securing PHI / PII data and adhering to HIPAA / GDPR requirements.
Problem -Solving : Ability to
troubleshoot pipeline failures, optimize query performance, and resolve data
discrepancies.
Collaboration : Strong communication
skills to work with cross -functional teams (clinicians, analysts, IT).
Requirements
Snowflake
SnowPro Core Certification (or higher).
AWS / Azure / GCP
Data Engineering Certification (e.g., AWS Certified Data Analytics, Azure Data
Engineer Associate).
Running
jobs on high -performance computing servers
Healthcare -specific
certifications (e.g., HL7 FHIR Certification, Certified Health Data Analyst
(CHDA)).
Security
certifications (e.g., CISSP, CIPP) for handling sensitive clinical data.
3+ years of experience in data engineering, with 2+ years focused
on
healthcare / clinical data
(e.g., hospitals, EMR systems, clinical
trials).
2+ years of hands -on experience with
Snowflake
in
production environments.
Proven track record of building ETL pipelines for large -scale clinical
datasets.
Experience with OMOP CDM, Epic / Cerner EHR systems, or clinical
data lakes.
Exposure to DevOps practices (CI / CD, Terraform) and Agile
methodologies.
Some front -end development is preferred.
Data Engineer • Boston, Massachusetts, United States