Talent.com
Member of Technical Staff - Large Model Data
Member of Technical Staff - Large Model DataBlack Forest Labs • San Francisco, CA, United States
Member of Technical Staff - Large Model Data

Member of Technical Staff - Large Model Data

Black Forest Labs • San Francisco, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Member Of Technical Staff - Large Model Data

What if the bottleneck to better generative models isn't architecture or compute, but the quality and scale of the data we train on?

We're the ~50-person team behind Stable Diffusion, Stable Video Diffusion, and FLUX.1models with 400M+ downloads. But here's what we've learned : breakthrough models require breakthrough datasets. Not just big datasetscarefully curated, properly processed, deeply understood datasets that push models toward capabilities they couldn't achieve otherwise. That's the infrastructure you'll build.

What You'll Pioneer

You'll create the data systems that make frontier research possible. This isn't traditional data engineeringit's building infrastructure at a scale where billion-image datasets are normal, where video processing pipelines need to run across thousands of GPUs, and where understanding what's in your data is as important as collecting it.

You'll be the person who :

  • Develops and maintains scalable infrastructure for acquiring massive-scale image and video datasetsthe kind where "large" means billions of assets, not millions
  • Manages and coordinates data transfers from licensing partners, turning heterogeneous sources into training-ready pipelines
  • Implements and deploys state-of-the-art ML models for data cleaning, processing, and preparationbecause at our scale, manual curation isn't an option
  • Builds scalable tools to visualize, cluster, and deeply understand what's actually in our datasets (because you can't fix what you can't see)
  • Optimizes and parallelizes data processing workflows to handle billion-scale datasets efficiently across both CPUs and GPUs
  • Ensures data quality, diversity, and proper annotationincluding captioning systems that make training datasets actually useful
  • Transforms user preference data and alternative sources into formats that models can learn from
  • Works directly in the model development loop, updating datasets as training trajectories reveal what we're missing

Questions We're Wrestling With

  • How do you deduplicate billions of images without accidentally removing the edge cases that make models interesting?
  • What does "data quality" actually mean when you're training generative modelsand how do you measure it at scale?
  • How do you caption video data in ways that capture temporal dynamics, not just individual frames?
  • Where are the hidden biases in our datasets, and how do we surface them before they become model biases?
  • When does adding more data help, and when does it just add noise?
  • How do we build data pipelines that adapt as model requirements change mid-training?
  • Who Thrives Here

    You understand that data engineering at research scale is fundamentally different from traditional data engineering. You've built pipelines that broke, debugged them at scale, and emerged with opinions about what works. You know the difference between data that looks good and data that actually trains well.

    You likely have :

  • Strong proficiency in Python and experience with various file systems for data-intensive manipulation and analysis
  • Hands-on familiarity with cloud platforms (AWS, GCP, or Azure) and Slurm / HPC environments for distributed data processing
  • Experience with image and video processing libraries (OpenCV, FFmpeg, etc.) and an understanding of their performance characteristics
  • Demonstrated ability to optimize and parallelize data workflows across both CPUs and GPUsbecause at our scale, inefficient code is unusable code
  • Familiarity with data annotation and captioning processes for ML training datasets
  • Knowledge of machine learning techniques for data cleaning and preprocessing (because heuristics only get you so far)
  • We'd be especially excited if you :

  • Have built or contributed to large-scale data acquisition systems and understand the operational challenges
  • Bring experience with NLP techniques for image / video captioning
  • Have implemented data deduplication at billion-record scale and understand the tradeoffs
  • Know your way around big data frameworks like Apache Spark or Hadoop
  • Have been part of shipping a state-of-the-art model and understand how data decisions impact training outcomes
  • Think deeply about ethical considerations in data collection and usage
  • What We're Building Toward

    We're not just processing datawe're building the foundation that determines what our models can learn. Every pipeline optimization makes training faster. Every data quality improvement makes models better. Every new data source opens new possibilities. If that sounds more compelling than maintaining existing systems, we should talk.

    Base Annual Salary : $180,000$300,000 USD

    We're based in Europe and value depth over noise, collaboration over hero culture, and honest technical conversations over hype. Our models have been downloaded hundreds of millions of times, but we're still a ~50-person team learning what's possible at the edge of generative AI.

    [job_alerts.create_a_job]

    Member of Technical Staff Large Model Data • San Francisco, CA, United States

    [internal_linking.similar_jobs]
    Member of Technical Staff

    Member of Technical Staff

    Openblock • San Francisco, CA, United States
    [job_card.full_time]
    San Francisco, California / Hybrid or Remote $200K-320K + Equity.We're building the next generation of AI-powered development tools. Our CLI coding agent is designed to be a true collaborator for pr...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Member of Technical Staff, AI Platform & Architecture

    Member of Technical Staff, AI Platform & Architecture

    Postman • San Francisco, CA, United States
    [job_card.full_time]
    Member Of Technical Staff, Ai Platform & Architecture.Postman is the world's leading API platform, used by more than 40 million developers and 500,000 organizations, including 98% of the Fortune 50...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Technical Lead for Inference & ML Performance

    Staff Technical Lead for Inference & ML Performance

    Fal • San Francisco, CA, United States
    [job_card.full_time]
    Staff Technical Lead for Inference & ML Performance.We're pushing the boundaries of model inference performance to power seamless creative experiences at unprecedented scale.We're looking for a Sta...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Member of Technical Staff - Plasma Physics

    Member of Technical Staff - Plasma Physics

    Maritime Fusion • San Francisco, CA, United States
    [job_card.full_time]
    We’re looking for a Plasma Physicist who’s not just into theory—but wants to design, simulate, and help build marine deployable tokamaks. You’ll spend your time running transport and MHD codes, expl...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff Software Engineer, Model Serving

    Staff Software Engineer, Model Serving

    Menlo Ventures • San Francisco, CA, United States
    [job_card.full_time]
    At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Member of Technical Staff, Infrastructure & Scaling

    Member of Technical Staff, Infrastructure & Scaling

    Parallel Web Systems • San Francisco, CA, United States
    [job_card.full_time]
    You will build, operate, and scale our infrastructure, including our infrastructure around large language models, and ensure that our systems are reliable and cost-efficient as we grow.You will ant...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Staff Data Engineer

    Staff Data Engineer

    Tonal • San Francisco, CA, United States
    [job_card.full_time]
    Tonal is the smartest home gym and personal trainer.With cutting-edge hardware, AI-driven coaching, and the world’s largest dataset of real-world strength training, we’re redefining how people trai...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Member of Technical Staff - Full Stack

    Member of Technical Staff - Full Stack

    Hyperbolic Labs • San Francisco, CA, United States
    [job_card.full_time]
    Full Stack Engineer At Hyperbolic Labs.As a Full Stack Engineer at Hyperbolic Labs, you'll work closely with our Product and Engineering teams to design, build, and scale end-to-end applications th...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Member of Technical Staff, Applied Inference

    Member of Technical Staff, Applied Inference

    Xai • San Francisco, CA, United States
    [job_card.full_time]
    Member Of Technical Staff, Applied Inference.AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.Our team is small, highly ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Data Engineer

    Staff Data Engineer

    Sony Interactive Entertainment • San Mateo, CA, United States
    [job_card.full_time]
    PlayStation isn't just the Best Place to Play it's also the Best Place to Work.Today, we're recognized as a global leader in entertainment producing The PlayStation family of products and services ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Member of Technical Staff - Research

    Member of Technical Staff - Research

    Vals AI • San Francisco, CA, United States
    [job_card.full_time]
    Researcher And Research Engineer Position.We are looking for exceptional researchers and research engineers to design and build the next generation of AI benchmarks. You will create high-impact, cha...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Member of Technical Staff, Machine Learning

    Member of Technical Staff, Machine Learning

    Syntiant • Redwood City, CA, United States
    [job_card.full_time]
    Member Of Technical Staff Of Machine Learning.AI software and semiconductor solutions space, is looking for an experienced and talented Member Of Technical Staff Of Machine Learning to take on a cr...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Data Engineer

    Staff Data Engineer

    Scribd • San Francisco, CA, United States
    [job_card.full_time]
    At Scribd (pronounced "scribbed"), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Machine Learning Data Engineer

    Staff Machine Learning Data Engineer

    Backflip AI • San Francisco, CA, United States
    [job_card.full_time]
    Staff Machine Learning Data Engineer.Join to apply for the Staff Machine Learning Data Engineer role at Backflip AI.Mechanical design, the work done in CAD, is the rate‑limiter for progress in the ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Member of Technical Staff, Product

    Member of Technical Staff, Product

    Mandolin • San Francisco, CA, United States
    [job_card.full_time]
    Nearly every disease will become treatable in our lifetimes.Mandolin is laying the clinical and financial infrastructure to get groundbreaking treatments to patients faster, powered by AI agents.Ma...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Member of Technical Staff - Applied Science, Perception, AGI Autonomy

    Member of Technical Staff - Applied Science, Perception, AGI Autonomy

    San Francisco Staffing • San Francisco, CA, United States
    [job_card.full_time]
    The AGI Autonomy Perception team performs applied machine learning research, including model training, dataset design, pre- and post-training. We train Nova Act, our state-of-the-art computer use ag...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Data Engineer

    Staff Data Engineer

    Artera Corporation • San Francisco, CA, United States
    [job_card.full_time]
    We are seeking a highly skilled and motivated Staff Data Engineer to join our team at Artera.This role is critical to maintaining and improving our data infrastructure, ensuring that our data pipel...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Member of Technical Staff - GPU Infrastructure

    Member of Technical Staff - GPU Infrastructure

    Reflection AI • San Francisco, CA, United States
    [job_card.full_time]
    Design, build, and operate Reflection's large-scale GPU infrastructure powering pre-training, post-training, and inference. Develop reliable, high-performance systems for scheduling, orchestration, ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]