Talent.com
Member of Technical Staff - Large Model Data
Member of Technical Staff - Large Model DataBlack Forest Labs • San Francisco, CA, United States
Member of Technical Staff - Large Model Data

Member of Technical Staff - Large Model Data

Black Forest Labs • San Francisco, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Member Of Technical Staff - Large Model Data

What if the bottleneck to better generative models isn't architecture or compute, but the quality and scale of the data we train on?

We're the ~50-person team behind Stable Diffusion, Stable Video Diffusion, and FLUX.1models with 400M+ downloads. But here's what we've learned : breakthrough models require breakthrough datasets. Not just big datasetscarefully curated, properly processed, deeply understood datasets that push models toward capabilities they couldn't achieve otherwise. That's the infrastructure you'll build.

What You'll Pioneer

You'll create the data systems that make frontier research possible. This isn't traditional data engineeringit's building infrastructure at a scale where billion-image datasets are normal, where video processing pipelines need to run across thousands of GPUs, and where understanding what's in your data is as important as collecting it.

You'll be the person who :

  • Develops and maintains scalable infrastructure for acquiring massive-scale image and video datasetsthe kind where "large" means billions of assets, not millions
  • Manages and coordinates data transfers from licensing partners, turning heterogeneous sources into training-ready pipelines
  • Implements and deploys state-of-the-art ML models for data cleaning, processing, and preparationbecause at our scale, manual curation isn't an option
  • Builds scalable tools to visualize, cluster, and deeply understand what's actually in our datasets (because you can't fix what you can't see)
  • Optimizes and parallelizes data processing workflows to handle billion-scale datasets efficiently across both CPUs and GPUs
  • Ensures data quality, diversity, and proper annotationincluding captioning systems that make training datasets actually useful
  • Transforms user preference data and alternative sources into formats that models can learn from
  • Works directly in the model development loop, updating datasets as training trajectories reveal what we're missing

Questions We're Wrestling With

  • How do you deduplicate billions of images without accidentally removing the edge cases that make models interesting?
  • What does "data quality" actually mean when you're training generative modelsand how do you measure it at scale?
  • How do you caption video data in ways that capture temporal dynamics, not just individual frames?
  • Where are the hidden biases in our datasets, and how do we surface them before they become model biases?
  • When does adding more data help, and when does it just add noise?
  • How do we build data pipelines that adapt as model requirements change mid-training?
  • Who Thrives Here

    You understand that data engineering at research scale is fundamentally different from traditional data engineering. You've built pipelines that broke, debugged them at scale, and emerged with opinions about what works. You know the difference between data that looks good and data that actually trains well.

    You likely have :

  • Strong proficiency in Python and experience with various file systems for data-intensive manipulation and analysis
  • Hands-on familiarity with cloud platforms (AWS, GCP, or Azure) and Slurm / HPC environments for distributed data processing
  • Experience with image and video processing libraries (OpenCV, FFmpeg, etc.) and an understanding of their performance characteristics
  • Demonstrated ability to optimize and parallelize data workflows across both CPUs and GPUsbecause at our scale, inefficient code is unusable code
  • Familiarity with data annotation and captioning processes for ML training datasets
  • Knowledge of machine learning techniques for data cleaning and preprocessing (because heuristics only get you so far)
  • We'd be especially excited if you :

  • Have built or contributed to large-scale data acquisition systems and understand the operational challenges
  • Bring experience with NLP techniques for image / video captioning
  • Have implemented data deduplication at billion-record scale and understand the tradeoffs
  • Know your way around big data frameworks like Apache Spark or Hadoop
  • Have been part of shipping a state-of-the-art model and understand how data decisions impact training outcomes
  • Think deeply about ethical considerations in data collection and usage
  • What We're Building Toward

    We're not just processing datawe're building the foundation that determines what our models can learn. Every pipeline optimization makes training faster. Every data quality improvement makes models better. Every new data source opens new possibilities. If that sounds more compelling than maintaining existing systems, we should talk.

    Base Annual Salary : $180,000$300,000 USD

    We're based in Europe and value depth over noise, collaboration over hero culture, and honest technical conversations over hype. Our models have been downloaded hundreds of millions of times, but we're still a ~50-person team learning what's possible at the edge of generative AI.

    [job_alerts.create_a_job]

    Member of Technical Staff Large Model Data • San Francisco, CA, United States

    [internal_linking.similar_jobs]
    Member of Technical Staff

    Member of Technical Staff

    Attention Engineering • San Francisco, CA, United States
    [job_card.full_time]
    We are an applied AI lab building truly personal intelligence.Our team is small and talent-dense, well funded by the best, and looking for great Engineers and Researchers to join us in California.[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of the Technical Staff — Full Stack

    Member of the Technical Staff — Full Stack

    Stuut, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff

    Member of Technical Staff

    Recruiting From Scratch • San Francisco, CA, United States
    [job_card.full_time]
    Who is Recruiting from Scratch : .Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients. Our team is 100% remote and we work with teams across North Ameri...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff - Machine Learning

    Member of Technical Staff - Machine Learning

    Quantix Search • San Francisco, CA, United States
    [job_card.full_time]
    Member of Technical Staff – Machine Learning.I’m partnering with a rapidly scaling healthtech startup that has just raised a $40M Series A to expand its engineering team. Their AI-powered platform i...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    [Omni] Member of Technical Staff, World Model

    [Omni] Member of Technical Staff, World Model

    xAI • San Francisco, CA, United States
    [job_card.full_time]
    Omni] Member of Technical Staff, World Model.AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.Our team is small, highly ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff, Fullstack

    Member of Technical Staff, Fullstack

    Envoy Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Envoy builds workspace management technology that makes it simple to run secure, compliant, and connected workplaces across every location. Over 16,000 workplaces and properties around the world rel...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff

    Member of Technical Staff

    Listen Labs • San Francisco, CA, United States
    [job_card.full_time]
    We are seeing strong market demand and an aggressive 6‑month product roadmap, so we are expanding our engineering team.We're looking for someone highly technical (our current team includes 3 IOI me...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff (applied)

    Member of Technical Staff (applied)

    Anthrogen • San Francisco, CA, United States
    [job_card.full_time]
    Get AI-powered advice on this job and more exclusive features.This role is provided by Anthrogen.Your actual pay will be based on your skills and experience – talk with your recruiter to learn more...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff - GPU Infrastructure

    Member of Technical Staff - GPU Infrastructure

    Prime Intellect • San Francisco, CA, United States
    [job_card.full_time]
    Member of Technical Staff - GPU Infrastructure.Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features. Building the Future of Decentralized AI Development.Pri...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff - Platform

    Member of Technical Staff - Platform

    PetsApp • San Francisco, CA, United States
    [job_card.full_time]
    We are looking for an exceptional mid-level to senior engineer to join our team.You, alongside the team, will own the platform that runs our benchmarks. This spans everything needed to evaluate LLMs...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff, Model Efficiency

    Member of Technical Staff, Model Efficiency

    Cohere • San Francisco, CA, United States
    [job_card.full_time]
    Member of Technical Staff, Model Efficiency.Our mission is to scale intelligence to serve humanity.We’re training and deploying frontier models for developers and enterprises who are building AI sy...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    [Omni] Member of Technical Staff, World Model

    [Omni] Member of Technical Staff, World Model

    Pantera Capital • San Francisco, CA, United States
    [job_card.full_time]
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff

    Member of Technical Staff

    Amigos • San Francisco, CA, United States
    [job_card.full_time]
    This range is provided by Amigos.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Direct message the job poster from Amigos.Be at the absolute fr...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff - Large scale data infrastructure

    Member of Technical Staff - Large scale data infrastructure

    Black Forest Labs • San Francisco, CA, United States
    [job_card.full_time]
    What if the ability to continually train improved models is just the capability to retrieve and process all our data?.Our founding team pioneered Latent Diffusion and Stable Diffusion - breakthroug...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff, Kernel Engineering

    Member of Technical Staff, Kernel Engineering

    Inferact • San Francisco, CA, United States
    [job_card.full_time]
    Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff

    Member of Technical Staff

    Goliath Partners • San Francisco, CA, United States
    [job_card.full_time]
    Goliath Partners has exclusively partnered with a fast-growing, venture-backed multimodal GenAI company operating at serious scale. They’re looking for a Machine Learning Engineer (Multimodal / Gene...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff Applied ML

    Member of Technical Staff Applied ML

    FitNext Co. • San Francisco, CA, United States
    [job_card.full_time]
    As a Machine Learning Research Engineer, you'll drive research that teaches models what great feels like across domains such as model personality and behavior, UI design, multi-modal generation, an...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Member of Technical Staff

    Member of Technical Staff

    OpenBlock Labs Inc. • San Francisco, CA, United States
    [job_card.full_time]
    San Francisco, California / Hybrid or Remote • $200K-320K + Equity.We're building the next generation of AI-powered development tools. Our CLI coding agent is designed to be a true collaborator for ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]