Founded by Arun Gupta (former CEO of Grailed, which sold to GOAT Group in 2022) and Bob Whitney (Anthropic, NYT Games), we're on a mission to create safe, hands‑on AI experiences that fuel kids' imaginations rather than replace them.
Our first product, Stickerbox, is the world’s first voice‑to‑sticker printer. A device that instantly transforms a child’s spoken ideas into printable, colorable stickers. We sold out our first run shipping for the holidays, and it’s already being called "one of the first products to make AI feel magical for kids and grounded for parents."
We have a $7M funding round led by Maveron (backers of Lovevery), Serena Ventures, and Ai2 (The Allen Institute). Stickerbox is bringing imagination to life for kids nationwide!
Why are we hiring?
The technical challenge is real.
We’re running real‑time audio transcription, proprietary content safety systems, and custom image generation, all serving thousands of concurrent users with sub‑second latency. We’re training our own models from scratch, optimizing for kid‑friendly aesthetics, and building safety guardrails that actually work. We need a Data Scientist to own data quality, evaluation, and ML optimization across this entire pipeline. You’ll work with the team to define what to train on, how to measure success, and how to make our models better every day.
What you’ll do
As our first Data Science hire, you’ll collaborate with us on :
Model Training & Data
Build and curate large‑scale image datasets for training custom models
Design annotation pipelines and data quality processes
Analyze training runs and model outputs to guide iteration
Work with our team to define what to train on and how to evaluate it
ML Pipeline Optimization
Optimize our transcription pipeline for accuracy and latency
Improve image generation quality, prompt adherence, and consistency
Identify bottlenecks and failure modes across the pipeline
Run experiments and A / B tests to measure improvements
Safety & Content Moderation
Refine content safety systems for child‑appropriate outputs, and develop new ones
Build on our evaluation datasets for safety edge cases
Analyze moderation performance and reduce false positives / negatives
Stay current on best practices for AI safety in generative systems
Evaluation & Metrics
Build evaluation frameworks to measure model performance at scale
Define metrics that correlate with user satisfaction (aesthetic quality, relevance, safety)
Develop automated evaluation pipelines (LLM‑as‑judge, CLIP scores, human eval)
Track experiments and communicate findings to the team
Prompt Engineering
Optimize prompts for transcription accuracy and image generation quality
Develop systematic approaches to prompt testing and iteration
Build prompt templates and guidelines for different use cases
What we're looking for
5+ years in data science or applied ML
Experience optimizing production ML systems
Strong statistical and analytical skills
Familiarity with LLMs and image generation models
Python proficiency; comfortable with PyTorch
Experience building evaluation frameworks
Track record of improving ML system performance through data and experimentation
Nice to have
Experience with content moderation or trust & safety
Background in speech / audio ML or computer vision
Experience with human annotation pipelines (Label Studio, Scale AI)
Familiarity with prompt engineering techniques and LLM‑based evaluation
Location : NYC only, On‑site (flexible on WFH but we like to be in office the majority of the week) in our Brooklyn based office, close to most major train lines.
Salary Range : $150k - $250k base + equity and benefits
#J-18808-Ljbffr
Senior Data Scientist • New York, New York, United States