Primer helps B2B products break out of the B2C-centric marketing box. Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market teams. We ingest billions of rows from first- and third-party sources, map them to rich company context, and surface hyper-targeted audiences and real-time performance alerts—all without vendor lock-in.
That only works if the lights stay
on , queries stay
fast , and incidents stay
rare . That’s where you come in.
As our first dedicated
Site Reliability Engineer , you’ll be the force multiplier who designs, builds, and operates the infrastructure that powers everything : petabyte-scale data pipelines, LLM-backed services, and the APIs our customers (and engineers!) rely on every day. You’ll pair hard-won ops experience with a mentor’s mindset—levelling up the whole team while keeping us four steps ahead of failure.
YOUR MISSION
Own reliability from design to customer.
Define and uphold SLOs / SLIs, manage error budgets, and lead blameless post-mortems.
Automate toil out of existence—CI / CD, infra-as-code, capacity planning, and chaos testing.
Drive incident response end-to-end : detection, mitigation, root-cause analysis, and long-term fixes.
Scale multi-cloud data pipelines (Prefect, ClickHouse, Iceberg) and GPU / LLM workloads.
Teach best practices, review designs, and coach engineers so reliability becomes a team sport.
WHAT YOU’LL DO
Design, implement, and tune distributed systems that handle
high-throughput B2B traffic .
Harden our AWS stack with IaC (e.g. Terraform)
Instrument everything—logs, traces, metrics, and AI-powered anomaly detection.
Champion security, cost optimization, and disaster-recovery strategies.
Jump into the weeds when something breaks, fix it fast, then automate it away.
WHAT YOU’LL BRING
Must-Haves
5+ years owning production systems at meaningful scale (sub-second latency, “four-nines” targets).
Mastery of SRE fundamentals : SLO / SLI design, error budgets, incident playbooks.
Deep hands-on with Linux, networking, containers / K8s, and at least one major cloud (AWS / GCP / Azure).
Proven track record automating infra with Terraform, Helm, or similar IaC tooling.
Fluency in at least one systems / scripting language (Go, Python, Rust, etc.).
Experience operating complex data pipelines (Prefect, Airflow, Temporal) or real-time streaming systems.
History of mentoring engineers and embedding reliability culture across teams.
Pragmatic decision-maker—balances uptime, velocity, and cost for startup reality.
Curiosity for AI-augmented ops (LLM chat-ops, anomaly detection, self-healing).
Nice-to-Haves
Managed GPU clusters and ML inference workloads.
Operated data lakes / lakehouses at scale (Iceberg, Delta, etc.).
Meaningful open-source contributions in SRE, DevOps, or data-infra projects.
WHY PRIMER
Mission with impact
– We’re unlocking new growth channels for thousands of B2B marketers.
High-trust, low-ego culture
– Fully distributed team, meeting-light weeks, Friday focus days.
Work & life, balanced
– Five weeks PTO, generous parental leave, and flexibility for families.
Career rocket-fuel
– Small team, huge problems, real ownership. Shape the future with bold innovators, driving impact that redefines industries.
Diverse & global
– Teammates span six countries—and counting.
Intro Call with Engineering Manager
– 30 min
System Design
– 60 min
Operational Excellence Drill-down
– 60 min
Strategic Pragmatism Chat with CTO
– 45 min
Technical Coding / Systems Deep Dive
– 30 min
Culture & Values with CEO
– 45 min
Decision typically within 24-48 hrs of final conversation.
READY TO LEVEL UP B2B MARKETING INFRASTRUCTURE?
careers@sayprimer.com
with your résumé, LinkedIn, GitHub, or anything that showcases your reliability superpowers. Let’s build the future—without the fire-drills.
#J-18808-Ljbffr
Site Reliability Engineer • San Francisco, California, United States