Talent.com
Software Engineer, Site Reliability
Software Engineer, Site ReliabilityRoblox • San Mateo, California, United States
[error_messages.no_longer_accepting]
Software Engineer, Site Reliability

Software Engineer, Site Reliability

Roblox • San Mateo, California, United States
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

As a Software Engineer on the Infra Reliability team you will drive the evolution of our systems, ensuring they meet the highest standards of performance, reliability, and efficiency. You’ll collaborate with cross-functional teams to build robust infrastructure that supports our growth. If you have a track record of solving complex technical challenges, we want to hear from you. Join us in shaping the future of our platform and delivering unparalleled value to our users.


You Will:



  • Create libraries that promote fault-tolerance and resilience– like retries, circuit breakers, and adaptive concurrency limits.

  • Build, automate and standardize process automation to create a "golden path" of tooling and platform support that powers the fundamental Roblox ecosystem.

  • Create tooling that provides production guardrails, for example evaluating release candidate capacity with load testing tooling before deploying to production.

  • Create performance monitoring services and observability towards understanding capacity issues and platform degradations.

  • Create tooling that monitors production services and their changes, like generalized canarying services with alerting.


You Have:



  • Experience: you have a BS degree (or equivalent professional experience) in Computer Science or related engineering field with at least 1-3 years of experience with added advantage working in the Site Reliability space in SRE or Software Engineering

  • Passion for systems: You have experience and good habits around building software and tools and getting them adopted. Your system's focus informs a view of code needing to be deeply reliable.

  • A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.

  • A Coder: you have experience writing common programming languages ( Go, C#, Java…).

  • Self-organized: you're excited about getting in front of complex problems, organizing your work by any means possible; overcome emergent issues and contributing to long-running projects as a part of the team.

  • Problem Solver: you ask the right questions to solve issues within your expertise and you use data to test your theories.

  • Planner - You have experience in large project lifecycles. You have experienced working in sprints, breaking down complex tasks into milestones, and reporting status to keep project scheduling accurate.

[job_alerts.create_a_job]

Software Engineer, Site Reliability • San Mateo, California, United States

[internal_linking.similar_jobs]
Senior Software Engineer - Observability and Reliability

Senior Software Engineer - Observability and Reliability

Sigma Computing • San Francisco, CA, United States
[job_card.full_time]
Senior Software Engineer - Observability and Reliability.We are growing the engineering team and looking for engineers who have the chops to build and deliver world-class technology.You will be par...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Attain • Redwood City, CA, United States
[job_card.full_time]
Built for consumers and companies, alike.In a world driven by data, we believe consumers and businesses can coexist.Our founders had a vision to empower consumers to leverage their greatest asset—t...[show_more]
[last_updated.last_updated_30] • [promoted]
Infrastructure Site Reliability Engineer (Local only)

Infrastructure Site Reliability Engineer (Local only)

Maxonic Inc. • San Francisco, CA, United States
[job_card.full_time]
Infrastructure Site Reliability Engineer (Local only).Direct message the job poster from Maxonic Inc.Infrastructure Site Reliability Engineer.Contract (4+ months) with strong possibility to convert...[show_more]
[last_updated.last_updated_30] • [promoted]
Sr/Staff Site Reliability Engineer, Consumer Apps Chicago, IL; Redwood City, CA

Sr/Staff Site Reliability Engineer, Consumer Apps Chicago, IL; Redwood City, CA

Attaindata • Redwood City, CA, United States
[job_card.full_time]
Sr/Staff Site Reliability Engineer, Consumer Apps.Klover’s engineering team powers one of the fastest-growing fintech platforms in the U.Our systems process and move more than $1.As part of this te...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Staff Site Reliability Engineer, Tech Lead

Staff Site Reliability Engineer, Tech Lead

Unify • San Francisco, CA, United States
[job_card.full_time]
Unify was founded January 17th, 2023 by Austin Hughes and Connor Heggie.Connor was a machine learning research engineer at.The rest of our team comes from companies like.Our mission is to build the...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Motive Software • San Francisco, CA, United States
[job_card.full_time]
Senior Site Reliability Engineer.Let’s face it, a company whose mission is human transformation better have some fresh thinking about the employer/employee relationship.We can’t cram it all in here...[show_more]
[last_updated.last_updated_30] • [promoted]
CloudDevs: Senior Site Reliability Engineer (SRE)

CloudDevs: Senior Site Reliability Engineer (SRE)

Breakout Tools • San Francisco, CA, United States
[job_card.full_time]
CloudDevs works with fast-moving, venture-backed startups across the US.We’re building a pool of world-class Site Reliability Engineers for current roles and for upcoming opportunities.You will eit...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Together AI • San Francisco, CA, United States
[job_card.full_time]
As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly.You are a blend of a pragmatic operator and a soft...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Mercor • San Francisco, CA, United States
[job_card.full_time]
Mercor is at the intersection of labor markets and AI research.We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development.Our vast talent network ...[show_more]
[last_updated.last_updated_30] • [promoted]
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Bugcrowd Inc. • San Francisco, CA, United States
[job_card.full_time]
Since 2012, we’ve empowered organizations to take back control and stay ahead of threat actors by uniting the collective ingenuity and expertise of our customers and trusted alliance of elite hacke...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Sr. Software Engineer, Site Reliability

Sr. Software Engineer, Site Reliability

Poshmark, Inc. • Redwood City, CA, United States
[job_card.full_time]
Confidence can sometimes hold us back from applying for a job.Here’s a secret: there's no such thing as a "perfect" candidate.Poshmark is looking for exceptional people who want to make a positive ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Alembic Technologies • San Francisco, CA, United States
[job_card.full_time]
Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Software Engineer, Site Defense

Senior Software Engineer, Site Defense

Reddit, Inc. • San Francisco, CA, United States
[job_card.full_time]
Reddit is a community of communities.It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.Every day, Reddit users submit, vot...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Site Reliability Engineer - Scale & Observability

Site Reliability Engineer - Scale & Observability

gamma.app • San Francisco, CA, United States
[job_card.full_time]
A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems.This high-impact role demands expertise in AWS and ...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Writemed • San Francisco, CA, United States
[job_card.full_time]
Would you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care ...[show_more]
[last_updated.last_updated_30] • [promoted]
Site Reliability Engineer

Site Reliability Engineer

Primer • San Francisco, CA, United States
[job_card.full_time]
Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Software Engineer, Reliability

Software Engineer, Reliability

OpenAI • San Francisco, CA, United States
[job_card.full_time]
Join the engineering teams that bring OpenAI’s ideas safely to the world!!.The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consu...[show_more]
[last_updated.last_updated_30] • [promoted]
Founding SRE Engineer – Reliability & Growth

Founding SRE Engineer – Reliability & Growth

Asana • San Francisco, CA, United States
[job_card.full_time]
A leading software company is seeking experienced Software Engineers to join the new Site Reliability Engineering team.This role focuses on building reliable, scalable systems and leading projects ...[show_more]
[last_updated.last_updated_30] • [promoted]