A company is looking for a Site Reliability Engineer to enhance observability and reliability practices within a distributed environment.
Key Responsibilities
Own and evolve the observability stack using various monitoring tools and AWS services
Design and maintain SLIs, SLOs, and error budgets to improve system reliability
Support incident investigations and maintain observability cost efficiency
Required Qualifications
Hands-on experience with production observability systems like Prometheus and Grafana
Experience with Thanos or large-scale metrics systems
Strong understanding of SLIs, SLOs, and incident response workflows
Solid experience with Kubernetes and Infrastructure as Code (Terraform preferred)
Proficiency in scripting or programming (Go, Python, or Bash)
Site Reliability Engineer • Baton Rouge, Louisiana, United States