Top Skills:
-System Admin/DevOps background
-Internal/On prem cloud experience
-Scripting-Python, Bash, PowerShell (Any language)
-Ops Support (great communication)
-Willing to work on call 1 Saturday/8 weeks
BofA Cloud Site Reliability Engineer (SRE) for Internal Cloud. Candidates must have 5+ years of experience working with Unix/Linux Server platforms. Must be extremely proficient in Terraform/Shell scripting /Java /Python/Ansible development. Must have experience with whole lifecycle of cloud services-from inception and design, through deployment, operation and support
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Troubleshoot issues across the entire stack: hardware, software, application and network
Perform deep dives into both systemic and latent reliability issues; partner with engineering and operation teams across the organization to produce and roll out fixes.
Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
Identify and drive opportunities to improve automation for the cloud services
Scope and create automation for deployment, management and visibility of our services
Troubleshoot issues across the entire stack: hardware, software, application and network
Perform deep dives into both systemic and latent reliability issues; partner with engineering and operation teams across the organization to produce and roll out fixes.
Identify and drive opportunities to improve automation for the cloud services