Automation : Automate infrastructure provisioning, deployment and scaling processes using IaC (Infrastructure as Code) methodologies. Observability : Develop and maintain observability systems to monitor the health and performance of applications and proactively identify and resolve issues. Kubernetes Expertise : Leverage your deep understanding of Kubernetes architecture to design and optimize deployment and orchestration of AI services in containerized environments. Scale and Performance Optimization : Work on scale and performance tuning and optimization of AI services to ensure efficient, responsive operation for handling large volumes of requests. Security Compliance : Ensure that AI services and APIs meet security and compliance standards, collaborating closely with security teams to implement necessary measures. On-Call : Participate in SRE on-call rotation to swiftly respond to and resolve system issues, ensuring optimal performance and reliability of AI services around the clock. Basic Skills : - Expertise in infrastructure automation tools like Terraform, Ansible (Configuration Management) Strong experience with cloud services GCP and container orchestration tools (Kubernetes, Docker) Proficiency in programming languages such as Python, Go Deep understanding of networking (VPC / Shared VPC / PSA), security and database technologies
Senior Devops Lead • Houston, Texas, United States