Talent.com
Infrastructure Engineer (Hybrid Cloud & Platform)
Infrastructure Engineer (Hybrid Cloud & Platform)Aldea Inc • San Francisco, California, United States, 94102
[error_messages.no_longer_accepting]
Infrastructure Engineer (Hybrid Cloud & Platform)

Infrastructure Engineer (Hybrid Cloud & Platform)

Aldea Inc • San Francisco, California, United States, 94102
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Location : US Remote / Bay Area

Job Type : Full-time

Level : Mid-Level / Senior

About Aldea

Aldea is a multi-modal foundational AI company reimagining the scaling laws of intelligence. We believe today's architectures create unnecessary bottlenecks for the evolution of software. Our mission is to build the next generation of foundational models that power a more expressive, contextual, and intelligent human–machine interface.

The Mission

We are seeking an Infrastructure Engineer to bridge the gap between complex hybrid infrastructure and developer velocity. You will architect a unified platform spanning AWS and Bare Metal Kubernetes .

At this level, you bring technical direction and expertise to the table. You will participate in planning and discussion for architecting resilient infrastructure, drive cross-team initiatives, and mentor other engineers while remaining deeply hands-on. Your ultimate goal is to build a "Golden Path" for engineering : automated releases, deep observability, and a platform experience that feels invisible to the end user.

Key Responsibilities

1. Hybrid Infrastructure & Bare Metal (AWS + K8s)

  • Unified IaC Strategy : Architect and maintain the Terraform codebase for both AWS services (EKS, RDS, VPC) and Bare Metal clusters. You will treat physical infrastructure as mutable software, using tools like Cluster API , Metal3 , or Tinkerbell to manage hardware lifecycles.
  • Bare Metal Mastery : Manage multiple production clusters on bare metal with clear separation of environments. You will solve complex challenges including networking (BGP, ECMP), load balancing (MetalLB / Kube-VIP), and storage orchestration (CSI / Rook-Ceph) for stateful workloads.

2. Observability & AI Monitoring

  • Full-Stack Visibility : Contribute to building our stack ( Prometheus, Grafana, ELK / Loki ) to monitor both EKS and bare metal.
  • AI / GPU Telemetry : Build specialized dashboards for AI workloads. You will track GPU metrics , CPU saturation, and memory pressure to ensure efficient resource utilization.
  • 4. CI / CD & Release Architecture

  • CI / CD at Scale : Architect resilient, multi-region pipelines using GitHub Actions . Automated CI / CD for apps using ArgoCD . You will build and manage a fleet of self-hosted runners to control costs and accelerate feedback loops.
  • Secure Release Engineering : Implement end-to-end workflows : Docker image build → Helm chart release → deployment (GH Actions + ArgoCD). Semantic versioning, manage artifacts in centralized registries, and integrate vulnerability scanning .
  • 5. Leadership & Collaboration

  • Technical Direction : Lead design reviews and drive platform roadmaps that balance reliability, cost, and developer productivity.
  • Cross-Functional Partnership : Partner with product, security, and application teams to translate business needs into robust platform capabilities.
  • Requirements

  • Experience : Infrastructure, DevOps, or SRE roles, with primary ownership of production systems in AWS and Bare Metal Kubernetes .
  • Technical Arsenal : Expert fluency in Terraform , Linux / Bash or Python scripting, and GitHub Actions , and ArgoCD
  • Bare Metal & K8s : Proven experience operating Kubernetes in production, including hybrid setups (EKS + On-Prem). You understand networking (CNI, BGP), storage (CSI), and cluster lifecycle management.
  • Observability Depth : You have moved beyond "out-of-the-box" dashboards. You understand high-cardinality metrics, log retention strategies, and how to debug distributed systems.
  • Platform Mindset : You don't just build servers; you build products for developers.
  • Bonus

  • Experience with OpenTelemetry (OTEL) for unified tracing.
  • Understanding of eBPF
  • Experience configuring NVIDIA DCGM for GPU monitoring and handling AI training / inference workloads.
  • Aldea is proud to be an equal-opportunity employer. We are committed to building a diverse and inclusive culture that celebrates authenticity to win as one. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, citizenship or immigration status, or any other legally protected characteristics.

    Aldea uses E-Verify to confirm employment eligibility in compliance with federal law. For more information please visit : https : / / www.e-verify.gov .

    Please note : We do not accept unsolicited resumes from recruiters or employment agencies and will not be responsible for any fees related to unsolicited resumes.

    PI3d93cf01e1bb-30511-39154745

    [job_alerts.create_a_job]

    Cloud Infrastructure Engineer • San Francisco, California, United States, 94102

    [internal_linking.similar_jobs]
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Braintrust • San Francisco, CA, United States
    [job_card.full_time]
    Braintrust is building the modern platform for evaluating and deploying AI systems.Our mission is to help enterprises build trust in their AI by making it easy to test, monitor, and improve models ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Founding Cloud Infrastructure Engineer

    Founding Cloud Infrastructure Engineer

    Thunder Compute • San Francisco, CA, United States
    [job_card.full_time]
    Founding Cloud Infrastructure Engineer.Build our cloud infrastructure.You will work on a high-stakes, production system, where stability and maintainability are key. Directly interacting with custom...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Founding Cloud Infrastructure Engineer

    Founding Cloud Infrastructure Engineer

    zaimler • San Mateo, CA, US
    [job_card.full_time]
    We’re creating the foundation for AI systems that don’t just generate, but retrieve, link, and reason over enterprise knowledge. In just over a year, we’ve begun partnering with Fo...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    Taskrabbit • San Francisco, CA, US
    [job_card.part_time]
    Taskrabbit is a marketplace platform that conveniently connects people with Taskers to handle everyday home to-do's, such as furniture assembly, handyman work, moving help, and much more.At Tas...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Cloud Engineer (AWS)

    Cloud Engineer (AWS)

    Contact Government Services, LLC • San Francisco, CA, US
    [job_card.full_time]
    Employment Type : Full-Time, Experienced .Department : Information technology .We are seeking a Cloud Engineer (AWS) who will be responsible for supporting the development of a...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Lead Cloud Engineer

    Lead Cloud Engineer

    Mill • San Bruno, CA, US
    [job_card.full_time]
    Mill is all about answering a simple question : how can we prevent waste? Less waste can save time, money, energy, maybe even our planet. And there's no better place to start than food.Food waste...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    AngelList • San Francisco, CA, US
    [job_card.full_time]
    We exist to accelerate innovation.We do this by giving more people the opportunity to participate in the venture economy by building the financial infrastructure that makes it possible for more peo...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Platform Engineer

    Senior Platform Engineer

    OnHires • San Francisco, CA, US
    [job_card.full_time]
    We are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications.Our infrastructure leverages state-of-the-art technologies to support real-time trading whi...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Cloud Infra Platform Engineer — Terraform & Reliability

    Cloud Infra Platform Engineer — Terraform & Reliability

    Fieldguide • San Francisco, CA, United States
    [job_card.full_time]
    A tech startup based in San Francisco is seeking an Infrastructure Platform Engineer to design and maintain cloud infrastructure. You will work closely with product developers, implement Infrastruct...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Crusoe • San Francisco, CA, United States
    [job_card.full_time]
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Forhyre • San Francisco, CA, US
    [job_card.full_time]
    Do you enjoy solving technical issues, empathize with customer user experiences and want to keep up with the latest tech? We are looking for a Cloud Infrastructure Engineer that will work with tale...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Cloud Engineer (Network)

    Senior Cloud Engineer (Network)

    Planet Labs PBC • San Francisco, CA, United States
    [job_card.full_time]
    We believe in using space to help life on Earth.Planet designs, builds, and operates the largest constellation of imaging satellites in history. This constellation delivers an unprecedented dataset ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Infrastructure Engineer - San Francisco or Bellevue

    Senior Infrastructure Engineer - San Francisco or Bellevue

    Aircall • San Francisco, CA, US
    [job_card.full_time]
    Aircall is a unicorn AI-powered customer communications platform used by 22,000+ companies worldwide to drive revenue, faster resolutions, and scale. We’re redefining what a customer communica...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Infrastructure Engineer (Hybrid Cloud & Platform)

    Infrastructure Engineer (Hybrid Cloud & Platform)

    Aldea • San Francisco, CA, United States
    [job_card.full_time]
    Infrastructure Engineer (Hybrid Cloud & Platform).Aldea is a multi‑modal foundational AI company reimagining the scaling laws of intelligence. We believe today's architectures create unnecessary bot...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Cloud Engineer

    Cloud Engineer

    Vertex Sigma Software • Foster City, CA, US
    [job_card.full_time]
    We are seeking a Cloud Engineer to join our team! The ideal candidate will be responsible for designing, developing, and maintaining software applications using Terraform, Python and AWS technologi...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Lead Security Engineer, Cloud Infrastructure

    Lead Security Engineer, Cloud Infrastructure

    Klaviyo • San Francisco, CA, US
    [job_card.full_time]
    At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair sh...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Principal Infrastructure Engineer

    Principal Infrastructure Engineer

    Center for Elders' Independence • Oakland, CA, US
    [job_card.full_time]
    The Center for Elders’ Independence.PACE (Program of All-Inclusive Care for the elderly) organization (PO) that uses an interdisciplinary team approach for care planning and implementing purp...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Infrastructure Engineer (Hybrid Cloud & Platform)

    Infrastructure Engineer (Hybrid Cloud & Platform)

    Aldea Inc • San Francisco, CA, United States
    [job_card.full_time]
    Location : US Remote / Bay Area.Aldea is a multi-modal foundational AI company reimagining the scaling laws of intelligence. We believe today's architectures create unnecessary bottlenecks for the ev...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]