Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.
About the role
TensorWave is seeking an experienced Technical Program Manager to drive the execution of complex AI infrastructure initiatives that power cutting-edge machine learning workloads.
In this role, you'll be the connective tissue between engineering, product, and business teams, ensuring our AMD-powered AI platform delivers exceptional performance and reliability at scale.
You'll own the end-to-end program lifecycle for critical infrastructure projects, from initial scoping through deployment and iteration, working at the intersection of hardware optimization, distributed systems, and ML operations.
Responsibilities
- Lead cross-functional programs spanning hardware deployment, software infrastructure, and ML platform development, ensuring alignment across engineering, product, and operations teams
- Define program scope, objectives, and success metrics for AI infrastructure initiatives, from GPU cluster buildouts to inference optimization projects
- Drive cross-functional roadmap planning and prioritization, balancing immediate customer needs with long-term platform scalability
- Manage program timelines, dependencies, and resource allocation across multiple concurrent initiatives
- Translate complex technical tradeoffs into clear business implications for executive leadership and external partners
- Communicate program status, risks, and blockers through regular updates, maintaining transparency across the organization
- Identify and mitigate technical and operational risks before they impact delivery timelines or system performance
- Drive postmortems and retrospectives to capture learnings and continuously improve execution velocity
Required Experience
Bachelor's degree in Computer Science, Engineering, Information Systems, or a related technical field, or equivalent practical experience3+ years of technical program management experience in infrastructure, cloud platforms, or ML / AI systemsStrong technical background with ability to engage in architecture discussions around distributed systems, GPU computing, or ML frameworksProven track record of delivering complex, multi-quarter programs involving hardware and software componentsExperience managing cross-functional initiatives with engineering, product, and business stakeholdersExperience with Jira and implementing Jira workflows to match team processesExcellent written and verbal communication skills, with ability to tailor messaging for technical and non-technical audiencesPreferred Experience
Experience with AMD GPU architectures (ROCm, Instinct GPUs) or competitive platforms (NVIDIA CUDA, Google TPUs)Background in ML infrastructure, model training / inference pipelines, or MLOps platformsPrior experience at a high-growth startup or in a fast-paced infrastructure organizationHands-on technical experience as a software engineer or systems engineer earlier in careerFamiliarity with Kubernetes, distributed training frameworks (PyTorch, JAX), or AI workload orchestrationWhat We Bring
Mission driven companyCompetitive SalaryStock Options100% paid Medical, Dental, and Vision insuranceFlexible PTOPaid Holidays401(k)Parental LeaveFlexible Spending AccountShort Term Disability InsuranceLife and Voluntary Supplemental InsuranceMental Health Benefits through Spring HealthWe're looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.
Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.