A company is looking for a Distributed Systems Framework Engineer to build a robust framework for managing jobs across on-prem and cloud compute environments.
Key Responsibilities
Build a framework to manage jobs across on-prem and cloud compute
Implement job orchestration to allocate compute nodes, load LLMs, process queries, and deliver results
Design fault-tolerant execution with restart / recovery mechanisms
Required Qualifications
2-3 years of software engineering experience
Proficiency in Python
Experience with LLM inference libraries (vLLM, transformers, or nemotron)
Experience with Kubernetes and distributed container orchestration
Experience with AWS or GCP
System Engineer • Sterling Heights, Michigan, United States