A company is looking for an LLM Inference Deployment Engineer to optimize and deploy large language models for high-performance inference.
Key Responsibilities
Deploy and optimize LLMs post-training from libraries like HuggingFace
Utilize inference runtimes for efficient execution
Develop and maintain high-performance inference pipelines using Docker and Kubernetes
Required Qualifications
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field
Experience in LLM inference deployment and model optimization
Expertise in LLM inference frameworks such as PyTorch and ONNX Runtime
In-depth knowledge of Python for model integration and performance tuning
Experience with containerized AI deployments and LLM memory optimization strategies
Deployment Engineer • Charleston, South Carolina, United States