Machine Learning Engineer, LLM Fine-Tuning
We are actively hiring for a Machine Learning Engineer focused on LLM fine-tuning for Verilog / RTL applications.
Location : San Jose, CA (Onsite)
Skills : LLM fine-tuning, Verilog / RTL, AWS, Bedrock, SageMaker
Responsibilities
Own the technical roadmap for Verilog / RTL-focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.
Lead a hands-on team of applied scientists / engineers : set direction, unblock technically, review designs / code, and raise the bar on experimentation velocity and reliability.
Fine-tune and customize models using state-of-the-art techniques (LoRA / QLoRA, PEFT, instruction tuning, preference optimization / RLAIF) with robust HDL-specific evals :
Compile- / lint- / simulate-based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and "does-it-synthesize" checks.
Design privacy-first ML pipelines on AWS :
Training / customization and hosting using Amazon Bedrock and SageMaker (or EKS + KServe / Triton / DJL) for bespoke training needs.
Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least-privilege, CloudTrail auditing, and Secrets Manager for credentials.
Enforce encryption in transit / at rest, data minimization, no public egress for customer / RTL corpora.
Stand up dependable model serving : Bedrock model invocation where it fits, and / or low-latency self-hosted inference (vLLM / TensorRT-LLM), autoscaling, and canary / blue-green rollouts.
Build an evaluation culture : automatic regression suites that run HDL compilers / simulators, measure behavioral fidelity, and detect hallucinations / constraint violations; model cards and experiment tracking (MLflow / Weights & Biases).
Partner deeply with hardware design, CAD / EDA, Security, and Legal to source / prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
Drive productization : integrate LLMs with internal developer tools (IDEs / plug-ins, code review bots, CI), retrieval (RAG) over internal HDL repos / specs, and safe tool-use / function-calling.
Mentor & uplevel : coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure-by-default systems.
Qualifications
10+ years total engineering experience with 5+ years in ML / AI or large-scale distributed systems; 3+ years working directly with transformers / LLMs.
Proven track record shipping LLM-powered features in production and leading ambiguous, cross-functional initiatives at Staff level.
Deep hands-on skill with PyTorch, Hugging Face Transformers / PEFT / TRL, distributed training (DeepSpeed / FSDP), quantization-aware fine-tuning (LoRA / QLoRA), and constrained / grammar-guided decoding.
AWS expertise to design and defend secure enterprise deployments : Bedrock, SageMaker, S3, EC2 / EKS / ECR, VPC / Subnets / Security Groups, IAM, KMS, PrivateLink, CloudWatch / CloudTrail, Step Functions, Batch, Secrets Manager.
Strong software engineering fundamentals : testing, CI / CD, observability, performance tuning; Python a must (bonus for Go / Java / C++).
Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.
Seniority Level
Mid-Senior level
Employment Type
Full-time
Job Function
Engineering and Information Technology
Industries
IT Services and IT Consulting
J-18808-Ljbffr
Machine Learning Engineer • San Jose, CA, US