Staff Machine Learning Engineer / Principal ML Engineer (San Jose)SRS Consulting Inc • San Jose, CA, United States

Staff Machine Learning Engineer / Principal ML Engineer (San Jose)

SRS Consulting Inc • San Jose, CA, United States

[job_card.variable_hours_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

Role : Staff Machine Learning Engineer

Location : San Jose, CA (Onsite) Locals

Duration : Long-term

Mode of Interview : Virtual & Final In-person

Why this role exists

We're building privacypreserving LLM capabilities that help hardware design teams reason over Verilog / SystemVerilog and RTL artifactscode generation, refactoring, lint explanation, constraint translation, and spectoRTL assistance. We're looking for a Stafflevel engineer to technically lead a small, highleverage team that finetunes and productizes LLMs for these workflows in a strict enterprise dataprivacy environment.

You don't need to be a Verilog / RTL expert to start; curiosity, drive, and deep LLM craftsmanship matter most. Any HDL / EDA fluency is a strong plus.

What you'll do (Responsibilities)

Own the technical roadmap for Verilog / RTLfocused LLM capabilitiesfrom model selection and adaptation to evaluation, deployment, and continuous improvement.

Lead a handson team of applied scientists / engineers : set direction, unblock technically, review designs / code, and raise the bar on experimentation velocity and reliability.

Finetune and customize models using stateoftheart techniques (LoRA / QLoRA, PEFT, instruction tuning, preference optimization / RLAIF) with robust HDLspecific evals :

o Compile / lint / simulatebased pass rates, pass@k for code generation, constrained decoding to enforce syntax, and doesitsynthesize checks.

Design privacyfirst ML pipelines on AWS :

o Training / customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe / Triton / DJL) for bespoke training needs.

o Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM leastprivilege, CloudTrail auditing, and Secrets Manager for credentials.

o Enforce encryption in transit / at rest, data minimization, no public egress for customer / RTL corpora.

Stand up dependable model serving : Bedrock model invocation where it fits, and / or lowlatency selfhosted inference (vLLM / TensorRTLLM), autoscaling, and canary / bluegreen rollouts.

Build an evaluation culture : automatic regression suites that run HDL compilers / simulators, measure behavioral fidelity, and detect hallucinations / constraint violations; model cards and experiment tracking (MLflow / Weights & Biases).

Partner deeply with hardware design, CAD / EDA, Security, and Legal to source / prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.

Drive productization : integrate LLMs with internal developer tools (IDEs / plugins, code review bots, CI), retrieval (RAG) over internal HDL repos / specs, and safe tooluse / functioncalling.

Mentor & uplevel : coach ICs on LLM best practices, reproducible training, critical paper reading, and building securebydefault systems.

What you'll bring (Minimum qualifications)

10+ years total engineering experience with 5+ years in ML / AI or largescale distributed systems; 3+ years working directly with transformers / LLMs.

Proven track record shipping LLMpowered features in production and leading ambiguous, crossfunctional initiatives at Staff level.

Deep handson skill with PyTorch, Hugging Face Transformers / PEFT / TRL, distributed training (DeepSpeed / FSDP), quantizationaware finetuning (LoRA / QLoRA), and constrained / grammarguided decoding.

AWS expertise to design and defend secure enterprise deployments, including :

o Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)

o SageMaker (Training, Inference, Pipelines), S3, EC2 / EKS / ECR, VPC / Subnets / Security Groups, IAM, KMS, PrivateLink, CloudWatch / CloudTrail, Step Functions, Batch, Secrets Manager.

Strong software engineering fundamentals : testing, CI / CD, observability, performance tuning; Python a must (bonus for Go / Java / C++).

Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.

Nice to have (Preferred qualifications)

Familiarity with Verilog / SystemVerilog / RTL workflows : lint, synthesis, timing closure, simulation, formal, test benches, and EDA tools (Synopsys / Cadence / Mentor).

Experience integrating static analysis / ASTaware tokenization for code models or grammarconstrained decoding.

RAG at scale over code / specs (vector stores, chunking strategies), tooluse / functioncalling for code transformation.

Inference optimization : TensorRTLLM, KVcache optimization, speculative decoding; throughput / latency tradeoffs at batch and token levels.

Model governance / safety in the enterprise : model cards, redteaming, secure eval data handling; exposure to SOC2 / ISO 27001 / NIST frameworks.

Data anonymization, DLP scanning, and code deidentification to protect IP.

What success looks like

90 days

Baseline an HDLaware eval harness that compiles / simulates; establish secure AWS training & serving environments (VPConly, KMSbacked, no public egress).

Ship an initial finetuned / customized model with measurable gains vs. base (e.g., +X% compilepass rate, Y% lint findings per K LOC generated).

180 days

Expand customization / training coverage (Bedrock for managed FMs including Anthropic; SageMaker / EKS for bespoke / open models).

Add constrained decoding + retrieval over internal design specs; productionize inference with SLOs (p95 latency, availability) and audited rollout to pilot hardware teams.

12 months

Demonstrably reduce review / iteration cycles for RTL tasks with clear metrics (defect reduction, timetolintclean, % autofix suggestions accepted), and a stable MLOps path for continuous improvement.

How we work (Security & privacy by design)

Customer and internal design data remain within private AWS VPCs; access via IAM roles and audited by CloudTrail; all artifacts encrypted with KMS.

No public internet calls for sensitive workloads; Bedrock access via VPC interface endpoints / PrivateLink with endpoint policies; SageMaker and / or EKS run in private subnets.

Data pipelines enforce minimization, tagging, retention windows, and reproducibility; DLP scanning and redaction are firstclass steps.

We produce model cards, data lineage, and evaluation artifacts for every release.

Tech you'll touch

Modeling : PyTorch, HF Transformers / PEFT / TRL, DeepSpeed / FSDP, vLLM, TensorRTLLM

AWS & MLOps : Amazon Bedrock (Anthropic and other FMs, Guardrails, Knowledge Bases, Runtime APIs), SageMaker (Training / Inference / Pipelines), MLflow / W&B, ECR, EKS / KServe / Triton, Step Functions

Platform / Security : S3 + KMS, IAM, VPC / PrivateLink (incl. Bedrock), CloudWatch / CloudTrail, Secrets Manager

Tooling (nice to have) : HDL toolchains for compile / simulate / lint, vector stores (pgvector / OpenSearch), GitHub / GitLab CI

[job_alerts.create_a_job]

Staff Machine Learning Engineer • San Jose, CA, United States

[internal_linking.related_jobs]

Staff Machine Learning Engineer (Applied ML)

EarnIn • Mountain View, CA, United States

[job_card.full_time]

Mountain View, US – Salary : $272,700 – $333,300 plus equity and benefits, hybrid in Mountain View.One of the first pioneers of earned wage access – building products that deliver real‑time financia...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Machine Learning Engineer

Cisco Systems, Inc. • San Jose, California, United States

[job_card.full_time]

Join the engineering team building theintelligent backbone of Splunk Observability Cloud.We are committed toleveragingthe latest advancements in data science and machine learning to unlock unpreced...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Machine Learning Engineer

Adobe Inc. • San Jose, CA, US

[job_card.full_time]

Overview Adobe Experience Intelligence Team is looking for a Staff Machine Learning Engineer who will apply AI and machine learning techniques to big-data problems to help Adobe better understand,...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Principal Machine Learning Engineer - ML Innovation

Apple Inc. • Cupertino, CA, United States

[job_card.full_time]

Cupertino, California, United States Machine Learning and AI.We are seeking an exceptional Principal Machine Learning (ML) Engineer / Researcher to join our premier ML innovation team at Apple.As a k...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Machine Learning Engineer (ML Platform)

EarnIn • Palo Alto, CA, United States

[job_card.full_time]

Get AI-powered advice on this job and more exclusive features.As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibi...[show_more]

[last_updated.last_updated_30] • [promoted]

Staff Machine Learning R&D Engineer

Visual Lease • Sunnyvale, CA, United States

[job_card.full_time]

Matterport is leading the digital transformation of the built world.Our groundbreaking spatial computing platform turns buildings into data making every space more valuable and accessible.Millions ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff ML Engineer — AI Platform, GPUs, Hybrid

ServiceNow, Inc. • Santa Clara, CA, United States

[job_card.full_time]

A leading enterprise technology company seeks a Staff Machine Learning Engineer in Santa Clara.This role requires a commitment to building advanced AI infrastructures and collaborating with cross-f...[show_more]

[last_updated.last_updated_1_day] • [promoted]

Staff Machine Learning Engineer

GEICO • Palo Alto, CA, United States

[job_card.full_time]

Staff Machine Learning Engineer • • • •Overview : • • •single • AI / Machine Learning team, responsible for the tech design and tech health of the team. You will build and architect scalable and reliable AIML...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Machine Learning Engineer

Cisco Systems • San Jose, California, United States

[job_card.full_time]

Meet the Team Join the engineering team building the intelligent backbone of Splunk Observability Cloud.We are committed to leveraging the latest advancements in data science and machine learning t...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff ML Engineer : Generative AI & Large-Scale Systems

Adobe Inc. • San Jose, California, United States

[job_card.full_time]

A leading software company is seeking a Staff Machine Learning Engineer to work on AI and machine learning solutions that enhance customer experience. The ideal candidate will have strong expertise ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Principal Machine Learning Engineer

ServiceNow, Inc. • Santa Clara, CA, United States

[job_card.full_time]

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff / Principal Machine Learning Engineer

Inworld AI • Mountain View, CA, United States

[job_card.full_time]

Staff / Principal Machine Learning Engineer.Staff / Principal Machine Learning Engineer.Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Inworld AI.A...[show_more]

[last_updated.last_updated_30] • [promoted]

AIML - Staff Machine Learning Engineer

Apple Inc. • Cupertino, CA, United States

[job_card.full_time]

Cupertino, California, United States Machine Learning and AI.The Apple Knowledge & Information (AKI) Entity Resolution team is looking for senior and staff engineers to lead software projects suffu...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Principal ML Engineer — GenAI & Large-Scale AI Systems

Walmart • Sunnyvale, California, United States

[job_card.full_time]

A large retail company in California is looking for a Principal Machine Learning Engineer to lead AI and machine learning projects. This role involves developing and deploying scalable solutions, co...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Machine Learning R&D Engineer

Matterport • Sunnyvale, CA, United States

[job_card.full_time]

[last_updated.last_updated_variable_days] • [promoted]

Sr. Staff Machine Learning Engineer, Closeup Relevance

Pinterest • Palo Alto, CA, United States

[job_card.full_time]

Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a mission to br...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Machine Learning Engineer / Principal ML Engineer (San Jose)

SRS Consulting Inc • San Jose, CA, US

[job_card.part_time]

Role : Staff Machine Learning Engineer.Location : San Jose, CA (Onsite) Locals.Mode of Interview : Virtual & Final In-person. We're building privacypreserving LLM capabilities that help hardware design...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Founding Machine Learning Engineer (San Jose)

Key Technology • San Jose, CA, US

[job_card.part_time]

Youll design, build, and ship ranking and recommendation systems that make every match feel more personal and improve week after week. Train and fine-tune LLMs / encoders.Collaborate across ML, platfo...[show_more]

[last_updated.last_updated_variable_days] • [promoted]