Talent.com
Principal Software Engineer, Managed AI
Principal Software Engineer, Managed AICrusoe • Sunnyvale, CA, US
Principal Software Engineer, Managed AI

Principal Software Engineer, Managed AI

Crusoe • Sunnyvale, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About This Role :

As a Principal Software Engineer on the Managed AI team at Crusoe, you'll have a pivotal role in shaping the architecture and scalability of our next-generation AI inference platform. You will lead the design and implementation of core systems for our AI services, including resilient fault-tolerant queues, model catalogs, and scheduling mechanisms optimized for cost and performance. This role gives you the opportunity to build and scale infrastructure capable of handling millions of API requests per second across thousands of customers.

From day one, you'll own critical subsystems for managed AI inference, helping to serve large language models (LLMs) to a global audience. As part of a dynamic, fast-growing team, you’ll collaborate cross-functionally, influence the long-term vision of the platform, and contribute to cutting-edge AI technologies. This is a unique opportunity to build a high-performance AI product that will be central to Crusoe's business growth.

What You’ll Be Working On :

Design and Development :

Lead the design and implementation of core AI services, including :

Resilient fault-tolerant queues for efficient task distribution.

Model catalogs for managing and versioning AI models.

Scheduling mechanisms optimized for cost and performance.

High-performance APIs for serving AI models to customers.

Scalability and Performance :

Build and scale infrastructure to handle millions of API requests per second.

Optimize AI inference performance on GPU-based systems.

Implement robust monitoring and alerting to ensure system health and availability.

Collaboration and Innovation :

Collaborate closely with product management, business strategy, and other engineering teams.

Influence the long-term vision and architectural decisions of the AI platform.

Contribute to open-source AI frameworks and participate in the AI community.

Prototype and iterate on new features and technologies.

What You’ll Bring to the Team :

Strong Engineering Fundamentals :

Advanced degree in Computer Science, Engineering, or a related field.

Demonstrable experience in distributed systems design and implementation.

Proven track record of delivering early-stage projects under tight deadlines.

Expertise in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc.

AI / ML Expertise :

Experience in Generative AI (Large Language Models, Multimodal).

Familiarity with AI infrastructure, including training, inference, and ETL pipelines.

Software Engineering Skills :

Experience with container runtimes (e.g., Kubernetes) and microservices architectures.

Experience using REST APIs and common communication protocols, such as gRPC.

Demonstrated experience in the software development cycle and familiarity with CI / CD tools.

Preferred Qualifications :

Proficiency in Golang or Python for large-scale, production-level services.

Contributions to open-source AI projects such as VLLM or similar frameworks.

Performance optimizations on GPU systems and inference frameworks.

Personal Attributes :

Proactive and collaborative approach with the ability to work autonomously.

Strong communication and interpersonal skills.

Passion for building cutting-edge AI products and solving challenging technical problems.

Benefits :

Industry competitive pay

Restricted Stock Units in a fast growing, well-funded technology company

Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

Employer contributions to HSA accounts

Paid Parental Leave

Paid life insurance, short-term and long-term disability

Teladoc

401(k) with a 100% match up to 4% of salary

Generous paid time off and holiday schedule

Cell phone reimbursement

Tuition reimbursement

Subscription to the Calm app

MetLife Legal

Company paid Commuter FSA benefit of $300 per month

Compensation :

Compensation will be paid in the range of $256,000 - $320,000 a year + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex / gender, sexual preference / orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

[job_alerts.create_a_job]

Principal Software Engineer • Sunnyvale, CA, US

[internal_linking.similar_jobs]
AI / ML Principal Engineer

AI / ML Principal Engineer

Cisco Systems, Inc. • San Jose, CA, United States
[job_card.full_time]
The application window is expected to close on : January 5, 2025.NOTE : Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.Outshift by...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Engineer - High-Performance AI Infrastructure

Principal Engineer - High-Performance AI Infrastructure

Diversity Talent Scouts • San Jose, CA, US
[job_card.full_time]
Principal Engineer for HPC and AI Infrastructure.GPU utilization across large, mission-critical workloads.Working within our GPU Runtime & Systems team, you’ll focus on.GPU clusters deliv...[show_more]
[last_updated.last_updated_30] • [promoted]
Principal Generative AI Engineer

Principal Generative AI Engineer

SAP SE • Palo Alto, CA, United States
[job_card.full_time]
At SAP, we keep it simple : you bring your best to us, and we'll bring out the best in you.We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to h...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
AI Solution Manager

AI Solution Manager

Supermicro • San Jose, CA, United States
[job_card.full_time]
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Software Engineer (AI)

Principal Software Engineer (AI)

Palo Alto Networks • Santa Clara, CA, US
[job_card.full_time]
At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer a...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Principal Engineer - AI Infrastructure Abstractions

Principal Engineer - AI Infrastructure Abstractions

Diversity Talent Scouts • San Jose, CA, US
[job_card.full_time]
Principal AI Infrastructure Abstraction Engineer.AI compute environments scalable, secure, and developer-friendly.Your work will focus on creating abstractions that hide hardware complexity while p...[show_more]
[last_updated.last_updated_30] • [promoted]
Generative AI - ML System Engineering

Generative AI - ML System Engineering

Meshy • Sunnyvale, CA, US
[job_card.full_time]
We are looking for Machine Learning Systems Engineers who can help us build the world's largest end-to-end 3D native machine learning systems. You will help us build our end to end ML framework ...[show_more]
[last_updated.last_updated_30] • [promoted]
Principal Software Engineer

Principal Software Engineer

Supermicro • San Jose, CA, United States
[job_card.full_time]
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
[last_updated.last_updated_30] • [promoted]
ASIC Design Principal Engineer

ASIC Design Principal Engineer

nEye Systems • Santa Clara, CA, US
[job_card.full_time]
Eye’s MEMS-based silicon photonics optical circuit switches (OCS) eliminate critical bottlenecks in AI processing by enabling direct optical connections among thousands of GPUs and memory uni...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Software Engineer II - Elasticsearch - Query Engine, Database Internals

Principal Software Engineer II - Elasticsearch - Query Engine, Database Internals

Elastic • Mountain View, CA, United States
[job_card.full_time]
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people.The Elastic Search AI...[show_more]
[last_updated.last_updated_30] • [promoted]
Principal Engineer, System

Principal Engineer, System

Samsung Semiconductor • San Jose, CA, US
[job_card.full_time]
To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period.Advancing the World's Tec...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal, System Product Engineering

Principal, System Product Engineering

Sandisk • Milpitas, CA, US
[job_card.full_time]
Sandisk understands how people and businesses consume data and we relentlessly innovate to deliver solutions that enable today’s needs and tomorrow’s next big ideas.With a rich history ...[show_more]
[last_updated.last_updated_1_day] • [promoted]
Principal Software Engineer

Principal Software Engineer

Fortinet • Santa Clara, CA, United States
[job_card.full_time]
Design and implement platform-level systems.Build scalable, highly available, and cost-efficient services that support.Collaborate with Product and Engineering stakeholders to define functional spe...[show_more]
[last_updated.last_updated_30] • [promoted]
AI Project Manager 3

AI Project Manager 3

Boston • Palo Alto, CA, US
[job_card.full_time]
AI Project Manager Are you a results-driven, highly organized project management professional with a passion for Artificial Intelligence (AI)? We're looking for an AI Project Manager to oversee and...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Machine Learning Engineer

Principal Machine Learning Engineer

Cisco Systems, Inc. • San Jose, CA, United States
[job_card.full_time]
We are an agile team with a startup feel and a strong bias for action.We move fast, embrace failure as part of the process, and stay focused on solving real-world problems for defenders on the fron...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Elasticsearch - Principal Software Engineer II - Vector Search

Elasticsearch - Principal Software Engineer II - Vector Search

Elastic • Mountain View, CA, United States
[job_card.full_time]
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people.The Elastic Search AI...[show_more]
[last_updated.last_updated_30] • [promoted]
Principal Software Engineer - AI Systems

Principal Software Engineer - AI Systems

ODAIA • Sunnyvale, CA, United States
[job_card.full_time]
Design and implement large-scale, production-grade AI systems that integrate LLMs and Generative AI into real-world applications. Build frameworks that support Retrieval-Augmented Generation (RAG), ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal AI Engineer, Enterprise AI Platform

Principal AI Engineer, Enterprise AI Platform

Palo Alto Networks • Santa Clara, CA, United States
[job_card.full_time]
Principal AI Engineer, Enterprise AI Platform.At Palo Alto Networks® everything starts and ends with our mission : to be the cybersecurity partner of choice, protecting our digital way of life.We ar...[show_more]
[last_updated.last_updated_30] • [promoted]