Talent.com
Senior Resiliency and Safety Architect
Senior Resiliency and Safety ArchitectNVIDIA • Santa Clara, CA, US
[error_messages.no_longer_accepting]
Senior Resiliency and Safety Architect

Senior Resiliency and Safety Architect

NVIDIA • Santa Clara, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

We are now seeking a Senior Resiliency and Safety Architect!

NVIDIA is a learning machine that constantly evolves by seeking exciting opportunities that matter to the world, and that only we can solve. We attract the world's best people, so we can achieve our highest aim : building a company that lets us do our life's work, at the highest level of our craft. NVIDIA is looking for a Resiliency and Safety Architect to support the development of GPU (graphical processing units) and Tegra SoC hardware and software resiliency and safety features. In this role, you will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries. You will have the opportunity to impact the industry's leading GPUs and SoCs impacting product lines ranging from consumer graphics to self-driving cars and the growing field of artificial intelligence.

What you'll be doing :

Collaborate with the Software and Hardware teams to architect new safety and resiliency features and guide future development.

Optimize hardware & software features to improve system robustness, performance, and security.

Model and analyze RAS metrics like Failures in Time and Availability; and Safety metrics like Diagnostic Coverage and PMHF

Run simulations to analyze Architectural Vulnerability Factor and Liveness of on-die memory

Develop diagnostics software components for Resiliency and Safety to run on NVIDIA GPUs.

Participate in testing new and existing resiliency and safety hardware and software features.

Work on compliance of products with functional safety standards (ISO 26262 and ASPICE (Automotive SPICE)). This includes defining requirements, architecture, and design with end-to-end traceability, performing safety analyses - FMEA / DFA / FTA and ensuring compliance of software to MISRA and Cert-C standards.

What we need to see :

Master's or PhD degree in Computer Science, Computer Engineering, Electrical Engineering or closely related degree or equivalent experience.

At least 5+ years of relevant experience.

Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, direct memory access, etc.).

Proficiency in C / C++.

Scripting and automation with Python or similar.

Understanding of the software development process, from requirements to testing closure and maintenance.

Experience with resiliency and / or functional safety.

Excellent interpersonal skills and ability to collaborate with on-site and remote teams.

Strong debugging and analytical skills.

Be self-driven and results oriented.

Ways to stand out from the crowd :

Familiarity with general HW concepts, Verilog RTL coding and simulations / debug, GPU and SOC Architectures, and Machine Learning / Deep Learning concepts

Programming with CUDA

Experience in embedded software development.

NVIDIA's invention of the GPU 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as "the AI computing company". Do you love the challenge of crafting the highest-performance silicon possible? If so, we want to hear from you! Come, join our Accelerated and Resilient Compute Systems team and help build the real-time, cost-effective computing platform driving our success in this exciting and quickly growing field.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits

Applications for this job will be accepted at least until July 29, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

[job_alerts.create_a_job]

Architect Resiliency • Santa Clara, CA, US

[internal_linking.similar_jobs]
Senior Solutions Architect

Senior Solutions Architect

Workato • Palo Alto, CA, US
[job_card.full_time]
Workato transforms technology complexity into business opportunity.As the leader in enterprise orchestration, Workato helps businesses globally streamline operations by connecting data, processes, ...[show_more]
[last_updated.last_updated_30] • [promoted]
Safety, Intelligence, and Security Systems Architect Intuit

Safety, Intelligence, and Security Systems Architect Intuit

GeoPolist • Mountain View, CA, United States
[job_card.full_time]
The People and Places team is the steward of Intuit's greatest strategic asset — our People.Our mission is to power prosperity around the world, and we do that by maximizing the potential of our pe...[show_more]
[last_updated.last_updated_30] • [promoted]
Systems Architect

Systems Architect

Reliable Robotics • Mountain View, CA, United States
[job_card.permanent]
We're building safety-enhancing technology for aviation that will save lives.Automated aviation systems will enable a future where air transportation is safer, more convenient and fundamentally tra...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior DAC Architect and Lead

Senior DAC Architect and Lead

Omni Design Technologies • Milpitas, CA, US
[job_card.full_time]
DAC architect and development lead focusing on high-performance digital-to-analog converters.The successful candidate in this role will work with customers to understand requirements, and will lead...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior SIEM Architect – NGSIEM & LogScale (Remote)

Senior SIEM Architect – NGSIEM & LogScale (Remote)

Ccrps • Sunnyvale, CA, United States
[filters.remote]
[job_card.full_time]
A leading cybersecurity firm is seeking a Principal Resident Engineering Consultant to join their LogScale NGSIEM team.This remote-friendly position requires at least 10 years of experience in log ...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Wellness & Resilience Consultant

Senior Wellness & Resilience Consultant

Stanford Children's Health • Palo Alto, CA, United States
[job_card.full_time]
At Lucile Packard Children’s Hospital Stanford, we know world-renowned care begins with world-class caring.That's why we combine advanced technologies and breakthrough discoveries with family-cente...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior SIEM Architect – NGSIEM & LogScale (Remote)

Senior SIEM Architect – NGSIEM & LogScale (Remote)

CrowdStrike, Inc. • Sunnyvale, CA, United States
[filters.remote]
[job_card.full_time]
A leading cybersecurity firm is seeking a Principal Resident Engineering Consultant focused on Falcon LogScale.This fully remote-friendly role will involve guiding customers through complex SIEM de...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Sales Consultant

Sales Consultant

Santa Cruz VW CDJR • Santa Cruz, CA, US
[job_card.full_time]
The Vehicle Salesperson is directly responsible for selling used vehicles meeting gross profit, volume and customer satisfaction standards. Must present a professional appearance.Essential Duties &a...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Solution Architect

Senior Solution Architect

LotusFlare, Inc. • Santa Clara, CA, US
[job_card.full_time]
LotusFlare employees join and remain at LotusFlare for two simple reasons.First, they can see immediately that their work makes a positive impact on LotusFlare customers, and second, they grow on a...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Principal Solutions Architect - Observability

Principal Solutions Architect - Observability

Elastic • Mountain View, CA, United States
[job_card.full_time]
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people.The Elastic Search AI...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Product Manager - Observability and Resilience

Senior Product Manager - Observability and Resilience

NVIDIA Corporation • Santa Clara, CA, United States
[job_card.full_time]
NVIDIA has become the platform upon which every new AI-powered application is built.From healthcare research applications to autonomous vehicles, or voice-recognition systems, there is a need to si...[show_more]
[last_updated.last_updated_30] • [promoted]
Senior Director, Observability Solutions Architects (Remote)

Senior Director, Observability Solutions Architects (Remote)

Cisco Systems, Inc. • San Jose, CA, United States
[filters.remote]
[job_card.full_time]
The application window is expected to close on : December 23rd, 2025.Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.Location : Thi...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Senior Architect, Platform Resilience & Governance

Senior Architect, Platform Resilience & Governance

GEICO • Palo Alto, CA, United States
[job_card.full_time]
A leading insurance company is seeking an experienced Distinguished Engineer to drive their enterprise transformation.This role focuses on site reliability, risk management, and technical governanc...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
IRT eCOA Systems Manager

IRT eCOA Systems Manager

Jazz Pharmaceuticals • Fremont, California, USA
[job_card.full_time]
If you are a current Jazz employee please apply via the Internal Career site.Jazz Pharmaceuticals is a global biopharma company whose purpose is to innovate to transform the lives of patients and ...[show_more]
[last_updated.last_updated_variable_hours] • [promoted] • [new]
Health and Safety Professional

Health and Safety Professional

Western Digital • Fremont, CA, US
[job_card.full_time]
At Western Digital, our vision is to power global innovation and push the boundaries of technology to make what you thought was once impossible, possible. At our core, Western Digital is a company o...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Policy Fellow - Innovation (Emerging Technology & Regulation)

Policy Fellow - Innovation (Emerging Technology & Regulation)

Silicon Valley Leadership Group • San Jose, CA, US
[job_card.full_time]
SVLG Policy Fellow Innovation (Emerging Technology & Regulation).Help Guide the Rules of the Road for Tomorrows Technologies. Silicon Valley Leadership Group (SVLG).This fellowship will provide ...[show_more]
[last_updated.last_updated_30] • [promoted]
Resident Solutions Architect

Resident Solutions Architect

Menlo Ventures • Mountain View, CA, United States
[job_card.full_time]
As a Resident Solutions Architect (RSA) in our Professional Services team you will work with clients on short to medium term customer engagements on their big data challenges using the Databricks p...[show_more]
[last_updated.last_updated_variable_days] • [promoted]
Remote Investment Analyst – AI Trainer ($50-$60 / hour)

Remote Investment Analyst – AI Trainer ($50-$60 / hour)

Data Annotation • Santa Cruz, California
[filters.remote]
[job_card.full_time] +1
We are looking for a finance professional to join our team to train AI models.You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of ...[show_more]
[last_updated.last_updated_30] • [promoted]