Talent.com
Senior Software Engineer, Profiling Services
Senior Software Engineer, Profiling ServicesNvidia Corporation • Santa Clara, CA, United States
[error_messages.no_longer_accepting]
Senior Software Engineer, Profiling Services

Senior Software Engineer, Profiling Services

Nvidia Corporation • Santa Clara, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Overview

Are you ready to innovate GPU performance analysis for Machine Learning workloads?! Join our Developer Tools Always-On Profiling (AON) team as a Senior Software Architect, where you'll be pivotal in designing, implementing, and leading our Always-On Profiling service. This role demands deep technical expertise, a proven track record to solve ambiguous challenges, and strong technical leadership skills.

Responsibilities

  • Architect and Build Scalable Systems : Drive the design and implementation of the AON profiling service's core systems. Master inter-process communication (IPC), memory management, and low-overhead architectures to handle profiling data from complex multi-node, multi-process, multi-GPU, and cluster environments.
  • Elevate Software Engineering Excellence : Promote high standards in software development, including design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems. Commit to code quality and robust testing to ensure a reliable profiling service.
  • Lead, Mentor, and Innovate : Guide and mentor engineers, provide impactful code reviews, and shape technical roadmaps. Proactively identify complex technical issues within the AON project, break them down, and craft innovative solutions. Problem-solving prowess is crucial for AON's success with ML workloads.
  • Architect and Build High-Performance Platforms : Transform user needs into clear requirements and design documents. Explore diverse approaches to problems, make well-reasoned recommendations, and lead end-to-end feature development—from planning and prototyping to implementation, testing, and customer evaluation. Hands-on development across user applications, drivers, performance counter libraries, and lower-level platform / hardware abstraction layers.
  • Collaborate Across Boundaries : Partner effectively with diverse internal and external teams. Exceptional communication and collaboration skills are key to integrating AON seamlessly into the broader profiling and ML ecosystem.

Qualifications

  • BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related degree.
  • 6+ years of meaningful software development experience in C, C++, and Python.
  • 6+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and delivering production-quality software.
  • Strong interpersonal, verbal, and written communication, demonstrating the ability to build cross-organizational partnerships and lead technical teams through complex challenges.
  • Profiling & Performance Tools Expert : Extensive knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU / GPU events, performance counters, API traces, event correlation). Familiarity with existing profiling ecosystems and their limitations is a plus.
  • GPU & CUDA Proficiency : In-depth knowledge of CUDA APIs, runtime, streams, kernels, and GPU architecture.
  • ML Ecosystem & Performance Analysis : Familiarity with ML frameworks such as PyTorch and JAX, and knowledge of performance analysis for AI training / inference applications.
  • Large-Scale System Development & Debugging : Experience developing and debugging across complex multi-layered software systems, including user mode and kernel drivers, with a proven ability to contribute to and extend substantial codebases (100s of millions of lines).
  • Proficiency in Designing APIs and Interfaces for Profiling Tools : Designs robust, flexible APIs and interfaces enabling seamless integration of profiling tools with various frameworks and custom code.
  • Mastery of Problem Simplification : A history of breaking down ill-defined problems in complex technical domains, designing effective solutions, and leading teams to implement them.
  • Ways to Stand Out

  • Pioneering Low-Overhead Profiling Systems : A track record of designing and implementing profiling systems with minimal performance impact on target workloads, especially in complex multi-process and distributed environments.
  • Deep Understanding of PyTorch Internals & CUDA Usage : A comprehensive grasp of how PyTorch uses CUDA, including tensor memory, operations, and distributed training functionalities.
  • GPU Performance Analysis & Optimization Acuity : The ability to analyze profiling data and translate it into concrete, actionable insights, particularly within CUDA and ML Frameworks like PyTorch.
  • Translating Customer Needs : Skilled at redefining customer requests into actionable use cases and requirements.
  • Strong understanding of system security principles.
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

    You will also be eligible for equity and benefits.

    Applications for this job will be accepted at least until November 10, 2025.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    #J-18808-Ljbffr

    [job_alerts.create_a_job]

    Senior Software Engineer • Santa Clara, CA, United States

    [internal_linking.related_jobs]
    Senior Software Engineer - BGP Routing

    Senior Software Engineer - BGP Routing

    Cisco Systems, Inc. • Milpitas, CA, United States
    [job_card.full_time]
    Renowned for being the best in the industry, our BGP Routing Team is part of the Distributed Systems Engineering group, where they focus on the design, development, coding, and testing of routing s...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer, Profiling Services

    Senior Software Engineer, Profiling Services

    NVIDIA • Santa Clara, CA, United States
    [job_card.full_time]
    Are you ready to innovate GPU performance analysis for Machine Learning workloads? Join our Developer Tools Always-On Profiling (AON) team as a Senior Software Architect, where you'll be pivotal in...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer, Observability

    Senior Software Engineer, Observability

    Expedia Group • San Jose, CA, United States
    [job_card.full_time]
    Get AI-powered advice on this job and more exclusive features.This range is provided by Expedia Group.Your actual pay will be based on your skills and experience — talk with your recruiter to learn...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Senior Software Engineer, AI for Quantum

    Senior Software Engineer, AI for Quantum

    PSI Quantum • Palo Alto, CA, US
    [job_card.full_time]
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer

    Senior Software Engineer

    Symphony Industrial AI, Inc. • Palo Alto, California, United States
    [job_card.full_time]
    SymphonyAI is the leading enterprise AI SaaS company, providing productized, packaged AI application suites for high-value use cases in key verticals. Our innovative offerings—built on the advanced ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Senior Software Engineer

    Senior Software Engineer

    Aurora Innovation • Mountain View, California, United States
    [job_card.full_time]
    Aurora hires talented people with diverse backgrounds who are excited about building the future of transportation that will make our roads safer, get crucial goods where they need to go, and make m...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer

    Senior Software Engineer

    Navan • Palo Alto, California, United States
    [job_card.full_time]
    We believe "It’s all about the user.We’re passionate about providing a seamless one-stop experience for travelers, no matter how they travel, where they stay, or where they’re going.As a Senior ba...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer, Control & Calibration

    Senior Software Engineer, Control & Calibration

    PsiQuantum • Palo Alto, CA, United States
    [job_card.full_time]
    PsiQuantum'smission is to build the first useful quantum computers-machines capable of delivering the breakthroughs the field has long promised. Since our founding in 2016, our singular focus has be...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Development Engineer

    Senior Software Development Engineer

    Fortinet • Sunnyvale, CA, United States
    [job_card.full_time]
    Serve as a key resource on project teams to design and implement cloud solutions.Research, investigate, and define new areas of technology to enhance existing FortiWeb product or new product direct...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer

    Senior Software Engineer

    Signify Technology • Palo Alto, CA, US
    [job_card.full_time]
    Onsite, Palo Alto, CA (5 days per week).A fast-growing startup at the crossroads of.Their mission centers on responsible innovation, developing AI products that are not only powerful but trustworth...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Honey Health • Fremont, CA, US
    [job_card.full_time]
    Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patients data, processing orders and prescr...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer - Fullstack

    Senior Software Engineer - Fullstack

    Databricks Inc. • Mountain View, CA, United States
    [job_card.full_time]
    At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer, Observability

    Senior Software Engineer, Observability

    Expedia, Inc. • San Jose, CA, United States
    [job_card.full_time]
    Expedia Group brands power global travel for everyone, everywhere.We design cutting‑edge tech to make travel smoother and more memorable, and we create groundbreaking solutions for our partners.Our...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer - Wallet Core Palo Alto, California, United States

    Senior Software Engineer - Wallet Core Palo Alto, California, United States

    BitGo Inc. • Palo Alto, CA, US
    [job_card.full_time]
    Overview BitGo is the leading infrastructure provider of digital asset solutions, delivering custody, wallets, staking, trading, financing, and settlement services from regulated cold storage.Sinc...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer (Hayward)

    Senior Software Engineer (Hayward)

    Key Talent Solutions • Hayward, CA, US
    [job_card.full_time] +1
    Senior Software Engineer (Full-Stack).San Francisco (Hybrid, 2 days / week).Our client is a Series A start up, growing quickly as they onboard new customers and scale platform, using AI to remake the...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software Engineer - Core Services

    Senior Software Engineer - Core Services

    Djangojobs • Palo Alto, CA, US
    [job_card.full_time]
    Senior Software Engineer - Core Services at Rivian May 09, 2020.Job Title Senior Software Engineer - Core Services.Job Description Rivian's Digital Commerce Team is responsible for the end-to-end ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Software QA Engineer

    Senior Software QA Engineer

    ChargePoint • Campbell, CA, United States
    [job_card.full_time]
    With electric vehicles expected to be nearly 30% of new vehicle sales by 2025 and more than 50% by 2040, electric mobility is becoming a reality. ChargePoint (NYSE : CHPT) is at the center of this re...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Software Engineer- Backend

    Senior Software Engineer- Backend

    Flow • Palo Alto, California, US
    [job_card.full_time]
    Senior Software Engineer- Backend.Technology – Flow Engineering / Salaried, full-time / On-site.At Flow, we're on a mission to enhance living experiences across communities by lever...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]