• Senior AI and HPC

    NVIDIA (Santa Clara, CA)
    AI / HPC Infrastructure team and lead the design of groundbreaking GPU compute clusters for demanding AI , HPC , and compute-intensive workloads. In this ... role, you'll lead and operate substantial AI GPU Clusters at unprecedented scale. You'll...fast, distributed storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
    NVIDIA (11/08/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Cluster…

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... of ground breaking GPU compute clusters that run demanding deep learning, high performance computing,...Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Working knowledge of cluster… more
    NVIDIA (11/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Storage…

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... changes and/or completely new approaches for our GPU Compute Clusters fast storage. As an expert, you will help...of distributed storage services. + Design, implement an on-prem AI / HPC infrastructure supplemented with cloud computing to… more
    NVIDIA (11/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer…

    NVIDIA (Santa Clara, CA)
    …large scale automation solutions. You will also be maintaining and building deep learning AI - HPC GPU clusters at scale and supporting our researchers to ... diverse team today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...the design and implementation of ground breaking GPU compute clusters that powers all AI research across… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Product Architect, HPC

    NVIDIA (Santa Clara, CA)
    …infrastructure expertise to create reference designs for the world's most powerful AI clusters . As an AI / HPC Product Architect at NVIDIA, you'll be the ... experience with benchmarking systems and analyzing performance bottlenecks in large-scale AI / HPC infrastructure + Exceptional communication skills, with the… more
    NVIDIA (10/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solution Architect, HPC

    NVIDIA (Santa Clara, CA)
    …infrastructure for customers + Support operational and reliability aspects of large-scale AI clusters , focusing on performance at scale, training stability, ... be part of the team that brings Artificial Intelligence ( AI ) emerging technology to the field? We are looking...a hardworking Solution Architect (SA) to join the NVIDIA AI Enterprise (NVAIE) SA Segment Team. The mission of… more
    NVIDIA (10/10/24)
    - Save Job - Related Jobs - Block Source
  • Senior SRE Engineering Leader - AI

    NVIDIA (Santa Clara, CA)
    …journey as Senior SRE Engineering Leader. Lead our globally distributed clusters , ensuring seamless operations and delivering AI services that drive ... What you'll be doing: + Manage distributed, multi-location GPU clusters for AI research. + Lead a...Go). + Expertise in managing large-scale distributed systems and AI / HPC environments. + Leadership experience, mentoring, and… more
    NVIDIA (10/08/24)
    - Save Job - Related Jobs - Block Source
  • Principal Observability Architect, AI

    NVIDIA (Santa Clara, CA)
    …leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and guide implementation ... NVIDIA's Hardware Infrastructure organization is seeking a Senior or Princip al Data and Observability Architect....visualization to spectacularly improve efficiency, performance, and productivity of AI and HPC workloads. You will lead… more
    NVIDIA (11/02/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …like NCCL, NVSHMEM, and UCX that are crucial for scaling Deep Learning and HPC . We're seeking a Senior Software Architect to help co-design next-gen data ... + Design and implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative...+ Use simulation to explore performance of large GPU clusters (think scales of 100s of 1000s of GPUs)… more
    NVIDIA (08/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - HPC

    NVIDIA (Santa Clara, CA)
    …doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and innovative technologies as the landscape evolves ... parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA...from the crowd: + Prior experience building solutions for HPC clusters based on Slurm or Kubernetes… more
    NVIDIA (10/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Architect

    NVIDIA (Santa Clara, CA)
    …team today! We are looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of large-scale ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...GPU compute clusters . Be a key player to enable the most… more
    NVIDIA (08/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Developer, HPC

    NVIDIA (Santa Clara, CA)
    …fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU ... Cluster Manager is used to power thousands of Linux clusters around the world, varying from a few nodes...a few nodes to several thousands of nodes. Bright clusters can run on-premises, completely in the cloud, or… more
    NVIDIA (10/15/24)
    - Save Job - Related Jobs - Block Source
  • Senior Technical Program Manager - GPU…

    NVIDIA (Santa Clara, CA)
    …externally with senior management and partner teams to scale the clusters operations charter. They will develop and standardize planning, reporting and execution ... Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy...large scale distributed computing + Experience managing large scale HPC and/or AI Infrastructure deployments that stretch… more
    NVIDIA (11/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior System Software Engineer, NCCL…

    NVIDIA (Santa Clara, CA)
    …the crowd: + Experience conducting performance benchmarking and developing infrastructure on HPC clusters . Prior system administration experience, esp for large ... guide our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with...Experience working with engineering or academic research community supporting HPC or AI + Practical experience with… more
    NVIDIA (10/22/24)
    - Save Job - Related Jobs - Block Source
  • Senior GPU Supercomputer Scheduler Engineer

    NVIDIA (Santa Clara, CA)
    …graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a "learning machine" that ... intelligence. Join us today! As a member of the GPU/ HPC Infrastructure team, you will provide leadership in the...in the design and implementation of groundbreaking GPU compute clusters that run demanding deep learning, high performance computing,… more
    NVIDIA (11/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NPN

    NVIDIA (Santa Clara, CA)
    …Science, or a related field or equivalent experience. + Established track record working with AI and HPC clusters , both on-premises and cloud based. + 12+ ... with experience in designing, building, and maintaining large scale HPC and AI hybrid computing solutions to...Kubernetes, InfiniBand, Ethernet, or other areas related to high-performance clusters and hybrid cloud solutions. + Exhibit hands on… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior MLOps Engineer, GenAI Framework

    NVIDIA (Santa Clara, CA)
    …(PyTorch, JAX, Tensorflow) + Cluster/cloud technologies, eg: SLURM, Lustre, k8s + Experience with HPC hardware systems such as compute clusters and HPC ... NVIDIA is looking for a dedicated and motivated senior build and continuous integration (CI/CD) engineer for...on Large Language Models (LLM), Multimodal (MM), and Speech AI . NeMo provides end-to-end model training, including data curation,… more
    NVIDIA (10/08/24)
    - Save Job - Related Jobs - Block Source
  • Senior Technical Program Manager…

    NVIDIA (Santa Clara, CA)
    …span across multiple teams and engineers (100+) + Experience managing large scale HPC and/or AI Infrastructure deployments that stretch across hardware and ... Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy...capacity forecasting, planning, allocation and management across our internal clusters . The GPU infrastructure we build and operate enables… more
    NVIDIA (11/10/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer, Omniverse…

    NVIDIA (Santa Clara, CA)
    …rendering technologies into existing software tools and simulation workflows for building AI systems. Come join the Omniverse Cloud team, serving as a foundational ... and SaaS offering! We are seeking a highly motivated Senior Site Reliability Engineer to join our Omniverse Infrastructure...will architect solutions to run our ever-growing number of clusters around the world to ensure Omniverse Cloud's mission… more
    NVIDIA (11/04/24)
    - Save Job - Related Jobs - Block Source
  • Senior MLOps Engineer, Deep Learning…

    NVIDIA (Santa Clara, CA)
    …with large-scale distributed computing systems and cloud platforms. + Experience with HPC based compute clusters and scheduling solutions like Slurm The ... and releasing NVIDIA Deep Learning Frameworks on the most powerful, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS. Are you ready for this… more
    NVIDIA (09/12/24)
    - Save Job - Related Jobs - Block Source