• Senior AI - HPC Cluster…

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... including developing scalable automation solutions + Build and maintain AI and ML heterogeneous clusters on-premises and...and operating large scale compute infrastructure + Experience with AI / HPC advanced job schedulers, such as Slurm,… more
    NVIDIA (03/11/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC AI Cluster…

    NVIDIA (Santa Clara, CA)
    …verification HPC / AI Infrastructure team. We are building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an ... outstanding architect for a senior HPC , be a key player to...be doing: + Designing, implementing and maintaining large scale HPC / AI clusters with monitoring, logging… more
    NVIDIA (01/27/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer…

    NVIDIA (Santa Clara, CA)
    …large scale automation solutions. You will also be maintaining and building deep learning AI - HPC GPU clusters at scale and supporting our researchers to ... diverse team today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...the design and implementation of ground breaking GPU compute clusters that powers all AI research across… more
    NVIDIA (12/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Observability Engineer, AI

    NVIDIA (Santa Clara, CA)
    … Observability Engineer to help architect and implement our distributed observability systems for AI and HPC clusters . We serve and collaborate directly with ... and research teams to deliver observability solutions that meet their needs in AI / HPC clusters . + Develop, test, and deploy data collectors, pipelines,… more
    NVIDIA (01/31/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Storage…

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... and implementation of distributed storage services. + Design, implement an on-prem AI / HPC infrastructure supplemented with cloud computing to support the growing… more
    NVIDIA (02/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior Observability Architect, AI

    NVIDIA (Santa Clara, CA)
    …leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and guide implementation ... NVIDIA's Hardware Infrastructure organization is seeking a Senior or Princip al Data and Observability Architect....visualization to spectacularly improve efficiency, performance, and productivity of AI and HPC workloads. You will lead… more
    NVIDIA (02/13/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC and AI Networking…

    NVIDIA (Santa Clara, CA)
    …to hear from you! NVIDIA is seeking a Senior High Performance Computing ( HPC ) and AI Networking Performance Research and Analysis Engineer to join our ... In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale ...AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training focused on… more
    NVIDIA (03/11/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …like NCCL, NVSHMEM, and UCX that are crucial for scaling Deep Learning and HPC . We're seeking a Senior Software Architect to help co-design next-gen data ... + Design and implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative...+ Use simulation to explore performance of large GPU clusters (think scales of 100s of 1000s of GPUs)… more
    NVIDIA (02/22/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - HPC

    NVIDIA (Santa Clara, CA)
    …doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and innovative technologies as the landscape evolves ... parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA...from the crowd: + Prior experience building solutions for HPC clusters based on Slurm or Kubernetes… more
    NVIDIA (03/06/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Software Development Engineer, HPC /ML…

    Amazon (Cupertino, CA)
    …on features for the largest clusters , with the largest customers, for the largest AI models. The org you would be joining is Annapurna Labs, an integral part of ... are seeking an experienced engineer to work on distributed AI /ML systems. This role involves working on collective operations...systems is valued, and experience with high-speed networking or HPC interconnects is valued highly. If you like solving… more
    Amazon (02/12/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Developer, HPC

    NVIDIA (Santa Clara, CA)
    …fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU ... Cluster Manager is used to power thousands of Linux clusters around the world, varying from a few nodes...a few nodes to several thousands of nodes. Bright clusters can run on-premises, completely in the cloud, or… more
    NVIDIA (01/14/25)
    - Save Job - Related Jobs - Block Source
  • Senior Technical Marketing Engineer…

    NVIDIA (Santa Clara, CA)
    …team and see how you can make a lasting impact on the world. As a Senior Technical Marketing Engineer for AI Infrastructure, you will join a dedicated team that ... doing: + Evaluate and run multi-node jobs on large clusters to assess performance and developer experience in distributed...of experience. + Proficiency in Python and C++ for AI and HPC applications. + Experience using… more
    NVIDIA (01/29/25)
    - Save Job - Related Jobs - Block Source
  • Senior System Software Engineer, NCCL…

    NVIDIA (Santa Clara, CA)
    …the crowd: + Experience conducting performance benchmarking and developing infrastructure on HPC clusters . Prior system administration experience, esp for large ... guide our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with...Experience working with engineering or academic research community supporting HPC or AI + Practical experience with… more
    NVIDIA (01/21/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer, Foundation Model…

    NVIDIA (Santa Clara, CA)
    …C++ for efficient system development. + Strong experience with large-scale GPU clusters , HPC environments, and job scheduling/orchestration tools (eg, SLURM, ... NVIDIA is searching for a senior or principal engineer who specializes in building...works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka… more
    NVIDIA (03/08/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer for Reinforcement…

    NVIDIA (Santa Clara, CA)
    …domain randomization, curriculum learning; + Strong experience with large-scale GPU clusters , HPC environments, and job scheduling/orchestration tools (eg, ... NVIDIA is searching for a senior or principal engineer who specializes in large-scale...works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka… more
    NVIDIA (03/08/25)
    - Save Job - Related Jobs - Block Source
  • Sr Staff Engineer, ML Infrastructure…

    LinkedIn (Mountain View, CA)
    …in LinkedIn's Sunnyvale, CA campus. About the Role We are seeking a Senior Staff Engineer to design, build, and maintain our large-scale GPU infrastructure for ... machine learning (ML) and AI workloads. In this role, you will be the...of experience designing and managing large-scale, distributed systems or HPC environments, with at least 3+ years focused on… more
    LinkedIn (03/04/25)
    - Save Job - Related Jobs - Block Source