• Senior GPU Cluster

    NVIDIA (Santa Clara, CA)
    …working with distributed system software architecture + Basic understanding of HPC GPU cluster , slurm + Basic understanding of Machine learning concepts and ... experience for customer as well as engineers supporting the cluster . Much of our software development focuses...running and instrumenting distributed LLM training on a multi gpu HPC cluster + Knowledge of LLM… more
    NVIDIA (08/13/24)
    - Save Job - Related Jobs - Block Source
  • Senior GPU Cluster Tools…

    NVIDIA (Santa Clara, CA)
    … team with high standards! This software engineering role involves developing tools for GPU Cluster users and admins. As a member of the software ... work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich...+ Build debugging tools for common encountered problems in GPU cluster + Work with our users… more
    NVIDIA (10/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior High Performance Computing…

    NVIDIA (Santa Clara, CA)
    …for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU -accelerated systems and provide architectural mentorship to product teams ... team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance… more
    NVIDIA (09/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Test Development…

    NVIDIA (Santa Clara, CA)
    We are looking for a highly experienced AI Senior Software Test development engineer in NVIDIA's Deep Learning SWQA team. The position is in NVIDIA Deep Learning ... to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech… more
    NVIDIA (09/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Test Development…

    NVIDIA (Santa Clara, CA)
    …to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech ... We are looking for a Software Test development engineer in NVIDIA's Deep Learning...improve test automation. + Experience in validating Data Center GPU based infrastructure (multi-GPUS, multi-nodes, cluster ). +… more
    NVIDIA (09/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer - AI…

    NVIDIA (Santa Clara, CA)
    …scenarios while working with internal & external partners + Building automation for AI-HPC GPU Cluster bring up and scaled up operation + Write and review ... logging and alerting. Additional responsibilities include: + Design and implement state-of-the-art GPU compute clusters + Optimize cluster operations for maximum… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Data…

    NVIDIA (Santa Clara, CA)
    software and firmware stack for these systems. We are looking for a Senior Software Architect who has deep expertise in designing server platforms and has ... We are building innovative server systems for GPU accelerated applications, such as Deep Learning. Data...customers. What you'll be doing: + You will lead software activities for NVIDIA's deep learning server platforms, from… more
    NVIDIA (07/16/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software QA Test Development…

    NVIDIA (Santa Clara, CA)
    NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our ... Computing Company', and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some… more
    NVIDIA (09/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior Cloud Services Software

    NVIDIA (Santa Clara, CA)
    …seeking a distributed software engineer to join our team! As a Senior engineer, you'll be instrumental in developing and optimizing AI infrastructure services to ... resiliency for DGX Cloud. Your expertise in cloud services software architecture that drives the full resilience stack that...that allows the framework to be integrated with the cluster scheduler visibly to the users + Strong understanding… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI-HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …and models + Familiarity with InfiniBand with IBOP and RDMA + Background with Software Defined Networking and AI/HPC cluster networking + Familiarity with deep ... reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC...[ AWS, Azure or GCP] + Experience with AI/HPC cluster job schedulers such as SLURM, LSF + In… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior Firmware Engineer - Data Center…

    NVIDIA (Santa Clara, CA)
    NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More ... recently, GPU deep learning ignited modern deep learning - the...and implemented in right way with each firmware and software module + Collaborate with other leads to design… more
    NVIDIA (10/03/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NPN

    NVIDIA (Santa Clara, CA)
    …end-to-end Machine Learning and Deep Learning solutions, using NVIDIA's compute, networking, and software stacks. Don't think this is a high-level slideshow job - we ... on-premises and cloud based. + 12+ years of proven experience with cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior Linux Systems Engineer

    NVIDIA (Santa Clara, CA)
    …container runtimes, drivers+containers, and containerization of various high performance computing cluster software elements within a variety of environments. + ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...impact on the world. We are looking for a Senior Linux Software Engineer to join the… more
    NVIDIA (08/24/24)
    - Save Job - Related Jobs - Block Source
  • Systems Software Engineer - NIM Factory…

    NVIDIA (Santa Clara, CA)
    …efforts + Experience working with hardware clusters, distributed system, networking, GPU interconnects (PCie, NVlink), node and cluster interconnect (Infiniband) ... new AI-powered application is built. We are seeking a senior engineer to design and build factory automation for...all the way through deployment in heterogeneous hardware and software environments. You will influence and drive technical advances… more
    NVIDIA (10/03/24)
    - Save Job - Related Jobs - Block Source
  • Data Center Test Development Architect

    NVIDIA (Santa Clara, CA)
    We are seeking a highly skilled and hard-working Senior Test Developer Architect to join our multifaceted Enterprise Software QA team. This role offers an ... field, or equivalent experience. + 4+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source
  • Research Associate - Neutrino

    SLAC National Accelerator Laboratory (Menlo Park, CA)
    …computer science departments. Computing resources available for this work include local GPU clusters with NVIDIA GPUs (28 A100, 280 RTX 2080Ti), current allocation ... at the NERSC Perlmutter (A100 cluster ), and other potential HPC centers where we apply...considered. + Knowledge in statistics, data analysis, algorithms and software development will be required. Strong background in AI/ML,… more
    SLAC National Accelerator Laboratory (08/26/24)
    - Save Job - Related Jobs - Block Source