• Senior GPU Cluster Software…

    NVIDIA (Santa Clara, CA)
    …working with distributed system software architecture + Basic understanding of HPC GPU cluster , slurm + Basic understanding of Machine learning concepts ... alerting capabilities with promised uptime + Build internal profiling tools for real world ML/DL applications running on HPC...running and instrumenting distributed LLM training on a multi gpu HPC cluster + Knowledge of LLM… more
    NVIDIA (08/13/24)
    - Save Job - Related Jobs - Block Source
  • Senior GPU Cluster

    NVIDIA (Santa Clara, CA)
    …join a multifaceted software team with high standards! This software engineering role involves developing tools for GPU Cluster users and admins. As a member ... debugging tools for common encountered problems in GPU cluster + Work with our users...the crowd: + Proven experience in GPU cluster scale continuous profiling & analysis tools /platforms… more
    NVIDIA (10/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior High Performance Computing…

    NVIDIA (Santa Clara, CA)
    …for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU -accelerated systems and provide architectural mentorship to product teams ... team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance… more
    NVIDIA (09/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer - Internal…

    NVIDIA (Santa Clara, CA)
    …+ Finding and fixing problems before they occur + Building automation for AI-HPC GPU Cluster bring up and scaled up operation + Improving Operational Excellence ... reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC...including workflows that uses MPI + Working knowledge of cluster configuration management tools such as BCM,… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Infrastructure Software Engineer

    NVIDIA (Santa Clara, CA)
    …+ Work with various teams at NVIDIA to incorporate and influence the latest tools for operating GPU clusters + Collaborate with users and system administrators ... We are now looking for a Senior Infrastructure Software Engineer! NVIDIA's Deep Learning Architecture...software stack for our next generation test and development cluster , the core infrastructure that provides a foundation for… more
    NVIDIA (09/23/24)
    - Save Job - Related Jobs - Block Source
  • Senior DevOps Engineer - DGX Cloud

    NVIDIA (Santa Clara, CA)
    …of GPU assets. You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Working with teams ... science of computer graphics. With the invention of the GPU - the engine of modern visual computing -...large-scale production systems. Experience with the aforementioned DevOps/SRE principles, tools and techniques. + You possess a BS in… more
    NVIDIA (08/29/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Test Development Engineer…

    NVIDIA (Santa Clara, CA)
    …CUDA libraries for Deep Learning. + Experience in validating Data Center GPU based infrastructure (multi-GPUs, multi-nodes, cluster ) + Background in validating ... We are looking for a highly experienced AI Senior Software Test development engineer in NVIDIA's Deep Learning SWQA team. The position is in NVIDIA Deep Learning and… more
    NVIDIA (09/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Test Development Engineer…

    NVIDIA (Santa Clara, CA)
    …infrastructure to improve test automation. + Experience in validating Data Center GPU based infrastructure (multi-GPUS, multi-nodes, cluster ). + Experience in ... measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech recognition, natural...VectorCAST, Bullseye, Gcov, or Coverity tools . The base salary range is 164,000 USD -… more
    NVIDIA (09/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior ASIC Physical Design Engineer,…

    NVIDIA (Santa Clara, CA)
    NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next...high-frequency and low-power CPUs, GPUs, SoCs at block level, cluster level, and/or full chip level, with a focus… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior ASIC Timing Engineer

    NVIDIA (Santa Clara, CA)
    NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next...Nvidia's GPUs, CPUs, DPUs and SoCs at block level, cluster level, and/or full chip level. + Work with… more
    NVIDIA (09/20/24)
    - Save Job - Related Jobs - Block Source
  • Senior System Reliability Engineer

    NVIDIA (Santa Clara, CA)
    NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer ... graphics, and revolutionized parallel computing - with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the… more
    NVIDIA (08/31/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NPN

    NVIDIA (Santa Clara, CA)
    …both on-premises and cloud based. + 12+ years of proven experience with cluster management and related tools , including Docker Containers, Slurm, Kubernetes, and ... job - we are the voice of experience, using Kubernetes, SaaS, infrastructure-as-code tools , network debugging, and problem solving skills to help build modern AI… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer - Storage

    NVIDIA (Santa Clara, CA)
    …etc. Familiarity with newer and emerging monitoring products. + Prior Experience with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Experience ... groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The GPU , our invention, serves as the visual cortex of modern… more
    NVIDIA (08/30/24)
    - Save Job - Related Jobs - Block Source
  • Senior Cloud Services Software Engineer

    NVIDIA (Santa Clara, CA)
    …and resiliency of ML workloads, as well as developing scalable AI infrastructure tools and services. Our objective is to deliver a stable, scalable environment for ... distributed software engineer to join our team! As a Senior engineer, you'll be instrumental in developing and optimizing...that allows the framework to be integrated with the cluster scheduler visibly to the users + Strong understanding… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Development Engineer…

    NVIDIA (Santa Clara, CA)
    …release efforts, gather automation requirements, and drive the development of automation tools and infrastructure. + Ensure the delivery of high-quality software by ... focusing on code coverage and maintaining automation tools and infrastructure. + Contribute to the automation of...team. + Be responsible for testing cloud services, new GPU /system bring-up, Security Products and CUDA releases. + Enhance… more
    NVIDIA (10/01/24)
    - Save Job - Related Jobs - Block Source
  • Data Center Test Development Architect

    NVIDIA (Santa Clara, CA)
    …field, or equivalent experience. + 4+ years of hands-on experience in cluster management and related tools , including Docker Containers, Slurm, Kubernetes, ... We are seeking a highly skilled and hard-working Senior Test Developer Architect to join our multifaceted Enterprise Software QA team. This role offers an… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source