- NVIDIA (Santa Clara, CA)
- …working with distributed system software architecture + Basic understanding of HPC GPU cluster , slurm + Basic understanding of Machine learning concepts ... alerting capabilities with promised uptime + Build internal profiling tools for real world ML/DL applications running on HPC...running and instrumenting distributed LLM training on a multi gpu HPC cluster + Knowledge of LLM… more
- NVIDIA (Santa Clara, CA)
- …join a multifaceted software team with high standards! This software engineering role involves developing tools for GPU Cluster users and admins. As a member ... debugging tools for common encountered problems in GPU cluster + Work with our users...the crowd: + Proven experience in GPU cluster scale continuous profiling & analysis tools /platforms… more
- NVIDIA (Santa Clara, CA)
- …for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU -accelerated systems and provide architectural mentorship to product teams ... team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance… more
- NVIDIA (Santa Clara, CA)
- …scenarios while working with internal & external partners + Building automation for AI-HPC GPU Cluster bring up and scaled up operation + Write and review ... logging and alerting. Additional responsibilities include: + Design and implement state-of-the-art GPU compute clusters + Optimize cluster operations for maximum… more
- NVIDIA (Santa Clara, CA)
- …CUDA libraries for Deep Learning. + Experience in validating Data Center GPU based infrastructure (multi-GPUs, multi-nodes, cluster ) + Background in validating ... We are looking for a highly experienced AI Senior Software Test development engineer in NVIDIA's Deep Learning SWQA team. The position is in NVIDIA Deep Learning and… more
- NVIDIA (Santa Clara, CA)
- …infrastructure to improve test automation. + Experience in validating Data Center GPU based infrastructure (multi-GPUS, multi-nodes, cluster ). + Experience in ... measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech recognition, natural...VectorCAST, Bullseye, Gcov, or Coverity tools . The base salary range is 164,000 USD -… more
- NVIDIA (Santa Clara, CA)
- NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next...high-frequency and low-power CPUs, GPUs, SoCs at block level, cluster level, and/or full chip level, with a focus… more
- NVIDIA (Santa Clara, CA)
- NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next...Nvidia's GPUs, CPUs, DPUs and SoCs at block level, cluster level, and/or full chip level. + Work with… more
- NVIDIA (Santa Clara, CA)
- NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer ... graphics, and revolutionized parallel computing - with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the… more
- NVIDIA (Santa Clara, CA)
- NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More ... recently, GPU deep learning ignited modern deep learning - the...a data center view point. + Work closely with cluster bring up team and resolve issues at Speed… more
- NVIDIA (Santa Clara, CA)
- …both on-premises and cloud based. + 12+ years of proven experience with cluster management and related tools , including Docker Containers, Slurm, Kubernetes, and ... job - we are the voice of experience, using Kubernetes, SaaS, infrastructure-as-code tools , network debugging, and problem solving skills to help build modern AI… more
- NVIDIA (Santa Clara, CA)
- …etc. Familiarity with newer and emerging monitoring products. + Prior Experience with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Experience ... groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The GPU , our invention, serves as the visual cortex of modern… more
- NVIDIA (Santa Clara, CA)
- …and resiliency of ML workloads, as well as developing scalable AI infrastructure tools and services. Our objective is to deliver a stable, scalable environment for ... distributed software engineer to join our team! As a Senior engineer, you'll be instrumental in developing and optimizing...that allows the framework to be integrated with the cluster scheduler visibly to the users + Strong understanding… more
- NVIDIA (Santa Clara, CA)
- …field, or equivalent experience. + 4+ years of hands-on experience in cluster management and related tools , including Docker Containers, Slurm, Kubernetes, ... We are seeking a highly skilled and hard-working Senior Test Developer Architect to join our multifaceted Enterprise Software QA team. This role offers an… more