- NVIDIA (Santa Clara, CA)
- …working with distributed system software architecture + Basic understanding of HPC GPU cluster , slurm + Basic understanding of Machine learning concepts and ... experience for customer as well as engineers supporting the cluster . Much of our software development focuses...running and instrumenting distributed LLM training on a multi gpu HPC cluster + Knowledge of LLM… more
- NVIDIA (Santa Clara, CA)
- … team with high standards! This software engineering role involves developing tools for GPU Cluster users and admins. As a member of the software ... work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich...+ Build debugging tools for common encountered problems in GPU cluster + Work with our users… more
- NVIDIA (Santa Clara, CA)
- …hardware integration and bare-metal provisioning related functionality in our Linux-based cluster management software environment. NVIDIA's Bright Cluster ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...Work on adding features to our Ansible collections for Cluster Installation and Management. + Assist our support team… more
- NVIDIA (Santa Clara, CA)
- …for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU -accelerated systems and provide architectural mentorship to product teams ... team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance… more
- NVIDIA (Santa Clara, CA)
- Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support the bringup, operations and ... automation of GPU infrastructure. The GPU infrastructure we build...a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be… more
- NVIDIA (Santa Clara, CA)
- …scale up its AI Infrastructure. We expect you to have significant software engineering experience with kubernetes including cluster operations, operator ... variety of AI workloads. This includes working on custom software related to scheduling GPU resources on...You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network… more
- NVIDIA (Santa Clara, CA)
- We are looking for a highly experienced AI Senior Software Test development engineer in NVIDIA's Deep Learning SWQA team. The position is in NVIDIA Deep Learning ... to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech… more
- NVIDIA (Santa Clara, CA)
- …to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech ... We are looking for a Software Test development engineer in NVIDIA's Deep Learning...improve test automation. + Experience in validating Data Center GPU based infrastructure (multi-GPUS, multi-nodes, cluster ). +… more
- NVIDIA (Santa Clara, CA)
- …and alerting. Additional responsibilities include: + Design and implement state-of-the-art GPU compute clusters. + Optimize cluster operations for maximum ... complex systems in the world. + Implement remediations across software and hardware stack according to plan, while keeping...environments with operational experience of at least 5K GPUs cluster . + Deep understanding of GPU computing… more
- NVIDIA (Santa Clara, CA)
- … software and firmware stack for these systems. We are looking for a Senior Software Architect who has deep expertise in designing server platforms and has ... We are building innovative server systems for GPU accelerated applications, such as Deep Learning. Data...customers. What you'll be doing: + You will lead software activities for NVIDIA's deep learning server platforms, from… more
- NVIDIA (Santa Clara, CA)
- … developers for the development of cloud related functionality in our Linux-based cluster software environment. NVIDIA's Base Command Manager is used to power ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...take pride in producing extremely clean code. + Our cluster management software is based on Linux.… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our ... Computing Company', and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some… more
- NVIDIA (Santa Clara, CA)
- …Group is growing our team of AI focused Infrastructure Engineers who run our internal cluster for accelerated AI and software development. As part of this team, ... you will help to manage a diverse cluster of GPU -accelerated systems. Your contributions will...in computing technology. Join our technically diverse team of GPU architects, software engineers and infrastructure experts… more
- NVIDIA (Santa Clara, CA)
- …and models + Familiarity with InfiniBand with IBOP and RDMA + Background with Software Defined Networking and AI/HPC cluster networking + Familiarity with deep ... reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC...[ AWS, Azure or GCP] + Experience with AI/HPC cluster job schedulers such as SLURM, LSF + In… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a dedicated and motivated senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, ... Building upon modern DevOps tools, your work will enable GenAI framework software engineers and deep learning algorithm engineers to work efficiently with a… more
- NVIDIA (Santa Clara, CA)
- …end-to-end Machine Learning and Deep Learning solutions, using NVIDIA's compute, networking, and software stacks. Don't think this is a high-level slideshow job - we ... on-premises and cloud based. + 12+ years of proven experience with cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.… more
- NVIDIA (Santa Clara, CA)
- …container runtimes, drivers+containers, and containerization of various high performance computing cluster software elements within a variety of environments. + ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...impact on the world. We are looking for a Senior Linux Software Engineer to join the… more
- NVIDIA (Santa Clara, CA)
- …efforts + Experience working with hardware clusters, distributed system, networking, GPU interconnects (PCie, NVlink), node and cluster interconnect (Infiniband) ... new AI-powered application is built. We are seeking a senior engineer to design and build factory automation for...all the way through deployment in heterogeneous hardware and software environments. You will influence and drive technical advances… more
- NVIDIA (Santa Clara, CA)
- …and Deep Learning Software Stack + Good knowledge of container and cluster technologies like slurm, kubernetes, jenkins, gitlab-ci, and zabbix + Experience with ... the next wave of NVIDIA's highest performing deep learning software stacks. Your role spans multiple products such as...GPU computing systems + Track record of identifying useful… more
- NVIDIA (Santa Clara, CA)
- …+ Background in computer science, machine learning, deep learning, open-source software , infrastructure technologies, and GPU technology. + Prior experience ... NVIDIA's Hardware Infrastructure organization is seeking a Senior or Princip al Data and Observability Architect....teams to define a vision and roadmap for AI/HPC cluster observability. + Architect and lead teams to d… more