- NVIDIA (Santa Clara, CA)
- …working with distributed system software architecture + Basic understanding of HPC GPU cluster , slurm + Basic understanding of Machine learning concepts and ... experience for customer as well as engineers supporting the cluster . Much of our software development focuses...running and instrumenting distributed LLM training on a multi gpu HPC cluster + Knowledge of LLM… more
- NVIDIA (Santa Clara, CA)
- …hardware integration and bare-metal provisioning related functionality in our Linux-based cluster management software environment. NVIDIA's Bright Cluster ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...Work on adding features to our Ansible collections for Cluster Installation and Management. + Assist our support team… more
- NVIDIA (Santa Clara, CA)
- … team with high standards! This software engineering role involves developing tools for GPU Cluster users and admins. As a member of the software ... work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich...+ Build debugging tools for common encountered problems in GPU cluster + Work with our users… more
- NVIDIA (Santa Clara, CA)
- …for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU -accelerated systems and provide architectural mentorship to product teams ... team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance… more
- Lenovo (Morrisville, NC)
- Senior Solution Architect - AI cluster direction... cluster topology + Design and develop test software solutions for GPU cluster architecture ... ourStoryHub (https://news.lenovo.com/) . **Description and Requirements** We are hiring a ** Senior ** **Solution Architect** with a software background to prove… more
- NVIDIA (Santa Clara, CA)
- Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support the bringup, operations and ... automation of GPU infrastructure. The GPU infrastructure we build...a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be… more
- NVIDIA (Santa Clara, CA)
- …scale up its AI Infrastructure. We expect you to have significant software engineering experience with kubernetes including cluster operations, operator ... variety of AI workloads. This includes working on custom software related to scheduling GPU resources on...You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network… more
- NVIDIA (Santa Clara, CA)
- We are looking for a highly experienced AI Senior Software Test development engineer in NVIDIA's Deep Learning SWQA team. The position is in NVIDIA Deep Learning ... to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech… more
- NVIDIA (Santa Clara, CA)
- …to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for autonomous driving, healthcare, speech ... We are looking for a Software Test development engineer in NVIDIA's Deep Learning...improve test automation. + Experience in validating Data Center GPU based infrastructure (multi-GPUS, multi-nodes, cluster ). +… more
- NVIDIA (Santa Clara, CA)
- …and alerting. Additional responsibilities include: + Design and implement state-of-the-art GPU compute clusters. + Optimize cluster operations for maximum ... complex systems in the world. + Implement remediations across software and hardware stack according to plan, while keeping...environments with operational experience of at least 5K GPUs cluster . + Deep understanding of GPU computing… more
- NVIDIA (Santa Clara, CA)
- … software and firmware stack for these systems. We are looking for a Senior Software Architect who has deep expertise in designing server platforms and has ... We are building innovative server systems for GPU accelerated applications, such as Deep Learning. Data...customers. What you'll be doing: + You will lead software activities for NVIDIA's deep learning server platforms, from… more
- NVIDIA (Santa Clara, CA)
- … developers for the development of cloud related functionality in our Linux-based cluster software environment. NVIDIA's Base Command Manager is used to power ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...take pride in producing extremely clean code. + Our cluster management software is based on Linux.… more
- NVIDIA (Champaign, IL)
- NVIDIA's products, hardware and software , are world leaders for performance and efficiency. We are continually innovating in creative ways to improve our ability to ... deliver outstanding solutions across a wide range of sectors. We are seeking System Software Engineers who are passionate about what they do and are committed to… more
- University of Massachusetts Amherst (Amherst, MA)
- Senior Research Fellow - Research Software & Research Computing Facilitation Apply now ... (CDS) at the University of Massachusetts Amherst (UMass) is hiring a Senior Research Fellow in research software and research computing facilitation.… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our ... Computing Company', and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some… more
- Bloomberg (New York, NY)
- …runtimes. + Experience with Kubeflow/KServe, MLFlow, Sagemaker. + Experience working with GPU compute software and hardware. + Ability to identify and ... problems such as scalable model deployment, low latency/high throughput inference, GPU resource optimizations and autoscaling. + Automate operation and improve… more
- NVIDIA (Santa Clara, CA)
- …Group is growing our team of AI focused Infrastructure Engineers who run our internal cluster for accelerated AI and software development. As part of this team, ... you will help to manage a diverse cluster of GPU -accelerated systems. Your contributions will...in computing technology. Join our technically diverse team of GPU architects, software engineers and infrastructure experts… more
- NVIDIA (Santa Clara, CA)
- …and models + Familiarity with InfiniBand with IBOP and RDMA + Background with Software Defined Networking and AI/HPC cluster networking + Familiarity with deep ... reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC...[ AWS, Azure or GCP] + Experience with AI/HPC cluster job schedulers such as SLURM, LSF + In… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a dedicated and motivated senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, ... Building upon modern DevOps tools, your work will enable GenAI framework software engineers and deep learning algorithm engineers to work efficiently with a… more
- Cisco (Research Triangle Park, NC)
- …AI/ML workloads. * Experience with infrastructure monitoring and solving at both hardware ( GPU , CPU, memory) and software (AI/ML models, applications) levels. * ... Are We are seeking a highly skilled and experienced Senior Engineer to join our team, focusing on the...multiple projects simultaneously. * Experience with NVIDIA NGC (NVIDIA GPU Cloud) and DGX OS software stack… more