- NVIDIA (Santa Clara, CA)
- …make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters ... + Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Santa Clara, CA)
- …+ Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop ... and operating large scale compute infrastructure. + Experience with AI/ HPC job schedulers and orchestrators, such as Slurm, K8s...such as Slurm, K8s or LSF. Applied experience with AI/ HPC workflows that use MPI and NCCL. + Proficient… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Westford, MA)
- …that will be integrated with diverse quantum computing platforms. The lead HPC Engineer will offer technical mentorship, system administration, optimizing ... As the HPC Operations Engineer for the new...As the HPC Operations Engineer for the new Accelerated Quantum Center (https://www.nvidia.com/en-us/solutions/quantum-computing/accelerated-quantum-center/)… more
- Texas A&M University System (College Station, TX)
- Job Title Senior HPC Engineer Agency Texas A&M University Department Technology Services - IT Enterprise Operations Proposed Minimum Salary Commensurate Job ... members' faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer ( HPC ), you will provide… more
- NVIDIA (Santa Clara, CA)
- …and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly ... will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography… more
- NVIDIA (Santa Clara, CA)
- Join the NVIDIA Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, committed to ... for internal teams and external partners on standard methodologies in HPC networking deployments. + Share insights on improving networking strategies for… more
- NVIDIA (Santa Clara, CA)
- …artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, ... software + Experience with RDMA (InfiniBand or RoCE) fabrics + Background with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Passionate and… more
- Oracle (Boise, ID)
- …Description** The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute ( HPC ), ... of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts...We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA, we are pushing the boundaries of what's possible in AI. We are currently seeking ... a Senior Software Engineer to lead the development...GPUs. Your expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI systems… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... to support multi-modal foundation models for robotics. + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. +… more
- NVIDIA (Santa Clara, CA)
- …The data center platforms like GB200 NVL72 by NVIDIA are redefining AI, HPC , and cloud computing. To accommodate leading workloads globally, our diagnostic systems ... hardware technologies. We're in search of a visionary technical leader to engineer and propel innovation in diagnostics for NVIDIA's partner ecosystem. This role… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is seeking a Senior Firmware Engineer to join our CSP Engagements team, focusing on system software for Datacenter products such as GB200. This role ... see: + Deep expertise in data center server architectures, HPC systems, and hardware-software co-design. + Deep expertise in...out from the crowd: + Knowledge of cloud and cluster level deployment and management systems. + Experience with… more
- NVIDIA (Santa Clara, CA)
- …for AVs capable of running on thousands of GPUs; + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets; + Implement ... curriculum learning. + Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes. + Strong programming skills in Python… more
- NVIDIA (Santa Clara, CA)
- …power some of the world's most advanced computing workloads. We are seeking a Software Engineer to join our MARS team at NVIDIA. In this role, you will help design, ... experience developing and operating large-scale distributed systems, infrastructure platforms, or HPC environments. + Strong programming skills in C++, Python, or… more
- NVIDIA (Santa Clara, CA)
- …GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... Linux experience, reliability testing with various telemetries, scale out cluster , test plan development, track record in developing AI...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
- NVIDIA (CO)
- …server architecture. In-depth understanding of the different deployment models for GPUs (eg, HPC , AI cluster , single- or multi-GPU servers). + Experience in Data ... NVIDIA is searching for a highly motivated, creative engineer with experience in system software security to join the Data Center Systems Software team. In this… more
- Mount Sinai Health System (New York, NY)
- …and implements backup policies. + Assist in the management and maintenance of HPC cluster and data center work, including troubleshooting for resolving system ... data warehouse team and a research data services team. The **_Senior Systems Administrator/ Engineer ,_** as a member of the Scientific Computing and Data group, is… more
- Honeywell (Phoenix, AZ)
- You will report directly to the Senior Engineering Manager and you'll work at our Plymouth, MN location on a Hybrid work schedule. (Other allowed Honeywell Aerospace ... **KEY RESPONSIBILITIES** + Work with IC Design EDA Applications, High Performance Compute cluster staff, and IC Design engineers to craft and maintain optimized EDA… more
- NVIDIA (NY)
- …other Engineering fields (or equivalent experience) + 12+ years experience as an ML/Software Engineer with a proven track record in writing code in Python, C++ + ... models at scale on public cloud computing and/or on-prem HPC clusters in production Ways To Stand Out From...of MLOps technologies such as containers, data center deployments, cluster management software, etc. + Experience working with enterprise… more