- NVIDIA (Santa Clara, CA)
- …make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters ... + Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Santa Clara, CA)
- …+ Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop ... and operating large scale compute infrastructure. + Experience with AI/ HPC job schedulers and orchestrators, such as Slurm, K8s...such as Slurm, K8s or LSF. Applied experience with AI/ HPC workflows that use MPI and NCCL. + Proficient… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- Texas A&M University System (College Station, TX)
- Job Title Senior HPC Engineer Agency Texas A&M University Department Technology Services - IT Enterprise Operations Proposed Minimum Salary Commensurate Job ... members' faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer ( HPC ), you will provide… more
- NVIDIA (Santa Clara, CA)
- …and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly ... will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography… more
- NVIDIA (Santa Clara, CA)
- Join the NVIDIA Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, committed to ... for internal teams and external partners on standard methodologies in HPC networking deployments. + Share insights on improving networking strategies for… more
- NVIDIA (Santa Clara, CA)
- …artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, ... software + Experience with RDMA (InfiniBand or RoCE) fabrics + Background with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Passionate and… more
- Oracle (Des Moines, IA)
- …Description** The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute ( HPC ), ... of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts...We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA, we are pushing the boundaries of what's possible in AI. We are currently seeking ... a Senior Software Engineer to lead the development...GPUs. Your expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI systems… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... to support multi-modal foundation models for robotics. + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. +… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is seeking a Senior Firmware Engineer to join our CSP Engagements team, focusing on system software for Datacenter products such as GB200. This role ... see: + Deep expertise in data center server architectures, HPC systems, and hardware-software co-design. + Deep expertise in...out from the crowd: + Knowledge of cloud and cluster level deployment and management systems. + Experience with… more
- NVIDIA (Santa Clara, CA)
- …for AVs capable of running on thousands of GPUs; + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets; + Implement ... curriculum learning. + Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes. + Strong programming skills in Python… more
- NVIDIA (Santa Clara, CA)
- …power some of the world's most advanced computing workloads. We are seeking a Software Engineer to join our MARS team at NVIDIA. In this role, you will help design, ... experience developing and operating large-scale distributed systems, infrastructure platforms, or HPC environments. + Strong programming skills in C++, Python, or… more
- NVIDIA (Santa Clara, CA)
- …GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... Linux experience, reliability testing with various telemetries, scale out cluster , test plan development, track record in developing AI...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
- NVIDIA (CO)
- …server architecture. In-depth understanding of the different deployment models for GPUs (eg, HPC , AI cluster , single- or multi-GPU servers). + Experience in Data ... NVIDIA is searching for a highly motivated, creative engineer with experience in system software security to join the Data Center Systems Software team. In this… more
- Mount Sinai Health System (New York, NY)
- …and implements backup policies. + Assist in the management and maintenance of HPC cluster and data center work, including troubleshooting for resolving system ... data warehouse team and a research data services team. The **_Senior Systems Administrator/ Engineer ,_** as a member of the Scientific Computing and Data group, is… more
- Honeywell (Phoenix, AZ)
- You will report directly to the Senior Engineering Manager and you'll work at our Plymouth, MN location on a Hybrid work schedule. (Other allowed Honeywell Aerospace ... **KEY RESPONSIBILITIES** + Work with IC Design EDA Applications, High Performance Compute cluster staff, and IC Design engineers to craft and maintain optimized EDA… more
- NVIDIA (NY)
- …other Engineering fields (or equivalent experience) + 12+ years experience as an ML/Software Engineer with a proven track record in writing code in Python, C++ + ... models at scale on public cloud computing and/or on-prem HPC clusters in production Ways To Stand Out From...of MLOps technologies such as containers, data center deployments, cluster management software, etc. + Experience working with enterprise… more
- Stanford University (Stanford, CA)
- …full-stack applications + Optimizing Slurm scripts for effective utilization of cluster resources + Automated web scraping + Crowdsourcing pipelines In addition ... resources. **Solutions Development:** * Formulate innovative technical strategies and engineer them to completion to achieve unique research objectives, using… more