• Senior AI - HPC

    NVIDIA (Santa Clara, CA)
    …of experience crafting and operating large scale compute infrastructure. + Experience with AI / HPC job schedulers and orchestrators, such as Slurm, K8s or LSF. ... Applied experience with AI / HPC workflows that use MPI and NCCL. + Proficient in using Linux including Centos/RHEL and/or Ubuntu Linux distributions. A solid… more
    NVIDIA (10/30/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI and ML HPC

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... years of experience designing and operating large scale compute infrastructure + Experience with AI / HPC advanced job schedulers, such as Slurm, K8s, PBS, RTDA or… more
    NVIDIA (10/19/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Engineer

    Texas A&M University System (College Station, TX)
    Job Title Senior HPC Engineer Agency Texas A&M University Department Technology Services - IT Enterprise Operations Proposed Minimum Salary Commensurate Job ... sensitive requiring US Citizenship. Opportunities to Contribute * Manage large-scale HPC cluster operations, including OS upgrades, firmware patching, and… more
    Texas A&M University System (10/03/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, Cluster

    NVIDIA (Santa Clara, CA)
    …and reference material Ways to stand out from the crowd: + Experience leading large-scale AI Factory or HPC cluster bring-ups or builds + Hands-on experience ... world's most groundbreaking and innovative accelerated computing platforms for AI and HPC . Because of our work,...world's fastest supercomputers. We are seeing a highly motivated Senior Solutions Architect to join the Cluster more
    NVIDIA (12/04/25)
    - Save Job - Related Jobs - Block Source
  • Senior GPU and HPC Infrastructure…

    NVIDIA (Santa Clara, CA)
    NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, knowledge of datacenter hardware, operations, ... and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly… more
    NVIDIA (10/09/25)
    - Save Job - Related Jobs - Block Source
  • HPC Sr. Systems Administrator (IT@JH…

    Johns Hopkins University (Baltimore, MD)
    …the direction of senior engineers. The position collaborates closely with senior HPC staff to deliver stable, efficient, and well-documented systems that ... scheduler queues. + Contribute to automation efforts and continuous improvement of cluster operations under guidance from senior engineers. + Support compliance… more
    Johns Hopkins University (12/09/25)
    - Save Job - Related Jobs - Block Source
  • Senior Systems Engineer - High-Performance…

    NVIDIA (Santa Clara, CA)
    Join the NVIDIA Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, committed to ... equivalent experience. + 8+ years of proven experience in AI / HPC Infrastructure. + Familiarity with AI...NCCL, NIXL, NVSHMEM, UCX. + Experience developing or maintaining cluster management and monitoring tools Ex: ansible for infrastructure… more
    NVIDIA (11/11/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer, AI

    NVIDIA (Santa Clara, CA)
    We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA, we are pushing the boundaries of what's possible in AI . We are currently ... Senior Software Engineer to lead the development of AI software resiliency for the most powerful AI...GPUs. Your expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI more
    NVIDIA (10/15/25)
    - Save Job - Related Jobs - Block Source
  • Senior Manager, Network Development

    Oracle (Lansing, MI)
    …force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI , ML, HPC workloads. We strive to be the go-to experts in ... leveraging our deep understanding of the unique demands of AI /ML and HPC applications. By staying at...& operating the network stack required to run distributed AI workloads across a cluster spanning thousands… more
    Oracle (11/25/25)
    - Save Job - Related Jobs - Block Source
  • Senior /Principal - Artificial Intelligence…

    Sandia National Laboratories (Albuquerque, NM)
    …Integrate exascale HPC systems with elastic cloud resources and specialized AI accelerator clusters (on-prem and in-cloud) + Deploy ruggedized edge servers and ... models for risk-shared governance Manage enterprise licensing, token agreements, and software audits for AI and HPC frameworks Manage the full lifecycle of the … more
    Sandia National Laboratories (11/14/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer - Storage

    NVIDIA (Santa Clara, CA)
    …supporting software + Experience with RDMA (InfiniBand or RoCE) fabrics + Background with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Passionate ... artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC ...To Stand Out Of The Crowd: + Knowledge of HPC and AI solution technologies from CPU's… more
    NVIDIA (11/19/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NVIDIA Cloud…

    NVIDIA (Santa Clara, CA)
    …with NVIDIA hardware (such as GPUs, ETH/IB networking components, storage, etc.) within extensive AI and HPC cluster settings. + Practical knowledge of ... expertise in data center design, development and execution for AI and HPC . + Efficient time management...AI benchmarking, and more. + Practical involvement in cluster administration and coordination (SLURM, K8s, etc.). We have… more
    NVIDIA (12/02/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NPN

    NVIDIA (Durham, NC)
    …a hardworking Solution Architect with experience in designing, building, and maintaining large scale HPC and AI hybrid computing solutions to join our team at ... (or equivalent experience). + Established track record working with AI and HPC clusters, both on-premises and...based. + 4 plus years of proven experience with cluster management and related tools, including Docker Containers, Slurm,… more
    NVIDIA (10/19/25)
    - Save Job - Related Jobs - Block Source
  • Senior Network Development Engineer

    Oracle (Des Moines, IA)
    …force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI , ML, HPC workloads. We strive to be the go-to experts in ... Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute...leveraging our deep understanding of the unique demands of AI /ML and HPC applications. By staying at… more
    Oracle (11/25/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer, Foundation Model…

    NVIDIA (Santa Clara, CA)
    NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka… more
    NVIDIA (12/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior Manager, CSP Engagements - System…

    NVIDIA (Santa Clara, CA)
    …and telemetry frameworks. + Familiarity with GPU computing (CUDA), large-scale AI / HPC workloads, NVLink, Grace, and cluster -level deployment/management. ... NVIDIA is seeking a Senior Manager to lead our System Software SWAT...with at least 5 years in data center or HPC software environments. + Bachelor's degree or equivalent experience.… more
    NVIDIA (11/04/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer - Autonomous…

    NVIDIA (Santa Clara, CA)
    …training deep learning models at scale, and a good mathematical foundation to analyze new AI algorithms. We focus on AI models for autonomous driving such as ... agent behavior models, end-to-end AV architectures, AI safety, closed-loop training approaches, and AV foundation models...running on thousands of GPUs; + Optimize GPU and cluster utilization for efficient model training and fine-tuning on… more
    NVIDIA (10/08/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - Storage

    NVIDIA (Santa Clara, CA)
    …computing, and artificial intelligence. Our technology powers everything from generative AI to autonomous systems, and we continue to shape the future ... through innovation and collaboration. Within this mission, our team, Managed AI Research Superclusters (MARS), builds and scales the infrastructure, platforms, and… more
    NVIDIA (12/02/25)
    - Save Job - Related Jobs - Block Source
  • Senior Principal System Solution Architect

    Microsoft Corporation (Redmond, WA)
    …the hardware development lifecycle. + Proficient understanding of state of the art of AI / HPC physical infrastructure. + Ability to analyze solutions from a full ... that will manage and optimize the Cloud infrastructure. We are looking for a ** Senior Principal System Solution Architect** to join the System Design tea The System… more
    Microsoft Corporation (12/06/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, Financial…

    NVIDIA (NY)
    …Capital Markets and Exchange firms to accelerate High-Performance Computing and AI workloads across various use cases. We're seeking an inquisitive, hard-working, ... models at scale on public cloud computing and/or on-prem HPC clusters in production Ways To Stand Out From...of MLOps technologies such as containers, data center deployments, cluster management software, etc. + Experience working with enterprise… more
    NVIDIA (10/15/25)
    - Save Job - Related Jobs - Block Source