• AI and ML HPC Cluster

    NVIDIA (Santa Clara, CA)
    …that power some of the world's most advanced computing workloads. NVIDIA is looking for an AI / ML HPC Cluster Engineer to join our MARS team. You ... including performance analysis and optimizations + Analyze and optimize cluster efficiency, job fragmentation, and GPU waste to meet...ahead of emerging technologies and effective approaches in the HPC and AI / ML infrastructure fields.… more
    NVIDIA (01/03/26)
    - Save Job - Related Jobs - Block Source
  • Senior AI and ML HPC

    NVIDIA (Santa Clara, CA)
    …for continual learning and staying ahead of emerging technologies and effective approaches in the HPC and AI / ML infrastructure fields. Ways to stand out from ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...including developing scalable automation solutions + Build and maintain AI and ML heterogeneous clusters on-premises and… more
    NVIDIA (10/19/25)
    - Save Job - Related Jobs - Block Source
  • PCIe QA Engineer

    Broadcom (San Jose, CA)
    …with L2/L3 protocols especially RoCE( RDMA over Converged Ethernet ) protocol & use cases in AI / ML , HPC cluster is a plus + Having Knowledge of deep ... PCI-E-based designs, and hands-on experience in Python programming. Good understanding of AI / ML clusters, Deep learning models, and GPU Micro benchmarks is a… more
    Broadcom (11/06/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Cluster

    NVIDIA (Santa Clara, CA)
    …for continual learning and staying ahead of new technologies and effective approaches in the HPC and AI / ML infrastructure fields. Ways to stand out from the ... experience crafting and operating large scale compute infrastructure. + Experience with AI / HPC job schedulers and orchestrators, such as Slurm, K8s or LSF.… more
    NVIDIA (10/30/25)
    - Save Job - Related Jobs - Block Source
  • Performance Benchmarking Engineer - Cluster

    Oracle (Seattle, WA)
    …be the go-to experts on RDMA cluster architecture and its relationship to AI / ML / HPC performance. We apply our deep understanding of these unique workload ... so our customers can push the cutting edge in AI / ML and other areas of HPC...+ Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not… more
    Oracle (11/25/25)
    - Save Job - Related Jobs - Block Source
  • HPC Sr. Scientific Software Engineer (IT@JH…

    Johns Hopkins University (Baltimore, MD)
    …and Design** + Develop and refine deployment strategies for scientific software on HPC and AI systems. + Design computational workflows, selecting optimal ... _Performance Optimization_ + Analyze and optimize the performance of AI models and HPC applications, focusing on...findings and sharing best practices. + Integrate and support AI / ML frameworks, scientific libraries, and workflow engines… more
    Johns Hopkins University (11/21/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Systems Engineer

    Massachusetts Institute of Technology (Cambridge, MA)
    …and optimizing HPC clusters, storage systems, and networking for AI / ML workloads. Join a collaborative, fast-paced team delivering critical infrastructure ... scripting in Python or Bash; and container orchestration tools like Docker and Kubernetes; and experience in cloud-based HPC or AI / ML workloads. 12/3/2025 more
    Massachusetts Institute of Technology (12/04/25)
    - Save Job - Related Jobs - Block Source
  • Technical Program Manager, AI Network Infra

    Meta (Menlo Park, CA)
    …stack, Network Hardware (NICs, Optics & Switches) 20. Experience Developing & Delivering AI Cluster Solutions for training & inference use cases **Preferred ... AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible… more
    Meta (12/20/25)
    - Save Job - Related Jobs - Block Source
  • Product Manager, AI Platform Kernels…

    NVIDIA (Santa Clara, CA)
    NVIDIA's AI Software Platforms team seeks a technical product manager to accelerate next-generation inference deployments through innovative libraries, communication ... on the NVIDIA Platform, and push the boundaries of what is possible with their AI deployments! For Inference, we are the champions inside NVIDIA for AI more
    NVIDIA (12/10/25)
    - Save Job - Related Jobs - Block Source
  • High Performance Computing Engineer - Mid-level

    General Dynamics Information Technology (Chantilly, VA)
    …* Provide day to day systems administration duties for Nvidia GPUs, Commodity Cluster Systems and Cray HPC environments * Perform system monitoring, software ... operate across 50 countries worldwide, offering leading capabilities in digital modernization, AI / ML , Cloud, Cyber and application development. Together with our… more
    General Dynamics Information Technology (12/29/25)
    - Save Job - Related Jobs - Block Source
  • Solutions Architect - NVIDIA Cloud Partners

    NVIDIA (Santa Clara, CA)
    …with NVIDIA hardware (such as GPUs, ETH/IB networking components, storage, etc.) within extensive AI and HPC cluster settings. + Practical knowledge of ... bridge the gap between design and deployment of large-scale AI and HPC GPU infrastructure. Do you...to be part of the team that brings GenAI, AI , ML , etc. hardware and software technologies… more
    NVIDIA (12/16/25)
    - Save Job - Related Jobs - Block Source
  • Senior Network Development Engineer

    Oracle (Annapolis, MD)
    …force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI , ML , HPC workloads. We strive to be the go-to ... Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute...leveraging our deep understanding of the unique demands of AI / ML and HPC applications. By… more
    Oracle (12/13/25)
    - Save Job - Related Jobs - Block Source
  • Network Development Engineer

    Oracle (Richmond, VA)
    …force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI , ML , HPC workloads. We strive to be the go-to ... Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute...leveraging our deep understanding of the unique demands of AI / ML and HPC applications. By… more
    Oracle (11/25/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, Financial Services…

    NVIDIA (NY)
    …Capital Markets and Exchange firms to accelerate High-Performance Computing and AI workloads across various use cases. We're seeking an inquisitive, hard-working, ... and next-generation GPU architectures. + Work directly with client ML researchers and developers/engineers on business-impacting workflows, projects, and issues… more
    NVIDIA (10/15/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - Storage

    NVIDIA (Santa Clara, CA)
    …and tools that enable researchers and engineers to develop the next generation of AI / ML systems. By joining us, you'll help design solutions that power some ... of GPUs and petabytes of storage in multi-region clusters. + Collaborate with AI / ML research teams to understand their requirements and translate them into… more
    NVIDIA (12/02/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Research Analytics Scientist

    Stanford University (Stanford, CA)
    …submissions and basic debugging techniques. Working knowledge of at least one mainstream ML / AI framework and how to execute efficiently in an advanced computing ... full-stack applications + Optimizing Slurm scripts for effective utilization of cluster resources + Automated web scraping + Crowdsourcing pipelines In addition… more
    Stanford University (10/16/25)
    - Save Job - Related Jobs - Block Source
  • Postdoctoral Fellow-Msh-32910-026

    Mount Sinai Health System (New York, NY)
    …AIRMS ( AI -ready Mount Sinai Integrated Data and Analytics Platform), the Minerva HPC cluster , and eHive, a digital platform for wearable and real-world data ... Proficiency in Python and PyTorch. + Demonstrated publication record in ML / AI or computational health. + Strong communication and collaboration skills.… more
    Mount Sinai Health System (01/03/26)
    - Save Job - Related Jobs - Block Source
  • Software Developer 4

    Oracle (Santa Clara, CA)
    …building a cutting-edge, ultra-high-performance GPU cluster based Data Centers designed to support AI / ML / HPC workloads. This is your chance to be part of ... the AI revolution, creating systems that allow customers to scale...thousands of GPUs without compromising performance. We are the AI Infrastructure Delivery Engineering org at OCI. The OCI… more
    Oracle (11/25/25)
    - Save Job - Related Jobs - Block Source
  • Software Developer 4

    Oracle (Santa Clara, CA)
    …(OCI) Cluster Networking team is building an ultra-high-performance network to support AI / ML / HPC workloads. Join us to design systems that scale from ... MPI and GPU frameworks like CUDA and ROCm. + 2+ years of experience with ML training frameworks like PyTorch, TensorFlow + Proficient at programming in any two out… more
    Oracle (12/16/25)
    - Save Job - Related Jobs - Block Source