• AI and ML HPC Cluster

    NVIDIA (Santa Clara, CA)
    …that power some of the world's most advanced computing workloads. NVIDIA is looking for an AI / ML HPC Cluster Engineer to join our MARS team. You ... including performance analysis and optimizations + Analyze and optimize cluster efficiency, job fragmentation, and GPU waste to meet...ahead of emerging technologies and effective approaches in the HPC and AI / ML infrastructure fields.… more
    NVIDIA (01/03/26)
    - Save Job - Related Jobs - Block Source
  • Senior AI and ML HPC

    NVIDIA (Santa Clara, CA)
    …for continual learning and staying ahead of emerging technologies and effective approaches in the HPC and AI / ML infrastructure fields. Ways to stand out from ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...including developing scalable automation solutions + Build and maintain AI and ML heterogeneous clusters on-premises and… more
    NVIDIA (10/19/25)
    - Save Job - Related Jobs - Block Source
  • PCIe QA Engineer

    Broadcom (San Jose, CA)
    …with L2/L3 protocols especially RoCE( RDMA over Converged Ethernet ) protocol & use cases in AI / ML , HPC cluster is a plus + Having Knowledge of deep ... PCI-E-based designs, and hands-on experience in Python programming. Good understanding of AI / ML clusters, Deep learning models, and GPU Micro benchmarks is a… more
    Broadcom (11/06/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Cluster

    NVIDIA (Santa Clara, CA)
    …for continual learning and staying ahead of new technologies and effective approaches in the HPC and AI / ML infrastructure fields. Ways to stand out from the ... experience crafting and operating large scale compute infrastructure. + Experience with AI / HPC job schedulers and orchestrators, such as Slurm, K8s or LSF.… more
    NVIDIA (10/30/25)
    - Save Job - Related Jobs - Block Source
  • Technical Program Manager, AI Network Infra

    Meta (Menlo Park, CA)
    …stack, Network Hardware (NICs, Optics & Switches) 20. Experience Developing & Delivering AI Cluster Solutions for training & inference use cases **Preferred ... AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible… more
    Meta (12/20/25)
    - Save Job - Related Jobs - Block Source
  • Product Manager, AI Platform Kernels…

    NVIDIA (Santa Clara, CA)
    NVIDIA's AI Software Platforms team seeks a technical product manager to accelerate next-generation inference deployments through innovative libraries, communication ... on the NVIDIA Platform, and push the boundaries of what is possible with their AI deployments! For Inference, we are the champions inside NVIDIA for AI more
    NVIDIA (12/10/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Research Analytics Scientist

    Stanford University (Stanford, CA)
    …submissions and basic debugging techniques. Working knowledge of at least one mainstream ML / AI framework and how to execute efficiently in an advanced computing ... full-stack applications + Optimizing Slurm scripts for effective utilization of cluster resources + Automated web scraping + Crowdsourcing pipelines In addition… more
    Stanford University (10/16/25)
    - Save Job - Related Jobs - Block Source
  • Software Developer 4

    Oracle (Santa Clara, CA)
    …building a cutting-edge, ultra-high-performance GPU cluster based Data Centers designed to support AI / ML / HPC workloads. This is your chance to be part of ... the AI revolution, creating systems that allow customers to scale...thousands of GPUs without compromising performance. We are the AI Infrastructure Delivery Engineering org at OCI. The OCI… more
    Oracle (11/25/25)
    - Save Job - Related Jobs - Block Source
  • Software Developer 4

    Oracle (Santa Clara, CA)
    …(OCI) Cluster Networking team is building an ultra-high-performance network to support AI / ML / HPC workloads. Join us to design systems that scale from ... MPI and GPU frameworks like CUDA and ROCm. + 2+ years of experience with ML training frameworks like PyTorch, TensorFlow + Proficient at programming in any two out… more
    Oracle (12/16/25)
    - Save Job - Related Jobs - Block Source