- Meta (Austin, TX)
- …fabric and host networking, communications lib and scheduling infrastructure. Required Skills: AI / HPC System Performance Engineer Responsibilities: Lead ... a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look...with teamwork and close collaboration Responsible for the overall performance of the communication system , including … more
- Microsoft Corporation (Redmond, WA)
- …of high- performance physical networking systems that underpin large-scale AI and high- performance computing ( HPC ) environments. Network Deployment: ... Overview The HPC / AI (High performance Computing...scalable, fault-tolerant physical networks for distributed computing environments (eg, AI /ML clusters, HPC systems ) Proficiency… more
- Johns Hopkins University (Baltimore, MD)
- …Deployment and Design Develop and refine deployment strategies for scientific software on HPC and AI systems . Design computational workflows, selecting ... Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance . HPC Scientific Software...Ensure compliance with security and regulatory standards for all HPC and AI systems . In… more
- Google (Atlanta, GA)
- …building tools, architecting and developing software for scalable, distributed systems , including data platform, AI /ML, and infrastructure. Experience ... products, and different customer segments/use cases of the emerging AI Compute tech stack. About the job The Google...of our customers and helping shape the future of HPC . As the Senior Manager in High Performance… more
- Pfizer (Groton, CT)
- …and requires a hands-on approach to designing and delivering robust High Performance Computing ( HPC ) solutions supporting computational workloads across the ... you will design, implement, operate, and own robust and dependable infrastructure for HPC and ML/ AI workloads in a cloud environment (AWS/GCP). Lead… more
- Micron Technology, Inc. (Richardson, TX)
- …designing and optimizing High Bandwidth Memory (HBM) products for AI /ML, high- performance computing ( HPC ), and data-centric systems , collaborating across ... ever. Micron's Heterogeneous Integration Group (HIG) is shaping the future of AI and accelerated computing by developing sophisticated memory solutions! The team… more
- Oracle (Indianapolis, IN)
- …possible. Responsibilities Lead architecture, system design, and implementation for high- performance RDMA solutions across OCI's AI / HPC platforms, ... If you thrive at the intersection of large-scale distributed systems , high-speed networking, and AI workloads, this...and performance tuning at scale. Familiarity with AI / HPC stacks and workloads: NCCL/RCCL/MPI, Slurm or… more
- Oracle (Nashville, TN)
- …the forefront of building a cutting-edge, ultra-high- performance GPU platform designed to support AI /ML/ HPC workloads. This is your chance to be part of the ... AI revolution, creating systems that allow customers...and diagnostic services. These are essential for running distributed AI /ML/ HPC workloads across thousands of GPUs, leveraging… more
- Oracle (Austin, TX)
- …the forefront of building a cutting-edge, ultra-high- performance GPU platform designed to support AI /ML/ HPC workloads. This is your chance to be part of the ... AI revolution, creating systems that allow customers...and diagnostic services. These are essential for running distributed AI /ML/ HPC workloads across thousands of GPUs, leveraging… more
- Oracle (Salt Lake City, UT)
- …the forefront of building a cutting-edge, ultra-high- performance GPU platform designed to support AI /ML/ HPC workloads. This is your chance to be part of the ... AI revolution, creating systems that allow customers...and diagnostic services. These are essential for running distributed AI /ML/ HPC workloads across thousands of GPUs, leveraging… more
- Microsoft Corporation (Mountain View, CA)
- …the boundaries of scale, performance , and deployment, creating frontier AI systems that power transformative experiences across Microsoft. The Multimodal ... data libraries (Pandas, NumPy, etc.) OR equivalent experience. Experience with large-scale AI systems - design and deployment of distributed architectures,… more
- Oracle (Washington, DC)
- …the forefront of building a cutting-edge, ultra-high- performance GPU platform designed to support AI /ML/ HPC workloads. This is your chance to be part of the ... AI revolution, working with systems that allow...scale from tens to thousands of GPUs without compromising performance . Our team is responsible for designing and developing… more
- Bosch (Pittsburgh, PA)
- …or PyCharm) and experience with experiment tracking tools (MLFlow). Familiarity with high- performance computing ( HPC ) systems and job schedulers (Slurm, ... 125 years. The Research and Technology Center North America provides technologies and system solutions for various Bosch business fields, primarily in the field of… more
- Oracle (Sacramento, CA)
- …metrics, logs, eBPF/perf, chaos/failure testing, and SLO-driven operations. Knowledge of AI / HPC workload patterns and their implications for storage, query ... Job Description OCI (Oracle Cloud) AI Infrastructure Innovation team is inventing the next...If you thrive at the intersection of large-scale distributed systems , database internals, and cloud platforms, this role offers… more
- General Dynamics Information Technology (Falls Church, VA)
- …Family: IT Infrastructure and Operations Skills: High Performance Computing ( HPC ),High- Performance Computing ( HPC ) Systems ,Scientific Research ... Experience working directly with researchers for support and troubleshooting Experience with HPC systems and tooling like SLURM, GPFS, Globus Experience with… more
- TE Connectivity (Richmond, VA)
- …architectures to meet future performance , power, and density demands of AI , ML, and hyperscale workloads. Architect CPO/NPO systems across optical, ... Principal Optical System Architect (Remote) Posting Start Date: 7/8/25 At...optical transceivers, near/co-package optical transceivers, optical interconnects for advanced AI / HPC environment, compute, storage, and networking hardware… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Intel (Santa Clara, CA)
- …ensuring tailored innovation for diverse needs across general-purpose compute, web services, HPC , and AI -accelerated systems . Our charter encompasses ... potential of announced and projected future platform products, including features, performance , etc. Prior experience with system configuration, testing and… more
- Sandia National Laboratories (Albuquerque, NM)
- …independent project, thesis, or dissertation). Experience in developing software and AI systems for enterprise and national security applications. Demonstrated ... with distributed training frameworks (MPI, Horovod, Ray), hyperparameter tuning, and HPC systems . Hands-on experience with model optimization techniques… more
- Broadcom (San Jose, CA)
- …you apply. Job Description: Ethernet NIC product portfolio is designed for high performance computing and networking applications including AI and ML. This is ... of the next generation of Ethernet NIC solutions for AI /ML and High performance computing applications. We...networking is an added advantage. Experience analyzing and tuning performance for a variety of HPC workloads.… more