- Meta (Seattle, WA)
- …Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on ... (eg Large-Scale GenAI/LLM training) from the trainer down to the inter-GPU and network communication layer. And we are seeking for engineers to work on the… more
- Meta (Bellevue, WA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis.We are actively seeking Software… more
- Amazon (Seattle, WA)
- …cloud offerings that enable high performance and scalability in AI/ML and HPC workloads. AWS Infrastructure Services owns the design, planning, delivery, and ... to help. You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital… more
- Amazon (Seattle, WA)
- …SRE (Site Reliability Engineering), or Resilience Engineering - 5+ years of SysDE ( Systems Development Engineer ) or equivalent experience - 5+ years of server ... that enable high performance and scalability in AI/ML and HPC workloads. You are intrigued by the continuous release...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more