- Meta (Menlo Park, CA)
- …our network engineering teams is for you! **Required Skills:** Network Engineer , HPC Systems Network Strategy Responsibilities: 1. Design, ... you will be responsible for conceiving, developing, and deploying software, hardware and network systems and tools that improve reliability and efficiency in our… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Active ... daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: … more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Lead ... deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together. In addition, we… more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the ... and life cycle of servers in production. **Required Skills:** Production Systems Engineer , Sustaining Responsibilities: 1. Develop robust, industry leading… more
- Meta (Menlo Park, CA)
- …Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on ... (eg Large-Scale GenAI/LLM training) from the trainer down to the inter-GPU and network communication layer. And we are seeking for engineers to work on the… more
- Meta (Menlo Park, CA)
- …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis.We are actively seeking Software… more
- General Motors (Mountain View, CA)
- …This role will involve working across various areas, from enhancing underlying HPC infrastructure to optimizing Kubernetes and Kubeflow setups, as well as refining ... teams to understand requirements and implement solutions. + Troubleshoot complex HPC infrastructure issues and implement effective resolutions with partner team. +… more
- Cisco (San Jose, CA)
- …servers and AI systems . What you'll do: As a Technical Marketing Engineer ( TME) you will collaborate with engineering teams on product development and ... us as a highly motivated and driven Technical Marketing Engineer to define, validate, and drive compute & AI...Microsoft Hyper-V, and KVM. You have knowledge of storage systems , including SAN, NAS, and NVMe performance, and experience… more
- Cisco (San Jose, CA)
- …if you consider yourself a technologist at heart and with: * Experience with Network Operating Systems and in System and Software Qualification * Diligent with ... and traffic generators (commercial & open-source) * Exposure to network operating systems , preferably SONiC * Knowledge...VXLAN, segment Routing and/or MPLS * Exposure to RDMA, HPC networks * Knowledge of RoCE and Infini band… more