- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
- Meta (Menlo Park, CA)
- …Meta and externally. **Required Skills:** Research Scientist, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. Apply High- Performance Computing ( ... Performance team is dedicated to maximizing training performance of Generative AI and recommendation models...HPC ) algorithms and techniques to optimize large-scale AI workloads 2. Analyze, benchmark, and optimize large-scale workloads… more
- Meta (Menlo Park, CA)
- …our research, visit https:// ai .facebook.com. **Required Skills:** Research Scientist Intern, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. ... team's mission is to explore, develop and help productize high- performance software and hardware technologies for AI ...infrastructure.Meta is seeking Research Scientist Interns to join our AI & Systems Co-Design Training team to… more
- Meta (Menlo Park, CA)
- … AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible ... ground-breaking solutions & technologies. The ideal candidate will have experience in AI / HPC product development and operations, strong understanding of the… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Cisco (San Jose, CA)
- …team engaged in the design, development and execution of tests to qualify network performance for AI .ML capability. In this role you'll have opportunity to: * ... to build the next generation infrastructure to meet the needs of AI /ML workloads and continuously increasing internet users and application. We are uniquely… more
- General Motors (Mountain View, CA)
- …This role will involve working across various areas, from enhancing underlying HPC infrastructure to optimizing Kubernetes and Kubeflow setups, as well as refining ... teams to understand requirements and implement solutions. + Troubleshoot complex HPC infrastructure issues and implement effective resolutions with partner team. +… more
- Meta (Menlo Park, CA)
- …and suited for the hardware infrastructure.Meta is seeking Research Scientist Interns to join our AI & Systems Co-Design HPC & Inference team to drive the ... definition of our next-generation AI Systems Inference and Training architectures. The... performance - Model, SW, System, Accelerator - Performance modeling and simulations- HPC Software Optimizations-… more
- Meta (Menlo Park, CA)
- …computer science or related field. 8. Research or industry experience in compilers, ML systems , ML accelerators, HPC , GPU performance , and similar. 9. ... Our work is open-source, cutting-edge, and industry-leading. **Required Skills:** Software Engineer, Systems ML - PyTorch Compiler / ML Framework / Performance … more
- Meta (Menlo Park, CA)
- …years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Kernel development, Performance optimization (eg, NVIDIA, AMD, ... developing and productizing high- performance software and hardware technologies for AI at datacenter scale.Hardware Systems Engineer in RTP work closely… more
- Microsoft Corporation (Mountain View, CA)
- …and external, and operate at the intersection of AI algorithmic innovation, purpose-built AI hardware, systems , and software. We are a team of highly capable ... The Artificial Intelligence ( AI ) Frameworks team at Microsoft develops AI...+ Speeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systems +… more
- Cisco (San Jose, CA)
- …an expert in hardware optimization and performance tuning, especially for servers and AI systems . What you'll do: As a Technical Marketing Engineer ( TME) ... and performance tuning, especially for servers and AI systems * Experience with Linux to...products and solutions * Understand high- performance computing ( HPC ), GPU workloads, and other AI infrastructure… more
- Meta (Menlo Park, CA)
- …in Python, C++ or CUDA programming. 10. Research or industry experience in ML systems , ML accelerators, HPC , GPU performance , and similar. 11. Currently ... PyTorch. 14. Expert knowledge in GPU performance and writing high- performance communication libraries and fault tolerance distributed systems . 15. Proven… more