- Meta (Menlo Park, CA)
- …libraries and scheduling infrastructure. **Required Skills:** Sr. Technical Lead Manager - AI / HPC Systems Performance Responsibilities: 1. Support ... requirements of large-scale training and inference workloads. To improve performance of these systems we constantly look...on monitoring, benchmarking and looking for opportunities to improve performance of AI Training and Inference. 2.… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- … testing with focus on automation. 22. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity with ... and/or similar languages. **Preferred Qualifications:** Preferred Qualifications: 16. Proficiency in High- Performance Computing ( HPC ) or AI system… more
- Meta (Menlo Park, CA)
- …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
- NVIDIA (Santa Clara, CA)
- …to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
- NVIDIA (Santa Clara, CA)
- …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
- NVIDIA (Santa Clara, CA)
- …looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more
- NVIDIA (Santa Clara, CA)
- …seeking a Senior Observability Engineer to help architect and implement our distributed observability systems for AI and HPC clusters. We serve and ... be working with a team of dedicated engineers on systems for data collection, aggregation, enrichment, storage, retrieval, and...spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will develop,… more
- Meta (Menlo Park, CA)
- …of RDMA workloads that expects a loss-less fabric interconnect. To enhance the performance of these systems , we continuously seek opportunities for improvement ... host networking, communication libraries, and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test… more
- Meta (Menlo Park, CA)
- … Performance Compute organization, you will define and develop the compute, storage, and AI systems that are deployed across our global data center fleet and ... and requirements needed for future hardware platforms **Required Skills:** Product Manager, AI / HPC Responsibilities: 1. Establish a shared vision and strategy… more
- Meta (Menlo Park, CA)
- …requirements of RDMA workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across ... fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test and… more
- Meta (Menlo Park, CA)
- …Meta and externally. **Required Skills:** Research Scientist, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. Apply High- Performance Computing ( ... Performance team is dedicated to maximizing training performance of Generative AI and recommendation models...HPC ) algorithms and techniques to optimize large-scale AI workloads 2. Analyze, benchmark, and optimize large-scale workloads… more
- NVIDIA (Santa Clara, CA)
- …for a Performance Engineer Intern focused on Deep Learning (DL) & High- Performance Computing ( HPC ) applications to join our diverse team. NVIDIA builds the ... What you'll be doing: + Plan and execute GPU performance benchmarking across a wide range of HPC...quality, and want to be at the forefront of AI & HPC , we would love for… more
- NVIDIA (Santa Clara, CA)
- …vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more
- Amazon (Cupertino, CA)
- Description We are seeking an experienced engineer to work on distributed AI /ML systems . This role involves working on collective operations - the fundamental ... operations that enable AI to scale across multiple accelerators & servers. Most...building networking solutions that for Machine Learning (ML) and High- Performance Computing ( HPC ) workloads on AWS. We… more
- NVIDIA (Santa Clara, CA)
- …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
- Meta (Menlo Park, CA)
- …end-to-end system validation strategy (hardware and software), with a focus on various AI / HPC hardware systems in datacenter applications. 2. Lead the ... algorithms, and OOP). **Preferred Qualifications:** Preferred Qualifications: 17. Proficiency in High- Performance Computing ( HPC ) or AI system architecture… more
- NVIDIA (Santa Clara, CA)
- … infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more
- NVIDIA (Santa Clara, CA)
- …+ Hands-on involvement in the entire lifecycle-from design to deployment-of large-scale High- Performance Computing ( HPC ) systems . + Experience in ... architecture or related fields, with a deep understanding of AI -optimized systems . + Excellent and proven ability...hands-on experience in software development on high-complexity projects involving HPC or AI . Ways to Stand Out… more
- Amazon (Cupertino, CA)
- …and operating AWS cloud offerings that enable high performance and scalability in AI /ML and HPC workloads. You are intrigued by the continuous release of ... Want to do industry leading work delivering continuous price performance improvements in the cloud for AI ...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more