- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Meta (Menlo Park, CA)
- …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
- NVIDIA (Santa Clara, CA)
- …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
- NVIDIA (Santa Clara, CA)
- …topologies + Extensive experience with benchmarking systems and analyzing performance bottlenecks in large-scale AI / HPC infrastructure + Exceptional ... harness your infrastructure expertise to create reference designs for the world's most powerful AI clusters. As an AI / HPC Product Architect at NVIDIA, you'll… more
- NVIDIA (Santa Clara, CA)
- …designing and operating large scale compute infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Working ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek an...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
- NVIDIA (Santa Clara, CA)
- …Understanding of fast, distributed storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning frameworks like PyTorch and ... parallel computing. Now, GPU deep learning is driving modern AI forward. Join our GPU AI / HPC...identify bottlenecks and opportunities for optimization, continuously improving the performance and cost-effectiveness of our AI computing… more
- NVIDIA (Santa Clara, CA)
- …looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more
- NVIDIA (CA)
- …a lasting impact on the world. NVIDIA Infrastructure Specialists team seeks an HPC / AI Infiniband Network Engineer to help customers realize next-generation data ... doing: + Primary responsibilities will include building and validating AI / HPC infrastructure for new and existing customers....customers. + Support operational and reliability aspects of large-scale AI clusters with a focus on performance … more
- NVIDIA (Santa Clara, CA)
- …group at NVIDIA has openings for software architects in the field of AI and high- performance networking and system software. We research, develop, and ... be doing + Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), HPC programming models (MPI, OpenSHMEM, PGAS), new… more
- NVIDIA (Santa Clara, CA)
- …Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more
- The MITRE Corporation (Mclean, VA)
- …+ Provide Linux systems administration support for MITRE's HPC systems to ensure the availability, performance , and security of systems . ... Computing to MITRE research organizations. Job Description: We are seeking an experienced Linux HPC Systems engineer to join our team! This is an exciting… more
- University Corporation for Atmospheric Research (WY)
- Job Description Summary: UCAR is excited to announce the job opening for a HPC Systems Engineer III role. This position is responsible for providing system ... on a routine, daily basis to ensure proper and efficient operations. Alerts other HPC Systems Group staff, vendor representatives, and/or NWSC staff of abnormal… more
- General Dynamics Information Technology (Fairfax, VA)
- …Regular **Clearance Level Must Be Able to Obtain:** None **Job Family:** Systems Engineering **Skills:** High- Performance Computing ( HPC ) Systems ... are our differentiator. Our work depends on a Senior HPC Systems Engineer joining our team to...some travel required. WCOSS provides NOAA the operational High Performance Computing ( HPC ) resources essential to process… more
- Meta (Bellevue, WA)
- …Meta and externally. **Required Skills:** Research Scientist, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. Apply High- Performance Computing ( ... Performance team is dedicated to maximizing training performance of Generative AI and recommendation models...HPC ) algorithms and techniques to optimize large-scale AI workloads 2. Analyze, benchmark, and optimize large-scale workloads… more
- NYU Rory Meyers College of Nursing (New York, NY)
- Position Summary The Senior HPC Specialist supports New York University's High- Performance Computing ( HPC ) and Research Cloud services by partnering with ... HPC policies, while working closely with the HPC and Research Cloud Systems team on...protocols, tools and utilities. Experience with monitoring and improving HPC application performance . Demonstrated experience in diagnosing… more
- General Dynamics Information Technology (Phoenix, AZ)
- …Regular **Clearance Level Must Be Able to Obtain:** None **Job Family:** Systems Administration **Skills:** High- Performance Computing ( HPC ) Systems ... our differentiator. Our work depends on an On Site HPC Systems Admin joining our team to...the Phoenix, AZ. WCOSS provides NOAA the operational High Performance Computing ( HPC ) resources essential to process… more
- NVIDIA (Santa Clara, CA)
- …the world. We are looking for an outstanding engineer for a Senior HPC Systems Engineer role for at scale AI system performance and datacenter ... develop new, leading differentiated solutions. You will interact with HPC , OS, CPU and GPU compute, and systems...debugging and resolving critical software issues for the best AI workload performance at scale. + Specific… more
- Meta (Menlo Park, CA)
- …our research, visit https:// ai .facebook.com. **Required Skills:** Research Scientist Intern, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. ... team's mission is to explore, develop and help productize high- performance software and hardware technologies for AI ...infrastructure.Meta is seeking Research Scientist Interns to join our AI & Systems Co-Design Training team to… more
- Lockheed Martin (PR)
- …Database Engines, Middleware Splunk\), Storage, Data Center/Hardware, High Performance Computing \(Simulation, AI /ML\), Governance, Commercial Cloud ... This Full Stack Engineer role is for the High Performance Computing \( HPC \) Delivery Team\. Engineer responsibilities...responsibilities include: * Support the design and development of HPC and utility systems \(computation, network, and… more