- Meta (New York, NY)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
- Amazon (New York, NY)
- …computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you have a unique combination of deep technical ... C++, Python, CUDA, Bash - Deep GPU knowledge in HPC and/or AI /ML frameworks. Preferred Qualifications -...life sciences or related discipline. - Working knowledge of HPC schedulers and distributed/parallel file systems , underlying… more
- NYU Rory Meyers College of Nursing (New York, NY)
- Position Summary The Senior HPC Specialist supports New York University's High- Performance Computing ( HPC ) and Research Cloud services by partnering with ... HPC policies, while working closely with the HPC and Research Cloud Systems team on...protocols, tools and utilities. Experience with monitoring and improving HPC application performance . Demonstrated experience in diagnosing… more
- Meta (New York, NY)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Mount Sinai Health System (New York, NY)
- …Computing and Data's computational and data science ecosystem. This ecosystem includes high- performance computing and data systems , Data Ark data commons, ... translational science research. To achieve these aims, we support a high- performance computing and data ecosystem along with MD/PhD-level support for researchers.… more
- Lenovo (New York, NY)
- …architectural approach to a wide variety of Data Center solution spaces: HyperConverged, HPC , BigData, IoT, AI , Virtualization, Storage, etc. + Work in ... world's largest PC company with a full-stack portfolio of AI -enabled, AI -ready, and AI -optimized devices...Server, Nutanix, systems management products, and x86 systems + Knowledge of OpenStack, SAP, HPC ,… more
- Amazon (New York, NY)
- …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... Description Are you passionate about Generative AI (GenAI)? Do you want to help define...services to power their businesses. We're continuously raising our performance bar as we strive to become Earth's Best… more
- Amazon (New York, NY)
- …5+ years of technology domain experience in High Performance Computing, AI /ML, Math, Quantum Information Systems and Technologies, or similar accelerated ... to helping Global Financial Services scale quickly, cost and performance effectively, with the best resiliency and security on...with a focus on Amazon's Accelerated Computing portfolio (ie HPC , AIML, big data, Batch) , among others. You… more
- Microsoft Corporation (New York, NY)
- …direct-to-chip liquid cooling systems and immersion cooling tanks for enhanced performance . + Drive innovation in AI data center sustainability, focusing on ... technology deployments, or related roles, with a focus on AI or high- performance computing ( HPC )...+ Familiarity with data center management platforms optimizing liquid-cooled systems for AI workloads is a plus.… more
- Bloomberg (New York, NY)
- …or familiarity in MLOps platforms & Machine Learning toolkits + Applied experience optimizing HPC workloads for AI & ML + Experience with capacity planning, ... to shape and execute the vision, and roadmap for the next-generation Bloomberg Generative AI platform. As a Technical Product Manager, you will have ownership over a… more
- Meta (New York, NY)
- …in Python, C++ or CUDA programming. 10. Research or industry experience in ML systems , ML accelerators, HPC , GPU performance , and similar. 11. Currently ... PyTorch. 14. Expert knowledge in GPU performance and writing high- performance communication libraries and fault tolerance distributed systems . 15. Proven… more
- TEKsystems (Piscataway, NJ)
- …platforms * Knowledge and experience supporting enterprise data centers or high performance computing systems * Knowledge of high availability system ... fiber optic cable, troubleshooting, cabling, data center operations, rack and stack, HPC , High Performance Computing, network fabric, Linux Top Skills Details:… more