• AI / HPC Systems

    Meta (New York, NY)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (10/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer- AI Hardware

    Bloomberg (New York, NY)
    …maintaining system software that enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems . This role will also be responsible ... overseeing the ongoing monitoring, support, and maintenance of our HPC / AI clusters, ensuring peak performance ...enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems +… more
    Bloomberg (01/11/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - Scaling…

    Meta (New York, NY)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (01/07/25)
    - Save Job - Related Jobs - Block Source
  • Senior GenAI Specialist Solutions Architect,…

    Amazon (New York, NY)
    …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... Description Are you passionate about Generative AI (GenAI)? Do you want to help define...services to power their businesses. We're continuously raising our performance bar as we strive to become Earth's Best… more
    Amazon (11/16/24)
    - Save Job - Related Jobs - Block Source
  • Sr Worldwide Specialist Solutions Architect,…

    Amazon (New York, NY)
    …experience - 5+ years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to take advantage of high ... of Go to Market (GTM) at AWS using generative AI (GenAI)? AWS Sales, Marketing, and Global Services (SMGS)...years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to… more
    Amazon (11/20/24)
    - Save Job - Related Jobs - Block Source
  • Technical Product Manager, Large Language Model…

    Bloomberg (New York, NY)
    …or familiarity in MLOps platforms & Machine Learning toolkits + Applied experience optimizing HPC workloads for AI & ML + Experience with capacity planning, ... to shape and execute the vision, and roadmap for the next-generation Bloomberg Generative AI platform. As a Technical Product Manager, you will have ownership over a… more
    Bloomberg (11/28/24)
    - Save Job - Related Jobs - Block Source