• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
    Meta (10/27/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (10/31/24)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (10/24/24)
    - Save Job - Related Jobs - Block Source
  • Research Scientist, Systems ML…

    Meta (Menlo Park, CA)
    …Meta and externally. **Required Skills:** Research Scientist, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. Apply High- Performance Computing ( ... Performance team is dedicated to maximizing training performance of Generative AI and recommendation models...HPC ) algorithms and techniques to optimize large-scale AI workloads 2. Analyze, benchmark, and optimize large-scale workloads… more
    Meta (09/21/24)
    - Save Job - Related Jobs - Block Source
  • Research Scientist Intern, Systems ML…

    Meta (Menlo Park, CA)
    …our research, visit https:// ai .facebook.com. **Required Skills:** Research Scientist Intern, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. ... team's mission is to explore, develop and help productize high- performance software and hardware technologies for AI ...infrastructure.Meta is seeking Research Scientist Interns to join our AI & Systems Co-Design Training team to… more
    Meta (10/12/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (10/18/24)
    - Save Job - Related Jobs - Block Source
  • Research Scientist Intern, Systems ML…

    Meta (Menlo Park, CA)
    …and suited for the hardware infrastructure.Meta is seeking Research Scientist Interns to join our AI & Systems Co-Design HPC & Inference team to drive the ... definition of our next-generation AI Systems Inference and Training architectures. The... performance - Model, SW, System, Accelerator - Performance modeling and simulations- HPC Software Optimizations-… more
    Meta (10/11/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - PyTorch…

    Meta (Menlo Park, CA)
    …computer science or related field. 8. Research or industry experience in compilers, ML systems , ML accelerators, HPC , GPU performance , and similar. 9. ... Our work is open-source, cutting-edge, and industry-leading. **Required Skills:** Software Engineer, Systems ML - PyTorch Compiler / ML Framework / Performance more
    Meta (11/07/24)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, RAS

    Meta (Menlo Park, CA)
    …years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Kernel development, Performance optimization (eg, NVIDIA, AMD, ... developing and productizing high- performance software and hardware technologies for AI at datacenter scale.Hardware Systems Engineer in RTP work closely… more
    Meta (11/02/24)
    - Save Job - Related Jobs - Block Source
  • Senior GenAI Specialist Solutions Architect,…

    Amazon (San Francisco, CA)
    …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... Description Are you passionate about Generative AI (GenAI)? Do you want to help define...services to power their businesses. We're continuously raising our performance bar as we strive to become Earth's Best… more
    Amazon (09/07/24)
    - Save Job - Related Jobs - Block Source
  • Senior Director, New Technology Deployments (Data…

    Microsoft Corporation (San Francisco, CA)
    …direct-to-chip liquid cooling systems and immersion cooling tanks for enhanced performance . + Drive innovation in AI data center sustainability, focusing on ... technology deployments, or related roles, with a focus on AI or high- performance computing ( HPC )...+ Familiarity with data center management platforms optimizing liquid-cooled systems for AI workloads is a plus.… more
    Microsoft Corporation (11/01/24)
    - Save Job - Related Jobs - Block Source
  • Research Engineer, Pytorch Distributed (PhD)

    Meta (Menlo Park, CA)
    …in Python, C++ or CUDA programming. 10. Research or industry experience in ML systems , ML accelerators, HPC , GPU performance , and similar. 11. Currently ... PyTorch. 14. Expert knowledge in GPU performance and writing high- performance communication libraries and fault tolerance distributed systems . 15. Proven… more
    Meta (10/24/24)
    - Save Job - Related Jobs - Block Source
  • Sr. GTM Specialist, Accelerated Compute, Startups

    Amazon (San Francisco, CA)
    …5+ years of technology domain experience in High Performance Computing, AI /ML, Math, Quantum Information Systems and Technologies, or similar accelerated ... with a focus on Amazon's Accelerated Computing portfolio (ie HPC , AIML, big data) , among others. You will...businesses. Mentorship & Career Growth: We're continuously raising our performance bar as we strive to become Earth's Best… more
    Amazon (09/21/24)
    - Save Job - Related Jobs - Block Source