• Sr. Technical Lead Manager - AI

    Meta (Menlo Park, CA)
    …libraries and scheduling infrastructure. **Required Skills:** Sr. Technical Lead Manager - AI / HPC Systems Performance Responsibilities: 1. Support ... requirements of large-scale training and inference workloads. To improve performance of these systems we constantly look...on monitoring, benchmarking and looking for opportunities to improve performance of AI Training and Inference. 2.… more
    Meta (02/05/25)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (03/05/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, NPI AI

    Meta (Menlo Park, CA)
    … testing with focus on automation. 22. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity with ... and/or similar languages. **Preferred Qualifications:** Preferred Qualifications: 16. Proficiency in High- Performance Computing ( HPC ) or AI system… more
    Meta (01/24/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (01/23/25)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Network Engineer

    Meta (Menlo Park, CA)
    …of RDMA workloads that expects a loss-less fabric interconnect. To enhance the performance of these systems , we continuously seek opportunities for improvement ... host networking, communication libraries, and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test… more
    Meta (03/04/25)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Network Engineer

    Meta (Menlo Park, CA)
    …requirements of RDMA workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across ... fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test and… more
    Meta (02/06/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, AI NPI

    Meta (Menlo Park, CA)
    …end-to-end system validation strategy (hardware and software), with a focus on various AI / HPC hardware systems in datacenter applications. 2. Lead the ... algorithms, and OOP). **Preferred Qualifications:** Preferred Qualifications: 17. Proficiency in High- Performance Computing ( HPC ) or AI system architecture… more
    Meta (02/05/25)
    - Save Job - Related Jobs - Block Source
  • Technical Program Manager, AI Network Infra

    Meta (Menlo Park, CA)
    AI product introductions and AI ops initiatives supporting Meta's growing AI / HPC infrastructure to enable AI product development for our Family of ... innovative and ground-breaking solutions and technologies. You will have experience in AI / HPC product development and operations with demonstrated experience in… more
    Meta (03/05/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, AI Networking (PhD)

    Meta (Menlo Park, CA)
    …domains: High speed networking (RDMA), Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance ... GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. We are seeking engineers… more
    Meta (03/11/25)
    - Save Job - Related Jobs - Block Source
  • Engineering Manager, PyTorch - AI

    Meta (Menlo Park, CA)
    …in high- performance computation. **Required Skills:** Engineering Manager, PyTorch - AI Acceleration Responsibilities: 1. Grow a team of domain experts within ... **Summary:** AI Acceleration is an org within PyTorch. It's...candidate should have technical skills - GPU / ML Systems knowledge is preferred, though not required. We work… more
    Meta (03/15/25)
    - Save Job - Related Jobs - Block Source
  • AI Applications Engineer

    quadric.io, Inc (Burlingame, CA)
    …wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high- performance automotive or autonomous vehicle systems . ... Candidates must demonstrate deep technical mastery of Quadric's product ecosystem including HPC Hardware (IP, Chips, Boards), SDK, and various algorithms (NN, DSP,… more
    quadric.io, Inc (03/14/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Accelerator Systems

    Meta (Menlo Park, CA)
    …11. Full-stack experience and understanding of AI / HPC systems , from HW/infrastructure through the application layer, performance optimizations, including ... learning domains: hardware accelerators, AI Infrastructure, and/or high performance computing ( HPC ), particularly pertaining to interconnect and collective.… more
    Meta (01/30/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Accelerator Solutions…

    Meta (Menlo Park, CA)
    …**Preferred Qualifications:** Preferred Qualifications: 15. Full-stack experience and understanding of AI / HPC systems , from hardware and infrastructure ... ML domains: hardware accelerators, AI Infrastructure, and/or high performance compute ( HPC ), particularly pertaining to interconnect and collective.… more
    Meta (01/30/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - PyTorch…

    Meta (Menlo Park, CA)
    …computer science or related field. 8. Research or industry experience in compilers, ML systems , ML accelerators, HPC , GPU performance , and similar. 9. ... Our work is open-source, cutting-edge, and industry-leading. **Required Skills:** Software Engineer, Systems ML - PyTorch Compiler / ML Framework / Performance more
    Meta (03/15/25)
    - Save Job - Related Jobs - Block Source
  • Sr Staff Engineer, ML Infrastructure…

    LinkedIn (Mountain View, CA)
    …parallel file systems , object storage, NVMe over Fabric) to meet performance and capacity requirements for ML workloads. Collaborate with network and storage ... our large-scale GPU infrastructure for machine learning (ML) and AI workloads. In this role, you will be the...8+ years of experience designing and managing large-scale, distributed systems or HPC environments, with at least… more
    LinkedIn (03/04/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - Scaling…

    Meta (Menlo Park, CA)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (01/17/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, NPI

    Meta (Menlo Park, CA)
    …years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Kernel development, Performance optimization (eg, NVIDIA, AMD, ... productizing high- performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer...Intel, or other misc accelerator), computer architecture, HPC communication libraries (eg, NCCL, MPI), performance more
    Meta (01/17/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Accelerator Solutions…

    Meta (Menlo Park, CA)
    …training **Preferred Qualifications:** Preferred Qualifications: 15. Full-stack experience and understanding of AI / HPC systems , with a focus on the ... of Meta's accelerators collective communications software library and optimizing distributed AI /ML workloads' performance . This is an opportunity to work… more
    Meta (02/01/25)
    - Save Job - Related Jobs - Block Source
  • Senior GenAI Specialist Solutions Architect,…

    Amazon (San Francisco, CA)
    …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... Description Are you passionate about Generative AI (GenAI)? Do you want to help define...services to power their businesses. We're continuously raising our performance bar as we strive to become Earth's Best… more
    Amazon (02/15/25)
    - Save Job - Related Jobs - Block Source
  • Sr Worldwide Specialist Solutions Architect…

    Amazon (San Francisco, CA)
    …experience - 5+ years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to take advantage of high ... of Go to Market (GTM) at AWS using generative AI (GenAI)? AWS Sales, Marketing, and Global Services (SMGS)...years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to… more
    Amazon (02/19/25)
    - Save Job - Related Jobs - Block Source