• Postdoctoral Researcher, AI / HPC

    Meta (Menlo Park, CA)
    …networking, comms lib and scheduling infrastructure. **Required Skills:** Postdoctoral Researcher, AI / HPC Systems Performance (PhD) Responsibilities: ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (08/01/24)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - HPC

    Meta (Burlingame, CA)
    …in multiple locations. **Required Skills:** Software Engineer, Systems ML - HPC Responsibilities: 1. Apply relevant AI and machine learning techniques to ... **Summary:** Meta is seeking an AI Software Engineer to join our Research &...on the web.Some aspects of this role as an HPC specialist will include using lower precision numeric formats… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Distinguished Engineer, Generative AI

    Capital One (San Francisco, CA)
    …strategies, in our public cloud. + Design and implement benchmarks to measure the performance of software systems within AI capabilities and make ... San Francisco, United States of America, San Francisco, California Distinguished Engineer, Generative AI Systems (Remote Eligible) Our mission at Capital One is… more
    Capital One (09/08/24)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, NPI AI

    Meta (Menlo Park, CA)
    …of issues. RTP team also helps in exploring, developing and productizing high- performance software and hardware technologies for AI at datacenter scale.RTP ... validation, supporting customer deployment, production issue triage. **Required Skills:** Hardware Systems Engineer, NPI AI Responsibilities: 1. Lead and execute… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (09/04/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI Hardware Architect

    Microsoft Corporation (Mountain View, CA)
    …+ Analyse Hardware Architecture for AI workloads. + Architecting large scale systems which support breakthrough performance AI workloads to shape Azure's ... or related field. + 3+ years of experience in Computer Architecture or AI Systems . **Other Requirements** + Ability to meet Microsoft, customer and/or… more
    Microsoft Corporation (09/17/24)
    - Save Job - Related Jobs - Block Source
  • Engineering Manager, PyTorch - AI

    Meta (Menlo Park, CA)
    …in high- performance computation. **Required Skills:** Engineering Manager, PyTorch - AI Acceleration Responsibilities: 1. Grow a team of domain experts within ... **Summary:** AI Acceleration is an org within PyTorch. It's...should have strong technical skills - GPU / ML Systems knowledge is preferred, though not required. We work… more
    Meta (07/17/24)
    - Save Job - Related Jobs - Block Source
  • Principal Software Engineer, AI Platform…

    General Motors (Mountain View, CA)
    …This role will involve working across various areas, from enhancing underlying HPC infrastructure to optimizing Kubernetes and Kubeflow setups, as well as refining ... teams to understand requirements and implement solutions. + Troubleshoot complex HPC infrastructure issues and implement effective resolutions with partner team. +… more
    General Motors (07/12/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - PyTorch…

    Meta (Menlo Park, CA)
    …computer science or related field. 8. Research or industry experience in compilers, ML systems , ML accelerators, HPC , GPU performance , and similar. 9. ... Our work is open-source, cutting-edge, and industry-leading. **Required Skills:** Software Engineer, Systems ML - PyTorch Compiler / ML Framework / Performance more
    Meta (08/08/24)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, RAS

    Meta (Menlo Park, CA)
    …years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Kernel development, Performance optimization (eg, NVIDIA, AMD, ... productizing high- performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer...Intel, or other misc accelerator), computer architecture, HPC communication libraries (eg, NCCL, MPI), performance more
    Meta (08/21/24)
    - Save Job - Related Jobs - Block Source
  • Manager, Production Engineering(Network)

    Meta (Menlo Park, CA)
    …1. Support and lead engineers who are responsible for reliably scaling Meta's AI / HPC networking operations. 2. Partner with teams across Meta's AI ... **Summary:** AI Training and Inference is a core pillar...production issues through the entire stack and building software systems to ensure that operations can be scaled appropriately.… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior GenAI Specialist Solutions Architect,…

    Amazon (San Francisco, CA)
    …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... Description Are you passionate about Generative AI (GenAI)? Do you want to help define...specific technology domain areas like software development, cloud computing, systems engineering, infrastructure, security, networking, data and analytics -… more
    Amazon (09/07/24)
    - Save Job - Related Jobs - Block Source
  • Head of Accounting (Mountain View)

    Lightmatter (Mountain View, CA)
    …processors at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. As Lightmatter enters a crucial phase of accelerated ... Lightmatter is leading the revolution in AI data center infrastructure and enabling the next...values exceptional talent and kindness. We have created a high- performance , collaborative, and open culture where our employees are… more
    Lightmatter (09/14/24)
    - Save Job - Related Jobs - Block Source
  • Staff Supplier Quality Engineer

    Lightmatter (Mountain View, CA)
    …of processors at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. We are seeking a Supplier Quality Engineer who will ... Lightmatter is leading the revolution in networking for AI and enabling the next giant leaps in...and internal stakeholders to monitor, analyze, and enhance supplier performance . The SQE will play a critical role in… more
    Lightmatter (09/01/24)
    - Save Job - Related Jobs - Block Source