• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (08/01/24)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Research Scientist, Systems ML…

    Meta (Menlo Park, CA)
    …Meta and externally. **Required Skills:** Research Scientist, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. Apply High- Performance Computing ( ... Performance team is dedicated to maximizing training performance of Generative AI and recommendation models...HPC ) algorithms and techniques to optimize large-scale AI workloads 2. Analyze, benchmark, and optimize large-scale workloads… more
    Meta (09/21/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - HPC

    Meta (Burlingame, CA)
    …in multiple locations. **Required Skills:** Software Engineer, Systems ML - HPC Responsibilities: 1. Apply relevant AI and machine learning techniques to ... **Summary:** Meta is seeking an AI Software Engineer to join our Research &...on the web.Some aspects of this role as an HPC specialist will include using lower precision numeric formats… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Distinguished Engineer, Generative AI

    Capital One (San Francisco, CA)
    …strategies, in our public cloud. + Design and implement benchmarks to measure the performance of software systems within AI capabilities and make ... San Francisco, United States of America, San Francisco, California Distinguished Engineer, Generative AI Systems (Remote Eligible) Our mission at Capital One is… more
    Capital One (09/08/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (09/04/24)
    - Save Job - Related Jobs - Block Source
  • Engineering Manager, PyTorch - AI

    Meta (San Francisco, CA)
    …in high- performance computation. **Required Skills:** Engineering Manager, PyTorch - AI Acceleration Responsibilities: 1. Grow a team of domain experts within ... **Summary:** AI Acceleration is an org within PyTorch. It's...should have strong technical skills - GPU / ML Systems knowledge is preferred, though not required. We work… more
    Meta (07/17/24)
    - Save Job - Related Jobs - Block Source
  • Technical Lead/Manager - AI /ML…

    Cisco (San Jose, CA)
    …team engaged in the design, development and execution of tests to qualify network performance for AI .ML capability. In this role you'll have opportunity to: * ... to build the next generation infrastructure to meet the needs of AI /ML workloads and continuously increasing internet users and application. We are uniquely… more
    Cisco (09/12/24)
    - Save Job - Related Jobs - Block Source
  • Principal Software Engineer, AI Platform…

    General Motors (Mountain View, CA)
    …This role will involve working across various areas, from enhancing underlying HPC infrastructure to optimizing Kubernetes and Kubeflow setups, as well as refining ... teams to understand requirements and implement solutions. + Troubleshoot complex HPC infrastructure issues and implement effective resolutions with partner team. +… more
    General Motors (07/12/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - PyTorch…

    Meta (Menlo Park, CA)
    …computer science or related field. 8. Research or industry experience in compilers, ML systems , ML accelerators, HPC , GPU performance , and similar. 9. ... Our work is open-source, cutting-edge, and industry-leading. **Required Skills:** Software Engineer, Systems ML - PyTorch Compiler / ML Framework / Performance more
    Meta (08/08/24)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, RAS

    Meta (Menlo Park, CA)
    …years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Kernel development, Performance optimization (eg, NVIDIA, AMD, ... productizing high- performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer...Intel, or other misc accelerator), computer architecture, HPC communication libraries (eg, NCCL, MPI), performance more
    Meta (08/21/24)
    - Save Job - Related Jobs - Block Source
  • Manager, Production Engineering(Network)

    Meta (Menlo Park, CA)
    …1. Support and lead engineers who are responsible for reliably scaling Meta's AI / HPC networking operations. 2. Partner with teams across Meta's AI ... **Summary:** AI Training and Inference is a core pillar...production issues through the entire stack and building software systems to ensure that operations can be scaled appropriately.… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior GenAI Specialist Solutions Architect,…

    Amazon (San Francisco, CA)
    …modernizing customer requirements to the cloud - Practical experience in High Performance Computing ( HPC ) and/or distributed training, performance profiling ... Description Are you passionate about Generative AI (GenAI)? Do you want to help define...specific technology domain areas like software development, cloud computing, systems engineering, infrastructure, security, networking, data and analytics -… more
    Amazon (09/07/24)
    - Save Job - Related Jobs - Block Source
  • Head of Accounting (Mountain View)

    Lightmatter (Mountain View, CA)
    …processors at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. As Lightmatter enters a crucial phase of accelerated ... Lightmatter is leading the revolution in AI data center infrastructure and enabling the next...values exceptional talent and kindness. We have created a high- performance , collaborative, and open culture where our employees are… more
    Lightmatter (09/14/24)
    - Save Job - Related Jobs - Block Source
  • Staff Supplier Quality Engineer

    Lightmatter (Mountain View, CA)
    …of processors at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. We are seeking a Supplier Quality Engineer who will ... Lightmatter is leading the revolution in networking for AI and enabling the next giant leaps in...and internal stakeholders to monitor, analyze, and enhance supplier performance . The SQE will play a critical role in… more
    Lightmatter (09/01/24)
    - Save Job - Related Jobs - Block Source
  • Sr. GTM Specialist Solutions Architect,…

    Amazon (San Francisco, CA)
    …- 5+ years of technical experience in High Performance Computing, AI /ML, Math, Quantum Information Systems and Technologies, or similar accelerated computing ... that helps Startups adopt AWS' Accelerated Computing portfolio (ie HPC , AIML, big data), among others. You will 1/Be...businesses. Mentorship & Career Growth: We're continuously raising our performance bar as we strive to become Earth's Best… more
    Amazon (09/21/24)
    - Save Job - Related Jobs - Block Source
  • Sr. GTM Specialist, Accelerated Compute, Startups

    Amazon (San Francisco, CA)
    …5+ years of technology domain experience in High Performance Computing, AI /ML, Math, Quantum Information Systems and Technologies, or similar accelerated ... with a focus on Amazon's Accelerated Computing portfolio (ie HPC , AIML, big data) , among others. You will...businesses. Mentorship & Career Growth: We're continuously raising our performance bar as we strive to become Earth's Best… more
    Amazon (09/21/24)
    - Save Job - Related Jobs - Block Source