• Postdoctoral Researcher, AI / HPC

    Meta (Menlo Park, CA)
    …networking, comms lib and scheduling infrastructure. **Required Skills:** Postdoctoral Researcher, AI / HPC Systems Performance (PhD) Responsibilities: ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (08/01/24)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
    NVIDIA (06/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …designing and operating large scale compute infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Working ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek an...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
    NVIDIA (06/30/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect, AI

    NVIDIA (Santa Clara, CA)
    …group at NVIDIA has openings for software architects in the field of AI and high- performance networking and system software. We research, develop, and ... be doing + Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), HPC programming models (MPI, OpenSHMEM, PGAS), new… more
    NVIDIA (07/26/24)
    - Save Job - Related Jobs - Block Source
  • Solutions Architect - AI and HPC

    NVIDIA (Santa Clara, CA)
    …doing: + Work with NVIDIA Product Teams to understand new product requirements including HPC and AI /ML Products. + Finding Optimum Solutions to deploy these ... hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android), a multitude of hardware platforms both NVIDIA GPUs and… more
    NVIDIA (06/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Systems Engineer

    NVIDIA (Santa Clara, CA)
    …the world. We are looking for an outstanding engineer for a Senior HPC Systems Engineer role for at scale AI system performance and datacenter ... develop new, leading differentiated solutions. You will interact with HPC , OS, CPU and GPU compute, and systems...debugging and resolving critical software issues for the best AI workload performance at scale. + Specific… more
    NVIDIA (09/04/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Systems ML - HPC

    Meta (Burlingame, CA)
    …in multiple locations. **Required Skills:** Software Engineer, Systems ML - HPC Responsibilities: 1. Apply relevant AI and machine learning techniques to ... **Summary:** Meta is seeking an AI Software Engineer to join our Research &...on the web.Some aspects of this role as an HPC specialist will include using lower precision numeric formats… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more
    NVIDIA (08/24/24)
    - Save Job - Related Jobs - Block Source
  • Sr. Software Development Engineer, HPC /ML…

    Amazon (Cupertino, CA)
    HPC network fabric or machine learning accelerator cluster systems . Also applicable is experience high-frequency trading networking, high-speed wireless ... or interconnect expertise to optimize customer experience by designing systems that enable scaling network-intensive workloads over thousands of...and TPUs. This role is on the forefront of AI /ML, we spend a good deal of the day… more
    Amazon (07/08/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Architect

    NVIDIA (Santa Clara, CA)
    …improved workflows and develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist to architect, develop ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of… more
    NVIDIA (08/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - HPC

    NVIDIA (Santa Clara, CA)
    …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
    NVIDIA (09/05/24)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, NPI AI

    Meta (Menlo Park, CA)
    …of issues. RTP team also helps in exploring, developing and productizing high- performance software and hardware technologies for AI at datacenter scale.RTP ... validation, supporting customer deployment, production issue triage. **Required Skills:** Hardware Systems Engineer, NPI AI Responsibilities: 1. Lead and execute… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • Principal Engineer for AI Software…

    NVIDIA (Santa Clara, CA)
    …expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable at all times. What You'll Be ... to embed AI resilience features into their AI frameworks, ensuring seamless integration and optimal performance...or related fields, with a deep understanding of distributed systems and large-scale AI infrastructure. + At… more
    NVIDIA (08/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior Deep Learning Systems Software…

    NVIDIA (Santa Clara, CA)
    …experience in performance optimization and benchmarking on large-scale distributed systems + Hands-on experience with NVIDIA GPUs, HPC storage, networking, ... NVIDIA is an industry leader with groundbreaking developments in High- Performance Computing, Artificial Intelligence and Visualization. The GPU, our invention,… more
    NVIDIA (09/04/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (09/04/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI Hardware Architect

    Microsoft Corporation (Mountain View, CA)
    …+ Analyse Hardware Architecture for AI workloads. + Architecting large scale systems which support breakthrough performance AI workloads to shape Azure's ... or related field. + 3+ years of experience in Computer Architecture or AI Systems . **Other Requirements** + Ability to meet Microsoft, customer and/or… more
    Microsoft Corporation (08/22/24)
    - Save Job - Related Jobs - Block Source
  • Senior Platform Software Engineer, AI

    NVIDIA (Santa Clara, CA)
    …GH200 superchip provides performance and productivity required for strong scaling for HPC and generative AI workload.Scale out is inherent to design of this ... the world. Today, we are increasingly known as "the AI computing company." We're looking to grow our company...issue closure. + Identify new technologies, features to improve performance , functionality, uptime of GPU systems to… more
    NVIDIA (09/05/24)
    - Save Job - Related Jobs - Block Source
  • Engineering Manager, PyTorch - AI

    Meta (Menlo Park, CA)
    …in high- performance computation. **Required Skills:** Engineering Manager, PyTorch - AI Acceleration Responsibilities: 1. Grow a team of domain experts within ... **Summary:** AI Acceleration is an org within PyTorch. It's...should have strong technical skills - GPU / ML Systems knowledge is preferred, though not required. We work… more
    Meta (07/17/24)
    - Save Job - Related Jobs - Block Source