• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
    Meta (10/27/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (10/31/24)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (10/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
    NVIDIA (11/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior Product Architect, HPC and AI

    NVIDIA (Santa Clara, CA)
    …topologies + Extensive experience with benchmarking systems and analyzing performance bottlenecks in large-scale AI / HPC infrastructure + Exceptional ... harness your infrastructure expertise to create reference designs for the world's most powerful AI clusters. As an AI / HPC Product Architect at NVIDIA, you'll… more
    NVIDIA (10/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI and HPC Clusters Lead

    NVIDIA (Santa Clara, CA)
    …Understanding of fast, distributed storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning frameworks like PyTorch and ... parallel computing. Now, GPU deep learning is driving modern AI forward. Join our GPU AI / HPC...identify bottlenecks and opportunities for optimization, continuously improving the performance and cost-effectiveness of our AI computing… more
    NVIDIA (11/08/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC and AI Networking…

    NVIDIA (Santa Clara, CA)
    …fit for you, we'd love to hear from you! NVIDIA is seeking a Senior High Performance Computing ( HPC ) and AI Networking Performance Research and Analysis ... In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale clusters...and platforms, such as HCAs, Switches, CPUs, GPUs, and Systems . You will develop performance analysis tools… more
    NVIDIA (12/07/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect, AI

    NVIDIA (Santa Clara, CA)
    …group at NVIDIA has openings for software architects in the field of AI and high- performance networking and system software. We research, develop, and ... be doing + Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), HPC programming models (MPI, OpenSHMEM, PGAS), new… more
    NVIDIA (10/29/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Network Engineer

    Meta (Menlo Park, CA)
    …of RDMA workloads that expects a loss-less fabric interconnect. To enhance the performance of these systems , we continuously seek opportunities for improvement ... host networking, communication libraries, and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test… more
    Meta (12/03/24)
    - Save Job - Related Jobs - Block Source
  • Product Manager, AI / HPC

    Meta (Menlo Park, CA)
    Performance Compute organization, you will define and develop the compute, storage, and AI systems that are deployed across our global data center fleet and ... and requirements needed for future hardware platforms **Required Skills:** Product Manager, AI / HPC Responsibilities: 1. Establish a shared vision and strategy… more
    Meta (12/05/24)
    - Save Job - Related Jobs - Block Source
  • Research Scientist, Systems ML…

    Meta (Menlo Park, CA)
    …Meta and externally. **Required Skills:** Research Scientist, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. Apply High- Performance Computing ( ... Performance team is dedicated to maximizing training performance of Generative AI and recommendation models...HPC ) algorithms and techniques to optimize large-scale AI workloads 2. Analyze, benchmark, and optimize large-scale workloads… more
    Meta (12/03/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, HPC

    NVIDIA (Santa Clara, CA)
    …Machine Learning ecosystems. You'll be called on to help architect and scale high- performance , distributed AI infrastructure on-prem or in the cloud built with ... profilers/ performance analysis tools (NSys). + Familiarity with NVIDIA systems /SDKs (eg CUDA), NVIDIA Networking technologies (eg, RoCE, InfiniBand), Switch… more
    NVIDIA (12/11/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Systems Engineer

    NVIDIA (Santa Clara, CA)
    …the world. We are looking for an outstanding engineer for a Senior HPC Systems Engineer role for at scale AI system performance and datacenter ... develop new, leading differentiated solutions. You will interact with HPC , OS, CPU and GPU compute, and systems...debugging and resolving critical software issues for the best AI workload performance at scale. + Specific… more
    NVIDIA (12/04/24)
    - Save Job - Related Jobs - Block Source
  • Research Scientist Intern, Systems ML…

    Meta (Menlo Park, CA)
    …our research, visit https:// ai .facebook.com. **Required Skills:** Research Scientist Intern, Systems ML and HPC - SW/HW Co-Design Responsibilities: 1. ... team's mission is to explore, develop and help productize high- performance software and hardware technologies for AI ...infrastructure.Meta is seeking Research Scientist Interns to join our AI & Systems Co-Design Training team to… more
    Meta (10/12/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more
    NVIDIA (11/23/24)
    - Save Job - Related Jobs - Block Source
  • Sr. Software Development Engineer, HPC /ML…

    Amazon (Cupertino, CA)
    HPC network fabric or machine learning accelerator cluster systems . Also applicable is experience high-frequency trading networking, high-speed wireless ... team focuses on building networking solutions that for Machine Learning (ML) and High- Performance Computing ( HPC ) workloads on AWS. Working at Annapurna Labs… more
    Amazon (12/20/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Architect

    NVIDIA (Santa Clara, CA)
    …improved workflows and develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist to architect, develop ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of… more
    NVIDIA (11/23/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - HPC

    NVIDIA (Santa Clara, CA)
    …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
    NVIDIA (12/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer - AI

    NVIDIA (Santa Clara, CA)
    … infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more
    NVIDIA (12/25/24)
    - Save Job - Related Jobs - Block Source
  • Distinguished Engineer, AI Resiliency Lead

    NVIDIA (Santa Clara, CA)
    …+ Hands-on involvement in the entire lifecycle-from design to deployment-of large-scale High- Performance Computing ( HPC ) systems . + Experience in ... architecture or related fields, with a deep understanding of AI -optimized systems . + Excellent and proven ability...hands-on experience in software development on high-complexity projects involving HPC or AI . Ways to Stand Out… more
    NVIDIA (10/23/24)
    - Save Job - Related Jobs - Block Source