• Senior DGX Cloud

    NVIDIA (Santa Clara, CA)
    … analysis, optimization, and modeling to define the architecture and design of NVIDIA's DGX Cloud clusters. The ideal candidate will have a deep understanding of ... the methodology to conduct end to end performance analysis of critical AI applications running on large...will work closely with the multi-functional teams to define DGX Cloud cluster architecture for different CSPs,… more
    NVIDIA (11/07/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer, Kubernetes…

    NVIDIA (Santa Clara, CA)
    …GPU deep learning. What you will be doing: + You will be part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to ... to ensure production AI clusters run reliability and consistently with maximum performance . Evaluating system failures and improving services based on a well-defined… more
    NVIDIA (11/23/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer…

    NVIDIA (Santa Clara, CA)
    …database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures ... that our internal and external facing GPU cloud services run maximum reliability and uptime as promised...planning while keeping an eye on capacity, latency and performance . SRE is also a mindset and a set… more
    NVIDIA (11/23/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer, Reliability…

    NVIDIA (Santa Clara, CA)
    DGXC SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time ... planning while keeping an eye on capacity, latency and performance . We are looking for systems and software engineers...deploy, and run internal tooling built on top of cloud infrastructure to provide foundations for operational excellence. +… more
    NVIDIA (09/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior System Software Engineer,…

    NVIDIA (Santa Clara, CA)
    …that automates GPU asset provisioning, configuration, and lifecycle management across cloud providers. + Design, develop, test, debug, and optimize creative ... solid understanding of Data Structure and Algorithms. + Understanding of performance , security and reliability in complex distributed systems. Familiarity with… more
    NVIDIA (10/22/24)
    - Save Job - Related Jobs - Block Source
  • Site Reliability Engineer

    Cisco (Research Triangle Park, NC)
    …hardware degradation or failure. * Develop solutions for supervising AI/ML model performance across DGX clusters, integrating logging and supervising for model ... projects simultaneously. * Experience with NVIDIA NGC (NVIDIA GPU Cloud ) and DGX OS software stack for...of NVIDIA Deep Learning frameworks (TensorFlow, PyTorch) and their performance optimization on DGX infrastructure. * Experience… more
    Cisco (11/14/24)
    - Save Job - Related Jobs - Block Source
  • Senior System Software Engineer…

    NVIDIA (Santa Clara, CA)
    …System Software Engineer to help us build out our scientific computing platform on Nvidia DGX Cloud . We are building a cloud based accelerated scientific ... computing platform as a service on the Nvidia DGX cloud . This DGX scientific...and actively engaged with operations to increase overall system performance , it spans across the stack eg deep understanding… more
    NVIDIA (09/10/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NPN

    NVIDIA (Santa Clara, CA)
    …involving GPU workloads, Kubernetes, InfiniBand, Ethernet, or other areas related to high- performance clusters and hybrid cloud solutions. + Exhibit hands on ... NVIDIA Partner Network team, we are actively helping NVIDIA DGX and DGX SuperPOD solutions bring the...help bring NVIDIA's premiere technologies to life in the cloud and in the datacenter. + Our mission is… more
    NVIDIA (09/18/24)
    - Save Job - Related Jobs - Block Source
  • Senior Offensive Security Engineer - Data…

    NVIDIA (Santa Clara, CA)
    …on offensive security efforts for our Data Center Systems, such as NVIDIA HGX, DGX , and MGX. What you'll be doing: + Identify vulnerabilities in our Data Center ... is leading the way in groundbreaking developments in Artificial Intelligence, High- Performance Computing and Visualization. The GPU, our invention, serves as the… more
    NVIDIA (11/24/24)
    - Save Job - Related Jobs - Block Source
  • Senior PCIe and Validation Engineer

    NVIDIA (Santa Clara, CA)
    …crafting NVIDIA's GPUs and SoCs into groundbreaking platforms for autonomous machines, Cloud and Data Centers, Deep learning, High- Performance Computing, Gaming, ... improve the silicon validation process, which will help meet upcoming performance , adaptability, and safety industry standards. + Ensure interoperability with… more
    NVIDIA (11/12/24)
    - Save Job - Related Jobs - Block Source
  • Senior CUDA Compute Systems Software…

    NVIDIA (Santa Clara, CA)
    …supports NVIDIA's kernel level drivers for supporting Cuda, especially on our AI, Cloud and Data Center product line-ups. This product line-up includes full system ... products such as the DGX platform, modular components such as MGX and stand-alone...Windows based operating systems (current and future) insuring optimum performance and feature set + Focusing on cross platform… more
    NVIDIA (11/15/24)
    - Save Job - Related Jobs - Block Source
  • Manager, Software Technical Program Management…

    NVIDIA (Santa Clara, CA)
    NVIDIA data center systems, such as DGX , MGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms ... be the cross-section between execution and strategy, leading a team of Senior TPMs driving impactful programs and delivering measurable results across many functions… more
    NVIDIA (11/14/24)
    - Save Job - Related Jobs - Block Source