- NVIDIA (Santa Clara, CA)
- … analysis, optimization, and modeling to define the architecture and design of NVIDIA's DGX Cloud clusters. The ideal candidate will have a deep understanding of ... the methodology to conduct end to end performance analysis of critical AI applications running on large...will work closely with the multi-functional teams to define DGX Cloud cluster architecture for different CSPs,… more
- NVIDIA (Santa Clara, CA)
- …GPU deep learning. What you will be doing: + You will be part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to ... to ensure production AI clusters run reliability and consistently with maximum performance . Evaluating system failures and improving services based on a well-defined… more
- NVIDIA (Santa Clara, CA)
- …database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures ... that our internal and external facing GPU cloud services run maximum reliability and uptime as promised...planning while keeping an eye on capacity, latency and performance . SRE is also a mindset and a set… more
- NVIDIA (Santa Clara, CA)
- DGXC SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time ... planning while keeping an eye on capacity, latency and performance . We are looking for systems and software engineers...deploy, and run internal tooling built on top of cloud infrastructure to provide foundations for operational excellence. +… more
- NVIDIA (Santa Clara, CA)
- …that automates GPU asset provisioning, configuration, and lifecycle management across cloud providers. + Design, develop, test, debug, and optimize creative ... solid understanding of Data Structure and Algorithms. + Understanding of performance , security and reliability in complex distributed systems. Familiarity with… more
- Cisco (Research Triangle Park, NC)
- …hardware degradation or failure. * Develop solutions for supervising AI/ML model performance across DGX clusters, integrating logging and supervising for model ... projects simultaneously. * Experience with NVIDIA NGC (NVIDIA GPU Cloud ) and DGX OS software stack for...of NVIDIA Deep Learning frameworks (TensorFlow, PyTorch) and their performance optimization on DGX infrastructure. * Experience… more
- NVIDIA (Santa Clara, CA)
- …System Software Engineer to help us build out our scientific computing platform on Nvidia DGX Cloud . We are building a cloud based accelerated scientific ... computing platform as a service on the Nvidia DGX cloud . This DGX scientific...and actively engaged with operations to increase overall system performance , it spans across the stack eg deep understanding… more
- NVIDIA (Santa Clara, CA)
- …involving GPU workloads, Kubernetes, InfiniBand, Ethernet, or other areas related to high- performance clusters and hybrid cloud solutions. + Exhibit hands on ... NVIDIA Partner Network team, we are actively helping NVIDIA DGX and DGX SuperPOD solutions bring the...help bring NVIDIA's premiere technologies to life in the cloud and in the datacenter. + Our mission is… more
- NVIDIA (Santa Clara, CA)
- …on offensive security efforts for our Data Center Systems, such as NVIDIA HGX, DGX , and MGX. What you'll be doing: + Identify vulnerabilities in our Data Center ... is leading the way in groundbreaking developments in Artificial Intelligence, High- Performance Computing and Visualization. The GPU, our invention, serves as the… more
- NVIDIA (Santa Clara, CA)
- …crafting NVIDIA's GPUs and SoCs into groundbreaking platforms for autonomous machines, Cloud and Data Centers, Deep learning, High- Performance Computing, Gaming, ... improve the silicon validation process, which will help meet upcoming performance , adaptability, and safety industry standards. + Ensure interoperability with… more
- NVIDIA (Santa Clara, CA)
- …supports NVIDIA's kernel level drivers for supporting Cuda, especially on our AI, Cloud and Data Center product line-ups. This product line-up includes full system ... products such as the DGX platform, modular components such as MGX and stand-alone...Windows based operating systems (current and future) insuring optimum performance and feature set + Focusing on cross platform… more
- NVIDIA (Santa Clara, CA)
- NVIDIA data center systems, such as DGX , MGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms ... be the cross-section between execution and strategy, leading a team of Senior TPMs driving impactful programs and delivering measurable results across many functions… more