- NVIDIA (Santa Clara, CA)
- …fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU ... are responsible for the big picture of how our systems relate to each other, we use a breadth...to both product quality and interesting dynamic day-to-day work. SRE 's culture of diversity, intellectual curiosity, problem solving and… more
- NVIDIA (Santa Clara, CA)
- …executives. Ways to Stand Out from the Crowd: + Experience with GPU-accelerated compute, HPC systems , or large-scale AI clusters. + Knowledge of Kubernetes ... at scale. If you are motivated by building foundational systems that enable large AI clusters to...enablement, and release readiness. + Track trends in observability, SRE practices, distributed systems , and automated operations… more
- Microsoft Corporation (Redmond, WA)
- …our infrastructure team. In this role, you'll blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable ... + **Reliability & Availability** : Ensure uptime, resiliency, and fault tolerance of AI model training and inference systems . + **Observability** : Design and… more
- NVIDIA (Santa Clara, CA)
- …improved workflows and develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist to architect, develop ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a "learning machine" that… more
- NVIDIA (TX)
- …AppArmor, or SELinux). Ways To Stand Out from the Crowd: + HPC / AI Security: Experience securing high-performance computing environments, RDMA-based networks, or ... NVIDIA DGX Cloud is the AI supercomputing-as-a-service substrate designed to power the next...massive-scale GPU clusters. You will design automated, resilient security systems that help ensure the integrity of our omni-cloud… more
- NVIDIA (Santa Clara, CA)
- …into large‑scale telemetry systems . + Deep knowledge of AI /ML infrastructure, high‑performance computing ( HPC ), networking, and cloud technologies ... NVIDIA has become the platform upon which every new AI -powered application is built. From healthcare research applications to autonomous vehicles, or… more