Performance Benchmarking Engineer Cluster Jobs

Performance Benchmarking…

Oracle (Seattle, WA)

…Design and code solutions for performance benchmarking . + Troubleshoot performance problems on RDMA clusters and perform cluster performance ... team strives to be the go-to experts on RDMA cluster architecture and its relationship to AI/ML/HPC performance...with 5+ years of relevant experience + Experience with benchmarking and troubleshooting or optimizing performance of… more

Oracle (11/25/25)
- Save Job - Related Jobs - Block Source
Senior HPC Cluster Engineer - EDA

NVIDIA (Santa Clara, CA)

…lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and ... high- performance computing workloads used across multiple teams and projects....experience crafting and operating large scale compute infrastructure, including cluster configuration managements tools such as BCM or Ansible.… more

NVIDIA (12/10/25)
- Save Job - Related Jobs - Block Source
Senior AI and ML HPC Cluster…

NVIDIA (Santa Clara, CA)

…breaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a technical leader ... compute, networking, and storage design for large scale, high performance workloads, effective resource utilization in a heterogeneous compute environment,… more

NVIDIA (10/19/25)
- Save Job - Related Jobs - Block Source
Senior AI-HPC Cluster Engineer…

NVIDIA (Santa Clara, CA)

…graphics. Design and implement GPU compute clusters for deep learning and high- performance computing. What you'll be doing: + Provide leadership and strategic ... user needs. + Support our researchers to run their workloads including performance analysis and optimizations. + Conduct root cause analysis and suggest corrective… more

NVIDIA (10/30/25)
- Save Job - Related Jobs - Block Source
Senior DGX Cloud Performance…

NVIDIA (Santa Clara, CA)

… performance and AI workloads on large scale systems + Experience with performance modeling and benchmarking at scale + Strong background in Computer ... seeking highly skilled Parallel and Distributed Systems engineers to drive the performance analysis, optimization, and modeling to define the architecture and design… more

NVIDIA (11/21/25)
- Save Job - Related Jobs - Block Source
Principal, Software Engineer - Cloud…

Walmart (Sunnyvale, CA)

…atop for advanced debugging. + Perform deep analysis of OSD, MON, MDS, RGW performance and optimize cluster parameters. + Debug network congestion, packet loss, ... hardware (NVMe SSDs, RDMA NICs, high-density HDDs) and their impact on storage performance . + Evaluate next-gen server SKUs, perform benchmarking , and compare… more

Walmart (11/20/25)
- Save Job - Related Jobs - Block Source
Senior DGX Cloud Performance…

NVIDIA (Santa Clara, CA)

… performance and AI workloads on large scale systems + Experience with performance modeling and benchmarking at scale + Strong background in Computer ... seeking highly skilled Parallel and Distributed Systems engineers to drive the performance analysis, optimization, and modeling to define the architecture and design… more

NVIDIA (10/22/25)
- Save Job - Related Jobs - Block Source
Senior MLOps Engineer , GenAI Framework

NVIDIA (Santa Clara, CA)

…cloud compute technologies, eg: SLURM, Lustre, k8s + Software and hardware Benchmarking on high- performance computing systems. #LI-Hybrid Your base salary will ... dedicated and motivated senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (Megatron-LM (https://github.com/NVIDIA/Megatron-LM) and NeMo… more

NVIDIA (10/15/25)
- Save Job - Related Jobs - Block Source
Principal / Sr. Principal HPC Network…

Northrop Grumman (Jessup, MD)

…making history. We are looking for you to join our team as a High- Performance Computing ( **HPC** ) **Network Engineer ** based out of **Annapolis Junction ... **Responsibilities** + Monitor and maintain performance of network within a high- performance compute cluster + Contribute to design of new high-… more

Northrop Grumman (12/05/25)
- Save Job - Related Jobs - Block Source
Senior Software Engineer - NIM Factory…

NVIDIA (Santa Clara, CA)

…Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi- cluster rollouts. + Optimize container performance : layer layout, ... understanding difference inference backends (vLLM, SGLang, TRT-LLM) + Background in benchmarking and optimizing inference container performance and startup… more

NVIDIA (12/11/25)
- Save Job - Related Jobs - Block Source
Senior Software Engineer - NIM Factory…

NVIDIA (Santa Clara, CA)

…Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi- cluster rollouts. + Optimize container performance : layer layout, ... understanding difference inference backends (vLLM, SGLang, TRT-LLM) + Background in benchmarking and optimizing inference container performance and startup… more

NVIDIA (09/19/25)
- Save Job - Related Jobs - Block Source
Staff Software Engineer , Level 6

Snap Inc. (Seattle, WA)

…coalescing, and slot-aware load balancing. + Implement robust failover, replication, and cluster topology management and optimize cpu performance , memory usage, ... privacy at the forefront. We're looking for a Software Engineer to join Snap Inc on our Core Infrastructure...layers or custom client lib). + Develop and maintain high- performance caching proxies or client side libraries for request… more

Snap Inc. (09/12/25)
- Save Job - Related Jobs - Block Source
Sr Manager, Cloud Infrastructure Engineer…

Pfizer (South San Francisco, CA)

…of production computing platforms. + Perform troubleshooting, system analysis, and benchmarking to resolve issues and maintain a high- performance environment. ... a hands-on approach to designing and delivering robust High Performance Computing (HPC) solutions supporting computational workloads across the organization.… more

Pfizer (12/03/25)
- Save Job - Related Jobs - Block Source
Member of Technical Staff, AI Networking

Microsoft Corporation (Mountain View, CA)

…Broadcom, and in-house silicon/network co-design teams + AI training + inference cluster bring-up, performance benchmarking , and root-cause analysis + ... Staff, AI Networking to design and scale the world's most advanced high- performance networks powering Copilot and next-generation AI systems. Join the team building… more

Microsoft Corporation (12/11/25)
- Save Job - Related Jobs - Block Source

"Juju

Account Login

Sign Up

Forgot your password?

Advanced Search