Senior HPC Cluster Systems Jobs

51 jobs (page 1)

Categories

All Categories

Engineering (9)

Software/IT (7)

Senior HPC Cluster…

Lawrence Berkeley National Laboratory (Berkeley, CA)

…Lab's ( LBNL ) Information Technology Division ( IT ) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team ! In ... by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems , and Kubernetes clusters. This role provides… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Senior HPC Cluster Engineer…

NVIDIA Corporation (Santa Clara, CA)

Senior AI- HPC EDA Cluster ...leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, ... is loaded## Senior AI- HPC EDA Cluster Engineerlocations: US, CA, Santa Clara: US, TX, Austin:...Experience analyzing and tuning performance for a variety of AI/ HPC workloads. Excellent problem-solving to analyze complex systems… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Senior Solutions Architect, Cluster…

NVIDIA Corporation (Santa Clara, CA)

…to stand out from the crowd: Experience leading large-scale AI Factory or HPC cluster bring-ups or builds* Hands-on experience with NVIDIA networking products ... Senior Solutions Architect, Cluster Design and...validation and troubleshooting* Proven expertise in designing large-scale distributed systems , AI clusters, or HPC infrastructure* Ability… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Senior Cluster Site Reliability…

The Voleon Group (Berkeley, CA)

…multibillion‑dollar asset manager, and we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our ... research compute cluster to meet our growing needs, and you will...in SRE or DevOps roles, preferably working as a senior engineer or tech lead Knowledge of HPC… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Senior Solutions Architect, NVIDIA Cloud…

NVIDIA Corporation (Santa Clara, CA)

…ETH/IB networking components, storage, etc.) within extensive AI and HPC cluster settings.* Practical knowledge of NVIDIA systems technology such as NCCL, ... Senior Solutions Architect, NVIDIA Cloud Partners page is...with partners and customers.* Experience crafting and deploying large-scale cluster environments.* Practical expertise in data center design, development… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Senior Engineering Manager - Accelerated…

Ring Inc (San Francisco, CA)

…networking, observability, security, disaster recovery, and cost management. Familiarity with HPC cluster management softwares such as Slurm Familiarity with ... and retrieval workloads. Previous success managing engineering teams delivering production-grade, HPC -scale RAG systems . Deep understanding of infra domains:… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Senior Engineering Manager - Accelerated…

Ring Inc (Washington, DC)

…networking, observability, security, disaster recovery, and cost management. Familiarity with HPC cluster management softwares such as Slurm Familiarity with ... and retrieval workloads. Previous success managing engineering teams delivering production‑grade, HPC ‑scale RAG systems . Deep understanding of infra domains:… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Senior Software Architect - Data Center…

NVIDIA Corporation (Santa Clara, CA)

…disability status or any other characteristic protected by law. Similar Jobs (5) Senior Systems Software Engineer, Data Center locations 2 Locations time type ... Senior Software Architect - Data Center Systems...systems , particularly at the SW/HW interface. Understanding of HPC or Deep learning workloads and use of accelerated… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Senior /Staff Backend Engineer…

Zettabyte (Palo Alto, CA)

…mindset-comfortable with ambiguity and rapid iteration Bonus qualifications GPU or HPC cluster management experience Understanding of ML/AI workload patterns ... world. Why this role exists We need a Backend Engineer to build the systems that orchestrate GPU clusters for AI workloads. You'll create APIs that handle GPU… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Storage Engineer

Slope (Miami, FL)

…8+ years of progressive, hands‑on experience designing and implementing high-performance storage systems for compute clusters in HPC , AI, or bare‑metal cloud ... lead the architecture, development, and deployment of our next-generation AI/ HPC storage platform. The role: As a Storage Engineer,...Lustre, Spectrum Scale, or similar) supporting GPU or AI cluster workloads. Solid foundation in Linux systems … more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Storage Engineer

Hydra Host, Inc. (Miami, FL)

…8+ years of progressive, hands-on experience designing and implementing high-performance storage systems for compute clusters in HPC , AI, or bare-metal cloud ... (WekaIO, BeeGFS, Lustre, Spectrum Scale, or similar) supporting GPU or AI cluster workloads.. Solid foundation in Linux systems engineering, automation, and… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Senior Inference Platform Engineer - Data…

Hamilton Barnes Associates Limited (San Francisco, CA)

… systems . Requirements 5+ years' experience building large-scale, fault-tolerant distributed systems (ML inference, HPC , or similar). Proficiency in Python, ... multi- cluster environments. Contributions to open-source ML or inference systems projects. Proven track record of cost optimisation in high-performance compute… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Senior / Staff Site Reliability Engineer

Fluidstack (San Francisco, CA)

…infrastructure. We treat our customers' outcomes as our own, taking pride in the systems we build and the trust we earn. If you're motivated by purpose, obsessed ... join us in building what's next. About the Role Senior / Staff SREs at Fluidstack sit at the...networking, platform engineering, and data center operations to build systems that scale with the demands of AI workloads.… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Senior Cloud Services Software Engineer

Promote Project (Santa Clara, CA)

…are seeking a distributed software engineer to join our team! As a Senior engineer, you'll be instrumental in developing and optimizing AI infrastructure services to ... on: Developing solutions at the intersection of machine learning, distributed systems , and high-performance computing, supplying to the advancement of AI… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Senior Software SDET Test Development…

NVIDIA Corporation (Santa Clara, CA)

…GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... integration, strong Linux experience, reliability testing with various telemetries, scale out cluster , test plan development, track record in developing AI tools and… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
Site Reliability Engineer, AI/ML Infrastructure

Boson AI (Palo Alto, CA)

…technologies as we continue to scale. Responsibilities Manage and optimize HPC cluster operations Deploy and maintain infrastructure‑as‑code solutions Support ... About The Role We're looking for a Senior Site Reliability Engineer to help us run...Minimum Qualifications 5+ years of experience in SRE or HPC operations. Proficiency in Linux systems administration… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Software Engineer (C++ Systems )

Recruiting From Scratch (San Francisco, CA)

…to oversubscription, checkpointing, or distributed compute scheduling. Background in HPC , storage systems , virtualization, or cloud infrastructure. Experience ... and candidates. https://www.recruitingfromscratch.com/ Title of Role: Software Engineer (C++ Systems ) Location: San Francisco, CA (On-site) Company Stage of Funding:… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Director, IT - EDA Compute Platforms, Intelligent…

Qualcomm (San Diego, CA)

…strategy and end‑to‑end operation of our global EDA engineering compute estate -covering HPC grids , intelligent job & license scheduling , utilization analytics , ... to reduce denials, increase throughput and fairness, and simplify cross‑ cluster feature management. Telemetry & observability. Build a converged metrics/logs/traces… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source
System Engineer, GPU Fleet

Fluidstack (Seattle, WA)

…architecture, CUDA toolkit, GPU drivers, monitoring tools (nvidia-smi, DCGM) Experience with HPC cluster management, job schedulers (Slurm, PBS, LSF), and ... customers' outcomes as our own, taking pride in the systems we build and the trust we earn. If...practical experience) 3+ years (System Engineer) or 5+ years ( Senior System Engineer) in Linux system administration, datacenter operations,… more

job goal (01/13/26)
- Save Job - Related Jobs - Block Source
Staff Network Validation Engineer

Support Revolution (San Jose, CA)

…for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company ... us. Job Summary: Supermicro is looking for a passionate senior network validation engineer in our San Jose office...work with cutting-edge technology, providing expert guidance on AI cluster networking. This role supports internal lab and rack… more

job goal (01/12/26)
- Save Job - Related Jobs - Block Source

"Juju

Account Login

Sign Up

Forgot your password?

Advanced Search