Ai Hpc Infrastructure Engineer Jobs | Juju

HPC Reliability Engineer…

xAI (Palo Alto, CA)

…team. In this role, you will ensure the reliability and performance of HPC infrastructure while collaborating with cross-functional teams to support AI ... A cutting-edge AI company in Palo Alto is seeking a Software Engineer to join its SuperComputing… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Sr Principal Software Engineer , Networking…

Oracle (Seattle, WA)

Sr Principal Software Engineer , Networking - AI Infrastructure Innovation OCI (Oracle Cloud) AI Infrastructure Innovation team is pioneering the ... creation of next‑generation AI / HPC networking for GPU superclusters at massive scale. Our mission is to design and deliver state‑of‑the‑art RDMA‑based… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Senior Principal Software Engineer…

Oracle (Seattle, WA)

Senior Principal Software Engineer , Storage - AI Infrastructure Innovation Join to apply for the Senior Principal Software Engineer , Storage - AI ... Infrastructure Innovation role at Oracle. Job Description OCI (Oracle Cloud) AI Infrastructure Innovation team is inventing the next generation of storage… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Site Reliability Engineer (SRE) - AI…

Hamilton Barnes Associates Limited (San Francisco, CA)

…(Prometheus, Grafana, Loki) and incident response frameworks. Familiarity with high‑performance computing ( HPC ) or AI /ML training infrastructure at scale. ... This is a rare opportunity to work at the intersection of hyperscale infrastructure and AI , shaping the operational backbone of one of the largest GPU clusters… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Tech Lead, AI Compute Infrastructure

HeyGen (San Francisco, CA)

…a highly motivated engineer with deep experience operating and optimizing AI infrastructure at scale. Bachelor's degree in Computer Science, Engineering, or ... low‑latency video generation. Responsibilities You will be the core engineer responsible for building the robust, efficient, and scalable...5+ years of full‑time industry experience in large‑scale MLOps, AI infrastructure , or HPC systems… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Staff Site Reliability Engineer , Compute

Crusoe (San Francisco, CA)

…-first Cloud infrastructure company. We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to ... most advanced AI applications. Crusoe is redefining AI cloud infrastructure , with a mission to...compute platforms, ensuring performance, security, and scale for modern AI and HPC workloads. What You'll Be… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Staff Site Reliability Engineer , Compute

Crusoe (San Francisco, CA)

…. About This Role At Crusoe, we are building the most sustainable, AI -first cloud infrastructure , and our Compute-focused Site Reliability Engineers are the ... Staff Site Reliability Engineer , Compute Join to apply for the Staff...compute platforms, ensuring performance, security, and scale for modern AI and HPC workloads. What You'll Be… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Site Reliability Engineer

Hamilton Barnes ? (San Francisco, CA)

…(Prometheus, Grafana, Loki) and incident response frameworks. Familiarity with high-performance computing ( HPC ) or AI /ML training infrastructure at scale. ... largest GPU clusters in private deployment. If you want to build and operate infrastructure for frontier AI workloads, automate systems at petascale, and be part… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Machine Learning Systems Engineer

Menlo Ventures (Berkeley, CA)

…they choose to live. We bring together engineers who love building core infrastructure , obsess over developer experience, and want to make complex systems scalable, ... observable, and reliable. Machine Learning Systems Engineer Location: Remote (San Francisco Bay Area / North...(CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Physical Design Engineer

OpenAI (San Francisco, CA)

…flows for physical synthesis, PNR, LEC and power estimation Bonus Experience with AI or HPC ‑focused chips Experience with optimizing PPA for high‑performance ... Join to apply for the Physical Design Engineer role at OpenAI About The Team OpenAI's...the custom silicon that powers the world's most advanced AI systems. From system‑level architecture to custom circuit implementations,… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Information Security Engineer , Cloud

Fluidstack (San Francisco, CA)

Information Security Engineer , Cloud Fluidstack is building the infrastructure for abundant intelligence, partnering with top AI labs, governments, and ... in securing containerised workloads (Docker, Kubernetes). Experience securing high‑performance computing ( HPC ) or AI /ML workloads. Experience with a modern SIEM… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Physical Design Engineer

OpenAI (San Francisco, CA)

…flows for physical synthesis, PNR, LEC and power estimation Bonus: Experience with AI or HPC ‑focused chips Experience with optimizing PPA for high performance ... team designs the custom silicon that powers the world's most advanced AI systems. From system‑level architecture to custom circuit implementations, we partner… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Senior Software Engineer

Crusoe (San Francisco, CA)

…Experience implementing disaster recovery strategies at scale Familiarity with GPUs, HPC clusters, or large-scale AI /ML workloads Benefits Industry competitive ... powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be...team that's setting the pace for responsible, transformative cloud infrastructure . About This Role: We are looking for a… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Training Performance Engineer

OpenAI (San Francisco, CA)

…experiments and products). About the Role As a Training Performance Engineer , you'll drive efficiency improvements across our distributed training stack. You'll ... new model architectures scale efficiently during pre‑training. Contribute to infrastructure decisions that improve reliability and efficiency of large training… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Machine Learning Engineer (Inference)

Acceler8 Talent (San Francisco, CA)

…Learning Engineer to join a Stanford spin out scale up building a foundational infrastructure layer for AI inference. The team were founded on the back of a ... Direct message the job poster from Acceler8 Talent Compilers, Kernels, Performance. Systems Software & HPC Recruiter We are seeking an Inference focussed Machine… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Staff Analog IC Design Engineer…

Lightmatter (Boston, MA)

Lightmatter is leading the revolution in AI data center infrastructure , enabling the next giant leaps in human progress. The company invented the world's first ... at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. Lightmatter raised $400 million in its Series D round, reaching a… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Hardware Engineer I (Co-op) - United States

Cisco Systems (San Francisco, CA)

…software, manufacturing, etc.). Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and ... data, collaboration, web, Internet of Things, routing, switching, IPv6, data center, HPC , Telepresence and many more. Your work will impact billions globally. Supply… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Systems Integrity Security Architect

Anthropic (San Francisco, CA)

…framework and software supply chain security standards Experience securing large‑scale HPC or cloud infrastructure Contributions to open‑source security projects ... technical designs, operational leadership, and vendor collaboration Previous work with AI /ML infrastructure security Deadline to apply: None. Applications will… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
Systems Integrity Security Architect

Anthropic (San Francisco, CA)

…framework and software supply chain security standards Experience securing large‑scale HPC or cloud infrastructure Contributions to open‑source security projects ... technical designs, operational leadership, and vendor collaboration Previous work with AI /ML infrastructure security Deadline to apply: None. Applications will… more

job goal (01/14/26)
- Save Job - Related Jobs - Block Source
AI and ML HPC Cluster…

NVIDIA (Santa Clara, CA)

…that power some of the world's most advanced computing workloads. NVIDIA is looking for an AI /ML HPC Cluster Engineer to join our MARS team. You will provide ... + Minimum 2 years of experience administering multi-node compute infrastructure + Background in managing AI / HPC job schedulers like Slurm, K8s, PBS, RTDA,… more

NVIDIA (01/10/26)
- Save Job - Related Jobs - Block Source

"Juju

Account Login

Sign Up

Forgot your password?

Advanced Search