- xAI (Palo Alto, CA)
- …team. In this role, you will ensure the reliability and performance of HPC infrastructure while collaborating with cross-functional teams to support AI ... A cutting-edge AI company in Palo Alto is seeking a Software Engineer to join its SuperComputing… more
- Oracle (Seattle, WA)
- Sr Principal Software Engineer , Networking - AI Infrastructure Innovation OCI (Oracle Cloud) AI Infrastructure Innovation team is pioneering the ... creation of next‑generation AI / HPC networking for GPU superclusters at massive scale. Our mission is to design and deliver state‑of‑the‑art RDMA‑based… more
- Oracle (Seattle, WA)
- Senior Principal Software Engineer , Storage - AI Infrastructure Innovation Join to apply for the Senior Principal Software Engineer , Storage - AI ... Infrastructure Innovation role at Oracle. Job Description OCI (Oracle Cloud) AI Infrastructure Innovation team is inventing the next generation of storage… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …(Prometheus, Grafana, Loki) and incident response frameworks. Familiarity with high‑performance computing ( HPC ) or AI /ML training infrastructure at scale. ... This is a rare opportunity to work at the intersection of hyperscale infrastructure and AI , shaping the operational backbone of one of the largest GPU clusters… more
- HeyGen (San Francisco, CA)
- …a highly motivated engineer with deep experience operating and optimizing AI infrastructure at scale. Bachelor's degree in Computer Science, Engineering, or ... low‑latency video generation. Responsibilities You will be the core engineer responsible for building the robust, efficient, and scalable...5+ years of full‑time industry experience in large‑scale MLOps, AI infrastructure , or HPC systems… more
- Crusoe (San Francisco, CA)
- …-first Cloud infrastructure company. We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to ... most advanced AI applications. Crusoe is redefining AI cloud infrastructure , with a mission to...compute platforms, ensuring performance, security, and scale for modern AI and HPC workloads. What You'll Be… more
- Crusoe (San Francisco, CA)
- …. About This Role At Crusoe, we are building the most sustainable, AI -first cloud infrastructure , and our Compute-focused Site Reliability Engineers are the ... Staff Site Reliability Engineer , Compute Join to apply for the Staff...compute platforms, ensuring performance, security, and scale for modern AI and HPC workloads. What You'll Be… more
- Hamilton Barnes ? (San Francisco, CA)
- …(Prometheus, Grafana, Loki) and incident response frameworks. Familiarity with high-performance computing ( HPC ) or AI /ML training infrastructure at scale. ... largest GPU clusters in private deployment. If you want to build and operate infrastructure for frontier AI workloads, automate systems at petascale, and be part… more
- Menlo Ventures (Berkeley, CA)
- …they choose to live. We bring together engineers who love building core infrastructure , obsess over developer experience, and want to make complex systems scalable, ... observable, and reliable. Machine Learning Systems Engineer Location: Remote (San Francisco Bay Area / North...(CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified… more
- OpenAI (San Francisco, CA)
- …flows for physical synthesis, PNR, LEC and power estimation Bonus Experience with AI or HPC ‑focused chips Experience with optimizing PPA for high‑performance ... Join to apply for the Physical Design Engineer role at OpenAI About The Team OpenAI's...the custom silicon that powers the world's most advanced AI systems. From system‑level architecture to custom circuit implementations,… more
- Fluidstack (San Francisco, CA)
- Information Security Engineer , Cloud Fluidstack is building the infrastructure for abundant intelligence, partnering with top AI labs, governments, and ... in securing containerised workloads (Docker, Kubernetes). Experience securing high‑performance computing ( HPC ) or AI /ML workloads. Experience with a modern SIEM… more
- OpenAI (San Francisco, CA)
- …flows for physical synthesis, PNR, LEC and power estimation Bonus: Experience with AI or HPC ‑focused chips Experience with optimizing PPA for high performance ... team designs the custom silicon that powers the world's most advanced AI systems. From system‑level architecture to custom circuit implementations, we partner… more
- Crusoe (San Francisco, CA)
- …Experience implementing disaster recovery strategies at scale Familiarity with GPUs, HPC clusters, or large-scale AI /ML workloads Benefits Industry competitive ... powers a world where people can create ambitiously with AI - without sacrificing scale, speed, or sustainability. Be...team that's setting the pace for responsible, transformative cloud infrastructure . About This Role: We are looking for a… more
- OpenAI (San Francisco, CA)
- …experiments and products). About the Role As a Training Performance Engineer , you'll drive efficiency improvements across our distributed training stack. You'll ... new model architectures scale efficiently during pre‑training. Contribute to infrastructure decisions that improve reliability and efficiency of large training… more
- Acceler8 Talent (San Francisco, CA)
- …Learning Engineer to join a Stanford spin out scale up building a foundational infrastructure layer for AI inference. The team were founded on the back of a ... Direct message the job poster from Acceler8 Talent Compilers, Kernels, Performance. Systems Software & HPC Recruiter We are seeking an Inference focussed Machine… more
- Lightmatter (Boston, MA)
- Lightmatter is leading the revolution in AI data center infrastructure , enabling the next giant leaps in human progress. The company invented the world's first ... at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. Lightmatter raised $400 million in its Series D round, reaching a… more
- Cisco Systems (San Francisco, CA)
- …software, manufacturing, etc.). Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and ... data, collaboration, web, Internet of Things, routing, switching, IPv6, data center, HPC , Telepresence and many more. Your work will impact billions globally. Supply… more
- Anthropic (San Francisco, CA)
- …framework and software supply chain security standards Experience securing large‑scale HPC or cloud infrastructure Contributions to open‑source security projects ... technical designs, operational leadership, and vendor collaboration Previous work with AI /ML infrastructure security Deadline to apply: None. Applications will… more
- Anthropic (San Francisco, CA)
- …framework and software supply chain security standards Experience securing large‑scale HPC or cloud infrastructure Contributions to open‑source security projects ... technical designs, operational leadership, and vendor collaboration Previous work with AI /ML infrastructure security Deadline to apply: None. Applications will… more
- NVIDIA (Santa Clara, CA)
- …that power some of the world's most advanced computing workloads. NVIDIA is looking for an AI /ML HPC Cluster Engineer to join our MARS team. You will provide ... + Minimum 2 years of experience administering multi-node compute infrastructure + Background in managing AI / HPC job schedulers like Slurm, K8s, PBS, RTDA,… more