- Jobright.ai (San Francisco, CA)
- Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site ... for building, testing, and deploying AI products at scale. The Site Reliability Engineer - Inference will work on developing a large-scale platform for… more
- Hamilton Barnes ? (San Francisco, CA)
- …to go for experimentation, full-scale model training, or inference . As a Platform Engineer /Senior Site Reliability Engineer , you'll own the ... and maintain large-scale GPU clusters (H100/H200/B200) for training and inference workloads. Build automation pipelines for provisioning, scaling, and monitoring… more
- Sierra Business Solution (San Francisco, CA)
- Software Engineer , Site Reliability (SRE) Software Engineer , Site Reliability (SRE) at Sierra Business Solution . About Us We are an in‑person ... and best practices. What You'll Bring 5+ years of hands‑on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud‑based systems.… more
- Sierra (San Francisco, CA)
- …led the product and design teams for Google Workspace. What you'll do As a Software Engineer on our Site Reliability team at Sierra, you will be responsible ... the engineering org. What you'll bring 5+ years of hands‑on experience in Site Reliability or Infrastructure engineering roles for complex SaaS or cloud‑based… more
- Primer (San Francisco, CA)
- …, and incidents stay rare . That's where you come in. As our first dedicated Site Reliability Engineer , you'll be the force multiplier who designs, builds, ... whole team while keeping us four steps ahead of failure. YOUR MISSION Own reliability from design to customer. Define and uphold SLOs / SLIs, manage error budgets,… more
- xAI (San Francisco, CA)
- …able to concisely and accurately share knowledge with their teammates. About the role As a Site Reliability Storage Engineer , you will play a pivotal role in ... to manage our cutting-edge AI research data with unparalleled scalability and reliability across multiple regions. This role's core responsibility is to make sure… more
- Hamilton Barnes Associates Limited (San Francisco, CA)
- …supports everything from rapid experimentation to full-scale model training and inference , with flexible orchestration via Slurm, Kubernetes, or direct SSH access. ... and maintain large-scale GPU clusters (H100/H200/B200) for training and inference workloads. Build automation pipelines for provisioning, scaling, and monitoring… more
- Baseten (San Francisco, CA)
- …we're scaling our team to meet accelerating customer demand. The Role As a Site Reliability Engineer , you'll envision and build robust systems and ... About Baseten Baseten powers inference for the world's most dynamic AI companies,...machine learning models. Establish standards and best practices for reliability and performance across the infrastructure. Automate processes when… more
- Chef Robotics (San Francisco, CA)
- …tech leaders from leading companies. About The Role As a Senior Software Engineer , Backend specializing in database architecture and AI systems, you will lead the ... and architecture for real-time robotics operations. As a senior engineer , you will mentor team members and drive technical...data storage and retrieval systems for training datasets and inference results Design and implement systems to collect and… more
- Zyphra Technologies Inc. (Palo Alto, CA)
- …company based in Palo Alto, California. The Role: As a Machine Learning Data Engineer - Systems & Retrieval , you will build and optimize the data infrastructure ... role in architecting retrieval systems for LLMs and enabling scalable training and inference with clean, accessible, and secure data. You'll have an impact across… more
- SproutsAI (San Francisco, CA)
- …. Get AI-powered advice on this job and more exclusive features. Senior Software Engineer - Chennai (On- site ) About the Role We're a fast-moving, top-tier ... We thrive in ambiguity, ship fast, and care deeply about reliability , security, and craftsmanship. You'll work end-to-end-from whiteboard to production-owning… more
- Amazon (San Francisco, CA)
- Machine Learning Engineer , Amazon General Intelligence (AGI) Job ID: 3003288 | Amazon.com Services LLC The Artificial General Intelligence (AGI) team is looking for ... a passionate, talented, and inventive Machine Learning Engineer (MLE) to play a pivotal role in the...We leverage our hyper‑scalable, general‑purpose large model training and inference systems to develop and deploy cutting‑edge sensory AI… more
- NLP PEOPLE (San Francisco, CA)
- Machine Learning Engineer , Amazon General Intelligence (AGI) The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive ... Machine Learning Engineer (MLE) to play a pivotal role in the...We leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI… more
- Blue Origin LLC (Seattle, WA)
- Senior Principal Software Engineer , Factory Automation AI page is loaded## Senior Principal Software Engineer , Factory Automation AIlocations: Seattle, WA: Space ... and maintain the infrastructure required for efficient data processing, model training, and inference at scale.* Set up and manage monitoring systems to ensure the… more
- Cisco (San Francisco, CA)
- Machine Learning Engineer PhD (Full Time) - United States Get AI-powered advice on this job and more exclusive features. Please note this posting is to advertise ... distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine‑tuned,… more
- Expedia, Inc. (San Jose, CA)
- …career journey. We're building a more open world. Join us. Machine Learning Engineer III Introduction to the Team: Expedia Technology teams partner with our Product ... that drive loyalty and traveler satisfaction. We are seeking a Machine Learning Engineer III to join our Universal Messaging Platform team. This role will focus… more
- Blue Origin (Seattle, WA)
- …and maintain the infrastructure required for efficient data processing, model training, and inference at scale. Set up and manage monitoring systems to ensure the ... performance, reliability , and scalability of AI/ML models in production. Automate...asylum. Compensation Range for: WA applicants is $237,413.00‑$332,377.50 Other site ranges may differ Culture Statement Don't meet all… more
- Cisco Systems (San Francisco, CA)
- …distillation, and generative adversarial networks (GANs). Performance, scalability, and reliability are front and center as models are trained, fine‑tuned, ... academic or professional projects. Preferred Qualifications Experience working with inference engines (eg, vLLM, Triton, TorchServe). Knowledge of GPU architecture… more
- Amazon (Seattle, WA)
- …with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team ... scale large language models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side with compiler engineers and… more
- Amazon (Seattle, WA)
- …with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team ... scale large language models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side with compiler engineers and… more