- NVIDIA (Santa Clara, CA)
- …streamlined deployment strategies with open-sourced inference frameworks. Seeking a Senior Deep Learning Algorithms Engineer to improve innovative generative ... and diffusion models. In this role, you will design, implement, and productionize model optimization algorithms for inference and deployment on NVIDIA's latest… more
- Amazon (Seattle, WA)
- …with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team ... culture. The team works closely with customers on their model enablement, providing direct support and optimization expertise to...models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side… more
- Amazon (Cupertino, CA)
- …with popular ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team ... culture. The team works closely with customers on their model enablement, providing direct support and optimization expertise to...models like the Llama family, DeepSeek and beyond. The Inference Enablement and Acceleration team works side by side… more
- MongoDB (Palo Alto, CA)
- **About the Role** We're looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic ... with Atlas and designed for developer-first experiences. As a Senior Engineer , you'll focus on building core...focus on building core systems and services that power model inference at scale. You'll own key… more
- NVIDIA (CA)
- …how you can make a lasting impact on the world. We are now looking for a Senior System Software Engineer to work on user facing tools for Dynamo Inference ... data scientists. What you'll be doing: + Build and maintain distributed model management systems, including Rust-based runtime components, for large-scale AI … more
- Red Hat (Boston, MA)
- …bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI ... the vLLM and LLM-D projects, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for… more
- NVIDIA (Santa Clara, CA)
- …leads the AI revolution. What you will be doing: + Implement language and multimodal model inference as part of NVIDIA Inference Microservices (NIMs). + ... We are now looking for a Senior DL Algorithms Engineer ! NVIDIA is...bugs and deliver production code to TRT-LLM, NVIDIA's open-source inference serving library. + Profile and analyze bottlenecks across… more
- NVIDIA (Santa Clara, CA)
- NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and ... frameworks, which are at the forefront of efficient large-scale model serving and inference . You will play...are growing fast. If you're a creative and autonomous engineer with a genuine passion for technology, we want… more
- NVIDIA (Santa Clara, CA)
- NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and ... SGLang and vLLM, which are at the forefront of efficient large-scale model serving and inference . You will play a central role in improving these platforms,… more
- Bank of America (Addison, TX)
- Senior Engineer -AI Inference Addison, Texas;Plano, Texas; Newark, Delaware; Charlotte, North Carolina; Kennesaw, Georgia **To proceed with your application, ... must be at least 18 years of age.** Acknowledge (https://ghr.wd1.myworkdayjobs.com/Lateral-US/job/Addison/ Senior - Engineer -AI- Inference \_25029879) **Job Description:** At Bank… more
- Red Hat (Boston, MA)
- …for enterprises to build, optimize, and scale LLM deployments. We are seeking an experienced Senior ML Ops engineer to work closely with our product and research ... open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings...deep learning products and software. As an ML Ops engineer , you will work closely with our technical and… more
- Red Hat (Boston, MA)
- …bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI ... maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for… more
- quadric.io, Inc (Burlingame, CA)
- … model deployment for efficient inference ; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI ... conventional C++ DSP and control code. Role: The AI Inference Engineer in Quadric is the key...Electric Engineering. + 5+ years of experience in AI/LLM model inference and deployment frameworks/tools + experience… more
- Amazon (Seattle, WA)
- …The Neuron Inference Technology team works side by side with the Inference Model Enablement, compiler runtime engineers to create, build and tune ... that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team...and performance tunes building blocks for all key ML model families, including Llama3, GPT OSS, Qwen3, DeepSeek and… more
- Amazon (Seattle, WA)
- …and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is ... and performance tuning of a wide variety of ML model families, including massive scale large language models like...and runtime engineers to create, build and tune distributed inference solutions with Trn1. Experience optimizing inference … more
- NVIDIA (Durham, NC)
- … Senior Deep Learning Inference Performance Architect! NVIDIA is seeking a Senior Performance Architect - a creative engineer who loves to squeeze out ... to extend the state of the art in AI Inference performance and efficiency + Model , analyze and prototype key deep learning algorithms and applications… more
- Amazon (Seattle, WA)
- …cloud-scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. This role is ... for development and performance optimization of core building blocks of LLM Inference - Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, etc.… more
- Red Hat (Raleigh, NC)
- …serve models, and deliver innovative apps. The OpenShift AI team seeks a Software Engineer with Kubernetes and Model Inference Runtimes experience to join ... packaging, such as PyPI libraries + Solid understanding of the fundamentals of model inference architectures + Experience with Jenkins, Git, shell scripting, and… more
- NVIDIA (Santa Clara, CA)
- …and model post-training. + Deep understanding of distributed systems for large-scale model inference and serving. Your base salary will be determined based ... We are now looking for a Senior DL Algorithms Engineer ! We are...programming skills in Python and C++. + Experience with model quantization and modern inference optimization techniques… more
- Google (Sunnyvale, CA)
- …like PyTorch or JAX. + 3 years of experience in software development for machine learning model inference or machine learning model training, and 1 year of ... Senior Software Engineer , Machine Learning, Kernel...experience with ML model inference and training optimization on modern...experience with ML model inference and training optimization on modern GPU/TPU architectures. **Preferred… more