- NVIDIA (Santa Clara, CA)
- We are now looking for a TensorRT - LLM Software Development Engineer ! NVIDIA is hiring software engineers for its TensorRT - LLM team. Academic and ... core backend software for LLM inference. + Improve the usability of the TensorRT - LLM library and build systems (CMake) What we need to see: + Masters or… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Deep Learning Software Engineer , LLM Performance! NVIDIA is seeking an experienced Deep Learning Engineer passionate ... learning community to implement the latest algorithms for public release in TensorRT LLM , VLLM, SGLang and LLM benchmarks. Identify performance opportunities… more
- NVIDIA (Santa Clara, CA)
- …deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale ... large-scale LLM inference. + Architect and implement deep integrations with leading LLM serving engines (such as vLLM, SGLang, TensorRT - LLM ), with a… more
- Amazon (Cupertino, CA)
- …responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama ... we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews.… more
- NVIDIA (Santa Clara, CA)
- …and streamlined deployment strategies with open-sourced inference frameworks. Seeking a Senior Deep Learning Algorithms Engineer to improve innovative LLMs, ... ( TensorRT Model Optimizer, Megatron-LM, Megatron-Bridge, Nvidia-NeMo, NeMo-AutoModel, TensorRT - LLM ) and open-source frameworks (PyTorch, Hugging Face, vLLM,… more
- NVIDIA (Santa Clara, CA)
- …on pre-optimized inference engines from NVIDIA and the community, including NVIDIA TensorRT and TensorRT - LLM , NIM microservices optimize response latency ... The NeMo Retriever team is looking for an AI Engineer to join our team, focusing on the intersection...deployments, etc. + Familiarity with ML libraries, especially PyTorch, TensorRT , or TensorRT - LLM . + Excellent… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior DL Algorithms Engineer ! We are seeking a highly skilled Deep Learning Algorithms Engineer with hands-on experience optimizing ... deploy, and optimize models for efficient inference using frameworks such as TensorRT , TensorRT - LLM , vLLM, and SGLang. + Understand, analyze, profile, and… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior DL Algorithms Engineer ! We are seeking a highly skilled Deep Learning Algorithms Engineer with hands-on experience optimizing ... inference. + Convert and deploy models using frameworks such as TensorRT and TensorRT - LLM + Understand, analyze, profile, and optimize performance of… more
- NVIDIA (Santa Clara, CA)
- …Today, we are increasingly known as "the AI computing company." We are seeking a Senior Staff Machine Learning Engineer to join our Enterprise AI team and build ... frameworks such as PyTorch or TensorFlow; familiarity with CUDA-accelerated libraries (eg, TensorRT - LLM ) is a plus. + Proven track record to take a significant… more
- Red Hat (Boston, MA)
- **About the Job** The Red Hat Performance and Scale Engineering team is seeking a Senior Performance Engineer to join our PSAP (Performance and Scale for AI ... for example.This is a dynamic role for a seasoned engineer with a growth mindset who handles and adapts...PyTorch Profiler, among others + Hands-on experience with modern LLM inference server stacks (eg, vLLM, TensorRT -… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …quantization, distillation, and using high-performance serving frameworks (eg, vLLM, TGI, TensorRT - LLM ) to maximize inference throughput and minimize latency. + ... implementing CI/CD pipelines for AI model development. + Advanced LLM Deployment & Optimization: Lead the deployment, serving, and...AI infrastructure. Proven track record as a Principal or Senior Staff Engineer . + Expert-level knowledge of… more
- NVIDIA (Santa Clara, CA)
- …and see how you can make a lasting impact on the world. NVIDIA is seeking a Senior Software Engineer to serve as a Tech Lead, driving the design and delivery of ... Experience developing for GPU platforms and familiarity with NVIDIA technologies (eg, CUDA, TensorRT , Triton, NeMo) and LLM serving frameworks (eg, Dynamo, vLLM,… more
- NVIDIA (Santa Clara, CA)
- …agents, or ML algorithms. + Experience with NVIDIA AI platforms: NeMo, NIMs, TensorRT - LLM , RAPIDS. NVIDIA offers competitive salaries and a generous benefits ... KGMON (Kaggle Grandmasters of NVIDIA) team is seeking a Senior Applied Research Scientist passionate about AI agents with...crowd: + Kaggle Grandmaster status with gold medals in NLP/ LLM contests, or several top-10 placements in prominent ML… more
- NVIDIA (CA)
- …you can make a lasting impact on the world. We are now looking for a Senior System Software Engineer to work on user facing tools for Dynamo Inference Server! ... the crowd: + Experience with inference-serving frameworks (eg, Dynamo Inference Server, TensorRT , ONNX Runtime) and deploying/managing LLM inference pipelines at… more
- Amazon (Seattle, WA)
- …responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama ... will: * would with state of the art LLMs, Open source and internal LLM families, large scale performance and benchmark evaluations etc., * develop and performance… more
- Microsoft Corporation (Redmond, WA)
- …develop novel quantization and numerics kernels to enable efficient deployment of LLM inference and training in Microsoft's Azure production environments. + Drive ... of quantized models. + Analyze performance bottlenecks in quantized state-of-the-art LLM architectures and drive performance improvements. + Prototype and evaluate… more
- Palo Alto Networks (Santa Clara, CA)
- …while ensuring a formidable security posture from development through runtime. As a Senior Principal Machine Learning Engineer , you will drive research on ... mechanisms and related knowledge is a plus. + Demonstrated expertise with modern LLM inference engines (eg, vLLM, SGLang, TensorRT - LLM ) is required.… more
- NVIDIA (Santa Clara, CA)
- …full-stack software ecosystem to power AI at scale. We are looking for a Senior Technical Marketing Engineer to join our growing accelerated computing product ... TensorFlow, JAX), and inference-specific frameworks & optimizations (Triton Inference Server, TensorRT - LLM , vLLM, SGLang). + Market Awareness - Experience… more
- Steampunk (Mclean, VA)
- …** to design, implement, and maintain production-grade large-language-model ( LLM ) pipelines, deployment architectures, and monitoring systems across enterprise ... environments. The Senior LLMOps Engineer will play a critical...models using frameworks such as Hugging Face Transformers, vLLM, TensorRT - LLM , or similar. + Proficiency in Python… more
- CAE USA INC (Arlington, TX)
- …and having fun! Summary We are seeking a highly skilled and experienced Machine Learning Engineer to join our growing AI & Data Science team in R&D. This role is ... scalable and efficient model serving infrastructure using tools like ONNX, TensorRT , DeepSpeed, or vLLM. + Implement retrieval-augmented generation (RAG) pipelines… more