Llm Distributed Training Engineer Jobs | Juju

Software Engineer III - Search…

JPMorgan Chase (Chicago, IL)

…contribute your skills in enabling enterprise-wide content discovery. As a Software Engineer III at JPMorgan Chase within the Employee Platforms, Enterprise Search ... and emerging technologies **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 3+ years… more

JPMorgan Chase (01/10/26)
- Save Job - Related Jobs - Block Source
Software Engineer II - AI/ML, AWS Neuron,…

Amazon (Cupertino, CA)

…ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team is at the forefront ... to work at the intersection of machine learning, high-performance computing, and distributed architectures, where you'll help shape the future of AI acceleration… more

Amazon (11/27/25)
- Save Job - Related Jobs - Block Source
Lead Software Engineer - Full…

JPMorgan Chase (Plano, TX)

**Lead Software Engineer - Python/Java/AWS/Cloud - 603** **Organization Description** Our Consumer & Community Banking division serves our Chase customers through a ... an accommodation. We are seeking a highly skilled and innovative Lead Software Engineer with a strong focus on automation and AI solutioning. The ideal candidate… more

JPMorgan Chase (12/12/25)
- Save Job - Related Jobs - Block Source
Senior Principal Machine Learning Engineer…

Red Hat (Boston, MA)

…optimize, and scale LLM deployments. As a Machine Learning Engineer focused on distributed vLLM (https://github.com/vllm-project/) infrastructure in the ... to GenAI deployments. As leading developers, maintainers of the vLLM and LLM -D projects, and inventors of state-of-the-art techniques for model quantization and… more

Red Hat (01/08/26)
- Save Job - Related Jobs - Block Source
Sr. Software Engineer - AI/ML, AWS Neuron…

Amazon (Seattle, WA)

…as well as Stable Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip architects, compiler ... accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for...engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience… more

Amazon (12/31/25)
- Save Job - Related Jobs - Block Source
Sr. Software Engineer - AI/ML, AWS Neuron…

Amazon (Cupertino, CA)

…as well as Stable Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip architects, compiler ... accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for...engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience… more

Amazon (12/19/25)
- Save Job - Related Jobs - Block Source
Forward Deployed Engineer , AI Inference…

Red Hat (Sacramento, CA)

…directly with the engineering teams at our customer to deploy, optimize, and scale distributed Large Language Model ( LLM ) inference systems. You will solve " ... developer to join our team as a **Forward Deployed Engineer ** . In this role, you will not just...-D engineering team. **What You Will Do** + **Orchestrate Distributed Inference** : Deploy and configure LLM -D… more

Red Hat (01/08/26)
- Save Job - Related Jobs - Block Source
Software Engineer , SystemML - Scaling…

Meta (Menlo Park, CA)

…to improve the full-stack distributed ML reliability and performance (eg Large-Scale GenAI/ LLM training ) from the trainer down to the inter-GPU and network ... NCCL has been integrated into PyTorch and is on the critical path of multi-GPU distributed training . In other words, nearly every distributed GPU-based ML… more

Meta (12/20/25)
- Save Job - Related Jobs - Block Source
Senior Research Engineer , Foundation Model…

NVIDIA (Santa Clara, CA)

…product roadmaps. What you will be doing: + Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics. + ... and AI infrastructure; + Proven experience designing and optimizing distributed training systems with frameworks like PyTorch,...conception to deployment; + Strong experience at building large-scale LLM and multimodal LLM training … more

NVIDIA (12/05/25)
- Save Job - Related Jobs - Block Source
Senior Software Engineer , AI Platform

LinkedIn (Mountain View, CA)

…and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Online Learning and Serving performance optimizations across billions of user queries. Model Training Infrastructure: As an engineer on the AI Training … more

LinkedIn (12/05/25)
- Save Job - Related Jobs - Block Source
Software Engineer , AI Platform

LinkedIn (Mountain View, CA)

…and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Serving performance optimizations across billions of user queries Model Training Infrastructure: As an engineer on the...performance applications serving very large & complex models across LLM and Personalization models. As an engineer ,… more

LinkedIn (10/21/25)
- Save Job - Related Jobs - Block Source
Senior Performance Engineer - AI Platforms

Red Hat (Boston, MA)

…The Red Hat Performance and Scale Engineering team is seeking a Senior Performance Engineer to join our PSAP (Performance and Scale for AI Platforms) team. In this ... role, you will drive the performance and scalability of distributed inference for Large Language Models (LLMs) as part...for example.This is a dynamic role for a seasoned engineer with a growth mindset who handles and adapts… more

Red Hat (01/05/26)
- Save Job - Related Jobs - Block Source
Principal Staff Software Engineer , AI…

LinkedIn (Mountain View, CA)

…GNNs, Incremental Learning, Online Learning, and advanced LLM Agents work for Training infrastructure. As a Principal Staff Software Engineer on the AI ... problems. + Designing, implementing, and optimizing the performance of large-scale distributed training for personalized recommendation as well as large… more

LinkedIn (12/25/25)
- Save Job - Related Jobs - Block Source
Sr. Staff Software Engineer , AI Infra

LinkedIn (Mountain View, CA)

…and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Serving performance optimizations across billions of user queries. Model Training Infrastructure: As an engineer on the...performance applications serving very large & complex models across LLM and Personalization models. As an engineer ,… more

LinkedIn (12/27/25)
- Save Job - Related Jobs - Block Source
Senior Research Engineer /Scientist

ServiceNow, Inc. (Santa Clara, CA)

…and developing LLM based features + Experience with methods of training and fine tuning large language models, such as distilation, supervised fine-tunning and ... sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how...phases of Large Language Models development, including data curation, training , and evaluation. Our goal is to consistently enhance… more

ServiceNow, Inc. (12/17/25)
- Save Job - Related Jobs - Block Source
Research Engineer , Language - Generative…

Meta (Topeka, KS)

…is seeking a Research Engineer to join our Large Language Model ( LLM ) Research team. We conduct focused research and engineering to build state-of-the-art LLMs, ... experience in areas like language model evaluation; data processing for pre- training and fine-tuning; responsible LLMs; LLM alignment; reinforcement learning… more

Meta (12/20/25)
- Save Job - Related Jobs - Block Source
Senior HPC and AI Networking Performance Research…

NVIDIA (Santa Clara, CA)

…from the crowd: + In-depth knowledge and experience with AI workloads and benchmarking for distributed LLM training . + Knowledge in CUDA, and NCCL libraries. ... Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join our Performance group. In this exciting...workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training … more

NVIDIA (12/03/25)
- Save Job - Related Jobs - Block Source
Software Engineer III, Mobile, Android,…

Google (Mountain View, CA)

Software Engineer III, Mobile, Android, Google Maps Platform _corporate_fare_ Google _place_ Mountain View, CA, USA **Mid** Experience driving progress, solving ... to SDKs or APIs for developers. + Experience working with Large Language Models ( LLM 's) or applied AI. + Experience with mapping technologies (eg, Google Maps SDK,… more

Google (12/18/25)
- Save Job - Related Jobs - Block Source
Generative AI Engineer

The MITRE Corporation (Mclean, VA)

…for expertise in areas such as LLMs, machine learning, model training and deployment, model evaluation, retrieval augmented generation (RAG) systems, GraphRAG, ... The position involves researching, developing, evaluating, and integrating GenAI and LLM capabilities. Specific responsibilities will include: + Work in and provide… more

The MITRE Corporation (01/12/26)
- Save Job - Related Jobs - Block Source
Senior ML Engineer (Internet Security)

Palo Alto Networks (Santa Clara, CA)

…**Your Career** You will build machine learning models and develop big data and distributed systems that use the models to analyze and categorize an enormous amount ... a security-sensitive environment. + Own the end-to-end lifecycle of ML and LLM components, from problem formulation and model development to production deployment,… more

Palo Alto Networks (01/10/26)
- Save Job - Related Jobs - Block Source

"Juju

Account Login

Sign Up

Forgot your password?

Advanced Search