- JPMorgan Chase (Chicago, IL)
- …contribute your skills in enabling enterprise-wide content discovery. As a Software Engineer III at JPMorgan Chase within the Employee Platforms, Enterprise Search ... and emerging technologies **Required qualifications, capabilities, and skills** + Formal training or certification on software engineering concepts and 3+ years… more
- Amazon (Cupertino, CA)
- …ML frameworks like PyTorch and JAX enabling unparalleled ML inference and training performance. The Inference Enablement and Acceleration team is at the forefront ... to work at the intersection of machine learning, high-performance computing, and distributed architectures, where you'll help shape the future of AI acceleration… more
- JPMorgan Chase (Plano, TX)
- **Lead Software Engineer - Python/Java/AWS/Cloud - 603** **Organization Description** Our Consumer & Community Banking division serves our Chase customers through a ... an accommodation. We are seeking a highly skilled and innovative Lead Software Engineer with a strong focus on automation and AI solutioning. The ideal candidate… more
- Red Hat (Boston, MA)
- …optimize, and scale LLM deployments. As a Machine Learning Engineer focused on distributed vLLM (https://github.com/vllm-project/) infrastructure in the ... to GenAI deployments. As leading developers, maintainers of the vLLM and LLM -D projects, and inventors of state-of-the-art techniques for model quantization and… more
- Amazon (Seattle, WA)
- …as well as Stable Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip architects, compiler ... accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for...engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience… more
- Amazon (Cupertino, CA)
- …as well as Stable Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip architects, compiler ... accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for...engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience… more
- Red Hat (Sacramento, CA)
- …directly with the engineering teams at our customer to deploy, optimize, and scale distributed Large Language Model ( LLM ) inference systems. You will solve " ... developer to join our team as a **Forward Deployed Engineer ** . In this role, you will not just...-D engineering team. **What You Will Do** + **Orchestrate Distributed Inference** : Deploy and configure LLM -D… more
- Meta (Menlo Park, CA)
- …to improve the full-stack distributed ML reliability and performance (eg Large-Scale GenAI/ LLM training ) from the trainer down to the inter-GPU and network ... NCCL has been integrated into PyTorch and is on the critical path of multi-GPU distributed training . In other words, nearly every distributed GPU-based ML… more
- NVIDIA (Santa Clara, CA)
- …product roadmaps. What you will be doing: + Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics. + ... and AI infrastructure; + Proven experience designing and optimizing distributed training systems with frameworks like PyTorch,...conception to deployment; + Strong experience at building large-scale LLM and multimodal LLM training … more
- LinkedIn (Mountain View, CA)
- …and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Online Learning and Serving performance optimizations across billions of user queries. Model Training Infrastructure: As an engineer on the AI Training … more
- LinkedIn (Mountain View, CA)
- …and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Serving performance optimizations across billions of user queries Model Training Infrastructure: As an engineer on the...performance applications serving very large & complex models across LLM and Personalization models. As an engineer ,… more
- Red Hat (Boston, MA)
- …The Red Hat Performance and Scale Engineering team is seeking a Senior Performance Engineer to join our PSAP (Performance and Scale for AI Platforms) team. In this ... role, you will drive the performance and scalability of distributed inference for Large Language Models (LLMs) as part...for example.This is a dynamic role for a seasoned engineer with a growth mindset who handles and adapts… more
- LinkedIn (Mountain View, CA)
- …GNNs, Incremental Learning, Online Learning, and advanced LLM Agents work for Training infrastructure. As a Principal Staff Software Engineer on the AI ... problems. + Designing, implementing, and optimizing the performance of large-scale distributed training for personalized recommendation as well as large… more
- LinkedIn (Mountain View, CA)
- …and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and ... Serving performance optimizations across billions of user queries. Model Training Infrastructure: As an engineer on the...performance applications serving very large & complex models across LLM and Personalization models. As an engineer ,… more
- ServiceNow, Inc. (Santa Clara, CA)
- …and developing LLM based features + Experience with methods of training and fine tuning large language models, such as distilation, supervised fine-tunning and ... sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how...phases of Large Language Models development, including data curation, training , and evaluation. Our goal is to consistently enhance… more
- Meta (Topeka, KS)
- …is seeking a Research Engineer to join our Large Language Model ( LLM ) Research team. We conduct focused research and engineering to build state-of-the-art LLMs, ... experience in areas like language model evaluation; data processing for pre- training and fine-tuning; responsible LLMs; LLM alignment; reinforcement learning… more
- NVIDIA (Santa Clara, CA)
- …from the crowd: + In-depth knowledge and experience with AI workloads and benchmarking for distributed LLM training . + Knowledge in CUDA, and NCCL libraries. ... Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join our Performance group. In this exciting...workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training … more
- Google (Mountain View, CA)
- Software Engineer III, Mobile, Android, Google Maps Platform _corporate_fare_ Google _place_ Mountain View, CA, USA **Mid** Experience driving progress, solving ... to SDKs or APIs for developers. + Experience working with Large Language Models ( LLM 's) or applied AI. + Experience with mapping technologies (eg, Google Maps SDK,… more
- The MITRE Corporation (Mclean, VA)
- …for expertise in areas such as LLMs, machine learning, model training and deployment, model evaluation, retrieval augmented generation (RAG) systems, GraphRAG, ... The position involves researching, developing, evaluating, and integrating GenAI and LLM capabilities. Specific responsibilities will include: + Work in and provide… more
- Palo Alto Networks (Santa Clara, CA)
- …**Your Career** You will build machine learning models and develop big data and distributed systems that use the models to analyze and categorize an enormous amount ... a security-sensitive environment. + Own the end-to-end lifecycle of ML and LLM components, from problem formulation and model development to production deployment,… more