- NVIDIA (Santa Clara, CA)
- …that power some of the world's most advanced computing workloads. NVIDIA is looking for an AI /ML HPC Cluster Engineer to join our MARS team. You will provide ... be doing: + Support day-to-day operations of production on-premises and multi-cloud AI / HPC clusters, ensuring system health, user satisfaction, and efficient… more
- NVIDIA (Santa Clara, CA)
- …foundational improvements and automation to improve engineer 's productivity. As a Site Reliability Engineer , you are responsible for the big picture of how ... fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU… more
- Johns Hopkins University (Baltimore, MD)
- …and Design** + Develop and refine deployment strategies for scientific software on HPC and AI systems. + Design computational workflows, selecting optimal ... AI Agents). _Performance Optimization_ + Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing. +… more
- Johns Hopkins University (Baltimore, MD)
- …and Design_ + Develop and refine deployment strategies for scientific software on HPC and AI systems. + Design computational workflows, selecting optimal ... _Performance Optimization_ + Analyze and optimize the performance of AI models and HPC applications, focusing on...fields, with advanced training in scientific computing. Classified Title: HPC Scientific Software Engineer Job Posting Title… more
- Lilly (Indianapolis, IN)
- …Bold** - You will bring a high learning agility and Infrastructure availability and reliability Engineer skills to help us enable the Lilly Technology strategy, ... the world. Come help us unlock the power of HPC and AI based POGPU and Accelerated...Additionally, you would advise with our senior Linux platform engineer directing the global Linux strategy for on-premises private… more
- Google (Kirkland, WA)
- Staff Software Engineer , HPC Solutions _corporate_fare_ Google _place_ Kirkland, WA, USA **Advanced** Experience owning outcomes and decision making, solving ... future of scientific computing by leading the convergence of AI and HPC . The AI ...Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability… more
- NVIDIA (Westford, MA)
- …how you can make a lasting impact on the world. We are seeking a Senior HPC & Quantum Systems Engineer to help architect, deploy, and operate a first-of-its-kind ... people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An...is not a pure research role nor a traditional HPC admin role-it is a systems engineering position dedicated… more
- Google (Kirkland, WA)
- Software Engineer , HPC , Platform Readiness, Workload Performance _corporate_fare_ Google _place_ Kirkland, WA, USA **Advanced** Experience owning outcomes and ... on and is growing every day. As a software engineer , you will work on a specific project critical...Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability… more
- Micron Technology, Inc. (Richardson, TX)
- …intelligence, inspiring the world to learn, communicate and advance faster than ever. As an HPC Staff Engineer at Micron, you will join a diverse team of ... You will play a key part in maintaining the reliability and efficiency of Micron's data environment. **Responsibilities** +...from candidates as consideration for their employment with Micron. AI alert **:** Candidates are encouraged to use … more
- NVIDIA (Santa Clara, CA)
- NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, knowledge of datacenter hardware, operations, ... and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly… more
- Google (Sunnyvale, CA)
- …architecture and its integration within AI /ML-driven systems. As a Quality and Reliability Engineer for Google Cloud, you will lead the development of ... Staff Quality and Reliability Engineer , Google Cloud _corporate_fare_ Google...Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability… more
- Bloomberg (New York, NY)
- …for overseeing the ongoing monitoring, support, and maintenance of our HPC / AI clusters, ensuring peak performance and reliability . **We'll trust you to:** ... Senior Software Engineer - AI Hardware Location New York...ongoing monitoring, support, and maintenance of our HPC / AI clusters, ensuring peak performance and reliability … more
- Microsoft Corporation (Redmond, WA)
- …so that everyone can realize its benefits. We're looking for an experienced **Site Reliability Engineer (SRE)** to join our infrastructure team. In this role, ... **Overview** As Microsoft continues to push the boundaries of AI , we are on the lookout for passionate individuals to work with us on the most interesting and… more
- Dell Technologies (Austin, TX)
- **Principal Mechanical Reliability Engineer ** Mechanical Engineering leads and delivers the development of innovative and compliant mechanical design solutions, ... make a profound social impact as a **Principal Mechanical Reliability Engineer ** on our Mechanical **Engineering** Team...be instrumental in delivering advanced liquid cooling solutions for AI , HPC , and enterprise server markets. Your… more
- Oracle (Cheyenne, WY)
- … AI Infrastructure Innovation team is pioneering the creation of next-generation AI / HPC networking for GPU superclusters at massive scale. Our mission is ... system design, and implementation for high-performance RDMA solutions across OCI's AI / HPC platforms, including frontend and backend fabrics. + Innovate… more
- Oracle (Lincoln, NE)
- …solutions across Oracle's enterprise customers. We are seeking a highly skilled ** AI /ML Infrastructure Engineer ** to design, build, and support the systems, ... troubleshooting, and best practices. + Stay current with emerging trends in AI infrastructure, agent frameworks, HPC systems, and cloud-native technologies;… more
- NVIDIA (Santa Clara, CA)
- …a passionate engineer who will solve networking problems for scalable AI clusters. This is a hands-on network engineering position focused on the architecture, ... and deployment of global-scale DCs inter-connects and fabric for HPC , AI , and GPU computing clusters. +...reliability . + Partner with system, OS, GPU, and HPC teams to deliver scalable, highly available networks for… more
- Oracle (Springfield, IL)
- …Forward Deployed Engineer (FDE) team is hiring a Senior Principal Software Development Engineer - AI Data Platform to help global customers unlock the full ... to streamline the adoption of Oracle AI Data Platform and Gen AI services. + Optimize performance, scalability, and reliability of distributed data/ AI… more
- Oracle (Santa Clara, CA)
- …and debug software programs for databases, applications, tools, networks etc.As an AI /ML Infrastructure Engineer on the GPU Strategic Customers Engineering team, ... or Scala + Proven experience designing, implementing, and managing infrastructure for AI /ML or HPC workloads. + Understanding machine learning frameworks and… more
- Oracle (Austin, TX)
- …automation, and diagnostic services. These are essential for running distributed AI /ML/ HPC workloads across thousands of GPUs, leveraging technologies like ... looking for a highly skilled and motivated distributed systems engineer who can architect solutions to scale and optimize...to scale and optimize Monitoring and Repair solutions for AI infrastructure components like GPU control plane and GPU… more