• Senior AI Cluster

    NVIDIA (Santa Clara, CA)
    …be doing: + Build internal perf/power profiling and analysis tools and platform for AI workloads at cluster scale + Build debugging tools for common ... frameworks like Pytorch, TensorFlow and etc + Knowledge of AI cluster job scheduling, storage management and...GPU cluster scale continuous profiling & analysis tools /platforms + Solid experience in large AI more
    NVIDIA (12/19/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer…

    NVIDIA (Santa Clara, CA)
    …5K GPUs cluster . + Deep understanding of GPU computing and AI infrastructure. + Passion for solving complex technical challenges and optimizing system ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...Solid experience with GPU clusters, and working knowledge of cluster configuration management tools such as BCM… more
    NVIDIA (12/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior AI Infrastructure Engineer

    NVIDIA (Santa Clara, CA)
    We are now seeking a Senior AI Infrastructure Engineer! NVIDIA's Compute Architecture Group is growing our team of AI focused Infrastructure Engineers who ... What you'll be doing: + Administer an NVIDIA Internal AI cluster composed of Linux systems ranging...updates, and maintenance of system availability using modern DevOps tools (Ansible, Gitlab, etc.) + Plan and maintain new… more
    NVIDIA (11/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior SRE Engineering Leader - AI

    NVIDIA (Santa Clara, CA)
    NVIDIA is leading the way in the AI revolution, revolutionizing industries with our brand-new GPU technology. Our GPUs drive groundbreaking innovations, from ... in computer vision, speech recognition, and more. As "the AI computing company," we constantly push the limits of...leaders to join us on an exciting journey as Senior SRE Engineering Leader. Lead our globally distributed clusters,… more
    NVIDIA (10/08/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, HPC…

    NVIDIA (Santa Clara, CA)
    …Primary responsibilities will be to validate and debug customer cluster performance issues, functional bottlenecks and drive customer technical engagements ... ecosystems. You'll be called on to help architect and scale high-performance, distributed AI infrastructure on-prem or in the cloud built with the latest NVIDIA GPU… more
    NVIDIA (12/11/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer, Kubernetes - DGX…

    NVIDIA (Santa Clara, CA)
    …experienced software engineers with kubernetes experience to help scale up its AI Infrastructure. We expect you to have significant software engineering experience ... with kubernetes including cluster operations, operator development, node health monitoring and working...deploy leading infrastructure solutions for a broad range of AI -based applications. If you're creative, passionate about kubernetes and… more
    NVIDIA (11/23/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, NPN

    NVIDIA (Santa Clara, CA)
    …both on-premises and cloud based. + 12+ years of proven experience with cluster management and related tools , including Docker Containers, Slurm, Kubernetes, and ... part of a team that's revolutionizing the field of AI with data center scale solutions? We are looking...are the voice of experience, using Kubernetes, SaaS, infrastructure-as-code tools , network debugging, and problem solving skills to help… more
    NVIDIA (12/17/24)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer, Foundation Model…

    NVIDIA (Santa Clara, CA)
    …JAX, or TensorFlow. + Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes. + Strong programming skills in ... NVIDIA is searching for a senior or principal engineer who specializes in building...works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka… more
    NVIDIA (12/07/24)
    - Save Job - Related Jobs - Block Source
  • CephFS Senior Software Engineer

    IBM (San Jose, CA)
    …talk. Your Role and Responsibilities IBM's Ceph[1] engineering organization is looking for a senior software engineer to join the CephFS team. In this role you will ... to higher-level APIs for integrating with other systems (OpenStack, OpenShift, an NFS-Ganesha cluster , Samba, etc). As a member of the CephFS engineering team, you… more
    IBM (10/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Test Development Engineer…

    NVIDIA (Santa Clara, CA)
    We are looking for a highly experienced AI Senior Software Test development engineer in NVIDIA's Deep Learning SWQA team. The position is in NVIDIA Deep Learning ... and AI Software Quality Assurance team that defines, develops and...in validating Data Center GPU based infrastructure (multi-GPUs, multi-nodes, cluster ) + Background in validating fault tolerance infrastructure +… more
    NVIDIA (12/06/24)
    - Save Job - Related Jobs - Block Source
  • Senior Technical Program Manager - GPU…

    NVIDIA (Santa Clara, CA)
    Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support the bringup, operations and ... infrastructure we build and operate enables NVIDIAs most advanced AI and hardware researchers and engineers to create the...a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be… more
    NVIDIA (12/03/24)
    - Save Job - Related Jobs - Block Source
  • Senior ASIC Physical Design Engineer,…

    NVIDIA (Santa Clara, CA)
    …graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a "learning machine" that ... parallel computing! More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...high-frequency and low-power CPUs, GPUs, SoCs at block level, cluster level, and/or full chip level, with a focus… more
    NVIDIA (12/25/24)
    - Save Job - Related Jobs - Block Source
  • Senior ASIC Timing Engineer

    NVIDIA (Santa Clara, CA)
    …graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a "learning machine" that ... parallel computing! More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...Nvidia's GPUs, CPUs, DPUs and SoCs at block level, cluster level, and/or full chip level. + Work with… more
    NVIDIA (12/03/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer, Languages (Rust,…

    LinkedIn (Mountain View, CA)
    …This spans multiple areas including but not limited to: Providing tools and infrastructure that creates delightful development experience; Provide and support ... member facing products; Provide actionable insights that leverages data and metrics; Provide tools and data that help LinkedIn teams listen to their customers; and… more
    LinkedIn (12/21/24)
    - Save Job - Related Jobs - Block Source
  • Senior Scientist, Pathology Data Science

    Merck (South San Francisco, CA)
    …well as comfort with image analysis commercial software and/or open-source tools is highly desirable. **Key Responsibilities:** + Analyze, visualize, and summarize ... to analyze high-dimensional datasets + Build end to end data visualization tools for non-programmer that help understand and interpret the complex high-dimensional… more
    Merck (12/14/24)
    - Save Job - Related Jobs - Block Source
  • Senior System Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …can perceive and understand the world. Today, we are increasingly known as "the AI computing company." We're looking to grow our company and build our teams with ... Hardware Reliability Engineering for Electronics/Server Systems (graphics cards, server, rack, cluster ) from Concept to End-of-Life phase. ​ + Establish, deliver and… more
    NVIDIA (11/30/24)
    - Save Job - Related Jobs - Block Source
  • Principal Infrastructure SRE - Storage

    NVIDIA (Santa Clara, CA)
    …crowd: + Deep understanding of other infrastructure components like DNS, LDAP, NIS, Security Tools etc. + Experience with HPC cluster management tools such ... people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An...a focus on infrastructure automation. + Develop and maintain tools for collecting, analyzing, and visualizing data for reporting,… more
    NVIDIA (10/27/24)
    - Save Job - Related Jobs - Block Source