• Production Systems Engineer

    Meta (Menlo Park, CA)
    production issue triage, rolling out new features in FW/Driver. **Required Skills:** Production Systems Engineer , AI Systems Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI /ML initiatives supporting large scale AI more
    Meta (03/06/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer

    Meta (Menlo Park, CA)
    production issue triage, rolling out new features in FW/Driver. **Required Skills:** Production Systems Engineer , AI Systems Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI /ML initiatives supporting large scale AI more
    Meta (01/20/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer

    Meta (Menlo Park, CA)
    …platforms, all the way to mass production and deployment. **Required Skills:** Production Systems Engineer , AI Systems Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production...Inference Accelerator (MTIA) program as a part of the AI /ML initiatives supporting large scale AI Training… more
    Meta (01/25/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer

    Meta (Menlo Park, CA)
    …health and lifecycle of servers in production . **Required Skills:** Production Systems Engineer , Fleet AI Systems Responsibilities: 1. Interface ... **Summary:** Meta is seeking a Production Systems Engineer to...systems issues. 15. 2+ years of experience supporting AI or HPC systems and/or related … more
    Meta (03/15/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer

    Meta (Menlo Park, CA)
    …health and lifecycle of servers in production . **Required Skills:** Production Systems Engineer , Fleet AI Systems Responsibilities: 1. Interface ... **Summary:** Meta is seeking an experienced Production Systems Engineer to... hyperscale environments, engineering varying solutions to wide-reaching, at-scale systems issues. 21. Experience supporting AI /HPC … more
    Meta (01/20/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer , NPI…

    Meta (Menlo Park, CA)
    **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI /ML initiatives supporting large scale AI ... services, and data center operations teams to enable new systems that will be deployed in our production...Silicon hyperscalar bring up and validation. **Required Skills:** Hardware Systems Engineer , NPI AI Responsibilities:… more
    Meta (01/24/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer

    Meta (Menlo Park, CA)
    … based approach to the new product introduction (NPI) phase. **Required Skills:** Hardware Systems Engineer , AI NPI Responsibilities: 1. Drive and execute ... services, and data center operations teams to enable new systems that will be deployed in our production...strategy (hardware and software), with a focus on various AI /HPC hardware systems in datacenter applications. 2.… more
    Meta (02/05/25)
    - Save Job - Related Jobs - Block Source
  • AI /HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Systems Performance Engineer Responsibilities: 1. Active ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: network… more
    Meta (03/05/25)
    - Save Job - Related Jobs - Block Source
  • Partner Engineer , Generative AI

    Meta (Menlo Park, CA)
    **Summary:** Meta is seeking a Partner Engineer to join Meta's AI Partner Engineering team, a highly technical team that works with strategic partners, machine ... evangelize Meta's AI design patterns and best practices. **Required Skills:** Partner Engineer , Generative AI Responsibilities: 1. Apply relevant AI and… more
    Meta (01/30/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , AI

    NVIDIA (Santa Clara, CA)
    …expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable at all times. What You'll Be ... We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA,...+ Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production -level C++ and Python… more
    NVIDIA (03/19/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …and blameless postmortems + Be part of an on call rotation to support production systems + Write and review code, develop documentation and capacity plans, ... automation to improve researchers productivity. As a Site Reliability Engineer , you are responsible for the big picture of...Deployment, BCM, Terraform. + Understanding of fast, distributed storage systems like Lustre and GPFS for AI /HPC… more
    NVIDIA (12/25/24)
    - Save Job - Related Jobs - Block Source
  • Principal Software Engineer - Enterprise…

    NVIDIA (Santa Clara, CA)
    …can connect to enterprise data sources and power search, chatbots and other gen AI applications + Develop platform and systems enabling unified experience across ... people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An...and products that improve business efficiency and productivity. This engineer is expected to be familiar with concepts of… more
    NVIDIA (01/08/25)
    - Save Job - Related Jobs - Block Source
  • Principal Staff Machine Learning Engineer

    LinkedIn (Sunnyvale, CA)
    …Machine Learning and Artificial Intelligence Preferred QualificationsExperience in bringing large scale AI systems to production .PhD in Computer Science, ... within FAIT and across the company to realize these AI innovations. As a Principal Staff Engineer ...define the bar for quality and efficiency of software systems while balancing business impact, operational impact and cost… more
    LinkedIn (01/19/25)
    - Save Job - Related Jobs - Block Source
  • Machine Learning Engineer , AI

    Cisco (San Jose, CA)
    …learning technologies. The ideal candidate will help build and maintain scalable AI systems while ensuring robust deployment and operational excellence. ... part of our journey! **Role** As the Machine Learning Engineer , AI Platform in the Splunk ...Engineers and Applied Scientists to build efficient model serving systems + Monitor system performance and implement improvements for… more
    Cisco (03/21/25)
    - Save Job - Related Jobs - Block Source
  • Senior Principal Machine Learning Engineer

    Cisco (San Jose, CA)
    …(LLM). + Experience developing large-scale, complex models and deploying them in production systems . + Experience large-scale data processing and parallel ... and executing the technical roadmap for the team, as we develop the core AI /ML capabilities to power the entire Splunk product portfolio and help our customers to… more
    Cisco (03/14/25)
    - Save Job - Related Jobs - Block Source
  • Human-Assisted AI Research Engineer

    Bosch (Sunnyvale, CA)
    …+ Experience with container orchestration platforms like Kubernetes. + Hands-on experience in the production of AI systems . + Good communication and teamwork ... services in application areas such as automated driving, advanced driver assistance systems (ADAS), robotics, smart manufacturing, enterprise AI , health care,… more
    Bosch (03/04/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer

    Meta (Menlo Park, CA)
    **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are ... cycle of servers in production . **Required Skills:** Production Systems Engineer , Sustaining Responsibilities:...hardware at scale 9. Experience in deploying and productionizing AI /HPC systems and/or related components at scale… more
    Meta (01/23/25)
    - Save Job - Related Jobs - Block Source
  • Senior Observability Engineer , AI

    NVIDIA (Santa Clara, CA)
    …Observability Engineer to help architect and implement our distributed observability systems for AI and HPC clusters. We serve and collaborate directly ... with NVIDIA's rapidly growing AI , HW, and SW engineering and research teams across...be working with a team of dedicated engineers on systems for data collection, aggregation, enrichment, storage, retrieval, and… more
    NVIDIA (01/31/25)
    - Save Job - Related Jobs - Block Source
  • AI /HPC Network Engineer

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Network Engineer Responsibilities: 1. Design, develop, test and ... operate networking systems to support large scale AI training...more. 5. Be oncall to learn from real world production challenges and take the lessons to improve current… more
    Meta (02/06/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer

    Meta (Menlo Park, CA)
    …validation, supporting customer deployment, production issue triage. **Required Skills:** Production Systems Engineer , Cooling & Power Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production...scaling and deployment challenges requires us to take a systems based approach to AI system bring… more
    Meta (01/19/25)
    - Save Job - Related Jobs - Block Source