- Meta (Menlo Park, CA)
- …platforms, all the way to mass production and deployment. **Required Skills:** Production Systems Engineer , AI Systems Responsibilities: ... **Summary:** Meta is seeking a Systems Engineer to join our Release to Production...Inference Accelerator (MTIA) program as a part of the AI /ML initiatives supporting large scale AI Training… more
- Meta (Menlo Park, CA)
- …health and lifecycle of servers in production . **Required Skills:** Production Systems Engineer , Fleet AI Systems (NetZero) Responsibilities: 1. ... **Summary:** Meta is seeking a Production Systems Engineer to...full system technologies, full system lifecycle 16. Experience supporting AI /HPC systems and/or related components at scale.… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …who want to make an impact on the world of technology. Cadence Design Systems is a world leader in providing computational software for all aspects of intelligent ... R&D team working on the emerging boundary of scientific computing and machine learning/ AI , with specific emphasis in the analog electronic circuit analysis area. The… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Systems Performance Engineer Responsibilities: 1. Lead ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across stack: network… more
- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI /HPC Systems Performance Engineer Responsibilities: 1. Active ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: network… more
- Capital One (San Francisco, CA)
- …to improve the performance - scalability, cost, latency, throughput - of large scale production AI systems . + Contribute to the technical vision and ... (61049), United States of America, San Francisco, California Distinguished AI Engineer Overview: At Capital One, we...latest AI research and AI systems , and judiciously apply novel techniques in production… more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking a Partner Engineer to join Meta's AI Partner Engineering team, a highly technical team that works with strategic partners, machine ... evangelize Meta's AI design patterns and best practices. **Required Skills:** Partner Engineer , Generative AI Responsibilities: 1. Apply relevant AI and… more
- NVIDIA (Santa Clara, CA)
- …and blameless postmortems + Be part of an on call rotation to support production systems + Write and review code, develop documentation and capacity plans, ... automation to improve researchers productivity. As a Site Reliability Engineer , you are responsible for the big picture of...Deployment, BCM, Terraform. + Understanding of fast, distributed storage systems like Lustre and GPFS for AI /HPC… more
- LinkedIn (Sunnyvale, CA)
- …Machine Learning and Artificial Intelligence Preferred QualificationsExperience in bringing large scale AI systems to production .PhD in Computer Science, ... within FAIT and across the company to realize these AI innovations. As a Principal Staff Engineer ...define the bar for quality and efficiency of software systems while balancing business impact, operational impact and cost… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior High-Performance AI Training Engineer : NVIDIA is seeking senior engineers who are obsessed with performance analysis and ... help us squeeze every last clock cycle out of AI training, the workload driving the design and construction...and construction of the largest and most powerful compute systems in the world. This role offers the opportunity… more
- eightfold.ai (Santa Clara, CA)
- …for Machine Learning & AI + Implement best practices for building AI -enabled products + Develop AI -based systems for Natural Language Processing ... is NOT a remote position ) About Eightfold Eightfold AI is the industry leader in AI -powered...frameworks (scikit-learn, tensorflow, torch, etc.) + Experience with implementing production machine learning systems and working with… more
- Google (Mountain View, CA)
- …and operability, and follow production principles in maintaining customer-facing production systems . + Influence, mentor, and coach a distributed team ... + Experience building LLM or ML Infrastructure, and working with Generative AI /ML technologies or similar. Preferred qualifications: + Master's degree or PhD in… more
- NVIDIA (Santa Clara, CA)
- …can perceive and understand the world. Today, we are increasingly known as "the AI computing company." We're looking to grow our company and establish teams with the ... productivity required for strong scaling for HPC and generative AI workload.Scale out is inherent to design of this...We are looking for a strong technical platform software engineer focused on PCIe firmware, you will own PCIe… more
- NVIDIA (Santa Clara, CA)
- …highly valued + Hands-on experience identifying vulnerabilities and implementing security measures for AI systems in production environments + Experience in ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing - with...+ Proficiency in working with scalable, high-availability, and low-latency systems (Kubernetes and Docker) Ways to stand out from… more
- NVIDIA (Santa Clara, CA)
- …generative AI platform and products + Develop platform and systems enabling unified experience across applications and driving insights for end-to-end user ... people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An...shaping the architecture, development, and scaling of our software systems . This role will give an opportunity to collaborate… more
- Meta (Menlo Park, CA)
- …space of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - AI Networking Responsibilities: 1. Enabling reliable ... learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance optimizations,… more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are ... cycle of servers in production . **Required Skills:** Production Systems Engineer , Sustaining Responsibilities:...hardware at scale 9. Experience in deploying and productionizing AI /HPC systems and/or related components at scale… more
- Cisco (San Jose, CA)
- …window is expected to close on November 15, 2024. Who We Are The Cisco Security AI team delivers AI products and platform for all Cisco Secure products and ... customers secure by simplifying security with zero compromise using AI and Machine Learning. Who You Are You are...Who You Are You are a passionate Machine Learning Engineer who is building their career through successfully building,… more
- Tarana Wireless (Milpitas, CA)
- …and help us to consistently deliver high-quality, high-reliability, cost-effective machine learning and AI systems as part of various products. You will guide ... preferred + 5-12 years of experience building large scale ML/ AI models and systems Knowledge, Skills and...markets, using either licensed or unlicensed spectrum. G1 started production in mid 2021 and has now been installed… more
- Google (Sunnyvale, CA)
- …Experience in architecting and developing software or infrastructure for scalable, distributed systems . + Experience in data and information management as it relates ... that drive differentiation, customer acquisition, and business acceleration. As a Customer Engineer , you will partner with technical Sales teams as a subject matter… more