- Bosch (Sunnyvale, CA)
- …journals such as CVPR, ICRA, IROS, RSS, NeurIPS and CoRL. **Job Description** As the Distributed Embodied AI Systems intern, you will perform research on ... that take advantage of technologies in the field of reliable distributed computing. We work with internal...future prediction for latency mitigation in distributed embodied AI systems . A… more
- Microsoft Corporation (Redmond, WA)
- …healthcare, economics, and the environment. Are you passionate about building the future of reliable , large-scale cloud and AI systems ? The ** Systems ... Interns to tackle cutting-edge challenges at the intersection of distributed systems , AI systems...letter. **Preferred Qualifications** + Experience of building scalable and reliable systems . + Demonstrated ability to develop… more
- NVIDIA (Santa Clara, CA)
- …design, or enterprise platform engineering. + Deep expertise in architecting large-scale distributed systems with a focus on reliability, performance, and ... record of publishing technical papers, architecture patterns, or thought leadership in AI systems . + Knowledge of observability tools, telemetry dashboards, and… more
- Cisco (San Jose, CA)
- …platforms, such as AWS, Azure, or Google Cloud. + Understanding of distributed systems concepts, including scalability, reliability, fault tolerance, and data ... Team** Our dedicated team members are building the future of Cisco's AI -driven platforms and data infrastructure, supporting innovation across the globe. You will… more
- NVIDIA (Santa Clara, CA)
- …you will work with internal teams and external partners to integrate distributed systems , manage large-scale data pipelines, and operationalize next-generation ... pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution. + Integrate simulation and drive logs (eg… more
- Oracle (Nashville, TN)
- …Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more
- NVIDIA (Santa Clara, CA)
- …and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more
- Oracle (San Juan, PR)
- …Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more
- Oracle (Columbus, OH)
- …. This is a highly technical, hands-on role where you'll build large-scale distributed systems , optimize AI /ML workflows, and collaborate with ... observability, CI/CD pipelines, and operational excellence. Troubleshoot complex issues in distributed systems and participate in on-call rotations as needed.… more
- Oracle (Redwood City, CA)
- …data architectures (data mesh, lakehouse, etc.). + Expertise in **data modeling, distributed systems , and performance optimization.** + Proven ability to ... you ready to shape the future of intelligent data systems ? We're seeking an ** AI and Data...Collaborate with engineers, product teams, and researchers to build systems that are ** reliable , scalable, and production-ready.**… more
- Walmart (Sunnyvale, CA)
- …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable...Walmart GTP, you will be building highly scalable and reliable APIs, services and applications which will drive the… more
- Amazon (Redmond, WA)
- …for a Data Engineering Manager who will design, implement, and operate globally distributed systems that enable Leo to achieve low single-digit-second query ... real-time analytics layer or lakehouse, and to support agentic AI capabilities on top. You'll build these systems...user experience in real time. We combine expertise in distributed systems , data lakehouse architectures, and applied… more
- Amazon (Redmond, WA)
- …is for a Data Engineer who will design, implement, and operate globally distributed systems that enable Leo to achieve low single-digit-second query responses ... real-time analytics layer or lakehouse, and to support agentic AI capabilities on top. You'll build these systems...user experience in real time. We combine expertise in distributed systems , data lakehouse architectures, and applied… more
- Charles Schwab (San Francisco, CA)
- …+ Champion reliability, monitoring, observability, and operational best practices for AI systems and data pipelines. + Collaborate with cross-functional ... in the development process. You will ensure that the systems we build are robust, reliable , and...troubleshoot complex problems with ambiguous or incomplete data in distributed systems . + Curiosity about new technologies… more
- NVIDIA (Austin, TX)
- …from the crowd: + Technical competency in managing and automating large-scale distributed systems independent of cloud providers. Advanced hands-on experience ... part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be...Bright Cluster Manager) + Proven operational excellence in maintaining reliable and performant AI infrastructure. NVIDIA is… more
- GE Vernova (Niskayuna, NY)
- …neural network architectures (eg, CNNs, RNNs, Transformers). + Expertise in designing scalable, distributed architectures for AI systems . + Strong experience ... Azure, GCP) and containerization (Kubernetes, Docker). + Familiarity with large-scale distributed systems and database technologies. + Experience in creating… more
- Microsoft Corporation (Redmond, WA)
- …AI incident response; researching the quickly evolving threat landscape; red teaming AI systems for failures; and empowering Microsoft with this knowledge. We ... We are seeking an experienced **Senior Software Engineer - AI Safety and Security** to join a high impact...building, and operating scalable, highly available cloud services or distributed systems on platforms such as Azure,… more
- Oracle (Nashville, TN)
- …+ Stay updated with industry trends, emerging technologies, and best practices in distributed systems and AI infrastructure management. **Qualifications & ... automation, and diagnostic services. These are essential for running distributed AI /ML/HPC workloads across thousands of GPUs,...We are looking for a highly skilled and motivated distributed systems engineer who can architect solutions… more
- Paycom Online (Oklahoma City, OK)
- …give and receive concrete feedback.** + **Experience in deploying and scaling containerized, distributed software and AI systems using tools such as ... in "traditional" NLP tools** + **Experience in SOA, Modular Monolith Architecture, and distributed systems for AI training and inference** + **Familiarity… more
- Zebra Technologies (Holtsville, NY)
- …Knowledge of NLP, computer vision, or reinforcement learning. + Familiarity with multi-agent systems and distributed AI frameworks. + Strong communication ... processing (NLP), or computer vision. + Build, test, deploy, and maintain software and AI systems , ensuring high performance, security, and scalability. + Own a… more