- NVIDIA (Santa Clara, CA)
- …and providing related services to support the overall stability and performance of the production systems. SRE at NVIDIA ensures that our internal and external ... Site Reliability Engineering ( SRE ) is an engineering discipline that involves designing, building, and maintaining large-scale production systems with high… more
- Intuit (Mountain View, CA)
- Overview Come join the Intuit FinTech Payments Platform as a Software Engineer . Intuit Fintech is your trusted financial expert empowering financial prosperity for ... will lead + Be the first level of support and handle and investigate incidents, production issues, and alerts + Identify, design and build tools that are focused on… more
- LinkedIn (Sunnyvale, CA)
- …Join us to challenge yourself with work that matters. LinkedIn is looking for a Senior Software Engineer to join our Edge Infrastructure team. We focus on all ... from the early stages of design all the way through identifying and resolving production issues. The ideal candidate will be passionate about this unique niche of… more
- NVIDIA (Santa Clara, CA)
- …decision-making culture? If so, we have a great opportunity for you! NVIDIA is seeking a Senior Site Reliability Engineer ( SRE ) for the Data Science & ML ... and availability of the platform, as well as applying SRE principles to improve production systems and...curiosity, problem solving, and openness are essential. As a Senior SRE at NVIDIA, you will have… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud… more
- NVIDIA (Santa Clara, CA)
- …and geographies. + 5+ years in similar role and experience on large-scale production systems. Experience with the aforementioned DevOps/ SRE principles, tools and ... principles and techniques including reliability assessments, incident management processes, production system observability, monitoring and alerting, automated deployments and… more
- Platform9 Systems (San Jose, CA)
- Senior DevOps Engineer Location: REMOTE IN THE USA or HYBRID IN SAN JOSE, CA About Platform9 Cloud native is a strategic priority for technology leaders at ... Role We are seeking a highly motivated and experienced Senior DevOps Engineer to join our growing...to automate deployments, manage infrastructure as code, and troubleshoot production issues. This is a unique opportunity to work… more
- NVIDIA (Santa Clara, CA)
- …trouble-shooting of compute hardware and networking equipment . As a software engineer , you will work with other software engineers, product architects, and product ... code - from development to commit to test to production , including operational support . We expect you...communication protocols (mutual-TLS, IPsec, or similar). + Knowledge of SRE principles (observability, SLOs, logging, etc.) Ways to stand… more
- Palo Alto Networks (Santa Clara, CA)
- …and operate reliable, secure Cloud infrastructure + Ensure that applications are production -ready, scalable, and reliable + Develop tools and automation frameworks + ... + Orchestrate end-to-end monitoring and alerting + Participate with SRE and Dev teams in the on-call rotation +...+ Lead root cause analysis of critical business and production issues + Participate in design reviews **Your Experience**… more
- Walmart (Sunnyvale, CA)
- …large scale services + You have experience in handling and triaging complex production issues + You have solid understanding of building and integrating with ... databases and big data technologies + You innately apply SRE practices, including operational architectures, observability, reliability, availability and scalability… more