- NVIDIA (Santa Clara, CA)
- As a Sr Manager in Site Reliability Engineering ( SRE ), you will lead a team dedicated to the design, construction, and maintenance of expansive ... What We Need To See: + Extensive experience in a senior-level role within Site Reliability Engineering , particularly in managing storage infrastructure. +… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) is an engineering discipline that involves designing, building, and maintaining large-scale production systems ... and availability. It encompasses various areas, including software and systems engineering practices, storage, data management, and services. SRE professionals… more
- Google (Sunnyvale, CA)
- …in Computer Science or Engineering . + 1 year of people management experience. Site Reliability Engineering ( SRE ) combines software and systems ... to learn and grow. To learn more: check out our books on Site Reliability Engineering (https://landing.google.com/ sre /book.html) or read a career profile… more
- Google (Sunnyvale, CA)
- …SDN). Preferred qualifications: + Master's degree in Computer Science or Engineering . Site Reliability Engineering ( SRE ) combines software and ... to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high ... efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demand knowledge… more
- NVIDIA (Santa Clara, CA)
- …full-stack cloud environment to craft, develop, deploy, and run industrial Omniverse applications. Site Reliability Engineering ( SRE ) focuses on ... SaaS offering! We are seeking a highly motivated Senior Site Reliability Engineer to join our Omniverse...it does so by defining and developing deep software engineering solutions and practices, which simplify the operating environment… more
- Palo Alto Networks (Santa Clara, CA)
- …Alerts Management - Clear understanding of incident and alerts management in Site Reliability Engineering + DevOps/ SRE Expertise - 5+ years of experience ... of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear...performance and health. **Your Impact** As a Senior Staff SRE with the Cortex Observability team, you will: +… more
- Google (Sunnyvale, CA)
- …ACLs, DNS, DHCP, SSH, etc.). + Experience with server hardware, storage, and networking. Site Reliability Engineering ( SRE ) or IT production ... + Excellent investigative, problem-solving, communication, and presentation skills. Systems Development Engineering (SDE) at Google is a role where you manage… more
- NVIDIA (Santa Clara, CA)
- …equivalent experience. + Minimum of 8 years of industry experience in network site reliability engineering , network operations, or related areas. Experience ... This role demands a unique blend of hands-on expertise in network operations, engineering , and observability. A proficient Network SRE is dedicated to enhancing… more
- NVIDIA (Santa Clara, CA)
- …field, or equivalent experience. + 8+ years of industry experience in wireless site reliability engineering , wireless network operations, or related areas. ... This role demands a unique blend of hands-on expertise in network operations, engineering , and observability. A proficient Network SRE is dedicated to enhancing… more
- EPAM Systems (San Jose, CA)
- As a ** SRE Lead - Toil Analysis** , you will be...+ Previous experience in a leadership role within a Site Reliability Engineering team + Proven ... implementing initiatives to reduce toil and improve overall system reliability . You will work closely with cross-functional teams to...to date on industry trends and best practices in SRE and automation **Requirements** + 10+ years of experience… more
- Amazon (Cupertino, CA)
- …tempo and quality. - 5+ years or more in software development, systems development, SRE ( Site Reliability Engineering ), or Resilience Engineering ... join us - we are looking for builders like you. The AWS Hardware Engineering team creates server designs for Amazon's innovative web services. Our designs are… more
- General Motors (Mountain View, CA)
- …BS/MS/PhD in Computer Science/ Engineering + 8+ years of experience engineering reliability + Experience building and operating enterprise cloud applications ... experiences to life. As a key member of our SRE team, you'll have the opportunity to shape the...and ensuring secure and compliant infrastructure as part of reliability engineering + Ability to lead technical… more
- Microsoft Corporation (Mountain View, CA)
- …team. + Stay current with industry trends, emerging technologies, and best practices in site reliability engineering and cloud computing. Embody our Culture ... Microsoft is looking for a Senior Site Reliability Engineer ( SRE )...ability to work effectively in a cross-functional team environment. Site Reliability Engineering IC4 -… more
- NVIDIA (Santa Clara, CA)
- …with the necessary resources and scale to foster innovation. We are seeking a Senior Site Reliability Engineer ( SRE ) to join our team. You'll be instrumental ... training and inferencing. The responsibilities include implementing software and systems engineering practices to ensure high efficiency and availability of the… more
- LinkedIn (Mountain View, CA)
- …expectation of participating in an oncall ~1x per month. Come join the Software Engineering SRE team responsible for maintaining one of the largest Streaming ... is a combination development and operational role ensuring the reliability for centralized Pubsub systems at LinkedIn. There will...working closely with our customers across all of LinkedIn Engineering . Additionally, as an embedded SRE , you… more
- Palo Alto Networks (Santa Clara, CA)
- …is the market leader in this space. We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services ... architecture to improve scalability in networking like BGP, OSPF, service reliability , capacity, and performance + Collaborate with development teams to ensure… more
- Palo Alto Networks (Santa Clara, CA)
- …management with a framework such as Ansible, Terraform, Helm + Experience in Site Reliability Engineering , Production Engineering , or DevOps ... This includes automation, architecture, performance, observability, troubleshooting, security, and reliability . Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab… more
- NVIDIA (Santa Clara, CA)
- …designing and operating large scale compute infrastructure + Proven experience in site reliability engineering for high-performance computing environments ... and drive foundational improvements and automation to improve researchers productivity. As a Site Reliability Engineer, you are responsible for the big picture… more
- Netflix (Los Gatos, CA)
- …pipeline team and day-to-day live-streaming operations for Netflix. As a Live Streaming Pipeline SRE , you will be responsible for the reliability of our live ... the world is a hard challenge, demanding exceptional levels of stability and reliability from dozens of services and systems between camera and device screens. About… more