- NVIDIA (Santa Clara, CA)
- In this role, the Senior SRE manager will be leading the SRE functions with a diverse team of systems engineers in close collaboration with SRE teams to ... Lead and grow a team of systems engineers and SRE developing and maintaining key infrastructure services used across...across NVIDIA. + Transform teams of systems engineers into SRE teams. + Build roadmaps for the next generation… more
- Deloitte (San Jose, CA)
- …Job Summary: We are seeking a highly skilled and experienced Site Reliability Engineer ( SRE ) to join our dynamic team. The ideal candidate will have a strong ... you will do: + Monitoring & Performance Management using SRE principles: + Set up and manage monitoring tools...for your service. + Reduce MTTD & MTTR using SRE principles. + Scripting & Automation: + Develop and… more
- LiveRamp (San Francisco, CA)
- …to build and maintain products operational documentation and setting up product SRE practices + Support Security and Compliance governance support in production ... environments + Work in close collaboration with SRE team members and Engineering organizations based in California,...and 5+ years of experience in the fields of SRE , DevOps or production engineering + Experience in Infrastructure… more
- LinkedIn (Mountain View, CA)
- …We are seeking a strategic, hands-on Director to lead the Grid and Streaming SRE team. In this leadership role, you will collaborate closely with development teams ... and monitoring platforms.Lead, mentor, and develop a high-performing team of SRE engineers specialized in data processing, compute and storage for LinkedIn's… more
- Palo Alto Networks (Santa Clara, CA)
- …experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission ... target (for sales/commissioned roles) is expected to be between $147000 - $237500/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here (http://benefits.paloaltonetworks.com/)… more
- Palo Alto Networks (Santa Clara, CA)
- …all win with precision. **Your Career** We're looking for a Technical SRE Leader that has experience supporting large-scale distributed systems. Technology stack ... mindset to operations, monitoring, alerting and remediation. As a SRE Technical Leader, you will be responsible for the...SASE as well as CLoud-NGFW managed services and their SRE operations teams. You will be expected to lead… more
- Abbott (Pleasanton, CA)
- …solutions for our customers. You will be responsible for implementing SRE improvement processes, procedures and influencing change within the organization. You ... environment and have DevOps or formal test automation, load testing or SRE experience. You will need extensive technical knowledge in the development, delivery,… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) is an engineering discipline that involves designing, building, and maintaining large-scale production systems with high ... software and systems engineering practices, storage, data management, and services. SRE professionals are highly specialized and possess expertise in different… more
- Google (Mountain View, CA)
- …analyzing, and troubleshooting large-scale distributed systems. Site Reliability Engineering ( SRE ) combines software and systems engineering to build and run ... large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical...customer's needs and a fast rate of improvement. Additionally SRE 's will keep an ever-watchful eye on our systems… more
- NVIDIA (Santa Clara, CA)
- As a Sr Manager in Site Reliability Engineering ( SRE ), you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, ... software and systems engineering, cloud-scale storage, data management, and services. SRE Senior Managers bring specialized expertise in areas such as systems,… more
- Google (San Francisco, CA)
- …+ Master's degree in Computer Science or Engineering. Site Reliability Engineering ( SRE ) combines software and systems engineering to build and run large-scale, ... massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical...customer's needs and a fast rate of improvement. Additionally SRE 's will keep an ever-watchful eye on our systems… more
- McAfee, Inc. (San Jose, CA)
- …monitoring, logging, and alerting solutions to maintain system health and security. ** SRE Leadership:** + Drive SRE practices by implementing strategies that ... and guide junior engineers in cloud architecture, DevOps, and SRE best practices. + Act as a subject matter...subject matter expert on AWS cloud solutions, DevOps, and SRE practices within the organization. **Documentation & Reporting:** +… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external...while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of… more
- LiveRamp (San Francisco, CA)
- …to build and maintain products operational documentation and setting up product SRE practices** + **Support Security and Compliance governance support in production ... environments** + **Work in close collaboration with SRE team members and Engineering organizations based in California,...+ **3+ years of experience in the fields of SRE , DevOps or production engineering** + **Experience in Infrastructure… more
- Federal Reserve Bank (San Francisco, CA)
- …scaling and operational consistency + Implement/leverage observability, monitoring, and SRE principles (eg, error budgets, proactive incident management) to enhance ... + Guide engineering teams, fostering standard processes in cloud engineering, SRE , and automation + Adopt security standard processes within cloud infrastructure… more
- Palo Alto Networks (Santa Clara, CA)
- …ones needed for this role. **Your Impact** + Contribute to the success of SRE and DevOps + Develop expertise in new technologies + Work with developers, researchers, ... + Orchestrate end-to-end monitoring and alerting + Participate with SRE and Dev teams in the on-call rotation +...critical business and production issues + Mentor and champion SRE culture + Participate in design reviews **Your Experience**… more
- Palo Alto Networks (Santa Clara, CA)
- …assist in escalations. While this role is similar to a Site Reliability Engineer ( SRE ) and lives in the same organization, here you will provide more opportunities ... in engineering troubleshooting roles in fields like Support, QA, Dev and SRE for an Enterprise-sized product delivery + Knowledge/Understanding in scripting and… more
- Palo Alto Networks (Santa Clara, CA)
- …insights into our systems' performance and health. **Your Impact** As a Senior Staff SRE with the Cortex Cloud Security Posture Management team, you will: + Cloud ... incident and alerts management in Site Reliability Engineering + DevOps/ SRE Expertise - 5+ years of experience as a... Expertise - 5+ years of experience as a DevOps/ SRE engineer with a passion for technology and a… more
- NVIDIA (Santa Clara, CA)
- …and scalability across global public and private clouds. + Implement SRE fundamentals, including incident management, monitoring, and performance optimization, while ... or related field, or equivalent experience with 12+ years in Software Development, SRE , or Production Engineering. + Proficiency in Python and at least one other… more
- NVIDIA (Santa Clara, CA)
- …There is an excellent opportunity to architect and drive advancements in the SRE automation on the largest NVIDIA GPU clusters in the cloud! Please apply ... doing: + As part of Maglev AI infrastructure and SRE team you will propose and craft new ways...crowd: + Previous experience with building sophisticated tooling and SRE automation on the large 100+ nodes GPU/CPU clusters… more