- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) is an engineering discipline that involves designing, building, and maintaining large-scale production systems with high ... software and systems engineering practices, storage, data management, and services. SRE professionals are highly specialized and possess expertise in different… more
- NVIDIA (Santa Clara, CA)
- As a Sr Manager in Site Reliability Engineering ( SRE ), you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, ... software and systems engineering, cloud-scale storage, data management, and services. SRE Senior Managers bring specialized expertise in areas such as systems,… more
- NVIDIA (Santa Clara, CA)
- …the platform upon which every new AI-powered application is built. We are seeking a SRE Manager to build and manage SREs which monitor and operate both the factory ... floor opportunity to form a team and define the SRE role in the NIM program. Your team will...services. + You will partner with internal and external SRE team leadership to provide the best experience for… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a seasoned SRE to join its complex and fast-paced Infrastructure, Planning and Processes organization where you will be working as a Senior ... SRE Engineer. The position will be part of a...cater to their infrastructure & systems needs. As an SRE , you'll also be working in conjunction with various… more
- EPAM Systems (San Jose, CA)
- As a ** SRE Lead - Toil Analysis** , you will...to date on industry trends and best practices in SRE and automation **Requirements** + 10+ years of experience ... and reducing operational overhead through automation + Strong understanding of SRE principles and best practices + Proficiency in programming and scripting… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a seasoned SRE to join its multifaceted and fast-paced Infrastructure, Planning and Processes organization where you will be working as a ... Senior SRE Engineer. The position will be part of a...cater to their infrastructure & systems needs. As an SRE , you'll also be working in conjunction with various… more
- Insight Global (Mountain View, CA)
- …worked their way into L2/L3 support, doing extensive scripting as they moved into the SRE function. This role is the right fit for someone who loves to be ... with monitoring tools such as Grafana or Prometheus -Experience working in a SRE or L2/L3 support (on-call environment) -Extremely motivated, excited to be part of… more
- General Motors (Sunnyvale, CA)
- …ecosystem. Additionally, this team works closely with the Engineering, Architecture and SRE teams to develop and standardize new patterns, libraries and services ... that are needed as part of GM's digital product transformation. **What You'll Do:** - Defines and leads corporate software strategy for new technology, highly complex features, or significant enhancements for current, new, or major programs. - Provide,… more
- Cisco (Milpitas, CA)
- …incident handling and reporting. Minimum Qualifications * Prior experience in SRE or DevOps * Experience administration, virtualization technologies, HTTP/HTTPS * ... Experience building and working with cloud-based distributed systems, preferably public cloud such as AWS, Azure, or GCP * Experience working with Containers and Containerization tools such as Docker, Kubernetes * Experience working with GitOps tools such as… more
- LinkedIn (Sunnyvale, CA)
- …* Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our Internet-facing services * Gain deep knowledge of our ... complex applications * Assist in the roll-out and ramp up of new product features and technologies to facilitate our rapid iteration and constant growth * Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a… more
- Intuit (Mountain View, CA)
- …Our Senior Manager is an engineering leader who works with the engineering staff to innovate and build new engineering solutions, improve, and enhance existing ... solutions as well as leverage engineering solutions to solve critical operational problems. You will also be responsible for growing and developing a high performing engineering team, who are driven and autonomous in nature. You will support the development of… more
- Intuit (Mountain View, CA)
- …Intuit Fintech is your trusted financial expert empowering financial prosperity for businesses and consumers through a convenient, powerful, AI-native fintech ... platform providing fast and easy access to funds at the time of need. We process millions of transactions every day across various payment methods. Millions of customers and merchants send billions of dollars moving at light-speed through our systems annually.… more
- Intuit (Mountain View, CA)
- …Intuit Fintech is your trusted financial expert empowering financial prosperity for businesses and consumers through a convenient, powerful, AI-native fintech ... platform providing fast and easy access to funds at the time of need. We process millions of transactions every day across various payment methods. Millions of customers and merchants send billions of dollars moving at light-speed through our systems annually.… more
- NVIDIA (Santa Clara, CA)
- …It is a unique legacy of innovation that's fueled by great technology and amazing people. Today, we're tapping into the unlimited potential of AI to define the next ... era of computing. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN,… more
- NVIDIA (Santa Clara, CA)
- …A world where the network learns from past events to recommend actions to users. Or better yet, a network that proactively prevents actions with high probability of ... causing disruption. This network is advanced and intelligent where disruptions are minimized and emerging technology is easily integrated to maintain a first-class service for our business. If that sounds exciting, NVIDIA is looking for you to develop a smart… more
- Google (Sunnyvale, CA)
- …Engineering. + 1 year of people management experience. Site Reliability Engineering ( SRE ) combines software and systems engineering to build and run large-scale, ... massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and...users' needs and a fast rate of improvement. Additionally SRE 's will keep an ever-watchful eye on our systems… more
- NVIDIA (Santa Clara, CA)
- …a great opportunity for you! NVIDIA is seeking a Senior Site Reliability Engineer ( SRE ) for the Data Science & ML Platform(s) team. The role involves designing, ... high efficiency and availability of the platform, as well as applying SRE principles to improve production systems and optimize service SLOs. Additionally,… more
- TEKsystems (Cupertino, CA)
- …in proprietary tool boxes and it's predicted to reach 100% in 3 months, SRE would help detect that and procure and provision additional hardware. Another example is ... when the C* SRE communicates the dev team that the storage of...the Workflow Platform. In the case of K8s workloads, SRE will work with Platform team to automate the… more
- Google (Sunnyvale, CA)
- …+ Master's degree in Computer Science or Engineering. Site Reliability Engineering ( SRE ) combines software and systems engineering to build and run large-scale, ... massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical...customer's needs and a fast rate of improvement. Additionally SRE 's will keep an ever-watchful eye on our systems… more
- McAfee, Inc. (San Jose, CA)
- …monitoring, logging, and alerting solutions to maintain system health and security. ** SRE Leadership:** + Drive SRE practices by implementing strategies that ... and guide junior engineers in cloud architecture, DevOps, and SRE best practices. + Act as a subject matter...subject matter expert on AWS cloud solutions, DevOps, and SRE practices within the organization. **Documentation & Reporting:** +… more