- NVIDIA (Santa Clara, CA)
- …the first people to make them operational in production? We are seeking a dedicated Cluster Deployment Operations Engineer to support product deployments ... team, acting as the link between engineering and the NVIS field team for cluster deployment and management solutions! We bridge the gap between product roadmaps… more
- Bloomberg (New York, NY)
- Senior Software Engineer - Market Data Platform, Cluster Management Location New York Business Area Engineering and CTO Ref # 10046371 **Description & ... in it for you:** As a Market Data Platform engineer , you will: + Get hands-on experience working on...for monitoring load and latency. Our platform enables self-service operations and supports Incident Response. + Design - We… more
- Northrop Grumman (Jessup, MD)
- …+ Oversee design, deployment , and lifecycle operation of a high-performance compute cluster + Lead team of HPC Systems Administrators + Assess and respond to ... and risks by performing trade studies of technological function, value proposition, and deployment timeline + Assess and report on cluster operational risks and… more
- Walmart (Sunnyvale, CA)
- …remediation. **Automation & Observability** + Build and standardize automation for cluster deployment , expansion, and monitoring using Ansible, Terraform, and ... **Position Summary ** We are seeking a highly skilled Principal Engineer (Ceph/Scale-Out Storage) with 10years+ of deep technical experience in distributed storage… more
- TP-Link North America, Inc. (Irvine, CA)
- …lifestyle. OVERVIEW We are looking for an experienced Senior Cloud Engineer specializing in Kubernetes development and enhancing underlying system capabilities to ... the underlying architecture, integrating opensource solutions, and improving Kubernetes cluster capabilities. You will directly contribute to the development of… more
- Mastercard (O'Fallon, MO)
- …governments realize their greatest potential._ **Title and Summary** Senior Enterprise Cloud Operations Engineer Job Summary We are seeking an experienced Senior ... Cloud Engineer to join our Cloud Operations team....candidate should be able to automate daily activities, support deployment pipelines, develop observability for the cloud platforms. This… more
- Panasonic Avionics Corporation (Beaverton, OR)
- …**Responsibilities** **The Position:** The K3s Network Engineer will focus on **networking for K3s ... ARM, accelerators). The role involves designing, implementing, and maintaining cluster networking that integrates with external systems. This includes **writing… more
- Amazon (Mesa, AZ)
- …Regional Chief Engineer (CE) to join our Data Center Engineering Operations (DCEO) Team. This committed group works to maintain the critical physical ... software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people… more
- SpaceX (Hawthorne, CA)
- Site Reliability Engineer , GNC (Falcon) Hawthorne, CA Apply SpaceX was founded under the belief that a future where humanity is out exploring the stars is ... goal of enabling human life on Mars. SITE RELIABILITY ENGINEER , GNC (FALCON) SpaceX is looking for a Site...and vehicle simulation and participates in recurring mission-critical launch operations . This position will work with the GNC team… more
- Truist (Charlotte, NC)
- …introducing new capabilities. This includes improving/developing automation for cluster installation, system upgrades, patch management/compliance, and monitoring ... of the overall environment. Other important aspects of this role will be cluster capacity management and providing level two operational support. CAAS support … more
- Cognizant (Bridgewater, NJ)
- …Develop automation scripts in Shell Python + Build internal tools to streamline cluster operations and observability. **Work model:** At Cognizant, we strive to ... As a **DevOps Engineer ** you will make an impact by administering...Retail team. **In this role, you will:** + Perform cluster lifecycle operations including upgrades patching node… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …world of technology. We are seeking a highly skilled and experienced AI Systems Engineer to join our team. This is a hands-on, senior individual contributor role ... that will be pivotal in leading the development, operations , and support of our entire AI infrastructure. You...services on both GCP and Azure. + Hands-on GPU Cluster Management: Take a leadership role in the configuration,… more
- NVIDIA (Santa Clara, CA)
- … operations , and networking, familiarity with software testing and deployment , familiarity with distributed systems, and excellent communication and planning ... management systems (Kubernetes, SLURM.) Hands-on experience in Machine Learning Operations . Hands-on experience with Bright Cluster Manager. + Hands-on… more
- V2X (Springfield, VA)
- …Google GKE, with demonstrate proficiency in the following container areas; cluster management, deployment and automation, monitoring and logging, security, ... Google GKE, with demonstrate proficiency in the following container areas; cluster management, deployment and automation, monitoring and logging, security,… more
- CACI International (Fort Bragg, NC)
- …experience. . Extensive experience with Kubernetes: You should be comfortable with cluster management, deployment strategies, and general Kubernetes concepts. . ... Platform System Engineer (Kubernetes) Job Category: Engineering Time Type: Full...systems. You'll work at the intersection of development and operations , focusing on automation and tooling to improve the… more
- Citigroup (Irving, TX)
- …that have completed the development stage and are running in the daily operations of the firm. + Manages, maintains and supports applications and their operating ... requirements. + Participate in application releases, from development, testing and deployment into production. + Engages in post implementation analysis to ensure… more
- Mastercard (O'Fallon, MO)
- …to design, build, implement, and support technology services. A business operations engineer will ensure operational criteria like system availability, ... Operations (BizOps) team is seeking a Business Operations Site Reliability Engineer (SRE). The role...capacity, performance, monitoring, self-healing, and deployment automation are implemented throughout the… more
- NVIDIA (Santa Clara, CA)
- …Artifactory, Jira) in hybrid on-premise and cloud environments. + Assist with cluster operations and system administration (managing: servers, team accounts, ... dedicated and motivated senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (Megatron-LM (https://github.com/NVIDIA/Megatron-LM) and NeMo… more
- NVIDIA (Santa Clara, CA)
- We are looking for Senior Software Development Engineer in Test (SDET) to join our New GPU Integration (NPI) team for NVIDIA's Enterprise Compute SWQA team. Are you ... to have your skills on the team! As an engineer on this New Platform GPU Integration team, you...tools to significantly enhance our testing capabilities and streamlining operations for more efficient and accurate results. + Improve… more
- CGI Technologies and Solutions, Inc. (Knoxville, TN)
- …ActiveIQ, and StorageGRID. Experience with racking, stacking, cabling, and hardware deployment for various NetApp storage cluster models and switches. ... **Storage Engineer ** **Category:** Infrastructure/Cloud **Main location:** United States, Tennessee, Knoxville **Position ID:** J1125-0619 **Employment Type:** Full… more