As a senior site reliability engineer will work in our global organization to provide operational support... and collaborate on technical developments. About You: Experienced Site Reliability Engineer with 6+ years of experience in DevOps...
services.You will blend software engineering and systems engineering to build and run large-scale, fault-tolerant, distributed... systems—focusing on performance, capacity, availability, and security. You’ll own service reliability across the stack...
: Lead resolution of high-severity/complex incidents across hybrid infrastructure. Architect and implement automation... frameworks, self-healing workflows, and AI-driven ops. Define SRE best practices, reliability SLIs/SLOs/SLAs, and operational...
. Mandatory Skills & Experience 7+ years of experience in Site Reliability Engineering (SRE). 3+ years of hands-on experience... and development Lead the design, deployment, and operations of large-scale systems on AWS (EKS) or Azure (AKS), ensuring reliability...
. Mandatory Skills & Experience 7+ years of experience in Site Reliability Engineering (SRE). 3+ years of hands-on experience... and development Lead the design, deployment, and operations of large-scale systems on AWS (EKS) or Azure (AKS), ensuring reliability...
+ overall years of experience in Site Reliability Engineering, DevOps, or a similar role, with at least 5 years in a leadership..., develop, and inspire a team of Site Reliability Engineers, fostering a strong culture of collaboration, ownership...
is building a next-generation Site Reliability Engineering team, and we're looking for talented, motivated engineers who thrive... excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the reliability and performance...
Job Summary We are seeking highly motivated and detail-oriented Site Reliability Engineers (SRE/Sr SRE..., or equivalent practical experience. 2–4 years of experience in SRE, L3 Technical Support, Reliability Engineering, or similar hands...
field. 7+ years of experience in site reliability engineering, infrastructure engineering, or a similar role. Proven.... Collaborate with engineering teams to ensure new products and features are designed with reliability and scalability in mind...
Reliability and Platform Engineering team is at the heart of building scalable, distributed, and fault-tolerant systems.... We integrate Software Engineering and Systems Engineering to drive exceptional system performance, capacity, and reliability...
, focusing on service excellence and live site reliability for AI workloads. - Research & Innovation: Stay informed on emerging...- Reliability: Ensure the reliability, scalability, and security of AI infrastructure supporting HPC & AI workloads...
. Have you got what it takes? Must have 5+ years of experience in Site Reliability Engineering Excellent technical, analytical... Reliability team to ensure we continue to offer exemplary service to our customers. Our Site Reliability team is responsible...
field. 7+ years of experience in site reliability engineering, infrastructure engineering, or a similar role. * Proven.... Collaborate with engineering teams to ensure new products and features are designed with reliability and scalability in mind...
contributor to the design, automation, and reliability of our cloud infrastructure. You will lead efforts to build and maintain.... Containerize applications using Docker and deploy them to AKS. Monitor system performance and reliability using Grafana...
,you will be a key member of the CFL Platform Engineering and Operations team ,you will lead reliability engineering for AI-powered... for latency, throughput, and availability Lead high-severity incident response, root cause analysis, and system recovery...
Career Category Engineering Job Description Position Overview The GCF5 Track Lead is the senior technical leader...) Enablement. They define and socialize standards and patterns, lead multi‑team delivery, and mentor GCF4 engineers. They translate...
, and best practices that raise reliability across hundreds of services and 30+ Kubernetes clusters. Lead global incident management... and error budgets with product and engineering teams; coach teams on using error budgets for release decisions and reliability...
/ Production Engineering / Platform Engineering (reliability-focused) Strong Go (mandatory): ability to read, debug, and ship... Why join Build a new function with real impact on reliability and engineering culture Work across the full production surface...
. This team is focused on influencing and scaling global incident management capabilities, actively empowering engineering teams... continuously develops processes, culture, and our collective system reliability. Your Impact As an SRE with an Incident...
Based Testing) eligible flows, develop CFBT tests and train the team on how to write and maintain them. Lead post.... Operations and Design Consultation for driving high reliability. Emergency Incident Response with action-oriented postmortem/RCA...