Job Description
Key Responsibilities
Cross-Cluster Standardization: define and enforce incident management practices, standardize alerting, monitoring, and request handling, align workflows across ServiceNow and Jira, ensure consistency across all clusters.
Reliability Engineering: define SLO, SLA, MTTR, MTRS standards, identify systemic reliability gaps, drive incident reduction and prevention strategies, establish reliability as a measurable discipline.
Automation Strategy: identify cross-cluster automation opportunities, define reusable automation patterns and frameworks, eliminate duplicated operational solutions, drive reduction of manual toil.
Architecture Alignment: partner with Solution Architects across clusters, ensure operability built into system design, align monitoring, alerting, and failover strategies, prevent conflicting tooling or architectural decisions.
Governance and Reviews: lead cross-cluster SRE reviews, track adoption of standards, dr...
Apply for This Position
Ready to take the next step? Click the button below to submit your application.
Submit Application