Job Description

The Senior Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, performance, and operability of production systems across our platforms, by applying software engineering practices to operations, with a focus on automation, observability, and incident response.

Responsibilities: 

Own and improve the reliability, availability, and performance of production services in Google Cloud (GCP).

Participate in incident management, including detection, triage, mitigation, escalation, and recovery.

Use and improve incident workflows and tooling (e.g., ServiceNow) to ensure clear ownership and timely communication.

Design, implement, and operate observability solutions including monitoring, logging, tracing, synthetics, and dashboards (e.g., Splunk Observability, OpenTelemetry).

Reduce operational toil through automation and engineering-led solutions, proactively introducing a...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application

Senior Site Reliability Engineer (R-19383)

Job Description

Apply for This Position