Job Description

L1 Site Reliability Engineer responsible for monitoring, triaging, and executing standard operational tasks across enterprise applications 
Supports Kubernetes, APIs, WAF, databases, API gateways (Gloo, Apigee), Kafka, and multi-cloud environments (AWS/Azure/GCP) 
First line of defense for incident detection, troubleshooting, and escalation using runbooks and automation 
Key Responsibilities Monitoring & Infrastructure Monitor systems using Grafana, Datadog, Splunk, Prometheus, and AIOps tools 
Detect anomalies and follow alert workflows for resolution or escalation 
Validate Kubernetes issues using monitoring dashboards and logs 
Runbook Execution Follow predefined runbooks for incident resolution 
Restart services, validate system health, and elevate when procedures fail 
Ensure adherence to operational standards 
Perform initial incident triage and severity classificatio...
            

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application

Site Reliability Engineer (Winnipeg)

Job Description

Key Responsibilities

Monitoring & Infrastructure

Runbook Execution

Apply for This Position