Job Description

Become a crucial part of L1 Site Reliability Engineering focused on monitoring and automating operational tasks across enterprise applications. Leverage your skills with Kubernetes, APIs, and multi-cloud environments to ensure seamless performance. This L1 Site Reliability Engineer role demands up to five years in IT operations, NOC, or SRE roles. 
You will be involved in monitoring systems using Grafana, Splunk, and Prometheus, while also triaging incidents and following standard runbooks for rapid resolution. Your expertise in automation with Python or Bash will streamline processes, enhancing operational workflow. 
Key Responsibilities Monitor systems with Grafana, Datadog, and AIOps tools 
Execute predefined runbooks for quick incident resolution 
Validate Kubernetes performance using dashboard metrics 
Collect and analyze logs for proactive issue detection 
Communicate effectively with stakeholders throughout incident...
            

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application

Site Reliability Engineer H/F (IT) (Winnipeg)

Job Description

Key Responsibilities

Apply for This Position