Job Description

                Become a key player as a L1 Site Reliability Engineer, focusing on operational tasks across enterprise applications. Your expertise in Kubernetes, APIs, and multi-cloud environments is essential for incident management and resolution.

In this role, you will handle monitoring, triaging, and executing crucial tasks using advanced tools like Grafana and Datadog. With 2-5 years in IT operations or DevOps, you’ll support automation and improve incident response processes while ensuring systems are healthy and operational standards are met.

Key Responsibilities: • Monitor systems with Grafana and Datadog for anomalies • Execute predefined runbooks for incident resolution • Collect logs and system data for analysis • Troubleshoot issues using kubectl and automation scripts • Document incident resolution steps and improvements

Requirements: • 2-5 years in IT operations or SRE roles • Proficient in Linux and Kubernetes fundamentals • Familiarity with AWS, Azure, or GC...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application

L1 Site Reliability Engineer Role

Job Description

Apply for This Position