Full-time Posted June 05, 2026
Apply Now

Job Description

Become a key player as a L1 Site Reliability Engineer, focusing on operational tasks across enterprise applications. Your expertise in Kubernetes, APIs, and multi-cloud environments is essential for incident management and resolution.

In this role, you will handle monitoring, triaging, and executing crucial tasks using advanced tools like Grafana and Datadog. With 2-5 years in IT operations or DevOps, you’ll support automation and improve incident response processes while ensuring systems are healthy and operational standards are met.

Key Responsibilities: • Monitor systems with Grafana and Datadog for anomalies • Execute predefined runbooks for incident resolution • Collect logs and system data for analysis • Troubleshoot issues using kubectl and automation scripts • Document incident resolution steps and improvements

Requirements: • 2-5 years in IT operations or SRE roles • Proficient in Linux and Kubernetes fundamentals • Familiarity with AWS, Azure, or GC...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application