Full-time Posted June 05, 2026
Apply Now

Job Description

Become a crucial part of L1 Site Reliability Engineering focused on monitoring and automating operational tasks across enterprise applications. Leverage your skills with Kubernetes, APIs, and multi-cloud environments to ensure seamless performance. This L1 Site Reliability Engineer role demands up to five years in IT operations, NOC, or SRE roles.

You will be involved in monitoring systems using Grafana, Splunk, and Prometheus, while also triaging incidents and following standard runbooks for rapid resolution. Your expertise in automation with Python or Bash will streamline processes, enhancing operational workflow.

Key Responsibilities

  • Monitor systems with Grafana, Datadog, and AIOps tools
  • Execute predefined runbooks for quick incident resolution
  • Validate Kubernetes performance using dashboard metrics
  • Collect and analyze logs for proactive issue detection
  • Communicate effectively with stakeholders throughout incident...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application