Job Description

Job Description   
Key Responsibilities:  
Incident Management and Reliability:  Lead the incident management process, ensuring high availability and performance of the applications. Develop and implement SRE practices to improve system reliability and resilience. 
Monitoring and Observability:  Utilize Dynatrace, Splunk, and Grafana to monitor system health, detect anomalies, and provide actionable insights for performance optimization. 
Root Cause Analysis:  Conduct thorough root cause analysis of incidents and outages, developing long-term solutions to prevent recurrence. 
DevOps Practices:  Collaborate with development and operations teams to streamline CI/CD pipelines, automate workflows, and implement infrastructure as code (IaC) for efficient service deployment and management. 
Networking Expertise:  Pro...
            

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application

Site Reliability Engineer

Job Description

Apply for This Position