Job Description
Job Overview
We are seeking a Site Reliability Engineer (SRE) to support large-scale, distributed, and fault-tolerant systems for a global technology environment. This role combines software engineering and systems operations to improve system reliability, scalability, automation, and performance.
What Will You Do:
- Design, build, and maintain scalable and highly available infrastructure systems.
- Develop automation tools and scripts to improve operational efficiency.
- Monitor system performance and troubleshoot infrastructure issues proactively.
- Implement monitoring, alerting, SLIs, SLOs, and SLA tracking.
- Participate in 24/7 on-call rotations and incident response activities.
- Conduct root cause analysis and support post-mortem reviews.
- Collaborate with engineering and cross-functional teams on system improvements.
- Ensure infrastructure security, ...
Apply for This Position
Ready to take the next step? Click the button below to submit your application.
Submit Application