Job Description
A Singapore-based software firm specializing in cloud-native applications and digital solutions. The company focuses on building scalable, customized platforms and helping businesses drive digital transformation through modern technologies and agile delivery.
Key Responsibilities- Drive reliability initiatives to enhance system stability, scalability, and overall performance
- Build and optimize automation solutions, including deployment safeguards and self-recovery mechanisms
- Establish and manage SLOs/SLIs, ensuring they support business priorities and service expectations
- Take ownership of major incidents, lead resolution efforts, and implement long-term preventive fixes
- Coach and mentor junior team members while fostering close collaboration with engineering and DevOps teams
- Create and maintain clear operational documentation and runbooks for consistent knowledge sharing
- Support capacity planning, scal...
Apply for This Position
Ready to take the next step? Click the button below to submit your application.
Submit Application