Job Description
**Responsibilities**:
- Identify technical and process gaps to implement improvements that increase operational reliability and operational efficiency, as well as promote stability through automation
- Support build and configuration of Kubernetes clusters, setting up monitoring framework
- Help teams perform post-incident reviews to eliminate the possibility of reoccurrence
- Help to meet performance and stability requirements by working with the team to implement load tests, tracing, monitoring, etc.
- Manage and maintain the release pipelines, help with manual and automated deployments.
- Perform regular security monitoring to identify any possible intrusions
- Perform/Validate scheduled backup operations, ensuring all required file systems and system data are successfully backed up to the appropriate media.
- Create/Manage (Change and Delete) user accounts as needed.
- Repair and recover from hardware or software failures as needed.
Coordi...
Apply for This Position
Ready to take the next step? Click the button below to submit your application.
Submit Application