Full-time Posted June 10, 2026
Apply Now

Job Description

Responsibilities

  • Design and enhance Kubernetes provider platforms and supporting infrastructure to improve scalability, reliability, and developer experience.

  • Automate and simplify Kubernetes clusters lifecycle management, upgrades, and observability workflows.

  • Implement monitoring and alerting systems using tools such as Prometheus, Grafana, or Elastic Observability to meet service-level objective (SLOs).

  • Collaborate with security teams to integrate and enforce security controls and compliance requirements within the container platform.

  • Work with application teams to improve platform usability, streamline onboarding, and reduce operational toil.

  • Respond to incidents and perform post-incident reviews, driving continuous improvement and operational excellence.

  • Contribute to the reliability engineering culture, fostering shared responsibility for system availability and performance.

  • Apply for This Position

    Ready to take the next step? Click the button below to submit your application.

    Submit Application