Full-time Posted June 28, 2026
Apply Now

Job Description

Responsibilities:


  • Design, implement, and evolve internal platform capabilities that make AI Efficiency services easier to build, ship, observe, secure, and operate

  • Build and maintain self-service workflows, reusable platform abstractions, and golden paths that improve developer productivity while preserving reliability, security, and governance

  • Improve platform reliability through better monitoring, alerting, observability, deployment safety, release practices, and incident readiness

  • Define and operationalize service health indicators, SLIs, SLOs, and related reliability metrics that help teams make informed tradeoffs between reliability, velocity, and cost

  • Build automation that reduces operational toil and improves mean time to detect, respond, and recover from incidents

  • Partner with engineers throughout the software development lifecycle to embed operability, production readiness, and maintainabil...
  • Apply for This Position

    Ready to take the next step? Click the button below to submit your application.

    Submit Application