Full-time Posted June 09, 2026
Apply Now

Job Description

Responsibilities

  • Navigate, troubleshoot, and recover dynamic infrastructure and long-running processes in real-time using command-line tools.
  • Master and manage highly containerized environments, including orchestrating Dockerized sandboxes and CI/CD workflows.
  • Build, maintain, and optimize systems for AI model training and high-throughput compute environments.
  • Respond swiftly to system errors, executing dynamic mid-operation replanning and recovery.
  • Collaborate with engineering and AI teams to ensure seamless integration, reliability, and performance.
  • Document system architectures, incident responses, and recovery protocols with meticulous clarity.

Requirements

  • Have demonstrated expert proficiency working in terminal environments for system builds, server administration, and infrastructure management.
  • Possess advanced problem-solving skills for multi-step troubleshooting, f...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application