Full-time Posted June 04, 2026
Apply Now

Job Description

ROLESUMMARY

The HPC Operations Support Engineer is responsible for providing day‑to‑day operational support for HPC environments, ensuring system stability, availability, and SLA adherence in production environments.

KEYRESPONSIBILITIES
  • Monitor HPC cluster health, job execution, and resource utilization.
  • Provide L1/L2 user support for HPC incidents and service requests.
  • Perform routine operational checks, maintenance, and housekeeping tasks.
  • Support system upgrades, patches, and scheduled maintenance activities.
  • Manage HPC user onboarding, access, and quota administration.
  • Escalate complex issues to HPC Engineers as required.
  • Maintain operational documentation, SOPs, and runbooks.
  • Ensure adherence to SLAs and operational procedures.
TECHNICALSKILLS & TOOLS
  • Linux administration (basic to intermediate)
  • Scheduler Operations: Slurm
  • Monitor...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application