Full-time Posted June 13, 2026
Apply Now

Job Description

Key Responsibilities
Design and develop compute cluster architectures optimized for performance, reliability, scalability, and serviceability within KLA systems. Define and validate server hardware configurations, including CPUs, GPUs, memory subsystems, storage, networking, and specialized accelerators. Analyze and optimize system-level performance across hardware and software layers, including CPU/GPU utilization, memory bandwidth, PCIe topology, NUMA architecture, and I/O performance. Collaborate with hardware, software, firmware, and systems engineering teams to ensure seamless integration of compute clusters into broader system architectures. Support server bring-up, hardware integration, diagnostics, benchmarking, stress testing, and root-cause analysis activities. Manage and troubleshoot enterprise server platforms, including BIOS/firmware configuration, BMC/IPMI management, thermal and power optimization, and hardware health monitoring. Participate in architecture reviews, i...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application