Full-time Posted June 05, 2026
Apply Now

Job Description

About our client A technology group establishing a new AI Centre of Excellence in Singapore is looking for an engineer to own the distributed training infrastructure for large-scale AIGC model development. What you'll work on Design and build distributed training toolchains supporting ultra-large-scale model training Optimise across compute, communication, and storage layers Diagnose and resolve training bottlenecks improve stability and throughput Track and apply frontier distributed training techniques end-to-end What we're looking for Master's or above in CS or related field 2+ years of relevant experience Deep hands-on experience with distributed training paradigms: Data / Pipeline / Tensor / Expert Parallelism Proficient in PyTorch, DeepSpeed, Megatron-LM Familiar with GPU architecture and CUDA programming experience in CUDA kernel development and NCCL/cuDNN Understanding of AIGC pre-training, Transformer architectures, and Diffusion models (Stable Diffusion, Flux)
About Us Dad...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application