Job Description

Who We're Looking For A year ago, reliably working agentic systems and sub‑second multimodal inference at scale barely existed. Nobody has a decade of experience here. So we're not screening for a resume template — we're looking for strong people from varied backgrounds who learn fast, thrive in ambiguity, and can show us what they've built, broken, and understood. 
Experience We Find Useful Inference Optimization.  Deep understanding of modern serving frameworks and techniques like vLLM or TRT‑LLM. 
Model Acceleration . Hands‑on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding. 
High‑Performance Systems.  Proficiency in C++, CUDA, Rust, or highly optimised Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs. 
Distributed Systems & Scaling.  Experi...
            

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application

Staff / Principal Machine Learning Engineer, Serving - Switzerland

Job Description

Who We're Looking For

Experience We Find Useful

Apply for This Position