Job Description

Job Description We’re looking for an experienced Machine Learning Platform Engineer who will bring focus and subject‑matter expertise around designing and implementing machine learning infrastructure and automation tools (MLOps and DevOps). This is a unique opportunity to grow in the world of machine learning infrastructure and work with a team of passionate individuals committed to the mission of bringing ML to enterprise. 
Responsibilities Deploy and operate the GenAI platform across OpenShift/Kubernetes. 
Manage large language model deployments (Cohere Command, Llama, Mistral) on GPU infrastructure (NVIDIA A100/H100) and configure RAG pipelines with serving frameworks like vLLM, NVIDIA NIM, and TensorRT‑LLM. 
Monitor GPU utilization, model performance metrics, and resource allocation across the platform. 
Implement observability stacks—Prometheus, Grafana, Pushgateway, and structured logging pipelines—to surface platf...
            

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application