Full-time Posted June 04, 2026
Apply Now

Job Description

Data Scientist

  • Design and implement end-to-end evaluation frameworks to assess performance, reliability, and safety of multi-agent AI systems
  • Lead experimentation and A/B testing efforts to systematically test hypotheses, validate model improvements, and track performance across agent iterations
  • Curate and maintain high-quality ground truth datasets to enable accurate, reproducible evaluation of multi-agent outputs
  • Identify and address reliability and accuracy gaps across agent workflows, failure modes, and edge cases in production-like environments
  • Stay current on emerging research in agentic AI, LLM evaluation, and multi-agent coordination to continuously improve framework design

Technical Skills

  • Proficiency in Python and ML frameworks
  • Hands-on experience with LLM APIs and agentic frameworks (LangChain, LlamaIndex, Semetic KernalI)
  • Familiarity with evaluation tooling (Ragas, DeepEval, L...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application