Full-time Posted June 19, 2026
Apply Now

Job Description

Position: SwarmBench Task Engineer Knowledge / Research Type: Short-Term Contract (4 weeks) Compensation: $15 per hour Location: Remote Commitment: 8 hours per day with 4 hours overlap with PST Role Responsibilities Build multi-agent benchmark tasks requiring deep reading, analysis, and synthesis of large document collections Curate real-world research datasets (academic papers, case studies, technical reports) for AI evaluation Design complex research-driven questions requiring cross-document reasoning and synthesis Create structured ground-truth outputs (JSON) with precise, verifiable answers Develop LLM judge prompts to evaluate outputs against defined schemas and oracles Design decomposition strategies to split research tasks across multiple parallel agents Analyze model outputs and ensure correctness, completeness, and factual grounding Work with agentic frameworks and evaluation pipelines for AI benchmarking Requirements Strong experience in research (academic or industry) across...

Apply for This Position

Ready to take the next step? Click the button below to submit your application.

Submit Application