. About The Role We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents.... You'll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions...
agent actions against. You'll work to ensure each scenario is clearly defined, well‑scored, and easy to execute and reuse... like JSON or YAML for scenario description - Can define expected agent behaviors (gold paths) and scoring logic - Basic...
. This remote, part-time role involves creating test cases, analyzing agent behavior, and ensuring clear documentation. A Bachelor...
. This remote, part-time role involves creating test cases, analyzing agent behavior, and ensuring clear documentation. A Bachelor...