for scenario description. Can define expected agent behaviors (gold paths) and scoring logic. Basic experience with Python and JS... looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate...