Job Description: Design and implement agent evaluation pipelines that benchmark AI capabilities across real-world enterprise use cases Build domain-specific benchmarks for product support, engineering ops, GTM insights, and other vertic...
Design and implement agent evaluation pipelines that benchmark AI capabilities across real-world enterprise use cases Build domain-specific benchmarks for product support, engineering ops, GTM insights, and other verticals relevant to mode...