This event has passed.

Virtual Paper Review – AI Agent Benchmarks

Name: Virtual Paper Review – AI Agent Benchmarks
Start: 2026-01-14T18:00:00-06:00
End: 2026-01-14T19:30:00-06:00

January 14 @ 6:00 pm – 7:30 pm

For our first paper review of 2026, we will have Tom Plunkett lead us through papers that define benchmarks used to evaluate Agentic AI.

This will be an hour long deep dive into an Agentic AI benchmark, the Tau benchmark. We’ll start with the 2024 Tau Benchmark paper, then cover the 2025 Tau2 benchmark paper. Finally, we’ll take a look at the tau2 bench Github and using the tau2 benchmark with example agents from the Retail, Telecom, and Airline domains .

Links:

τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains: Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan https://arxiv.org/abs/2406.12045
τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment: Victor Barres, Honghua Dong, Soham Ray, Xujie Si, Karthik Narasimhan https://arxiv.org/abs/2506.07982

Details:

Date – 01/14/2026
Time – 6-7:30pm
Location – VIRTUAL
Google Meet – https://meet.google.com/jfo-rrvd-aqv
Phone: ‪(US) +1 402-864-0236‬ PIN: ‪332 674 634‬#

Details

Date:: January 14
Time:: 6:00 pm – 7:30 pm