|
For our first paper review of 2026, we will have Tom Plunkett lead us through papers that define benchmarks used to evaluate Agentic AI.
This will be an hour long deep dive into an Agentic AI benchmark, the Tau benchmark. We’ll start with the 2024 Tau Benchmark paper, then cover the 2025 Tau2 benchmark paper. Finally, we’ll take a look at the tau2 bench Github and using the tau2 benchmark with example agents from the Retail, Telecom, and Airline domains .
|