Virtual Paper Review – AI Agent Benchmarks
For our first paper review of 2026, we will have Tom Plunkett lead us through papers that define benchmarks used to evaluate Agentic AI. This will be an hour long deep dive into an Agentic AI benchmark, the Tau benchmark. We’ll start with the 2024 Tau Benchmark paper, then cover the 2025 Tau2 benchmark paper. […]
