by sierra-research · LLM Plugin · ★ 1.1k
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
$\tau$-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains 🚀 τ³-bench is here! From text-only to multimodal, knowledge-aware agent evaluation. Voice full-duplex · Knowledge ret
| Stars | 1,091 |
| Forks | 278 |
| Language | Python |
| Category | LLM Plugin |
| License | MIT |
| Quality Score | 58.678/100 |
| Open Issues | 100 |
| Last Updated | 2026-04-30 |
| Created | 2025-06-09 |
| Platforms | python |
| Est. Tokens | ~20018k |
These tools work well together with tau2-bench for enhanced workflows:
Looking for a tau2-bench alternative? If you're comparing tau2-bench with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Sca
LLM Benchmark for Throughput via Ollama (Local LLMs)
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, ma
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with determ
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with de
Explore other popular llm plugin tools:
tau2-bench is τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. It is categorized as a LLM Plugin with 1.1k GitHub stars.
tau2-bench is primarily written in Python. It covers topics such as ai, benchmark, conversational-agents.
You can find installation instructions and usage details in the tau2-bench GitHub repository at github.com/sierra-research/tau2-bench. The project has 1.1k stars and 278 forks, indicating an active community.
tau2-bench is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to tau2-bench on Agent Skills Hub include AI-Infra-Guard, ollama-benchmark, Awesome-LLM-Eval. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.