by sierra-research · LLM Plugin · ★ 1.3k
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
$\tau$-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains 🚀 τ³-bench is here! From text-only to multimodal, knowledge-aware agent evaluation. Voice full-duplex · Knowledge ret
| Stars | 1,333 |
| Forks | 343 |
| Language | Python |
| Category | LLM Plugin |
| License | MIT |
| Quality Score | 58.678/100 |
| Open Issues | 120 |
| Last Updated | 2026-06-10 |
| Created | 2025-06-09 |
| Platforms | python |
| Est. Tokens | ~22491k |
These tools work well together with tau2-bench for enhanced workflows:
Looking for a tau2-bench alternative? If you're comparing tau2-bench with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Sca
LLM Benchmark for Throughput via Ollama (Local LLMs)
Klavis AI: MCP integration platforms that let AI agents use tools reliably at any scale
Adversary simulation and Red teaming platform with AI
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NP
A Model Context Protocol server for Excel file manipulation
Explore other popular llm plugin tools:
tau2-bench is τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. It is categorized as a LLM Plugin with 1.3k GitHub stars.
tau2-bench is primarily written in Python. It covers topics such as ai, benchmark, conversational-agents.
You can find installation instructions and usage details in the tau2-bench GitHub repository at github.com/sierra-research/tau2-bench. The project has 1.3k stars and 343 forks, indicating an active community.
tau2-bench is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to tau2-bench on Agent Skills Hub include AI-Infra-Guard, ollama-benchmark, klavis. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.