tau2-bench — LLM Plugin by sierra-research

by sierra-research · LLM Plugin · ★ 1.1k

Last updated: 2026-04-30 · Indexed by AgentSkillsHub · Auto-synced every 8h

About tau2-bench

$\tau$-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains 🚀 τ³-bench is here! From text-only to multimodal, knowledge-aware agent evaluation. Voice full-duplex · Knowledge ret

ai benchmark conversational-agents language-model-agent llm

Quick Facts

Stars	1,091
Forks	278
Language	Python
Category	LLM Plugin
License	MIT
Quality Score	58.678/100
Open Issues	100
Last Updated	2026-04-30
Created	2025-06-09
Platforms	python
Est. Tokens	~20018k

Compatible Skills

These tools work well together with tau2-bench for enhanced workflows:

mcpmark — semantic(0.42)+complementary+rare_topics+same_lang+similar_pop+shared_platform (64%)
MCPBench — semantic(0.27)+complementary+rare_topics+same_lang+similar_pop+shared_platform (59%)
MLLM-Tool — semantic(0.35)+complementary+same_lang+similar_pop+shared_platform (57%)
Toucan — semantic(0.23)+complementary+same_lang+similar_pop+shared_platform (53%)
mcp-bench — semantic(0.20)+complementary+same_lang+similar_pop+shared_platform (52%)

tau2-bench alternative? Top 6 similar tools

Looking for a tau2-bench alternative? If you're comparing tau2-bench with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

AI-Infra-Guard by Tencent · ⭐ 3.4k
A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Sca
ollama-benchmark by aidatatools · ⭐ 345
LLM Benchmark for Throughput via Ollama (Local LLMs)
Awesome-LLM-Eval by onejune2018 · ⭐ 615
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, ma
ClawProBench by suyoumo · ⭐ 576
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with determ
bigcodebench by bigcode-project · ⭐ 485
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
OpenClawProBench by suyoumo · ⭐ 340
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with de

More LLM Plugin Tools

Explore other popular llm plugin tools:

stagehand ⭐ 22.5k
promptfoo ⭐ 20.9k
gorilla ⭐ 12.8k
llm ⭐ 11.8k
llm-engineer-toolkit ⭐ 10.1k
phoenix ⭐ 9.5k
code2prompt ⭐ 7.3k
superagent ⭐ 6.5k
DecryptPrompt ⭐ 3.4k
ax ⭐ 2.6k

View all LLM Plugin tools →

Popular Python Agent Tools

AutoGPT ⭐ 184.0k · Agent Tool
langflow ⭐ 147.7k · Agent Tool
langchain ⭐ 135.8k · Agent Tool
open-webui ⭐ 135.5k · MCP Server
hermes-agent ⭐ 133.8k · Codex Skill

Frequently Asked Questions

What is tau2-bench?

tau2-bench is τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. It is categorized as a LLM Plugin with 1.1k GitHub stars.

What programming language is tau2-bench written in?

tau2-bench is primarily written in Python. It covers topics such as ai, benchmark, conversational-agents.

How do I install or use tau2-bench?

You can find installation instructions and usage details in the tau2-bench GitHub repository at github.com/sierra-research/tau2-bench. The project has 1.1k stars and 278 forks, indicating an active community.

What license does tau2-bench use?

tau2-bench is released under the MIT license, making it free to use and modify according to the license terms.

What are the best alternatives to tau2-bench?

The top alternatives to tau2-bench on Agent Skills Hub include AI-Infra-Guard, ollama-benchmark, Awesome-LLM-Eval. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse LLM Plugin tools