by InternLM · Codex Skill · ★ 301
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
WildClawBench []() []() Hard, practical, end-to-end evaluation for AI agents — in the wild. WildClawBench is an agent benchmark that tests what actually matters: can an AI agent do real work, end-to-end, without hand-holding? We drop agents into a live OpenClaw environment — the same open-source personal AI assistant that real users rely on daily — and throw 60 original tasks at them: clipping goal highlights from a football match, negotiating meeting times over multi-round emails, hunting down contradictions in search results, writing inference scripts for undocumented codebases, catching privacy leaks before they happen. Useful things. Hard things. Hard enough that every frontier model we tested scores below 0.55 (top overall 0.52). That makes scores mean something. Why WildClawBench? Most agent benchmarks test isolated capabilities — calling a function, parsing JSON, following a sing
| Stars | 301 |
| Forks | 15 |
| Language | Python |
| Category | Codex Skill |
| License | MIT |
| Quality Score | 58.358/100 |
| Last Updated | 2026-04-21 |
| Created | 2026-03-23 |
| Platforms | python |
| Est. Tokens | ~679k |
These tools work well together with WildClawBench for enhanced workflows:
Looking for a WildClawBench alternative? If you're comparing WildClawBench with other codex skill tools, these 1 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
TUI and CLI for browsing AI models, benchmarks, coding agents, and statuses for AI providers.
Explore other popular codex skill tools:
WildClawBench is An in-the-wild benchmark for AI agents in the OpenClaw Environment.. It is categorized as a Codex Skill with 301 GitHub stars.
WildClawBench is primarily written in Python. It covers topics such as agentic-ai, agentic-evaluation, agents.
You can find installation instructions and usage details in the WildClawBench GitHub repository at github.com/InternLM/WildClawBench. The project has 301 stars and 15 forks, indicating an active community.
WildClawBench is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to WildClawBench on Agent Skills Hub include models. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.