by suyoumo · Codex Skill · ★ 576
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
ClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. ClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and . Leaderboard Browse the public leaderboard and benchmark cases at suyoumo.github.io/bench. [](https://suyoumo.github.io
| Stars | 576 |
| Forks | 49 |
| Language | Python |
| Category | Codex Skill |
| License | Apache-2.0 |
| Quality Score | 53.296/100 |
| Last Updated | 2026-04-30 |
| Created | 2025-03-02 |
| Platforms | python |
| Est. Tokens | ~199k |
Looking for a ClawProBench alternative? If you're comparing ClawProBench with other codex skill tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, ma
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
会记忆、能成长的随身 AI 助手 · 桌面 / 云端 / IM 随叫随到,手机远程也能连 | Personal AI that remembers and grows — lives on desktop, self-h
A persistent, unified memory layer for all your AI agents (e.g. Claude Code, Codex), backed by Markdown and Mi
AI handles execution, humans own the direction, and every run becomes an inspectable research artifact on disk
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards
Explore other popular codex skill tools:
ClawProBench is ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 576 GitHub stars.
ClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.
You can find installation instructions and usage details in the ClawProBench GitHub repository at github.com/suyoumo/ClawProBench. The project has 576 stars and 49 forks, indicating an active community.
ClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to ClawProBench on Agent Skills Hub include Awesome-LLM-Eval, claw-eval, hope-agent. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.