OpenClawProBench — Codex Skill by suyoumo

Last updated: 2026-04-11 · Indexed by AgentSkillsHub · Auto-synced every 8h

About OpenClawProBench

OpenClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. OpenClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and . Leaderboard Browse the public leaderboard and benchmark cases at suyoumo.github.io/bench. [](https://suyoumo.gi

agent benchmark evaluation harness leaderboard llm openclaw

Quick Facts

Stars	340
Forks	26
Language	Python
Category	Codex Skill
License	Apache-2.0
Quality Score	53.296/100
Last Updated	2026-04-11
Created	2025-03-02
Platforms	python
Est. Tokens	~104k

Compatible Skills

These tools work well together with OpenClawProBench for enhanced workflows:

claw-eval — semantic(0.49)+complementary+rare_topics+same_lang+similar_pop+shared_platform (67%)
tau2-bench — semantic(0.24)+complementary+rare_topics+same_lang+similar_pop+shared_platform (63%)
MCPBench — semantic(0.31)+complementary+rare_topics+same_lang+similar_pop+shared_platform (60%)
ollama-benchmark — semantic(0.28)+complementary+rare_topics+same_lang+similar_pop+shared_platform (59%)
WildClawBench — semantic(0.38)+complementary+same_lang+similar_pop+shared_platform (58%)

OpenClawProBench alternative? Top 6 similar tools

Looking for a OpenClawProBench alternative? If you're comparing OpenClawProBench with other codex skill tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

ResearchClawBench by InternScience · ⭐ 107
🦞 ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
Awesome-LLM-Eval by onejune2018 · ⭐ 615
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, ma
claw-eval by claw-eval · ⭐ 514
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
hope-agent by shiwenwen · ⭐ 318
会记忆、能成长的随身 AI 助手 · 桌面 / 云端 / IM 随叫随到，手机远程也能连 | Personal AI that remembers and grows — lives on desktop, self-h
memsearch by zilliztech · ⭐ 1.6k
A persistent, unified memory layer for all your AI agents (e.g. Claude Code, Codex), backed by Markdown and Mi
AutoR by AutoX-AI-Labs · ⭐ 633
AI handles execution, humans own the direction, and every run becomes an inspectable research artifact on disk

More Codex Skill Tools

Explore other popular codex skill tools:

openclaw ⭐ 368.6k
hermes-agent ⭐ 133.8k
ui-ux-pro-max-skill ⭐ 57.6k
awesome-openclaw-skills ⭐ 48.0k
cherry-studio ⭐ 45.0k
siyuan ⭐ 43.6k
graphify ⭐ 43.1k
nanobot ⭐ 41.7k
airi ⭐ 39.0k
1Panel ⭐ 35.2k

View all Codex Skill tools →

Popular Python Agent Tools

AutoGPT ⭐ 184.0k · Agent Tool
langflow ⭐ 147.7k · Agent Tool
langchain ⭐ 135.8k · Agent Tool
open-webui ⭐ 135.5k · MCP Server
hermes-agent ⭐ 133.8k · Codex Skill

Frequently Asked Questions

What is OpenClawProBench?

OpenClawProBench is OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 340 GitHub stars.

What programming language is OpenClawProBench written in?

OpenClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.

How do I install or use OpenClawProBench?

You can find installation instructions and usage details in the OpenClawProBench GitHub repository at github.com/suyoumo/OpenClawProBench. The project has 340 stars and 26 forks, indicating an active community.

What license does OpenClawProBench use?

OpenClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

What are the best alternatives to OpenClawProBench?

The top alternatives to OpenClawProBench on Agent Skills Hub include ResearchClawBench, Awesome-LLM-Eval, claw-eval. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse Codex Skill tools