ClawProBench — Codex Skill by suyoumo

by suyoumo · Codex Skill · ★ 809

Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h

About ClawProBench

ClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. ClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and . Leaderboard Browse the public leaderboard and benchmark cases at suyoumo.github.io/bench. [](https://suyoumo.github.io

agentbenchmarkevaluationharnessleaderboardllmopenclaw

Quick Facts

Stars809
Forks52
LanguagePython
CategoryCodex Skill
LicenseApache-2.0
Quality Score69.238284222667/100
Last Updated2026-06-28
Created2025-03-02
Platformspython
Est. Tokens~15k

ClawProBench alternative? Top 6 similar tools

Looking for a ClawProBench alternative? If you're comparing ClawProBench with other codex skill tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

  • AI-Infra-Guard by Tencent · ⭐ 3.4k

    A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Sca

  • Awesome-LLM-Eval by onejune2018 · ⭐ 615

    Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, ma

  • claw-eval by claw-eval · ⭐ 568

    Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

  • memsearch by zilliztech · ⭐ 2.2k

    A persistent, unified memory layer for all your AI agents (e.g. Claude Code, Codex), backed by Markdown and Mi

  • trpc-agent-go by trpc-group · ⭐ 1.5k

    A Go framework for building production agent systems with graph workflows, tools, memory, A2A, AG-UI, MCP, eva

  • OpenJudge by agentscope-ai · ⭐ 673

    OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

More Codex Skill Tools

Explore other popular codex skill tools:

View all Codex Skill tools →

Popular Python Agent Tools

Frequently Asked Questions

What is ClawProBench?

ClawProBench is ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 809 GitHub stars.

What programming language is ClawProBench written in?

ClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.

How do I install or use ClawProBench?

You can find installation instructions and usage details in the ClawProBench GitHub repository at github.com/suyoumo/ClawProBench. The project has 809 stars and 52 forks, indicating an active community.

What license does ClawProBench use?

ClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

What are the best alternatives to ClawProBench?

The top alternatives to ClawProBench on Agent Skills Hub include AI-Infra-Guard, Awesome-LLM-Eval, claw-eval. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse Codex Skill tools