by agentscope-ai · Codex Skill · ★ 72
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
🐾 PawBench English · 简体中文 A Model × Harness co-evaluation benchmark for agentic AI. 150 agent tasks · 9 models · 3 harnesses · task slices · diagnostic traces The same model can behave very differently once it is placed inside a real agent runtime. A failure may come from model reasoning, missing tools, weak skill discovery, poor workspace awareness, brittle web access, or a completion check that is too loose. A single final pass rate cannot separate these causes. PawBench is built around one c
| Stars | 72 |
| Forks | 5 |
| Language | Python |
| Category | Codex Skill |
| License | Apache-2.0 |
| Quality Score | 66.7888531271658/100 |
| Open Issues | 2 |
| Last Updated | 2026-06-25 |
| Created | 2026-05-15 |
| Platforms | python |
| Est. Tokens | ~16k |
These tools work well together with PawBench for enhanced workflows:
Looking for a PawBench alternative? If you're comparing PawBench with other codex skill tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
The Runtime Security Layer for OpenClaw/Hermes-agent, the essential safety harness for PII & sensitive credent
Open Python agent harness for production AI apps: tools, MCP, memory, workspace, telemetry, subagents, backgro
A banchmark list for evaluation of large language models.
Traffic light for AI Agents and TypeScript/Node multi-agent orchestrator with shared state, guardrails, and ad
LLM Benchmark for Throughput via Ollama (Local LLMs)
Kindly Web Search MCP Server: Web search + robust content retrieval for AI coding tools (Claude Code, Codex, C
Explore other popular codex skill tools:
PawBench is A benchmark for evaluating LLM × harness performance.. It is categorized as a Codex Skill with 72 GitHub stars.
PawBench is primarily written in Python. It covers topics such as agent, benchmark, harness.
You can find installation instructions and usage details in the PawBench GitHub repository at github.com/agentscope-ai/PawBench. The project has 72 stars and 5 forks, indicating an active community.
PawBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to PawBench on Agent Skills Hub include clawshell, omnicoreagent, LLM-Agent-Benchmark-List. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.