WildClawBench — Codex Skill by InternLM

by InternLM · Codex Skill · ★ 452

Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h

About WildClawBench

WildClawBench []() []() Hard, practical, end-to-end evaluation for AI agents — in the wild. WildClawBench is an agent benchmark that tests what actually matters: can an AI agent do real work, end-to-end, without hand-holding? We drop agents into a live OpenClaw environment — the same open-source personal AI assistant that real users rely on daily — and throw 60 original tasks at them: clipping goal highlights from a football match, negotiating meeting times over multi-round emails, hunting down contradictions in search results, writing inference scripts for undocumented codebases, catching privacy leaks before they happen. Useful things. Hard things. Hard enough that every frontier model we tested scores below 0.55 (top overall 0.52). That makes scores mean something. Why WildClawBench? Most agent benchmarks test isolated capabilities — calling a function, parsing JSON, following a sing

agentic-aiagentic-evaluationagentsbenchmarksopenclaw

Quick Facts

Stars452
Forks45
LanguagePython
CategoryCodex Skill
LicenseMIT
Quality Score71.2414881444168/100
Open Issues5
Last Updated2026-06-25
Created2026-03-23
Platformspython
Est. Tokens~18k

Compatible Skills

These tools work well together with WildClawBench for enhanced workflows:

  • team-tasks — semantic(0.16)+complementary+same_lang+similar_pop+shared_platform (56%)
  • AEnvironment — semantic(0.30)+complementary+same_lang+similar_pop+shared_platform (56%)
  • get-physics-done — semantic(0.16)+complementary+same_lang+similar_pop+shared_platform (55%)
  • agent-builder — semantic(0.15)+complementary+same_lang+similar_pop+shared_platform (55%)

WildClawBench alternative? Top 6 similar tools

Looking for a WildClawBench alternative? If you're comparing WildClawBench with other codex skill tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

  • NagaAgent by RTGS2017 · ⭐ 1.5k

    A simple yet powerful agent framework for personal assistants, designed to enable intelligent interaction, mul

  • tools by strands-agents · ⭐ 1.1k

    A set of tools that gives agents powerful capabilities.

  • awesome-openclaw by SamurAIGPT · ⭐ 957

    A curated list of OpenClaw resources, tools, skills, tutorials & articles. OpenClaw (formerly Moltbot / Clawdb

  • DeepMCPAgent by cryxnet · ⭐ 806

    Model-agnostic plug-n-play LangChain/LangGraph agents powered entirely by MCP tools over HTTP/SSE.

  • mcp-gateway-registry by agentic-community · ⭐ 762

    Enterprise-ready MCP Gateway & Registry that centralizes AI development tools with secure OAuth authentication

  • ai-maestro by 23blocks-OS · ⭐ 709

    AI Agent Orchestrator with Skills System - Give AI Agents superpowers: memory search, code graph queries, agen

More Codex Skill Tools

Explore other popular codex skill tools:

View all Codex Skill tools →

Popular Python Agent Tools

Frequently Asked Questions

What is WildClawBench?

WildClawBench is An in-the-wild benchmark for AI agents in the OpenClaw Environment.. It is categorized as a Codex Skill with 452 GitHub stars.

What programming language is WildClawBench written in?

WildClawBench is primarily written in Python. It covers topics such as agentic-ai, agentic-evaluation, agents.

How do I install or use WildClawBench?

You can find installation instructions and usage details in the WildClawBench GitHub repository at github.com/InternLM/WildClawBench. The project has 452 stars and 45 forks, indicating an active community.

What license does WildClawBench use?

WildClawBench is released under the MIT license, making it free to use and modify according to the license terms.

What are the best alternatives to WildClawBench?

The top alternatives to WildClawBench on Agent Skills Hub include NagaAgent, tools, awesome-openclaw. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse Codex Skill tools