by zhangxjohn · Agent Tool · ★ 164
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
A banchmark list for evaluation of large language models.
| Stars | 164 |
| Forks | 11 |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 42.7/100 |
| Open Issues | 3 |
| Last Updated | 2026-04-16 |
| Created | 2024-01-29 |
| Est. Tokens | ~12k |
Looking for a LLM-Agent-Benchmark-List alternative? If you're comparing LLM-Agent-Benchmark-List with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with determ
This is the repository for the Tool Learning survey.
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with de
[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
TPAMI 2026 | This repository collects awesome survey, resource, and paper for lifelong learning LLM agents
Explore other popular agent tool tools:
LLM-Agent-Benchmark-List is A banchmark list for evaluation of large language models.. It is categorized as a Agent Tool with 164 GitHub stars.
You can find installation instructions and usage details in the LLM-Agent-Benchmark-List GitHub repository at github.com/zhangxjohn/LLM-Agent-Benchmark-List. The project has 164 stars and 11 forks, indicating an active community.
LLM-Agent-Benchmark-List is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to LLM-Agent-Benchmark-List on Agent Skills Hub include bigcodebench, ClawProBench, LLM-Tool-Survey. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.