by zhangxjohn · Agent Tool · ★ 167
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
LLM-Agent-Benchmark-List 🤗We greatly appreciate any contributions via PRs, issues, emails, or other methods. ⏳ Continuous update... :book: Introduction In the swiftly evolving landscape of artificial intelligence, Large Language Models (LLMs) have emerged as a pivotal cornerstone, revolutionizing how we interact with and harness the power of natural language processing. However, as LLMs gain widespread application in both research and industry sectors, the imperative shifts towards evaluating their efficacy rather than perpetuating a cycle of unbridled performance iterations. This paradigm shift raises critical questions: i) what to evaluate? ii) where to evaluate? iii)How to evaluate? Diverse research endeavors have proposed varying interpretations and methodologies in response to these queries. The aim of this work is to methodically review and organize benchmarks that are both LLMs and agent-powered, thereby providing a streamlined resource for those journeying towards Artificial General Intelligence (AGI). :dizzy: List Survey [2023/07] A Survey on Evaluation of Large Language Models. Yupeng Chang ( Jilin University) et al. arXiv.
| Stars | 167 |
| Forks | 11 |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 42.7/100 |
| Open Issues | 3 |
| Last Updated | 2026-05-12 |
| Created | 2024-01-29 |
| Est. Tokens | ~12k |
Looking for a LLM-Agent-Benchmark-List alternative? If you're comparing LLM-Agent-Benchmark-List with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
This is the repository for the Tool Learning survey.
[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
TPAMI 2026 | This repository collects awesome survey, resource, and paper for lifelong learning LLM agents
Awesome LLM Papers and repos on very comprehensive topics.
GitHub page for "Large Language Model-Brained GUI Agents: A Survey"
Explore other popular agent tool tools:
LLM-Agent-Benchmark-List is A banchmark list for evaluation of large language models.. It is categorized as a Agent Tool with 167 GitHub stars.
You can find installation instructions and usage details in the LLM-Agent-Benchmark-List GitHub repository at github.com/zhangxjohn/LLM-Agent-Benchmark-List. The project has 167 stars and 11 forks, indicating an active community.
LLM-Agent-Benchmark-List is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to LLM-Agent-Benchmark-List on Agent Skills Hub include bigcodebench, LLM-Tool-Survey, OpenRCA. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.