by onejune2018 · Agent Tool · ★ 615
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
| Stars | 615 |
| Forks | 51 |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 39.5/100 |
| Open Issues | 9 |
| Last Updated | 2025-11-24 |
| Created | 2023-04-26 |
| Platforms | aws |
| Est. Tokens | ~1132k |
Looking for a Awesome-LLM-Eval alternative? If you're comparing Awesome-LLM-Eval with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
🦙 Integrating LLMs into structured NLP pipelines
A curated list of resources dedicated to open source GitHub repositories related to ChatGPT, OpenAI API, and C
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and po
A set of tools that gives agents powerful capabilities.
🤖 Automatically collected AI repos, tools, websites, papers & tutorials. 实用AI百宝箱 💎
♾️ Private Agent Fleet with Spec Coding. Each agent gets their own GPU-accelerated desktop. Run Claude, Codex,
Explore other popular agent tool tools:
Awesome-LLM-Eval is Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.. It is categorized as a Agent Tool with 615 GitHub stars.
You can find installation instructions and usage details in the Awesome-LLM-Eval GitHub repository at github.com/onejune2018/Awesome-LLM-Eval. The project has 615 stars and 51 forks, indicating an active community.
Awesome-LLM-Eval is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to Awesome-LLM-Eval on Agent Skills Hub include spacy-llm, awesome-ChatGPT-repositories, ExtractThinker. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.