by ai-twinkle · LLM Plugin · ★ 94
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.
| Stars | 94 |
| Forks | 16 |
| Language | Python |
| Category | LLM Plugin |
| License | MIT |
| Quality Score | 41.75/100 |
| Open Issues | 18 |
| Last Updated | 2026-04-11 |
| Created | 2025-03-31 |
| Platforms | python |
| Est. Tokens | ~202k |
These tools work well together with Eval for enhanced workflows:
Looking for a Eval alternative? If you're comparing Eval with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with de
Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 tim
A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG fram
Catch MCP server issues before your agents do.
Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)
Explore other popular llm plugin tools:
Eval is High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.. It is categorized as a LLM Plugin with 94 GitHub stars.
Eval is primarily written in Python. It covers topics such as eval, evaluation, llm.
You can find installation instructions and usage details in the Eval GitHub repository at github.com/ai-twinkle/Eval. The project has 94 stars and 16 forks, indicating an active community.
Eval is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to Eval on Agent Skills Hub include skill-optimizer, OpenClawProBench, bocoel. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.