Eval — LLM Plugin by ai-twinkle

Last updated: 2026-04-11 · Indexed by AgentSkillsHub · Auto-synced every 8h

About Eval

High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.

eval evaluation llm

Quick Facts

Stars	94
Forks	16
Language	Python
Category	LLM Plugin
License	MIT
Quality Score	41.75/100
Open Issues	18
Last Updated	2026-04-11
Created	2025-03-31
Platforms	python
Est. Tokens	~202k

Compatible Skills

These tools work well together with Eval for enhanced workflows:

mcp-interviewer — semantic(0.17)+complementary+rare_topics+same_lang+similar_pop+shared_platform (56%)
eval-view — semantic(0.17)+complementary+rare_topics+same_lang+similar_pop+shared_platform (56%)
just-eval — semantic(0.58)+rare_topics+same_lang+similar_pop+shared_platform (55%)
toolbench — semantic(0.22)+complementary+same_lang+similar_pop+shared_platform (53%)

Eval alternative? Top 6 similar tools

Looking for a Eval alternative? If you're comparing Eval with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

skill-optimizer by fastxyz · ⭐ 50
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
OpenClawProBench by suyoumo · ⭐ 340
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with de
bocoel by rentruewang · ⭐ 289
Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 tim
arag by Ayanami0730 · ⭐ 187
A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG fram
mcp-interviewer by microsoft · ⭐ 147
Catch MCP server issues before your agents do.
Hegelion by Hmbown · ⭐ 137
Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

More LLM Plugin Tools

Explore other popular llm plugin tools:

stagehand ⭐ 22.6k
promptfoo ⭐ 21.2k
gorilla ⭐ 12.8k
llm ⭐ 11.8k
llm-engineer-toolkit ⭐ 10.1k
phoenix ⭐ 9.6k
code2prompt ⭐ 7.3k
superagent ⭐ 6.5k
ai-cookbook ⭐ 4.0k
DecryptPrompt ⭐ 3.4k

View all LLM Plugin tools →

Popular Python Agent Tools

AutoGPT ⭐ 184.2k · Agent Tool
langflow ⭐ 148.0k · Agent Tool
hermes-agent ⭐ 146.9k · Codex Skill
open-webui ⭐ 136.8k · MCP Server
langchain ⭐ 136.6k · Agent Tool

Frequently Asked Questions

What is Eval?

Eval is High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.. It is categorized as a LLM Plugin with 94 GitHub stars.

What programming language is Eval written in?

Eval is primarily written in Python. It covers topics such as eval, evaluation, llm.

How do I install or use Eval?

You can find installation instructions and usage details in the Eval GitHub repository at github.com/ai-twinkle/Eval. The project has 94 stars and 16 forks, indicating an active community.

What license does Eval use?

Eval is released under the MIT license, making it free to use and modify according to the license terms.

What are the best alternatives to Eval?

The top alternatives to Eval on Agent Skills Hub include skill-optimizer, OpenClawProBench, bocoel. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse LLM Plugin tools