Eval — LLM Plugin by ai-twinkle

by ai-twinkle · LLM Plugin · ★ 94

Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h

About Eval

High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.

evalevaluationllm

Quick Facts

Stars94
Forks16
LanguagePython
CategoryLLM Plugin
LicenseMIT
Quality Score41.75/100
Open Issues18
Last Updated2026-04-11
Created2025-03-31
Platformspython
Est. Tokens~202k

Compatible Skills

These tools work well together with Eval for enhanced workflows:

  • mcp-interviewer — semantic(0.17)+complementary+rare_topics+same_lang+similar_pop+shared_platform (56%)
  • eval-view — semantic(0.17)+complementary+rare_topics+same_lang+similar_pop+shared_platform (56%)
  • just-eval — semantic(0.58)+rare_topics+same_lang+similar_pop+shared_platform (55%)
  • toolbench — semantic(0.22)+complementary+same_lang+similar_pop+shared_platform (53%)

Eval alternative? Top 6 similar tools

Looking for a Eval alternative? If you're comparing Eval with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

  • skill-optimizer by fastxyz · ⭐ 50

    Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs

  • OpenClawProBench by suyoumo · ⭐ 340

    OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with de

  • bocoel by rentruewang · ⭐ 289

    Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 tim

  • arag by Ayanami0730 · ⭐ 187

    A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG fram

  • mcp-interviewer by microsoft · ⭐ 147

    Catch MCP server issues before your agents do.

  • Hegelion by Hmbown · ⭐ 137

    Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

More LLM Plugin Tools

Explore other popular llm plugin tools:

View all LLM Plugin tools →

Popular Python Agent Tools

Frequently Asked Questions

What is Eval?

Eval is High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.. It is categorized as a LLM Plugin with 94 GitHub stars.

What programming language is Eval written in?

Eval is primarily written in Python. It covers topics such as eval, evaluation, llm.

How do I install or use Eval?

You can find installation instructions and usage details in the Eval GitHub repository at github.com/ai-twinkle/Eval. The project has 94 stars and 16 forks, indicating an active community.

What license does Eval use?

Eval is released under the MIT license, making it free to use and modify according to the license terms.

What are the best alternatives to Eval?

The top alternatives to Eval on Agent Skills Hub include skill-optimizer, OpenClawProBench, bocoel. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse LLM Plugin tools