Eval — LLM Plugin by ai-twinkle

by ai-twinkle · LLM Plugin · ★ 94

Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h

About Eval

Twinkle Eval -- High-Performance LLM Evaluation Framework 🇺🇸 English | 🇹🇼 繁體中文 Twinkle Eval 是一個以並行 API 請求為核心的 LLM 評測框架,支援選擇題、數學推理、指令遵循、函式呼叫、長文本理解、RAG、Text-to-SQL 等多類型評測。透過 OpenAI 相容 API 呼叫已部署的模型端點,單機即可完成完整評測流程。 目錄 為什麼選擇 Twinkle Eval 支援的評測資料集 評測方法一覽 安裝 快速開始 CLI 參考 設定檔 輸出格式 排行榜 使用 Coding Agent 貢獻 貢獻者 授權條款 引用 致謝 為什麼選擇 Twinkle Eval 2025 年推理模型(reasoning model)大量出現,每次 API 回應時間大幅增加。傳統評測框架逐題同步呼叫,一個 benchmark 動輒數小時。Twinkle Eval 以 並行送出請求,實測比 iKala/ievals 快 9--17 倍

evalevaluationllm

Quick Facts

Stars94
Forks16
LanguagePython
CategoryLLM Plugin
LicenseMIT
Quality Score64.5124777693617/100
Open Issues18
Last Updated2026-04-11
Created2025-03-31
Platformspython
Est. Tokens~202k

Compatible Skills

These tools work well together with Eval for enhanced workflows:

  • mcp-interviewer — semantic(0.17)+complementary+rare_topics+same_lang+similar_pop+shared_platform (56%)
  • eval-view — semantic(0.17)+complementary+rare_topics+same_lang+similar_pop+shared_platform (56%)
  • just-eval — semantic(0.58)+rare_topics+same_lang+similar_pop+shared_platform (55%)
  • toolbench — semantic(0.22)+complementary+same_lang+similar_pop+shared_platform (53%)

Eval alternative? Top 6 similar tools

Looking for a Eval alternative? If you're comparing Eval with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

  • skill-optimizer by fastxyz · ⭐ 58

    Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs

  • bocoel by rentruewang · ⭐ 289

    Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 tim

  • arag by Ayanami0730 · ⭐ 187

    A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG fram

  • mcp-interviewer by microsoft · ⭐ 151

    Catch MCP server issues before your agents do.

  • Hegelion by Hmbown · ⭐ 137

    Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

  • eval-view by hidai25 · ⭐ 114

    Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGr

More LLM Plugin Tools

Explore other popular llm plugin tools:

View all LLM Plugin tools →

Popular Python Agent Tools

Frequently Asked Questions

What is Eval?

Eval is High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.. It is categorized as a LLM Plugin with 94 GitHub stars.

What programming language is Eval written in?

Eval is primarily written in Python. It covers topics such as eval, evaluation, llm.

How do I install or use Eval?

You can find installation instructions and usage details in the Eval GitHub repository at github.com/ai-twinkle/Eval. The project has 94 stars and 16 forks, indicating an active community.

What license does Eval use?

Eval is released under the MIT license, making it free to use and modify according to the license terms.

What are the best alternatives to Eval?

The top alternatives to Eval on Agent Skills Hub include skill-optimizer, bocoel, arag. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse LLM Plugin tools