by ai-twinkle · LLM Plugin · ★ 94
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
Twinkle Eval -- High-Performance LLM Evaluation Framework 🇺🇸 English | 🇹🇼 繁體中文 Twinkle Eval 是一個以並行 API 請求為核心的 LLM 評測框架,支援選擇題、數學推理、指令遵循、函式呼叫、長文本理解、RAG、Text-to-SQL 等多類型評測。透過 OpenAI 相容 API 呼叫已部署的模型端點,單機即可完成完整評測流程。 目錄 為什麼選擇 Twinkle Eval 支援的評測資料集 評測方法一覽 安裝 快速開始 CLI 參考 設定檔 輸出格式 排行榜 使用 Coding Agent 貢獻 貢獻者 授權條款 引用 致謝 為什麼選擇 Twinkle Eval 2025 年推理模型(reasoning model)大量出現,每次 API 回應時間大幅增加。傳統評測框架逐題同步呼叫,一個 benchmark 動輒數小時。Twinkle Eval 以 並行送出請求,實測比 iKala/ievals 快 9--17 倍
| Stars | 94 |
| Forks | 16 |
| Language | Python |
| Category | LLM Plugin |
| License | MIT |
| Quality Score | 64.5124777693617/100 |
| Open Issues | 18 |
| Last Updated | 2026-04-11 |
| Created | 2025-03-31 |
| Platforms | python |
| Est. Tokens | ~202k |
These tools work well together with Eval for enhanced workflows:
Looking for a Eval alternative? If you're comparing Eval with other llm plugin tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 tim
A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG fram
Catch MCP server issues before your agents do.
Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)
Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGr
Explore other popular llm plugin tools:
Eval is High-performance LLM evaluation framework with parallel API calls — up to 17× faster than sequential tools. Supports box, math, and logit-based evaluation.. It is categorized as a LLM Plugin with 94 GitHub stars.
Eval is primarily written in Python. It covers topics such as eval, evaluation, llm.
You can find installation instructions and usage details in the Eval GitHub repository at github.com/ai-twinkle/Eval. The project has 94 stars and 16 forks, indicating an active community.
Eval is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to Eval on Agent Skills Hub include skill-optimizer, bocoel, arag. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.