by arthur-ai · Agent Tool · ★ 428
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
Bench Bench is a tool for evaluating LLMs for production use cases. Whether you are comparing different LLMs, considering different prompts, or testing generation hyperparameters like temperature and # tokens, Bench provides one touch point for all your LLM performance evaluation. If you have encountered a need for any of the following in your LLM work, then Bench can help with your evaluation: to standardize the workflow of LLM evaluation with a common interface across tasks and use cases to test whether open source LLMs can do as well as the top closed-source LLM API providers on your specific data to translate the rankings on LLM leaderboards and benchmarks into scores that you care about for your actual use case Join the bench community on Discord. For bug fixes and feature requests, please file a Github issue.
| Stars | 428 |
| Forks | 42 |
| Language | TypeScript |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 61.037885384596/100 |
| Open Issues | 1 |
| Last Updated | 2024-05-10 |
| Created | 2023-07-07 |
| Platforms | node |
| Est. Tokens | ~738k |
Looking for a bench alternative? If you're comparing bench with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Smoothly Manage Multiple LLMs (OpenAI, Anthropic, Azure) and Image Models (Dall-E, SDXL), Speed Up Responses,
A curated, comprehensive collection of open-source AI tools, frameworks, datasets, courses, and seminal papers
An orchestration runtime for multi-agent AI systems. Declare agents, tools, and policies as YAML; Orloj schedu
A Unified MCP Server Management App (MCP Manager).
Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diag
An MCP client for Neovim that seamlessly integrates MCP servers into your editing workflow with an intuitive i
Explore other popular agent tool tools:
bench is A tool for evaluating LLMs. It is categorized as a Agent Tool with 428 GitHub stars.
bench is primarily written in TypeScript. It covers topics such as llm, mlops.
You can find installation instructions and usage details in the bench GitHub repository at github.com/arthur-ai/bench. The project has 428 stars and 42 forks, indicating an active community.
bench is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to bench on Agent Skills Hub include GPTRouter, awesome-AI-toolkit, orloj. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.