by xingyaoww · Agent Tool · ★ 133
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback Official Repo for paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang\, Zihan Wang\, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Heng Ji. MINT benchmark aims to evaluate LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. :trophy: Please visit our website for the leaderboard. :warning: WARNING: Evaluation of LLMs requires executing untrusted model-generated code. Users are strongly encouraged to sandbox the code execution so that it does not perform destructive actions on their host or network. We highly recommend using the provided docker image for isolated execution. :rocket: Quick Start Environment Setup You can choose to use docker (recommended) or local setup as follows. Docker Setup (Recommended) You only need to ensure that you have docker installed on your local computer following [the official guide](https://docs.docker.co
| Stars | 133 |
| Forks | 8 |
| Language | Python |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 33.3/100 |
| Last Updated | 2024-06-04 |
| Created | 2023-09-18 |
| Platforms | python |
| Est. Tokens | ~4821k |
Looking for a mint-bench alternative? If you're comparing mint-bench with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
A collection of Agent skills and Claude Code plugins for HashiCorp products.
A collection of standardized Agent Skills to teach GitHub Copilot, Claude, Gemini and Cursor about modern Andr
Claude Code Skill Factory — A powerful open-source toolkit for building and deploying production-ready Claude
Lightweight registry to discover, install, and manage all public Claude plugins and agent skills for your favo
Claude Code Skills for software engineering workflows - Git automation, testing, and code review
A Claude Code skill that turns PDFs, docs, and codebases into Obsidian study vaults
Explore other popular agent tool tools:
mint-bench is Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Hen. It is categorized as a Agent Tool with 133 GitHub stars.
mint-bench is primarily written in Python.
You can find installation instructions and usage details in the mint-bench GitHub repository at github.com/xingyaoww/mint-bench. The project has 133 stars and 8 forks, indicating an active community.
mint-bench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to mint-bench on Agent Skills Hub include agent-skills, awesome-android-agent-skills, claude-code-skill-factory. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.