mint-bench — Agent Tool by xingyaoww

by xingyaoww · Agent Tool · ★ 133

Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h

About mint-bench

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback Official Repo for paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang\, Zihan Wang\, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Heng Ji. MINT benchmark aims to evaluate LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. :trophy: Please visit our website for the leaderboard. :warning: WARNING: Evaluation of LLMs requires executing untrusted model-generated code. Users are strongly encouraged to sandbox the code execution so that it does not perform destructive actions on their host or network. We highly recommend using the provided docker image for isolated execution. :rocket: Quick Start Environment Setup You can choose to use docker (recommended) or local setup as follows. Docker Setup (Recommended) You only need to ensure that you have docker installed on your local computer following [the official guide](https://docs.docker.co

Quick Facts

Stars133
Forks8
LanguagePython
CategoryAgent Tool
LicenseApache-2.0
Quality Score33.3/100
Last Updated2024-06-04
Created2023-09-18
Platformspython
Est. Tokens~4821k

mint-bench alternative? Top 6 similar tools

Looking for a mint-bench alternative? If you're comparing mint-bench with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

  • agent-skills by hashicorp · ⭐ 639

    A collection of Agent skills and Claude Code plugins for HashiCorp products.

  • awesome-android-agent-skills by new-silvermoon · ⭐ 588

    A collection of standardized Agent Skills to teach GitHub Copilot, Claude, Gemini and Cursor about modern Andr

  • claude-code-skill-factory by alirezarezvani · ⭐ 571

    Claude Code Skill Factory — A powerful open-source toolkit for building and deploying production-ready Claude

  • claude-plugins by Kamalnrf · ⭐ 517

    Lightweight registry to discover, install, and manage all public Claude plugins and agent skills for your favo

  • claude-skills-marketplace by mhattingpete · ⭐ 442

    Claude Code Skills for software engineering workflows - Git automation, testing, and code review

  • tutor-skills by RoundTable02 · ⭐ 400

    A Claude Code skill that turns PDFs, docs, and codebases into Obsidian study vaults

More Agent Tool Tools

Explore other popular agent tool tools:

View all Agent Tool tools →

Popular Python Agent Tools

Frequently Asked Questions

What is mint-bench?

mint-bench is Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng and Hen. It is categorized as a Agent Tool with 133 GitHub stars.

What programming language is mint-bench written in?

mint-bench is primarily written in Python.

How do I install or use mint-bench?

You can find installation instructions and usage details in the mint-bench GitHub repository at github.com/xingyaoww/mint-bench. The project has 133 stars and 8 forks, indicating an active community.

What license does mint-bench use?

mint-bench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

What are the best alternatives to mint-bench?

The top alternatives to mint-bench on Agent Skills Hub include agent-skills, awesome-android-agent-skills, claude-code-skill-factory. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse Agent Tool tools