by night-chen · Agent Tool · ★ 286
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
🛠️ToolQA 🛠️ The official repository for code and data of ToolQA dataset. ToolQA is a open-source dataset specifically designed for evaluations on tool-augmented large language models (LLMs). This repo provides the dataset, the corresponding data generation code, and the implementations of baselines on our dataset. Features Our questions are selected and guaranteed that LLMs have little chance to memorize and answer correctly within their internal knowledge; The majority of the questions in ToolQA require compositional use of multiple tools. According to the length of toolchains, we offer two different difficult levels of dataset: Easy and Hard. We apply a thorough diagnosis and analysis of in-context tool-augmented LLMs in our paper. ToolQA is created via collaboration between humans and AI, adaptable to new data and questions with automation. Dataset Statistics ToolQA consists of data from 8 distinct domains, each instance being a tuple — (question, answer, reference corpora, and tools). The reference corpora are external knowledge sources that can be queried, which can be a text corpus, a tabular database, or a graph. :
| Stars | 286 |
| Forks | 14 |
| Language | Jupyter Notebook |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 67.6839033158702/100 |
| Open Issues | 5 |
| Last Updated | 2023-08-19 |
| Created | 2023-06-06 |
| Est. Tokens | ~24k |
Looking for a ToolQA alternative? If you're comparing ToolQA with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Open Weight, tool-calling LLMs
🦙 Integrating LLMs into structured NLP pipelines
🪐 🔧 Model Context Protocol (MCP) Server for Jupyter.
A database of SDKs, frameworks, libraries, and tools for creating, monitoring, debugging and deploying autonom
Ship RAG based LLM web apps in seconds.
🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from promp
Explore other popular agent tool tools:
ToolQA is ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.. It is categorized as a Agent Tool with 286 GitHub stars.
ToolQA is primarily written in Jupyter Notebook. It covers topics such as large-language-models, natural-language-understanding, natural-lauguage-processing.
You can find installation instructions and usage details in the ToolQA GitHub repository at github.com/night-chen/ToolQA. The project has 286 stars and 14 forks, indicating an active community.
ToolQA is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to ToolQA on Agent Skills Hub include rubra, spacy-llm, jupyter-mcp-server. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.