by night-chen · Agent Tool · ★ 286
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.
| Stars | 286 |
| Forks | 14 |
| Language | Jupyter Notebook |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 48.2/100 |
| Open Issues | 5 |
| Last Updated | 2023-08-19 |
| Created | 2023-06-06 |
| Est. Tokens | ~24k |
Looking for a ToolQA alternative? If you're comparing ToolQA with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Open Weight, tool-calling LLMs
🦙 Integrating LLMs into structured NLP pipelines
Large Language Model based Multi-Agents: A Survey of Progress and Challenges (In IJCAI 2024)
A database of SDKs, frameworks, libraries, and tools for creating, monitoring, debugging and deploying autonom
Observal is an Observability and Evaluation platform for human-in-the-loop agents
🪐 🔧 Model Context Protocol (MCP) Server for Jupyter.
Explore other popular agent tool tools:
ToolQA is ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.. It is categorized as a Agent Tool with 286 GitHub stars.
ToolQA is primarily written in Jupyter Notebook. It covers topics such as large-language-models, natural-language-understanding, natural-lauguage-processing.
You can find installation instructions and usage details in the ToolQA GitHub repository at github.com/night-chen/ToolQA. The project has 286 stars and 14 forks, indicating an active community.
ToolQA is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to ToolQA on Agent Skills Hub include rubra, spacy-llm, LLM_MultiAgents_Survey_Papers. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.