by TheAgentArk · MCP Server · ★ 229
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
🦤 Toucan-1.5M: Toucan-1.5M is the largest fully synthetic tool-agent dataset to date, designed to advance tool use in agentic LLMs. It comprises over 1.5 million trajectories synthesized from 495 real-world Model Context Protocols (MCPs) spanning 2,000+ tools. By leveraging authentic MCP environments, Toucan-1.5M generates diverse, realistic, and challenging tasks requires using multiple tools, with trajectories involving real tool executions across multi-round, multi-turn, sequential, and parallel tool calls. Models fine-tuned on Toucan-1.5M outperform much larger closed-source counterparts on the BFCL V3 benchmark and extend the Pareto frontier on the MCP-Universe benchmark. 📄 Technical Report - Technical details behind Toucan-1.5M 💾 Github Repo - Pipeline to produce Toucan-1.5M 🤗 HF Dataset - Full dataset 🚚 Installation 📝 Data Synthesis Please refer to folder for details. 📚 Citation If you find the data or code useful, please cite: @misc{xu2025toucan, title={TOUCAN: Synthesizing 1.5M Tool-Agentic Dat
| Stars | 229 |
| Forks | 11 |
| Language | Python |
| Category | MCP Server |
| License | MIT |
| Quality Score | 29.25/100 |
| Open Issues | 4 |
| Last Updated | 2025-12-16 |
| Created | 2025-09-30 |
| Platforms | mcp, python |
| Est. Tokens | ~1625k |
These tools work well together with Toucan for enhanced workflows:
Explore other popular mcp server tools:
Toucan is Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments. It is categorized as a MCP Server with 229 GitHub stars.
Toucan is primarily written in Python.
You can find installation instructions and usage details in the Toucan GitHub repository at github.com/TheAgentArk/Toucan. The project has 229 stars and 11 forks, indicating an active community.
Toucan is released under the MIT license, making it free to use and modify according to the license terms.