by ZhuLinsen · Agent Tool · ★ 180
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
FastDatasets 🚀 一个强大的工具,用于为大语言模型(LLM)创建高质量的训练数据集 | Switch to English 🎯 在线体验 🚀 立即体验 FastDatasets,无需安装! 上传你的文档,一键生成 Alpaca 格式训练数据集 - 完全免费,无需配置环境! 主要功能 基于自由文档生成数据集 智能文档处理:支持多种格式文档的智能分割 问题生成:基于文档内容自动生成相关问题 答案生成:使用 LLM 生成高质量答案 异步处理:支持大规模文档的异步处理 多种导出格式:支持多种数据集格式导出(Alpaca、ShareGPT等) 直接SFT就绪输出:生成适用于监督微调的数据集 数据蒸馏与优化 知识蒸馏:从大模型中提取知识到训练数据集 指令扩增:自动生成指令变体,扩充训练数据 质量优化:使用 LLM 优化和提升数据质量 多格式支持:支持从多种格式的数据集进行蒸馏 快速开始 环境要求 Python 3.8+ 依赖包:见 安装
| Stars | 180 |
| Forks | 27 |
| Language | Python |
| Category | Agent Tool |
| License | Apache-2.0 |
| Quality Score | 66.7791591557286/100 |
| Last Updated | 2025-08-31 |
| Created | 2025-04-25 |
| Platforms | python |
| Est. Tokens | ~198k |
These tools work well together with FastDatasets for enhanced workflows:
Looking for a FastDatasets alternative? If you're comparing FastDatasets with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
2X faster ASGI web framework for python, offering high-level development, low-level performance.
Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, wi
🌐Web Agent Protocol (WAP) - Record and replay user interactions in the browser with MCP support
Your own Claude Code UI, sandbox, in-browser VS Code, terminal, multi-provider support (Anthropic, OpenAI, Git
The World's Most Comprehensive, Authoritative, and Structured Open Source Data Source Knowledge Base
A Model Context Protocol (MCP) server that implements the Zettelkasten knowledge management methodology, allow
Explore other popular agent tool tools:
FastDatasets is A powerful tool for creating high-quality training datasets for Large Language Models (LLMs)(一个快速生成高质量LLM微调训练数据集的工具). It is categorized as a Agent Tool with 180 GitHub stars.
FastDatasets is primarily written in Python. It covers topics such as asyncio, dataset-generation, datasets.
You can find installation instructions and usage details in the FastDatasets GitHub repository at github.com/ZhuLinsen/FastDatasets. The project has 180 stars and 27 forks, indicating an active community.
FastDatasets is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to FastDatasets on Agent Skills Hub include lihil, python-utcp, web-agent-protocol. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.