Best AI Agent Skills for Document Parsing

Discover tools for parsing PDFs, Word documents, spreadsheets, and extracting structured data from unstructured files.

Top 10 Document Parsing Tools

1 ExtractThinker by enoch3712
★ 1.5k Python Agent Tool

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

View Details → GitHub →
2 pdfmux by NameetP
★ 47 Python MCP Server

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

View Details → GitHub →
3 Skill_Seekers by yusufkaraaslan
★ 11.4k Python MCP Server

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

View Details → GitHub →
4 claude-office-skills by tfriedel
★ 348 Python Claude Skill

Office document creation and editing skills for Claude Code - PPTX, DOCX, XLSX, and PDF workflows with automation support

View Details → GitHub →
5 claude-code-polished-documents-skills by promptadvisers
★ 51 Python Claude Skill

A comprehensive collection of Claude Code skills for document generation, styling, and manipulation. Includes Document Polisher with 10 premium brand themes (McKinsey, Deloitte, Stripe, Apple, Notion, etc.) plus docx, pdf, xlsx, pptx skills.

View Details → GitHub →
6 pdf-reader-mcp by SylphxAI
★ 580 TypeScript MCP Server

📄 Production-ready MCP server for PDF processing - 5-10x faster with parallel processing and 94%+ test coverage

View Details → GitHub →
7 markdown-exporter by bowenliang123
★ 189 Python Agent Tool

An Agent Skill and Dify plugin to transform Markdown to files of DOCX, PPTX, XLSX, PNG, PDF, Mermaid, HTML, MD, CSV, JSON, XML.

View Details → GitHub →
8 ragflow by infiniflow
★ 76.5k Python MCP Server

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

View Details → GitHub →
9 codebase_to_text by QaisarRajput
★ 102 Python Agent Tool

For GenAI and LLM usage. This package converts codebase (folder structure with files) into a single text file or a Microsoft Word document (.docx), preserving folder structure and file contents. The tool extracts file contents from various file types, including text files, documents, and more, while retaining their formatting for easy readability.

View Details → GitHub →
10 siyuan by siyuan-note
★ 42.2k TypeScript Codex Skill

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

View Details → GitHub →

Comparison

Tool Stars Language License Score
ExtractThinker ★ 1.5k Python Apache-2.0 31
pdfmux ★ 47 Python MIT 42
Skill_Seekers ★ 11.4k Python MIT 51
claude-office-skills ★ 348 Python 32
claude-code-polished-documents-skills ★ 51 Python 33
pdf-reader-mcp ★ 580 TypeScript MIT 46
markdown-exporter ★ 189 Python Apache-2.0 41
ragflow ★ 76.5k Python Apache-2.0 53
codebase_to_text ★ 102 Python Apache-2.0 30
siyuan ★ 42.2k TypeScript AGPL-3.0 50

Related Categories

Content Writing Summarization Data Pipeline

Frequently Asked Questions

What are the best AI tools for document parsing?

The top document parsing tools include ExtractThinker, pdfmux, Skill_Seekers. These are ranked by our composite score based on GitHub stars, community activity, and code quality.

Are these document parsing tools free to use?

Most tools listed here are open-source. 8 out of 10 have explicit open-source licenses, making them free to use and modify.

How do I choose the right document parsing tool?

Consider your tech stack (language compatibility), project scale (stars indicate community trust), and specific features you need. Use the comparison table above to evaluate side by side.

Get Weekly AI Tool Picks

Top 20 fastest-growing AI tools delivered every Monday. Free.

No spam, unsubscribe anytime.

Explore All 25,000+ Skills on Agent Skills Hub