by NameetP · MCP Server · ★ 71
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
pdfmux Self-healing PDF extraction with per-page confidence scoring. Open-source LlamaParse alternative for RAG pipelines, MCP server for Claude Desktop, LangChain + LlamaIndex loaders. Ranked #2 on opendataloader-bench (0.900). The only PDF extractor that audits its own output. Catches blank pages, scrambled columns, broken tables — re-extracts them with a stronger backend. So your LLM gets clean data, not silent garbage. Routes each page to the best of 5 rule-based backends + BYOK LLM fallback (Gemini / Claude / GPT-4o / Ollama). One CLI. One API. Zero config. PDF ── pdfmux router ── best extractor per page ── audit ── re-extract failures ── Markdown / JSON / chunks | ├─ PyMuPDF (digital text, 0.01s/page) ├─ OpenDataLoader (complex layouts, 0.05s/page) ├─ RapidOCR (scanned pages, CPU-only)
| Stars | 71 |
| Forks | 11 |
| Language | Python |
| Category | MCP Server |
| License | MIT |
| Quality Score | 67.8650315813379/100 |
| Open Issues | 5 |
| Last Updated | 2026-06-24 |
| Created | 2026-03-03 |
| Platforms | cli, mcp, python |
| Est. Tokens | ~19k |
Looking for a pdfmux alternative? If you're comparing pdfmux with other mcp server tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
An MCP server that lets Claude Code and other AI agents work through large PDFs without overflowing their cont
AI-Native document parser: PDF, Office & images → clean Markdown with LaTeX, tables & OCR. Zero-dependency CLI
MCP server for AI agent for cybersecurity: automate assessment of documents, questionnaires & reports. Multi-f
All-in-one MCP server that can connect your AI agents to any native endpoint, powered by UTCP
A self-healing web scraper built for hostile sites: selectors repair themselves, browser rendering kicks in wh
AI Research assistant plugin for Zotero 9. Chat with your library, run federated scholarly search, RAG, OCR, s
Explore other popular mcp server tools:
pdfmux is PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.. It is categorized as a MCP Server with 71 GitHub stars.
pdfmux is primarily written in Python. It covers topics such as ai-agent, docling, document-parsing.
You can find installation instructions and usage details in the pdfmux GitHub repository at github.com/NameetP/pdfmux. The project has 71 stars and 11 forks, indicating an active community.
pdfmux is released under the MIT license, making it free to use and modify according to the license terms.
The top alternatives to pdfmux on Agent Skills Hub include pdf-mcp, MinerU-Skill, DocSentinel. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.