MinerU — High-accuracy document parsing engine for LLM · RAG · Agent workflows
Converts PDF · Word · PPT · Images · Web pages into structured Markdown / JSON · VLM+OCR dual engine · 109 languagesMCP Server · LangChain / Dify / FastGPT native integration · 10+ domestic AI chip support
🔍 Core Parsing Capabilities
- Formulas → LaTeX · Tables → HTML, accurate layout reconstruction
- Supports scanned docs, handwriting, multi-column layouts, cross-page table merging
- Output follows human reading order with automatic header/footer removal
- VLM + OCR dual engine, 109-language OCR recognition
🔌 Integration
| Use Case | Solution |
|---|---|
| AI Coding Tools | MCP Server — Cursor · Claude Desktop · Windsurf |
| RAG Frameworks | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |
| Development | Python / Go / TypeScript SDK · CLI · REST API · Docker |
| No-Code | mineru.net online · Gradio WebUI · Desktop client |
🖥️ Deployment (Private · Fully Offline)
| Inference Backend | Best For |
|---|---|
| pipeline | Fast & stable, no hallucination, runs on CPU or GPU |
| vlm-engine | High accuracy, supports vLLM / LMDeploy / mlx ecosystem |
| hybrid-engine | High accuracy, native text extraction, low hallucination |
Domestic AI chips: Ascend · Cambricon · Enflame · MetaX · Moore Threads · Kunlunxin · Iluvatar · Hygon · Biren · T-Head
Changelog
-
2026/03/29 3.0.0 Released
This release delivers a systematic upgrade centered on parsing capability, system architecture, and engineering usability. The main updates include:
- Native
DOCXparsing- Official support for native
DOCXparsing, delivering high-precision results without hallucinations. - Compared with the traditional workflow of first converting
DOCXtoPDFand then parsing it, end-to-end speed is improved by tens of times, making it better suited for scenarios with high requirements for both accuracy and throughput.
- Official support for native
pipelinebackend upgrade- The
pipelinebackend achieves a score of86.2on OmniDocBench (v1.5), surpassing the accuracy of the previous-generation mainstream VLMMinerU2.0-2505-0.9B. - Added support for parsing images/formulas inside tables, seal text recognition, vertical text support, and interline formula numbering recognition, continuously improving parsing quality for complex document scenarios.
- While maintaining high accuracy, it keeps resource usage extremely low and continues to support inference in pure CPU environments.
- The
API / CLI / Routerorchestration upgrademinerunow runs as an orchestration client based onmineru-api; when--api-urlis not provided, it will automatically start a local temporary service.mineru-apiadds a new asynchronous task endpointPOST /tasks, supporting task submission, status querying, and result retrieval; meanwhile, it retains the synchronous parsing endpointPOST /file_parsefor compatibility with legacy plugins.- Added
mineru-router, designed for unified entry deployment and task routing across multiple services and multiple GPUs; its interfaces are fully compatible withmineru-apiand support automatic task load balancing.
- Deployment and usability improvements
- Resolved compatibility issues with
torch >= 2.8; the base image has been upgraded tovllm0.11.2 + torch2.9.0, unifying installation paths across different Compute Capabilities. - Optimized the parsing pipeline with a sliding-window mechanism, significantly reducing peak memory usage in long-document scenarios, so documents with tens of thousands of pages no longer need to be split manually.
- Batch inference in
pipelinenow supports streaming writes to disk, allowing completed parsing results to be written out in time and further improving the experience for long-running tasks. - Completed thread-safety optimization and now fully supports multi-threaded concurrent inference; together with
mineru-router, this enables one-click multi-GPU deployment and makes it easy to build high-concurrency, high-throughput parsing systems. - Completely removed the use of two AGPLv3 models (
doclayoutyoloandmfd_yolov8) and one CC-BY-NC-SA 4.0 model (layoutreader).
- Resolved compatibility issues with
This update is not just a set of feature enhancements, but a key leap forward in MinerU's overall system capabilities. We specifically addressed the peak memory usage issue in long-document parsing. Through optimizations such as sliding windows and streaming writes to disk, ultra-long document parsing has moved from “requiring manual splitting and careful handling” to being “stable, scalable, and ready for production workloads.” At the same time, we completed thread-safety optimization and fully enabled multi-threaded concurrent inference, further improving single-machine resource utilization and runtime stability under high-concurrency workloads. On top of this, with
mineru-routerand the newAPI / CLIorchestration framework, MinerU now supports one-click multi-GPU deployment, unified access across multiple services, and automatic task load balancing, significantly reducing the difficulty of large-scale deployment. As a result, MinerU is evolving from a standalone data production tool into a large-scale document parsing foundation for high-concurrency and high-throughput scenarios, providing enterprise-grade document data processing with infrastructure that is more stable, more efficient, and easier to scale. - Native
📝 View the complete Changelog for more historical version information
Project Introduction
MinerU is a document parsing tool that converts PDF, image, and DOCX inputs into machine-readable formats such as Markdown and JSON for downstream retrieval, extraction, and processing.
MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models.
Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant document or sample file.
pdf_zh_cn.mp4
Key Features
- Support
PDF, image, andDOCXinputs. - Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.
- Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.
- Preserve the structure of the original document, including headings, paragraphs, lists, etc.
- Extract images, image descriptions, tables, table titles, and footnotes.
- Automatically recognize and convert formulas in the document to LaTeX format.
- Automatically recognize and convert tables in the document to HTML format.
- Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.
- OCR supports detection and recognition of 109 languages.
- Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.
- Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.
- Built-in CLI, FastAPI, Gradio WebUI, for local orchestration and multi-service deployment.
- Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration
- Compatible with Windows, Linux, and Mac platforms.
Quick Start
If you encounter any installation issues, please first consult the FAQ.
If the parsing results are not as expected, refer to the Known Issues.
Online Experience
Official online web application
The official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use
Gradio-based online demo
A WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required
Local Deployment
Warning
Pre-installation Notice—Hardware and Software Environment Support
To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.
In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
| Parsing Backend | pipeline | *-auto-engine | *-http-client | ||
|---|---|---|---|---|---|
| hybrid | vlm | hybrid | vlm | ||
| Backend Features | Good Compatibility | High Hardware Requirements | For OpenAI Compatible Servers2 | ||
| Accuracy1 | 86+ | 90+ | |||
| Operating System | Linux3 / Windows4 / macOS5 | ||||
| Pure CPU Support | ✅ | ❌ | ✅ | ||
| GPU Acceleration | Volta and later architecture GPUs or Apple Silicon | Not Required | |||
| Min VRAM | 4GB | 8GB | 8GB | 2GB | |
| RAM | Min 16GB, Recommended 32GB or more | Min 16GB | |||
| Disk Space | Min 20GB, SSD Recommended | Min 2GB | |||
| Python Version | 3.10-3.13 | ||||
1 Accuracy metrics are the End-to-End Evaluation Overall scores from OmniDocBench (v1.5), based on the latest version of MinerU.
2 Servers compatible with OpenAI API, such as local model servers or remote model services deployed via inference frameworks like vLLM/SGLang/LMDeploy.
3 Linux only supports distributions from 2019 and later.
4 Since the key dependency ray does not support Python 3.13 on Windows, only versions 3.10~3.12 are supported.
5 macOS requires version 14.0 or later.
Install MinerU
Install MinerU using pip or uv
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[all]"Install MinerU from source code
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[all]Tip
mineru[all] includes all core features, compatible with Windows / Linux / macOS systems, suitable for most users.
If you need to specify the inference framework for the VLM model, or only intend to install a lightweight client on an edge device, please refer to the documentation Extension Modules Installation Guide.
Deploy MinerU using Docker
MinerU provides a convenient Docker deployment method, which helps quickly set up the environment and solve some tricky environment compatibility issues. You can get the Docker Deployment Instructions in the documentation.
Using MinerU
If your device meets the GPU acceleration requirements in the table above, you can use a simple command line for document parsing:
mineru -p <input_path> -o <output_path>
If your device does not meet the GPU acceleration requirements, you can specify the backend as pipeline to run in a pure CPU environment:
mineru -p <input_path> -o <output_path> -b pipeline
mineru currently supports local PDF, image, and DOCX file or directory inputs, and can be used for document parsing through the CLI, API, WebUI, and mineru-router. For detailed instructions, please refer to the Usage Guide.
TODO
- Reading order based on the model
- Recognition of
indexandlistin the main text - Table recognition
- Heading Classification
- Handwritten Text Recognition
- Vertical Text Recognition
- Latin Accent Mark Recognition
- Code block recognition in the main text
- Chemical formula recognition(mineru.net)
- Geometric shape recognition
Known Issues
- Reading order is determined by the model based on the spatial distribution of readable content, and may be out of order in some areas under extremely complex layouts.
- Limited support for vertical text.
- Tables of contents and lists are recognized through rules, and some uncommon list formats may not be recognized.
- Code blocks are not yet supported in the layout model.
- Comic books, art albums, primary school textbooks, and exercises cannot be parsed well.
- Table recognition may result in row/column recognition errors in complex tables.
- OCR recognition may produce inaccurate characters in PDFs of lesser-known languages (e.g., diacritical marks in Latin script, easily confused characters in Arabic script).
- Some formulas may not render correctly in Markdown.
FAQ
- If you encounter any issues during usage, you can first check the FAQ for solutions.
- If your issue remains unresolved, you may also use DeepWiki to interact with an AI assistant, which can address most common problems.
- If you still cannot resolve the issue, you are welcome to join our community via Discord or WeChat to discuss with other users and developers.
All Thanks To Our Contributors
License Information
The source code in this repository is licensed under AGPLv3.
Acknowledgments
- UniMERNet
- TableStructureRec
- PaddleOCR
- PaddleOCR2Pytorch
- fast-langdetect
- pypdfium2
- pdftext
- pdfminer.six
- pypdf
- magika
- vLLM
- LMDeploy
Citation
@article{dong2026minerudiffusion, title={MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding}, author={Dong, Hejun and Niu, Junbo and Wang, Bin and Zeng, Weijun and Zhang, Wentao and He, Conghui}, journal={arXiv preprint arXiv:2603.22458}, year={2026} } @article{niu2025mineru2, title={Mineru2. 5: A decoupled vision-language model for efficient high-resolution document parsing}, author={Niu, Junbo and Liu, Zheng and Gu, Zhuangcheng and Wang, Bin and Ouyang, Linke and Zhao, Zhiyuan and Chu, Tao and He, Tianyao and Wu, Fan and Zhang, Qintong and others}, journal={arXiv preprint arXiv:2509.22186}, year={2025} } @article{wang2024mineru, title={Mineru: An open-source solution for precise document content extraction}, author={Wang, Bin and Xu, Chao and Zhao, Xiaomeng and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Xu, Rui and Liu, Kaiwen and Qu, Yuan and Shang, Fukai and others}, journal={arXiv preprint arXiv:2409.18839}, year={2024} } @article{he2024opendatalab, title={Opendatalab: Empowering general artificial intelligence with open datasets}, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua}, journal={arXiv preprint arXiv:2407.13773}, year={2024} }
Star History
Links
- MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding
- Easy Data Preparation with latest LLMs-based Operators and Pipelines
- Vis3 (OSS browser based on s3)
- LabelU (A Lightweight Multi-modal Data Annotation Tool)
- LabelLLM (An Open-source LLM Dialogue Annotation Platform)
- PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)
- OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)
- Magic-HTML (Mixed web page extraction tool)
- Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)
- Dingo: A Comprehensive AI Data Quality Evaluation Tool