md-pdf-md
Bidirectional Markdown ↔ PDF converter with AI-powered vision
Convert Markdown to beautiful PDFs AND extract Markdown from PDFs using local AI vision. Zero configuration, completely private, and open source.
✨ Features
Markdown → PDF
- 🎨 4 Beautiful Themes - GitHub, GitHub Dark, Academic, Minimal
- 💎 VS Code Syntax Highlighting - Powered by Shiki
- 📄 Smart Page Breaks - No orphaned headings or broken code blocks
- 📊 Auto Table of Contents - With page numbers
- 🚀 2-3 Second Generation - Fast and efficient
- ⚙️ Zero Configuration - Works out of the box
PDF → Markdown (NEW!)
- 🤖 AI-Powered Vision - Uses LLaVA to understand document structure
- 🔒 100% Private - Runs locally via Ollama (no cloud APIs)
- 📝 Structure Preservation - Maintains headings, lists, code blocks, tables
- 💰 Free Forever - No API costs, completely open source
🚀 Quick Start
# Install npm install -g md-pdf-md # Convert Markdown to PDF md-pdf-md README.md # Convert PDF to Markdown (requires Ollama + LLaVA) md-pdf-md document.pdf
That's it! The tool auto-detects file type and converts appropriately.
📦 Installation
Basic (MD→PDF only)
Full Setup (MD↔PDF bidirectional)
# 1. Install the package npm install -g md-pdf-md # 2. Install Ollama (for PDF→MD) # Visit: https://ollama.ai # 3. Pull LLaVA model (~4.7GB) ollama pull llava # 4. Verify setup md-pdf-md check
💡 Usage
Smart Auto-Detection
# Just pass any file! md-pdf-md README.md # → Converts to PDF md-pdf-md document.pdf # → Converts to Markdown md-pdf-md slides.md --theme github-dark
With Options
# Markdown to PDF md-pdf-md docs.md -o output.pdf --theme academic --format Letter # PDF to Markdown md-pdf-md report.pdf -o report.md --model llava --quality 300
Explicit Commands (for power users)
md-pdf-md md2pdf input.md # Explicit MD→PDF md-pdf-md pdf2md input.pdf # Explicit PDF→MD md-pdf-md themes # List available themes md-pdf-md check # Verify Ollama setup
🎨 Themes
| Theme | Description | Best For |
|---|---|---|
github |
Clean light theme | General docs |
github-dark |
Dark with syntax highlighting | Code-heavy docs |
academic |
Formal serif fonts | Papers & reports |
minimal |
Simple & clean | Minimalist design |
Preview: md-pdf-md themes
🔧 Options
Markdown → PDF
-o, --output <path> Output PDF path -t, --theme <name> Theme (default: github) --toc / --no-toc Table of contents (default: true) --page-numbers Page numbers (default: true) -f, --format <format> A4, Letter, or Legal (default: A4) --css <path> Custom CSS file --highlight-theme <theme> Syntax highlight theme
PDF → Markdown
-o, --output <path> Output markdown path -m, --model <name> Ollama model (default: llava) --host <url> Ollama server URL -q, --quality <dpi> Image quality (default: 200) --debug Debug mode
📝 Programmatic API
import { convertMarkdownToPdf, convertPdfToMarkdown } from 'md-pdf-md'; // Markdown → PDF const result = await convertMarkdownToPdf({ input: 'README.md', output: 'README.pdf', theme: 'github-dark', toc: true, pageNumbers: true }); // PDF → Markdown (with progress) const result = await convertPdfToMarkdown({ input: 'document.pdf', output: 'document.md', model: 'llava' }, (progress) => { console.log(`Page ${progress.currentPage}/${progress.totalPages}`); });
🤖 How PDF→MD Works
Traditional PDF extractors just dump text blindly. md-pdf-md uses LLaVA vision AI to:
- Understand structure - Identifies H1, H2, H3 correctly
- Preserve formatting - Maintains lists, code blocks, tables
- Detect code - Recognizes programming languages
- Keep hierarchy - Preserves document organization
All processing happens locally on your machine - no cloud APIs, no data leaving your computer.
🆚 Comparison
| Feature | md-pdf-md | pandoc | md-to-pdf | pdf2md |
|---|---|---|---|---|
| MD→PDF Beautiful | ✅ | ❌ | ||
| PDF→MD AI-powered | ✅ | ❌ | ❌ | |
| Zero config | ✅ | ❌ | ❌ | ✅ |
| 100% Private | ✅ | ✅ | ✅ | ✅ |
| Free | ✅ | ✅ | ✅ | ✅ |
💡 Use Cases
Developers: Beautiful README PDFs with syntax highlighting
md-pdf-md README.md --theme github-dark
Enterprises: Professional reports and documentation
md-pdf-md quarterly-report.md --theme academic --format Letter
Writers: Edit PDFs by converting to Markdown
md-pdf-md document.pdf # Edit the .md, then convert back!
md-pdf-md document.mdStudents: Format papers and extract notes from PDFs
md-pdf-md thesis.md --theme academic md-pdf-md lecture-slides.pdf
🐛 Troubleshooting
"Ollama is not running"
ollama serve # Start Ollama ollama pull llava # Install model md-pdf-md check # Verify
Poor PDF→MD results
md-pdf-md doc.pdf --quality 300 # Higher quality md-pdf-md doc.pdf --model llama3.2-vision # Different model md-pdf-md doc.pdf --debug # Debug mode
Memory issues
NODE_OPTIONS="--max-old-space-size=4096" md-pdf-md large.pdf📊 Performance
MD→PDF: 2-3 seconds for typical documents PDF→MD: ~5-10 seconds per page (CPU), ~2-5 seconds (GPU) Accuracy: 90%+ structure preservation
🛠️ Requirements
- Node.js ≥ 16.0.0
- Ollama (PDF→MD only) - ollama.ai
-
LLaVA model (PDF→MD only) -
ollama pull llava
🤝 Contributing
Contributions welcome! Please feel free to submit a Pull Request.
📄 License
MIT License - see LICENSE file for details.
🙏 Built With
- Puppeteer - PDF generation
- Ollama - Local AI runtime
- LLaVA - Vision language model
- Shiki - Syntax highlighting
- Marked - Markdown parsing
Made with ❤️ by josharsh
⭐ Star this repo if you find it useful!