md-pdf-md

Bidirectional Markdown ↔ PDF converter with AI-powered vision

Convert Markdown to beautiful PDFs AND extract Markdown from PDFs using local AI vision. Zero configuration, completely private, and open source.

✨ Features

Markdown → PDF

🎨 4 Beautiful Themes - GitHub, GitHub Dark, Academic, Minimal
💎 VS Code Syntax Highlighting - Powered by Shiki
📄 Smart Page Breaks - No orphaned headings or broken code blocks
📊 Auto Table of Contents - With page numbers
🚀 2-3 Second Generation - Fast and efficient
⚙️ Zero Configuration - Works out of the box

PDF → Markdown (NEW!)

🤖 AI-Powered Vision - Uses LLaVA to understand document structure
🔒 100% Private - Runs locally via Ollama (no cloud APIs)
📝 Structure Preservation - Maintains headings, lists, code blocks, tables
💰 Free Forever - No API costs, completely open source

🚀 Quick Start

# Install
npm install -g md-pdf-md

# Convert Markdown to PDF
md-pdf-md README.md

# Convert PDF to Markdown (requires Ollama + LLaVA)
md-pdf-md document.pdf

That's it! The tool auto-detects file type and converts appropriately.

📦 Installation

Basic (MD→PDF only)

Full Setup (MD↔PDF bidirectional)

# 1. Install the package
npm install -g md-pdf-md

# 2. Install Ollama (for PDF→MD)
# Visit: https://ollama.ai

# 3. Pull LLaVA model (~4.7GB)
ollama pull llava

# 4. Verify setup
md-pdf-md check

💡 Usage

Smart Auto-Detection

# Just pass any file!
md-pdf-md README.md        # → Converts to PDF
md-pdf-md document.pdf     # → Converts to Markdown
md-pdf-md slides.md --theme github-dark

With Options

# Markdown to PDF
md-pdf-md docs.md -o output.pdf --theme academic --format Letter

# PDF to Markdown
md-pdf-md report.pdf -o report.md --model llava --quality 300

Explicit Commands (for power users)

md-pdf-md md2pdf input.md       # Explicit MD→PDF
md-pdf-md pdf2md input.pdf      # Explicit PDF→MD
md-pdf-md themes                # List available themes
md-pdf-md check                 # Verify Ollama setup

🎨 Themes

Theme	Description	Best For
`github`	Clean light theme	General docs
`github-dark`	Dark with syntax highlighting	Code-heavy docs
`academic`	Formal serif fonts	Papers & reports
`minimal`	Simple & clean	Minimalist design

Preview: md-pdf-md themes

🔧 Options

Markdown → PDF

-o, --output <path>          Output PDF path
-t, --theme <name>           Theme (default: github)
--toc / --no-toc             Table of contents (default: true)
--page-numbers               Page numbers (default: true)
-f, --format <format>        A4, Letter, or Legal (default: A4)
--css <path>                 Custom CSS file
--highlight-theme <theme>    Syntax highlight theme

PDF → Markdown

-o, --output <path>          Output markdown path
-m, --model <name>           Ollama model (default: llava)
--host <url>                 Ollama server URL
-q, --quality <dpi>          Image quality (default: 200)
--debug                      Debug mode

📝 Programmatic API

import { convertMarkdownToPdf, convertPdfToMarkdown } from 'md-pdf-md';

// Markdown → PDF
const result = await convertMarkdownToPdf({
  input: 'README.md',
  output: 'README.pdf',
  theme: 'github-dark',
  toc: true,
  pageNumbers: true
});

// PDF → Markdown (with progress)
const result = await convertPdfToMarkdown({
  input: 'document.pdf',
  output: 'document.md',
  model: 'llava'
}, (progress) => {
  console.log(`Page ${progress.currentPage}/${progress.totalPages}`);
});

🤖 How PDF→MD Works

Traditional PDF extractors just dump text blindly. md-pdf-md uses LLaVA vision AI to:

Understand structure - Identifies H1, H2, H3 correctly
Preserve formatting - Maintains lists, code blocks, tables
Detect code - Recognizes programming languages
Keep hierarchy - Preserves document organization

All processing happens locally on your machine - no cloud APIs, no data leaving your computer.

🆚 Comparison

Feature	md-pdf-md	pandoc	md-to-pdf	pdf2md
MD→PDF Beautiful	✅	⚠️ Complex	⚠️ Basic	❌
PDF→MD AI-powered	✅	❌	❌	⚠️ Poor
Zero config	✅	❌	❌	✅
100% Private	✅	✅	✅	✅
Free	✅	✅	✅	✅

💡 Use Cases

Developers: Beautiful README PDFs with syntax highlighting

md-pdf-md README.md --theme github-dark

Enterprises: Professional reports and documentation

md-pdf-md quarterly-report.md --theme academic --format Letter

Writers: Edit PDFs by converting to Markdown

md-pdf-md document.pdf    # Edit the .md, then convert back!
md-pdf-md document.md

Students: Format papers and extract notes from PDFs

md-pdf-md thesis.md --theme academic
md-pdf-md lecture-slides.pdf

🐛 Troubleshooting

"Ollama is not running"

ollama serve              # Start Ollama
ollama pull llava         # Install model
md-pdf-md check           # Verify

Poor PDF→MD results

md-pdf-md doc.pdf --quality 300              # Higher quality
md-pdf-md doc.pdf --model llama3.2-vision    # Different model
md-pdf-md doc.pdf --debug                     # Debug mode

Memory issues

NODE_OPTIONS="--max-old-space-size=4096" md-pdf-md large.pdf

📊 Performance

MD→PDF: 2-3 seconds for typical documents PDF→MD: ~5-10 seconds per page (CPU), ~2-5 seconds (GPU) Accuracy: 90%+ structure preservation

🛠️ Requirements

Node.js ≥ 16.0.0
Ollama (PDF→MD only) - ollama.ai
LLaVA model (PDF→MD only) - ollama pull llava

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details.

🙏 Built With

Puppeteer - PDF generation
Ollama - Local AI runtime
LLaVA - Vision language model
Shiki - Syntax highlighting
Marked - Markdown parsing

Made with ❤️ by josharsh

⭐ Star this repo if you find it useful!