Settings

Theme

Show HN: Side-by-side PDF parser comparison for RAG pipelines

github.com

2 points by 2dogsanerd 17 days ago · 1 comment · 1 min read

Reader

A simple tool to compare how different PDF parsers handle your documents.

Shows naive parsing (pypdf) vs layout-aware parsing (Docling) side-by-side.

Helps spot issues with scans, tables, and multi-column layouts before theycause problems in your RAG system.

Parsers are easy to swap if you want to try alternatives.

final_aeon 17 days ago

Poppler is also good

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection