Settings

Theme

Open-source tool helps you convert PDF documents, web pages, etc., into Markdown

github.com

61 points by Moon_Y a year ago · 4 comments

Reader

h-jones a year ago

Anyone know how this compares to GROBID [1]? I'm looking at alternatives to GROBID as I'm not super pleased with its outputs. GROBID has a lot of great features for journal papers (reference extraction / parsing), but I'm only interested in cleanly extracting the body. Also considering nougat [2] but I haven't tried it yet.

[1] https://github.com/kermitt2/grobid

[2] https://github.com/facebookresearch/nougat

  • xk_id a year ago

    Right, I'm in a similar situation here. I'm trying to read journal papers in the terminal. Previously, I've considered using pdf2htmlEX[0] to generate a layout-preserving HTML5 + CSS version of the PDF; then rendering it in the terminal using browsh (unfortunately terminal browsers like w3m don't support HTML5 + CSS) [1]. Between nougat and MinerU, they seem like a better option.

    [0] https://pdf2htmlex.github.io/pdf2htmlEX/ [1] https://www.brow.sh/

oliverkwebb a year ago

Nice tool, I've been using html2md[1] and such. It's written in python and in beta so it's probably not the best for processing static sites and such. But still useful

[1]: https://github.com/suntong/html2md

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection