Settings

Theme

Ask HN: What's the best document parsing tool/SDK that you've heard of?

1 points by voiceclonr 7 years ago · 1 comment · 1 min read


I am looking to parse various documents (docx,ppt,pdf,pst etc), extract metadata, text etc for search. I'm looking into Apache Tika - but my gut tells me a native windows tool may be better long term. Can anyone refer to tools/SDK they've used or heard to be successful ?

mindcrime 7 years ago

Tika is what we use. It's not perfect, but it works pretty well for our purposes.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection