Ask HN: What's the best document parsing tool/SDK that you've heard of?
I am looking to parse various documents (docx,ppt,pdf,pst etc), extract metadata, text etc for search. I'm looking into Apache Tika - but my gut tells me a native windows tool may be better long term. Can anyone refer to tools/SDK they've used or heard to be successful ? Tika is what we use. It's not perfect, but it works pretty well for our purposes.