Ask HN: Is there no good OCR available?

2 points by leokster 2 years ago · 3 comments · 1 min read

I'm wondering if tools like Tesseract are still the open-source (and offline) gold standard. There are, in the meantime, document intelligence services from all large cloud providers, but there is still not really a usable AI model that is capable of doing good OCR (image, not necessarily scans -> text). Do you know any active projects or resources in that field?

latexr 2 years ago

Apple’s operating systems have been doing stellar OCR since 2019. When the feature was announced I was uninterested, but now I’m surprised how much I use it. It works without any extra work in Preview, Safari, and other apps. You can call it programatically via Shortcuts or the Vision APIs.

https://developer.apple.com/documentation/vision/recognizing...

solardev 2 years ago

(Edit: Nevermind, sorry. I misread your question. I think you're mainly interested in free offline apps.)

Does it have to be an "AI" model in the modern usage of it (LLMs, etc.?)

In the past, I found Google's Cloud Vision API to be pretty good for this sort of thing (images in text): https://cloud.google.com/vision?hl=en#demo

AFAIK Tesseract was never state of the art, it was just free and cheap. The commercial offerings (in my limited experience) were usually much more accurate.

verdverm 2 years ago

Second Google's offering which can reasonably read my chicken scratch

Settings

Ask HN: Is there no good OCR available?

Keyboard Shortcuts