Ask HN: Is there no good OCR available?
I'm wondering if tools like Tesseract are still the open-source (and offline) gold standard. There are, in the meantime, document intelligence services from all large cloud providers, but there is still not really a usable AI model that is capable of doing good OCR (image, not necessarily scans -> text). Do you know any active projects or resources in that field? Apple’s operating systems have been doing stellar OCR since 2019. When the feature was announced I was uninterested, but now I’m surprised how much I use it. It works without any extra work in Preview, Safari, and other apps. You can call it programatically via Shortcuts or the Vision APIs. https://developer.apple.com/documentation/vision/recognizing... (Edit: Nevermind, sorry. I misread your question. I think you're mainly interested in free offline apps.) Does it have to be an "AI" model in the modern usage of it (LLMs, etc.?) In the past, I found Google's Cloud Vision API to be pretty good for this sort of thing (images in text): https://cloud.google.com/vision?hl=en#demo AFAIK Tesseract was never state of the art, it was just free and cheap. The commercial offerings (in my limited experience) were usually much more accurate. Second Google's offering which can reasonably read my chicken scratch