Tesseract's WASM file too big to process

2 min read Original article ↗

Describe the problem and steps to reproduce it:

Hello, I'm writing the what-to-click extension. I've added the tesseract.js library locally for OCR functionality, which works fine, but it caused Firefox Addons linter to fail.

What happened?

Tesseract uses WebAssembly to speed up the process of analysing images. This comes with a file size overhead, which is so great, that I can no longer upload my extension to the Developer Hub:

image

image

The linter suggestion is valid, I would very much like to split the file into smaller ones (the enourmous filesize comes from a blob included in it), but I don't see a way of doing this because of the way the file is handled -- it's automatically loaded by tesseract, not the extension code, so import/export directives doesn't work and I also doesn't have the browser object available. The blob is also critical to be included in the file as because of this issue, importScripts is not available.

What did you expect to happen?

I expected to be able to submit the next version of my extension as a Firefox addon.

Anything else we should know?

The simplest solution would be to bump the singular file size limit to 5MB, as the problematic file is 4.8MB big, and such limit bump shouldn't cause overload on linter servers. However, if you see any option to reduce the filesize by any means I'm certinly open to it.

┆Issue is synchronized with this Jira Task