Chardetng: A More Compact Character Encoding Detector for the Legacy Web
hsivonen.fiAbout a year ago I had a webpage which was interpreted in the wrong encoding and was taken aback that Chrome no longer allows you to override a pages encoding.
I think it’s interesting how far we have come with UTF-8 adoption that it was the first time I had reached for said menu in probably nearly a decade.
Fantastic write up.
I regularly use the Python port of the original chardet (https://pypi.org/project/chardet/). In fact, most python devs do since it comes with requests.
This post is full of gems. E.G: I learned that it's important for your meta charset to be in the first 1024 bytes of your HTML :)
FWIW Firefox issues a warning if it finds your charset declaration late, outside the 1024. Long copyright or license headers can cause this problem, annoyingly.
This is super cool and interesting. Great write-up, thanks.