Mutually Assured Recursion
kylehovey.github.io> The researchers were able to perform text classification using only text compression and a clustering algorithm (kNN).
Wasnt this result disproved, because the positive results were due to a bad kNN implementation? I recall reading something like this but can't recover the exact post/article...
Interesting, I'd love to see a link to that if you know of it. Here's the original paper: https://aclanthology.org/2023.findings-acl.426.pdf In my own work I've successfully classified emergent behavior in Cellular Automata using a similar technique, and the technique has also been used elsewhere with success: https://www.nature.com/articles/s41598-022-12826-w
This took me an unreasonable amount of time to find, but here it is
https://kenschutte.com/gzip-knn-paper2/
The moral: the methodology is cool, but implementation details matter, i guess...
Thank you for this, I appreciate it! That's unfortunate to hear. I may have to swap out the example I used in this article, and maybe also include a note that this technique has limitations. I think that using compression/Kolmogorov complexity metrics for classification is a fruitful endeavor and that the philosophy of groups like the Hutter Prize are sound, but the kNN + gzip example looks like it has some problems with it.
For anyone else following along, I think the GitHub Issue discussion on the paper's repo is really interesting: https://github.com/bazingagin/npc_gzip/issues/3