Feature Extraction with KNN
davpinto.github.ioThis reminds me the talk by Leland McInness (employed by a branch of Canadian NSA and creator of UMAP) on rethinking dimensional reduction:
Rethinking Unsupervised Learning https://youtu.be/v_jcu7rUqqE
He boiled all dimensionality reduction down to two approaches: matrix decomposition and k-nearest neighbors.
I'm seeing the newest contenders for top algorithms leaning harder into the KNN dynamics, and less into the more complex maths of UMAP. (PaCMAP[1] and LocalMAP[2], which run via an attractive-repulsive force simulation based on high dimensional KNN graph)
Based on the end of his talk, Leland seems to generally align that this is where the next innovations are: KNNs.
Some interesting provocations are coming out:
Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective (Apr 2025) https://arxiv.org/html/2410.16608v2
[1]: https://youtu.be/sD-uDZ8zXkc
[2]: https://youtu.be/vmSUQ-4LT4Y
EDIT: that last paper is not as relevant as I recall on reviewing it further. So the point is mainly the take-away from Leland's expertise in his video
I'm not sure if I'm understanding correctly, but it reminds me of the kernel trick. The distances between the training samples and a target sample are computed, the distances are scaled through a kernel function, and the scaled distances are used as features.
Fast? The author should remind us the computational complexity of calculating the kNN...
I agree, they should have said it, and I had to go looking too. but it may be that they're just leaving this to others, and talking to ppl who already think about KNN's?
http://ann-benchmarks.com/index.html
But yeah, my own research (not just that link) says HNSW's get KNN complexity down to O(log n) for search, and O(n log n) for building the whole index.