Ask HN: Why is query, key, value in LLMs so hard to explain?
I have watched so many YouTube videos on this and no one seems to be able to explain it properly.
Each explanation is so dramatically different from one another as well.
I feel like its another infamously difficult to explain topic like "monads".
I am desperately waiting for a 3Blue1Brown video on transformers to hopefully resolve this ambiguity.
I am looking for a visual intuition, and something that tries to answer common questions and ambiguities that arise, and explains the history and why we do things this way.
The best approach I found currently is Serrano.Academy https://www.youtube.com/watch?v=UPtG_38Oq8o&pp=ygUUdHJhbnNmb3JtZXIgbmV0d29ya3M%3D. They try to visualize things in 2 dimensions with examples and show the linear transformations.
Karpathy had a unique way of conceptualizing it as a directed graph with a "communication phase" which further confused me.
For such a historic topic, I think we need a better explanation! Try HeduAI on youtube Already came across her. She was very good.