Your explanation is excellent. One item is that you use the word " distance" for dot product. It is better to say " dot product is a measur of how much correlation is there between two words...ie how much information does one word have about another. The aim of the transformer is to " deform" the embeddings so that they reflect the actual correlations. Otherwise this was by far the best explanation
Hi bhai! This is some next level work that you have pulled off. Like the part of wq,wk,wv is incredible.the motivation you set before introducing it is outstanding!
Single-handedly the best explanation on what Q,K and V are. Amazing work!
I have watched around 5 videos on Self Attention but this is the best video. Other videos just explains the formula and matrix calculations.
2:38 The issue wasn't just that RNNs used word embeddings, but that they used static/non contextual word embeddings, unlike Transformers, which use contextual embeddings that adapt based on surrounding words.
This is the best intuition and explanation for query, key and values. Most other videos were highly sub-optimal in this regard.
thank you so much,your explanation is very clear , understandable , thank you once again
One of the very best explanations I have ever seen in my life.
i watched lots of video on this topic,but fully grasped it now.please make more...
wow!! this is the video where i finally got the intuition behind Q,K and V matrices. Thank you!
That was awesome, hardly anyone breaks it down this clearly. Thanks a lot!
Seriously you have worked a lot on this. Hats off. Keep up...
Amazing your a genius! I Havent watched such a clear and simple explanation
Commenting cuz i want you to know the significance and the impact you have made. I was trying looking through videos explaining self attention and I had the best understanding after watching StatQuest's before I watched yours. I understood the intuition behind multiplying Q and K is for calculating similarity but I just couldn't grasp why the V matrix exist and the purpose of it. This idea has been made clear to me with this video of yours. I didn't watch your transformers series sequentially( pun-intended xD ) as you intended but I have gone through plenty, will eventually finish all of it and I have to say you have done an amazing work. I first got to know your channel and I came across your video cuz I needed to understand what residual connections are doing in LLM. I hope this comment mean something to you and motivate you to keep doing the great work! :D
Finally. Understood attention clearly. Thank you
clear cut explanation!! couldn't ask for more
One of the best explanation's of Self Attention
Brilliant Work Jay! Excellent explanation
Amazing video. I was stuck with this concept, but you made it very easy to understand.
@sriramr3045