@sriramr3045

Your explanation on query, key and value is bang on! Very well done.

@arunsrinivasa8643

Your explanation is excellent. One item is that you use the word " distance" for dot product. It is better to say " dot product is a measur of how much correlation is there between two words...ie  how much information does one word have about another. The aim of the transformer is to " deform" the embeddings so that they reflect the actual correlations. Otherwise this was by far the best explanation

@akshatpandey8571

Hi bhai! This is some next level work that you have pulled off. Like the part of wq,wk,wv is incredible.the motivation you set before introducing it is outstanding!

@patrickb277

Single-handedly the best explanation on what Q,K and V are. Amazing work!

@PrashanthVangari-p6h

I have watched around 5 videos on Self Attention but this is the best video. Other videos just explains the formula and matrix calculations.

@unstbl

2:38 The issue wasn't just that RNNs used word embeddings, but that they used static/non contextual  word embeddings, unlike Transformers, which use contextual embeddings that adapt based on surrounding words.

@ravi8908

This is the best intuition and explanation for query, key and values. Most other videos were highly sub-optimal in this regard.

@vigneshwars7483

thank you so much,your explanation is very clear , understandable , thank you once again

@RobertDonaldDuck

One of the very best explanations I have ever seen in my life.

@kheersagarpatel3195

i watched lots of video on this topic,but fully grasped it now.please make more...

@sumeet_skr761

wow!! this is the video where i finally got the intuition behind Q,K and V matrices. Thank you!

@sourabhhegde

That was awesome, hardly anyone breaks it down this clearly. Thanks a lot!

@kumarjitgupta4160

Seriously you have worked a lot on this. Hats off. Keep up...

@Tofipie

Amazing your a genius! I Havent watched such a clear and simple explanation

@tsuyaosone1535

Commenting cuz i want you to know the significance and the impact you have made. I was trying looking through videos explaining self attention and I had the best understanding after watching StatQuest's before I watched yours. I understood the intuition behind multiplying Q and K is for calculating similarity but I just couldn't grasp why the V matrix exist and the purpose of it. This idea has been made clear to me with this video of yours. I didn't watch your transformers series sequentially( pun-intended xD ) as you intended but I have gone through plenty, will eventually finish all of it and I have to say you have done an amazing work. I first got to know your channel and I came across your video cuz I needed to understand what residual connections are doing in LLM. I hope this comment mean something to you and motivate you to keep doing the great work! :D

@purposeoriented6094

Finally. Understood attention clearly. Thank you

@Johad-z5j

clear cut explanation!! couldn't ask for more

@Jeevan_prakash

One of the best explanation's of Self Attention

@rufaidakashif8260

Brilliant Work Jay! Excellent explanation

@madhuryaramesh4210

Amazing video. I was stuck with this concept, but you made it very easy to understand.