Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
658いいね 12,951 views回再生

Keys, Queries, and Values: The celestial mechanics of attention

The attention mechanism is what makes Large Language Models like ChatGPT or DeepSeek talk well. But how does it work? One can see it as a mechanism that uses similarity to figure out what parts of the text to pay more or less attention to. For this, we use word embeddings.
I like to see word embeddings as words flying around in the universe, like planets and stars. In this case, the attention mechanism (the Keys, Queries, and Values matrices) define the fabric of this universe, and the laws of gravity, that resemble (yet in some ways are very different) to the laws of gravity that rule our universe.
Come join me in this celestial adventure in the universe of language!

See other videos in this LLM series
The attention mechanism in LLMs:    • The Attention Mechanism in Large Language ...  
The math behind attention mechanisms:    • The math behind Attention: Keys, Queries, ...  
Transformer models:    • What are Transformer Models and how do the...  

Get the Grokking Machine Learning book!
https://manning.com/books/grokking-ma...
Discount code (40%): serranoyt
(Use the discount code on checkout)

01:55 Similarity
02:12 Embeddings
04:56 Attention
07:14 Dot product
09:29 Cosine similarity
11:10 The Keys and Queries matrices
14:19 Compressing and stretching dimensions
18:50 Combining dimensions
23:14 Asymmetric pull
40:57 Multi-head attention
45:14 The Value matrix
49:24 Summary

コメント