Inside the Matrix: How does matrix multiplication work inside GPUs?

「ツール」は右上に移動しました。

利用したサーバー: natural-voltaic-titanium

103いいね 2845回再生

Inside the Matrix: How does matrix multiplication work inside GPUs?

In this video, we dive into the mechanics of a GPU and learn how they perform matrix multiplication; the core computation powering deep neural networks and large language models. By the end of the video you'll learn, an efficient formulation of matrix multiplication, computing matrix multiplication with tiling and kernel fusion.

GEMM basics: docs.nvidia.com/deeplearning/performance/dl-perfor…
CUDA linear algebra: developer.nvidia.com/blog/cutlass-linear-algebra-c…
A100 specifications: developer.nvidia.com/blog/nvidia-ampere-architectu…

00:00 - Introduction
02:40 - GEMM basics
03:24 - Naive implementation of matmul
04:19 - GPU memory hierarchy
05:34 - Memory thrashing of GPUs
06:00 - Memory efficient implementation of matmul
06:33 - Matmul with tiling
08:17 - GPU execution hierarchy
09:25 - Magic of power of 2
10:15 - Tile quantization
11:14 - Kernel fusion
12:24 - Conclusion

Inside the Matrix: How does matrix multiplication work inside GPUs?

コメント