Umar Jamil
Titans: Learning to Memorize at Test Time
56:15
Umar Jamil
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
1:19:37
Umar Jamil
Umar Jamil Live Stream
Umar Jamil
Flash Attention derived and coded from first principles with Triton (Python)
7:38:18
Umar Jamil
Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation
5:46:05
Umar Jamil
ML Interpretability: feature visualization, adversarial example, interp. for language models
1:00:15
Umar Jamil
Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
1:15:39
Umar Jamil
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
48:46
Umar Jamil
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
2:15:13
Umar Jamil
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
1:14:29
Umar Jamil
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
1:26:21
Umar Jamil
Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
1:12:53
Umar Jamil
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
50:55
Umar Jamil
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
49:24
Umar Jamil
BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
54:52
Umar Jamil
Coding Stable Diffusion from scratch in PyTorch
5:03:32
Umar Jamil
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
3:04:11
Umar Jamil
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
1:10:55
Umar Jamil
Segment Anything - Model explanation with code
42:53
Umar Jamil
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
26:55
Umar Jamil
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
29:58
Umar Jamil
How diffusion models work - explanation and code!
21:12
Umar Jamil
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
27:12
Umar Jamil
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
58:04
Umar Jamil
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
2:59:24
Umar Jamil
CLIP - Paper explanation (training and inference)
14:01
Umar Jamil
Wav2Lip (generate talking avatar videos) - Paper reading and explanation
6:58