Titans: Learning to Memorize at Test Time
Umar Jamil
Titans: Learning to Memorize at Test Time
56:15
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Umar Jamil
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
1:19:37
Umar Jamil Live Stream
Umar Jamil
Umar Jamil Live Stream
Flash Attention derived and coded from first principles with Triton (Python)
Umar Jamil
Flash Attention derived and coded from first principles with Triton (Python)
7:38:18
Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation
Umar Jamil
Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation
5:46:05
ML Interpretability: feature visualization, adversarial example, interp. for language models
Umar Jamil
ML Interpretability: feature visualization, adversarial example, interp. for language models
1:00:15
Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
Umar Jamil
Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
1:15:39
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
Umar Jamil
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
48:46
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
Umar Jamil
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
2:15:13
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Umar Jamil
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
1:14:29
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
Umar Jamil
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
1:26:21
Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
Umar Jamil
Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
1:12:53
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
Umar Jamil
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
50:55
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
Umar Jamil
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
49:24
BERT explained: Training, Inference,  BERT vs GPT/LLamA, Fine tuning, [CLS] token
Umar Jamil
BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
54:52
Coding Stable Diffusion from scratch in PyTorch
Umar Jamil
Coding Stable Diffusion from scratch in PyTorch
5:03:32
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
Umar Jamil
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
3:04:11
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
Umar Jamil
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
1:10:55
Segment Anything - Model explanation with code
Umar Jamil
Segment Anything - Model explanation with code
42:53
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
Umar Jamil
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
26:55
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
Umar Jamil
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
29:58
How diffusion models work - explanation and code!
Umar Jamil
How diffusion models work - explanation and code!
21:12
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
Umar Jamil
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
27:12
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Umar Jamil
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
58:04
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
Umar Jamil
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
2:59:24
CLIP - Paper explanation (training and inference)
Umar Jamil
CLIP - Paper explanation (training and inference)
14:01
Wav2Lip (generate talking avatar videos) - Paper reading and explanation
Umar Jamil
Wav2Lip (generate talking avatar videos) - Paper reading and explanation
6:58