william falcon

Let's train a PyTorch model on multiple B200 GPUs (multi-GPU training)

9:28

william falcon

Let's deploy a custom AI model container as an autoscaling API on your private cloud in 10 minutes

8:36

william falcon

Let's finetune and deploy DeepSeek R1 (8B) for under $10

9:52

william falcon

Let's code on cloud GPUs with VSCode and Jupyter notebooks

21:53

william falcon

Let's pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.

1:31:01

william falcon

I explain Fully Sharded Data Parallel (FSDP) and pipeline parallelism in 3D with Vision Pro

18:11

william falcon

Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs 🤯🤯

29:02

william falcon

Round 1 - Codellama70B vs Mixtral MoE vs Mistral 7B for coding

37:02