Let's train a PyTorch model on multiple B200 GPUs (multi-GPU training)
william falcon
Let's train a PyTorch model on multiple B200 GPUs (multi-GPU training)
9:28
Let's deploy a custom AI model container as an autoscaling API on your private cloud in 10 minutes
william falcon
Let's deploy a custom AI model container as an autoscaling API on your private cloud in 10 minutes
8:36
Let's finetune and deploy DeepSeek R1 (8B) for under $10
william falcon
Let's finetune and deploy DeepSeek R1 (8B) for under $10
9:52
Let's code on cloud GPUs with VSCode and Jupyter notebooks
william falcon
Let's code on cloud GPUs with VSCode and Jupyter notebooks
21:53
Let's pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.
william falcon
Let's pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.
1:31:01
I explain Fully Sharded Data Parallel (FSDP) and pipeline parallelism in 3D with Vision Pro
william falcon
I explain Fully Sharded Data Parallel (FSDP) and pipeline parallelism in 3D with Vision Pro
18:11
Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs 🤯🤯
william falcon
Round 2 - I use CodeLlama 70B vs Mixtral MoE to write code to finetune a model on 16 GPUs 🤯🤯
29:02
Round 1 - Codellama70B vs Mixtral MoE vs Mistral 7B for coding
william falcon
Round 1 - Codellama70B vs Mixtral MoE vs Mistral 7B for coding
37:02