@outliier

Since these videos take an enormous amount of time (this one took about 300 hours), would you like to see, additionally, paper explanations in the style of Yannic Kilcher (https://www.youtube.com/@YannicKilcher) ? I could cover papers very quickly after they are released and also cover topics I wouldn’t do an animated video for. Let me know what you think :)

@olivercaufield669

Best intro to Diffusion ever.
I bet this video is better than 80% of the introductory courses in college.

@amimaster

Wow, finally someone that makes evident EVERY math step, as simple as it might. In my career I rarely found someone that can make math so accessible, I often get lost after the first one or two "implied" steps.

@Cyan-g2g

Wow! I did not expect this video to go this deep. But this is awesome! Please make more in depth explanation like this. It’s clear a lot of hard work went into it and the animation is sooo elegant

@pavanpreetgandhi6763

This video was absolutely fantastic—I feel like I’ve finally learned about diffusion models the right way! I really appreciated how you started from the basics, gradually building up concepts and intuition, while clearly explaining the math at every step. It took me a few hours to get through the entire video, but the length and pace were perfect—there’s nothing I would change. Everything was covered so thoroughly. Thank you for the effort you put into this, and I’m excited to see more videos from you in the future!

@huytruonguic

love your mathematics explanation and visualization, no fancy transitions were needed, just slow, simple, and clear english phrases

@venkatbalachandra5965

I absolutely love how you started from scratch, as in what the underlying PDF was. I'm working on a project on diffusion models and I don't know anything about it, and all the resources available are catered towards those with prerequisites I don't have yet, until this one. I haven't yet watched the whole thing, but I'm going to keep coming back to this till I understand everything in this video. Cheers mate!

@novantha1

Your videos are somehow simultaneously timely and timeless. Your content is absolutely appreciated and I wish you the best in your endeavors.

@arpanpoudel

I used Score-SDE in my thesis and I have my defense next week :D what a timing

@kirill2848

For people wondering why it's ok to use the first order the taylor expansion at 32:26

The full expansion will be: 1− (1/2) β_t  - (1/8) (β_t)^2 ....
So if β_t is small then all the future terms are much smaller and become almost 0. The range from the original DDPM paper was 10^-4 to 0.02, so the first order approximation dominates

@JieXia-n4d

This is my first time commenting, and I just want to say this is the best diffusion model video I've ever seen. Thank you so much!

@UmbrabbitMagnolia

I have watched this video for three times,  may watch this video again. Thank you.

@nocomment000

Thank you so much for walking thru the full derivation.  It is a huge help in understanding the score matching objective.

@jimlbeaver

It was like a lifetime of mathematics in 38 minutes!  You clearly having understanding of it in a talent for explaining it. I’d love to see you keep going on the diffusion stuff. Seems like a really important technology and the more people that understand it and use it the better.

Thanks great job.

@shivamshukla3374

well explained video, shut out to your hardwork man, you are doing fabulous work, keep it up definely we want more videos on diffusion models like this explaining the in depth concepts.

@chocobelly

The mathematical derivation and explanation is such a lifesaver, I also never really understood the underlying meaning when reading the diffusion models but now everything clicked. Thank you so much for the videos, really enjoyed it. Please make more of such videos. Liked and subscribed : ).

@mrp8686

Hey, I really enjoyed the video and learned a lot! One small comment: at 8:10 you state that we aim to minimize the both summands and, therefore, we want the gradient of s to be 0 at data points. From my understanding, we want to maximize p(x) at data points x. Therefore, the gradient of p(x) should be 0 at x (which minimizes the first summand) and the gradient of s (which corresponds to the Hessian of log(p)) should be negative actually.

@איילתדמור

Amazing video, thank you. I learned most of it a year ago in university but this was a great refresher which also provided me with new insights to some of the stuff. I really liked the conclusion of the Denoising Score Matching part, very beautiful.

@angtrinh6495

Beautiful and clear explanation :medal-yellow-first-red::medal-yellow-first-red::medal-yellow-first-red:. Thank you, sir!

@outliier

32:38 To correct myself here, the paper gives explanation how to derive the sampler. I personally just find that approach much harder to understand and generally the papers don’t go into too much details for their derivations.