@NileshPatil-z4w

very nice , so far best vid for beginners on join

@dr8idba

@EasewithData: You have explained performance tuning concepts really clearly.. thank you

@AzureDataGeek

Very nice explanation of the concept.Good Work Subham...

@sureshraina321

Most expected video😊
Thank you

@evgeniy7069

thanks a lot! 
does bucketing work with hive?
how bucketing should be done in case if I need to join by several columns?

@akshaygupta013

Hi at around 24 yiu mentioned that number of tasks is 4 because there are 4 buckets created So in case of extremely large table also will this hold true that if the dat is organized in buckets then number of task will always be equal to number of buckets?
2. And also in case if the number of buckets are different in both the tables then task will be set as oer the more number of buckets or as oer the buckets number of large table?

@eyob7269

No words. Thank you very much Sir !

@Sayon____bhattacharjee

Hi Subham you are doing a wonderful job. No words are enough to express. One thing I wanted to say is when I am deep and fully immersed in your lecture with full attention then all of a sudden 2 minutes unskip ad pops up and it really breaks the flow. Once again thank you for all the hard work. Really appreciate.

@karansingh-fk4gh

Bucket is not explained well confusing . Can you please take proper tables with columns and explain hashing also and buckets

@adulterrier

First of all, a big kudo! 
Fun fact: added up the times on my cluster. Two bucket writes were 7 and 11s, unbucketed join was 40s, the bucketed was 15s. So 7+11+15 = 33 is less than 40. It looks like it pays out to bucket the data first, right?

@chetanphalak7192

Amazingly explained

@DEwithDhairy

PySpark Coding Interview Questions and Answer of Top Companies 

https://www.youtube.com/playlist?list=PLqGLh1jt697zXpQy8WyyDr194qoCLNg_0

@dasaratimadanagopalan-rf9ow

Thanks so much for the video. I have a follow up question for you, can bucketing be used on High Cardinality Columns? Thanks in Advance

@Aravind-gz3gx

@23:03, the tasks showed only 4 tasks here , usually it will come's up with 16 tasks due to actual config in the cluster, but only 4 tasks is being taken due to the data is being bucketed before reading. Is it correct ?

@anuragdwivedi1804

truly an amazing video

@Basket-hb5jc

Great work

@NiteeshKumarPinjala

Hi Subham, one quick question. 
Can we Un broadcast the broadcasted dataframe? We can Un cache the cached dataset right, in the sameway can we do un broadcasting?

@avinash7003

high cardinality --- bucketing and low cardinality --- partition?

@keen8five

Bucketing can't be applied when the data resides in a Delta Lake table, right?

@ms_datamaster9341

@easewithdata what is the painting app you use for your videos its very intuitive