@EasewithData: You have explained performance tuning concepts really clearly.. thank you
Very nice explanation of the concept.Good Work Subham...
Most expected video😊 Thank you
thanks a lot! does bucketing work with hive? how bucketing should be done in case if I need to join by several columns?
Hi at around 24 yiu mentioned that number of tasks is 4 because there are 4 buckets created So in case of extremely large table also will this hold true that if the dat is organized in buckets then number of task will always be equal to number of buckets? 2. And also in case if the number of buckets are different in both the tables then task will be set as oer the more number of buckets or as oer the buckets number of large table?
No words. Thank you very much Sir !
Hi Subham you are doing a wonderful job. No words are enough to express. One thing I wanted to say is when I am deep and fully immersed in your lecture with full attention then all of a sudden 2 minutes unskip ad pops up and it really breaks the flow. Once again thank you for all the hard work. Really appreciate.
Bucket is not explained well confusing . Can you please take proper tables with columns and explain hashing also and buckets
First of all, a big kudo! Fun fact: added up the times on my cluster. Two bucket writes were 7 and 11s, unbucketed join was 40s, the bucketed was 15s. So 7+11+15 = 33 is less than 40. It looks like it pays out to bucket the data first, right?
Amazingly explained
PySpark Coding Interview Questions and Answer of Top Companies https://www.youtube.com/playlist?list=PLqGLh1jt697zXpQy8WyyDr194qoCLNg_0
Thanks so much for the video. I have a follow up question for you, can bucketing be used on High Cardinality Columns? Thanks in Advance
@23:03, the tasks showed only 4 tasks here , usually it will come's up with 16 tasks due to actual config in the cluster, but only 4 tasks is being taken due to the data is being bucketed before reading. Is it correct ?
truly an amazing video
Great work
Hi Subham, one quick question. Can we Un broadcast the broadcasted dataframe? We can Un cache the cached dataset right, in the sameway can we do un broadcasting?
high cardinality --- bucketing and low cardinality --- partition?
Bucketing can't be applied when the data resides in a Delta Lake table, right?
@easewithdata what is the painting app you use for your videos its very intuitive
@NileshPatil-z4w