Thank you, I spent hours searching tutorials for this model and none of them worked for me.
Exceptional video, thank you so muchâ¤
Good stuff, ty!
Thanks for awosome video. Your dependent variable ViolentCrimePerPop has skewed function. You used default loss function. I think default loss function assume normal distribution. I think negative log likelihood, which is mainly used loss function for xg classification can improve your RMSE. I wonder if you could answer my comment.
Thank you for this video. it helped me in immeasurable ways. please, how can I get the R2?
Good tutorial. Thanks for sharing. I have a question on the feature importance. How can we get Feature importance from XGBoost? Can you add that xgb.importance object for our reference?
Hi Spencer, Can be possible you can do a model whit AdaBoost algorithm for variable cuantitative continuous? is necessary transform a target variable numeric to category to apply this algorithm? thank for your conteny is wonderfull for us!
Hey Spencer! Sorry to be commenting so long after this post haha. This is very impressive, your walk through was so much more in depth than other xgboost for regression guide I've found. I've got a question for you: I have 14 predictor variables, most are binary, some are continuous, and then there are three categorical. I'm worried about one of those categorical variables because it has 14 levels. Will including that variable (after I've changed all categorical variables to numeric) mess anything up with the model? I was thinking I should just not include it, but it looks like your data has multiple categorical variables that also have more than a few levels. For background, I'm only using the xgboost decision tree for variable selection and an insight into variable importance. I will be plugging in the recommended variables into a LR model for interpretability purposes. Let me know what you think! Great content! I'm glad I found your page!
Fantestic video. BTW, how to calculate the AUC of this model?
Hey Spencer, Thanks a lot for the video. Really liked it. I wanted to point out though, you encoded 'county' and 'state' column as numeric in your data pre-processing stage. This seems like an incorrect way to encode these columns as XGBoost will see it as ordinal values rather than nominal data type. This can result in a brittle model and over-fitting easily. Hope this helps. Please keep creating more content, much appreciate the work!
Hi Spencer Pao: Thank you for your video. i have two questions. 1. how could I know which independent attributes are important in the regression? 2. Why other people use the following coding for GBDT. why yours are very different than theirs? bst_model <- xgb.train(params = xgb_params, data = train_matrix, nrounds = 1000, watchlist = watchlist, eta = 0.001, max.depth = 6, gamma = 0, subsample = 1, colsample_bytree = 1, missing = NA).
Hi, i tried running your code, however when i ran the xgb_tune i got this error "Error: Please make sure that the outcome column is a factor or numeric . The class(es) of the column: 'tbl_df', 'tbl', 'data.frame'. what do i do know ?
Can you also show how to get Gini for the model?
mark
@navronaman