Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.
Download the Allstate Claims Severity training data file from kaggle. Classification 1. We first generate discrete label for each sample based on its...
Download the Allstate Claims Severity training data file from kaggle.
Classification
1. We first generate discrete label for each sample based on its continuous loss value. For each sample, if its loss > median of all losses, then set it to 1; otherwise set it to 0. Use the first half of the training data (rows) to train a logistic regression model to predict this discretized 0/1 loss label, and then test on the second half of the training data. What is the classification error you obtain? What is the AUC of the testing? Make a ROC curve of it.
2. Conduct 5-fold cross-validation to estimate the classification error. Is it larger or smaller than the classification error you obtain in (1)? Which one will be closer to the true classification error of your classifier, do you think?
Regression
3. Use the lm() function in R to build a multivariate regression model with all variables, namely, loss ~ . Apply your trained model to the testing data and make your first submission to Kaggle.