6.4
Predicting Prices of Used Cars. The dataset mlba::Toyota Corolla contains data on
used cars (Toyota Corolla) on sale during late summer of 2004 in the Netherlands. It
has 1436 records containing details on 38 variables, including Price, Age, Kilometers,
HP, and other specifications. The goal is to predict the price of a used Toyota Corolla
based on its specifications. (The example in Section 6.3 is a subset of this dataset.)
Split the data into training (60%) and holdout (40%) datasets.
Run a multiple linear regression with the outcome variable Price and pre-
dictor variables Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quar-
terly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player,
Powered_Windows, Sport_Model, and Tow_Bar.
a. What appear to be the three or four most important car specifications for predicting
the car's price?
b. Using metrics you consider useful, assess the performance of the model in predict-
ing prices.
c. Train lasso and ridge regression models, and compare their performance to the mul- iple linear regression model. Tune the $\lambda$ parameter using 5-fold cross-validation.
d. Repeat (c) using the full dataset. Compare the training, cross-validation, and hold-
out performance for all three models. You will need to use cross-validation for the
linear regression model as well. Which model would you select based on the results
and why?