16. Model Selection#
When we fit models, we have, so far, used a few strategies to decide which features to include. These guidelines are:
assess model fit with residual plots
connect the statistical model to a physical model
keep the model simple
compare improvements in the standard deviation of the residuals and in the MSE between increasingly complex models
For example, when we examined the one-variable model of upward mobility in Chapter 15, we found curvature in the residual plot. Adding a second variable greatly improved the fit in terms of average loss (MSE and, relatedly, multiple-\(R^2\)), but some curvature remained in the residuals. A seven-variable model made little improvement over the two-variable model, in terms of a decrease in MSE, so although the two-variable model still showed some patterns in the residuals, we opted for this simpler model.
As another example, when we model the weight of a donkey in Chapter 18, we take guidance from a physical model. We ignore the donkey’s appendages and draw on the similarity between a barrel and a donkey’s body to begin fitting a model that explains weight by its length and girth (comparable to a barrel’s height and circumference). We then continue to adjust that model by adding categorical features related to the donkey’s physical condition and age, collapsing categories, and excluding other possible features to keep the model simple.
The decisions we make in building these models are based on judgment calls, and in this chapter we augment these with more formal criteria. To begin, we provide an example that shows why it’s typically not a good idea to include too many features in a model. This phenomenon, called overfitting, often leads to models that follow the data too closely and capture some of the noise in the data. Then, when new observations come along, the predictions are worse than those from a simpler model. The remainder of the chapter provides techniques, such as the test-train split, cross-validation, and regularization, for limiting the impact of overfitting. These techniques are especially helpful when there are a large number of potential features to include in a model. We also provide a synthetic example, where we know the true model, to explain the concepts of model variance and bias and how they relate to over- and under-fitting.