Problem 1 (Training Error vs. Test Error, ESL 2.9)
In this problem, we want to use the least squares estimator to illustrate the point that the training error is generally an underestimate of the prediction error (or test error). Consider a linear regression model with p parameters:
y = XB + ε, where εi ~ N(0, σ^2).
We fit the model by least squares to a set of training data (x1, y1), ... (xN, yN) drawn independently from the population. Let β^ be the least squares estimate obtained from the training data.
Suppose we have some test data (x̃1, ỹ1), ... (x̃M, ỹM) (N ≥ M > p) drawn at random from the same population as the training data. If Rtr(β) = 1/N Σ(yi - β^T xi)^2 and Rte(β) = 1/M Σ(ỹi - β^T x̃i)^2, prove that E[Rtr(β^)] ≤ E[Rte(β^)], where the expectations are over all that is random in each expression.