Question

Problem 1 (Training Error vs. Test Error, ESL 2.9) In this problem, we want to use the least squares estimator to illustrate the point that the trainning error is generally an underestimate of the prediction error (or test error). Consider a linear regression model with p parameters, y = X? + ?, where ?_i ? iid N(0, ?^2). We fit the model by least squares to a set of trainning data (x_1, y_1), ..., (x_N, y_N) drawn independently from a population. Let ?^ be the least squares estimate obtained from the training data. Suppose we have some test data (x?_1, y?_1), ..., (x?_M, y?_M) (N ? M > p) drawn at random from the same population as the training data. If R_tr(?) = (1/N) ?_{i=1}^{N} (y_i - ?^T x_i)^2 and R_te(?) = (1/M) ?_{i=1}^{M} (y?_i - ?^T x?_i)^2, prove that E[R_tr(?^)] ? E[R_te(?^)], where the expectations are over all that is random in each expression.

          Problem 1 (Training Error vs. Test Error, ESL 2.9)
In this problem, we want to use the least squares estimator to illustrate the point that the trainning error is
generally an underestimate of the prediction error (or test error).
Consider a linear regression model with p parameters,
y = X? + ?, where ?_i ? iid N(0, ?^2).
We fit the model by least squares to a set of trainning data (x_1, y_1), ..., (x_N, y_N) drawn independently from a
population. Let ?^ be the least squares estimate obtained from the training data. Suppose we have some test
data (x?_1, y?_1), ..., (x?_M, y?_M) (N ? M > p) drawn at random from the same population as the training data. If
R_tr(?) = (1/N) ?_{i=1}^{N} (y_i - ?^T x_i)^2 and R_te(?) = (1/M) ?_{i=1}^{M} (y?_i - ?^T x?_i)^2, prove that
E[R_tr(?^)] ? E[R_te(?^)],
where the expectations are over all that is random in each expression.

Problem 1 (Training Error vs. Test Error, ESL 2.9)
In this problem, we want to use the least squares estimator to illustrate the point that the trainning error is
generally an underestimate of the prediction error (or test error).
Consider a linear regression model with p parameters,
y = X? + ?, where ?i ? iid N(0, ?^2).
We fit the model by least squares to a set of trainning data (x1, y1), ..., (xN, yN) drawn independently from a
population. Let ?^ be the least squares estimate obtained from the training data. Suppose we have some test
data (x?1, y?1), ..., (x?M, y?M) (N ? M > p) drawn at random from the same population as the training data. If
Rtr(?) = (1/N) ?i=1^N (yi - ?^T xi)^2 and Rte(?) = (1/M) ?i=1^M (y?i - ?^T x?i)^2, prove that
E[Rtr(?^)] ? E[Rte(?^)],
where the expectations are over all that is random in each expression.

Added by Travis F.

Question

Please give Ace some feedback