Ex. 2.9 Consider a linear regression model with \( p \) parameters, fit by least squares to a set of training data \( \left(x_{1}, y_{1}\right), \ldots,\left(x_{N}, y_{N}\right) \) drawn at random from a population. Let \( \hat{\beta} \) be the least squares estimate. Suppose we have some test data \( \left(\tilde{x}_{1}, \tilde{y}_{1}\right), \ldots,\left(\tilde{x}_{M}, \tilde{y}_{M}\right) \) drawn at random from the same population as the training data. If \( R_{t r}(\beta)=\frac{1}{N} \sum_{1}^{N}\left(y_{i}-\beta^{T} x_{i}\right)^{2} \) and \( R_{t e}(\beta)= \) \( \frac{1}{M} \sum_{1}^{M}\left(\tilde{y}_{i}-\beta^{T} \tilde{x}_{i}\right)^{2} \), prove that
\[
E\left[R_{t r}(\hat{\beta})\right] \leq E\left[R_{t e}(\hat{\beta})\right]
\]