4. Let $M \subset \{1, \dots, p\}$. Assume that $\beta_j = 0$ for any $j \in M^c$. Show that $X_M \hat{\beta}_M = X_M (X_M^T X_M)^{-1} X_M^T Y$ is an unbiased predicted value for $Y^*$.
5. Consider a simple linear regression model $Y_i = \beta_0 + x_{i1} \beta_1 + \epsilon_i$ for $i = 1, \dots, n$. Suppose that the observed sample satisfies
$\sum_{i=1}^n Y_i = \sum_{i=1}^n x_{i1} = 0$
$\sum_{i=1}^n Y_i^2 = \sum_{i=1}^n x_{i1}^2 = n$
$\frac{1}{n} \sum_{i=1}^n Y_i x_{i1} = r$
for some $0 \le r \le 1$. We will select one of two models $M_0 = \emptyset$ (that is, model with only intercept), and $M_1 = \{1\}$ (that is, model with both intercept and predictor) using Mallows's $C_p$.
(a) Express Mallows's $C_p$ of $M_0$ and $M_1$ in terms of $n$ and $r$.
(b) Comparing Mallows's $C_p$, we select $M_1$ when $r > r_0$, and select $M_0$ when $r < r_0$. Express such $r_0$ in terms of $n$.