Question

10 Generalized Linear Regression In the problems of this section $\mathbf{x}^T \boldsymbol{\beta} = \beta_0 + \sum_{i=1}^p \beta_i x_i$ Problem 10.2. In binary logistic regression the probabilitymass function of $Y|X = \mathbf{x}$ is $P(y|\mathbf{x}; \boldsymbol{\beta}) = \sigma(y \mathbf{x}^T \boldsymbol{\beta})$, where $\sigma(x) = \frac{1}{1 + e^{-x}}$ and $y \in \{-1, 1\}$. A training set is $\mathcal{D}_{tr} = \{ (\mathbf{x}_1, y_1), \dots, (\mathbf{x}_n, y_n) \}$ where $y_i \in \{-1, 1\}$. The likelihood function of $\boldsymbol{\beta}$ is $L(\boldsymbol{\beta}) \stackrel{def}{=} \prod_{i=1}^n P(y_i | \mathbf{x}_i; \boldsymbol{\beta})$ a) The negative loglikelihood is Show that $-l(\boldsymbol{\beta}) \stackrel{def}{=} -\ln L(\boldsymbol{\beta})$ $-l(\boldsymbol{\beta}) = \sum_{i=1}^n \ln \left[ 1 + e^{-y_i \mathbf{x}_i^T \boldsymbol{\beta}} \right]$ b) With $\boldsymbol{\beta} = (\beta_0, \beta_1, \beta_2, \dots, \beta_p)$ and $\mathbf{x}_i^T \boldsymbol{\beta} = \beta_0 + \sum_{i=1}^p \beta_i x_i$ show that $\frac{\partial}{\partial \beta_0} \ln \left[ 1 + e^{-y_i \mathbf{x}_i^T \boldsymbol{\beta}} \right] = -y_i (1 - P(y_i | \mathbf{x}_i; \boldsymbol{\beta}))$ c) With $\boldsymbol{\beta} = (\beta_0, \beta_1, \beta_2, \dots, \beta_p)$ and $\mathbf{x}_i^T \boldsymbol{\beta} = \beta_0 + \sum_{i=1}^p \beta_i x_i$ show for $k = 1, \dots, p$ that $\frac{\partial}{\partial \beta_k} \ln \left[ 1 + e^{-y_i \mathbf{x}_i^T \boldsymbol{\beta}} \right] = -y_i x_{ik} (1 - P(y_i | \mathbf{x}_i; \boldsymbol{\beta}))$ d) One wants to find the maximum likelihood estimate $\hat{\boldsymbol{\beta}}_{ML}$ of $\boldsymbol{\beta}$ using $\mathcal{D}_{tr}$. We have $\hat{\boldsymbol{\beta}}_{ML} = \operatorname{argmin} [-l(\boldsymbol{\beta})]$. We assume that $\mathcal{D}_{tr}$ is such that a unique $\hat{\boldsymbol{\beta}}_{MLE}$ exists. Give a stochastic gradient descent algorithm for $\hat{\boldsymbol{\beta}}_{ML}$

          10 Generalized Linear Regression
In the problems of this section
$\mathbf{x}^T \boldsymbol{\beta} = \beta_0 + \sum_{i=1}^p \beta_i x_i$
Problem 10.2.
In binary logistic regression the probabilitymass function of $Y|X = \mathbf{x}$ is
$P(y|\mathbf{x}; \boldsymbol{\beta}) = \sigma(y \mathbf{x}^T \boldsymbol{\beta})$,
where $\sigma(x) = \frac{1}{1 + e^{-x}}$ and $y \in \{-1, 1\}$. A training set is
$\mathcal{D}_{tr} = \{ (\mathbf{x}_1, y_1), \dots, (\mathbf{x}_n, y_n) \}$
where $y_i \in \{-1, 1\}$. The likelihood function of $\boldsymbol{\beta}$ is
$L(\boldsymbol{\beta}) \stackrel{def}{=} \prod_{i=1}^n P(y_i | \mathbf{x}_i; \boldsymbol{\beta})$
a) The negative loglikelihood is
Show that
$-l(\boldsymbol{\beta}) \stackrel{def}{=} -\ln L(\boldsymbol{\beta})$
$-l(\boldsymbol{\beta}) = \sum_{i=1}^n \ln \left[ 1 + e^{-y_i \mathbf{x}_i^T \boldsymbol{\beta}} \right]$
b) With $\boldsymbol{\beta} = (\beta_0, \beta_1, \beta_2, \dots, \beta_p)$ and $\mathbf{x}_i^T \boldsymbol{\beta} = \beta_0 + \sum_{i=1}^p \beta_i x_i$ show that
$\frac{\partial}{\partial \beta_0} \ln \left[ 1 + e^{-y_i \mathbf{x}_i^T \boldsymbol{\beta}} \right] = -y_i (1 - P(y_i | \mathbf{x}_i; \boldsymbol{\beta}))$
c) With $\boldsymbol{\beta} = (\beta_0, \beta_1, \beta_2, \dots, \beta_p)$ and $\mathbf{x}_i^T \boldsymbol{\beta} = \beta_0 + \sum_{i=1}^p \beta_i x_i$ show for $k = 1, \dots, p$ that
$\frac{\partial}{\partial \beta_k} \ln \left[ 1 + e^{-y_i \mathbf{x}_i^T \boldsymbol{\beta}} \right] = -y_i x_{ik} (1 - P(y_i | \mathbf{x}_i; \boldsymbol{\beta}))$
d) One wants to find the maximum likelihood estimate $\hat{\boldsymbol{\beta}}_{ML}$ of $\boldsymbol{\beta}$ using $\mathcal{D}_{tr}$. We have
$\hat{\boldsymbol{\beta}}_{ML} = \operatorname{argmin} [-l(\boldsymbol{\beta})]$.
We assume that $\mathcal{D}_{tr}$ is such that a unique $\hat{\boldsymbol{\beta}}_{MLE}$ exists. Give a stochastic gradient descent
algorithm for $\hat{\boldsymbol{\beta}}_{ML}$

$10 Generalized Linear Regression In the problems of this section 𝐱^T β = β0 + ∑i=1^p xi Problem 10.2. In binary logistic regression the probabilitymass function of Y|X = 𝐱 is P(y|𝐱; β) = σ(y 𝐱^T β), where σ(x) = (1)/(1 + e^-x) and y ∈{-1, 1}. A training set is 𝒟tr = { (𝐱1, y1), …, (𝐱n, yn) } where yi ∈{-1, 1}. The likelihood function of β is L(β) def=∏i=1^n P(yi | 𝐱i; β) a) The negative loglikelihood is Show that -l(β) def= -ln L(β) -l(β) = ∑i=1^n ln[ 1 + e^-yi 𝐱i^T β] b) With β = (β0, β1, β2, …, ) and 𝐱i^T β = β0 + ∑i=1^p xi show that (∂)/(∂β0)ln[ 1 + e^-yi 𝐱i^T β] = -yi (1 - P(yi | 𝐱i; β)) c) With β = (β0, β1, β2, …, ) and 𝐱i^T β = β0 + ∑i=1^p xi show for k = 1, …, p that (∂)/(∂)ln[ 1 + e^-yi 𝐱i^T β] = -yi xik (1 - P(yi | 𝐱i; β)) d) One wants to find the maximum likelihood estimate β̂ML of β using 𝒟tr. We have β̂ML = argmin [-l(β)]. We assume that 𝒟tr is such that a unique β̂MLE exists. Give a stochastic gradient descent algorithm for β̂ML$

Added by Edward G.

Question

Please give Ace some feedback