Problem #3:
(1 point) Logistic regression is a widely used algorithm for binary classification problems.
Consider a binary classification problem where the labels $y_i \in \{0, 1\}$ for $i = 1, \dots, n$ and
the feature vectors are ${x_i}_{i=1}^n$. The logistic regression model estimates the probability
that $y_i = 1$ given $x_i$ as:
$P(y_i = 1 \mid x_i) = \sigma(w^T x_i + b) = \frac{1}{1 + e^{-(w^T x_i + b)}}$,
where $\sigma(z)$ is the sigmoid function.
⢠Derive the logistic loss function (negative log-likelihood) for a single training
example $(x_i, y_i)$.
⢠Extend this to derive the empirical risk for the entire training dataset.