Question

Given data {(x_i, y_i) ? ?^d % #177;1} : i ? [1, n]}, logistic regression is another popular classification method in Machine Learning, which amounts to the following minimization problem: min_{w ? ?^d} { f(w) := %1/n ?_{i=1}^n log(1 + e^{-y_i?w, x_i?}) + ?/2 ||w||^2 }, where ? > 0 is a regularization parameter. • (15 points) Work out the gradient function ?f(w) and the Hessian function ?^2 f(w). Show that its gradient function ?f(w) is Lipschitz continuous with constant L ? 1/n ?_{i=1}^n ||x_i||^2 + ?, i.e. ||?f(w) - ?f(w')|| ? (1/n ?_{i=1}^n ||x_i||^2 + ?) ||w - w'||, ? w, w' ? ?^d. (Recall that in Homework 3 we have shown the objective function f of the logistic regression is convex) • (15 points) Write down the pseudo-code of Stochastic gradient descent (SGD) for logistic regression. • (Bonus question) Consider the average of iterates, i.e. w?_T = 1/T ?_{t=1}^T w_t where {w_t : t ? [1, T]} is generated by SGD with step sizes ?_t = 1/(?t). Prove that the following convergence rate for SGD in logistic regression: ?(f_2(w?_T)) - min_{w ? ?^d} f_2(w) = O(log T / T).

          Given data {(x_i, y_i) ? ?^d % #177;1} : i ? [1, n]}, logistic regression is another popular classification method in Machine Learning, which amounts to the following minimization problem:
min_{w ? ?^d} { f(w) := %1/n ?_{i=1}^n log(1 + e^{-y_i?w, x_i?}) + ?/2 ||w||^2 },
where ? > 0 is a regularization parameter.
• (15 points) Work out the gradient function ?f(w) and the Hessian function ?^2 f(w). Show that its gradient function ?f(w) is Lipschitz continuous with constant L ? 1/n ?_{i=1}^n ||x_i||^2 + ?, i.e.
||?f(w) - ?f(w')|| ? (1/n ?_{i=1}^n ||x_i||^2 + ?) ||w - w'||, ? w, w' ? ?^d.
(Recall that in Homework 3 we have shown the objective function f of the logistic regression is convex)
• (15 points) Write down the pseudo-code of Stochastic gradient descent (SGD) for logistic regression.
• (Bonus question) Consider the average of iterates, i.e. w?_T = 1/T ?_{t=1}^T w_t where {w_t : t ? [1, T]} is generated by SGD with step sizes ?_t = 1/(?t). Prove that the following convergence rate for SGD in logistic regression:
?(f_2(w?_T)) - min_{w ? ?^d} f_2(w) = O(log T / T).

$Given data (xi, yi) ? ?^d % #177;1} : i ? [1, n]}, logistic regression is another popular classification method in Machine Learning, which amounts to the following minimization problem: minw ? ?^d f(w) := %1/n ?{i=1}^n log(1 + e^{-yi?w, xi?}) + ?/2 ||w||^2 }, where ? > 0 is a regularization parameter. • (15 points) Work out the gradient function ?f(w) and the Hessian function ?^2 f(w). Show that its gradient function ?f(w) is Lipschitz continuous with constant L ? 1/n ?i=1^n ||xi||^2 + ?, i.e. ||?f(w) - ?f(w')|| ? (1/n ?i=1^n ||xi||^2 + ?) ||w - w'||, ? w, w' ? ?^d. (Recall that in Homework 3 we have shown the objective function f of the logistic regression is convex) • (15 points) Write down the pseudo-code of Stochastic gradient descent (SGD) for logistic regression. • (Bonus question) Consider the average of iterates, i.e. w?T = 1/T ?t=1^T wt where wt : t ? [1, T] is generated by SGD with step sizes ?t = 1/(?t). Prove that the following convergence rate for SGD in logistic regression: ?(f2(w?T)) - minw ? ?^d f2(w) = O(log T / T).$

Added by Jessica M.

Question

Please give Ace some feedback