Regularization" is a popular method used to improve supervised learning. Specifically, for linear regression, the loss function is changed to:
J(w) = Σ(i=1 to n) (w^T * x_i - y_i)^2 + λ * ||w||^2
where the added term λ * ||w||^2 is called the regularization term, ||w|| is the length of the vector w, and λ is a parameter pre-set by the user.
Derive the formula for the gradient ∇J(w).
Assume we have a training dataset with two records: x_1 = (1, 2, 3), y_1 = 0.1, and x_2 = (1, 4, 5), y_2 = 0.2. Let w_0 = (1, 1, 1) and λ set to 0.1.
Compute the gradient ∇J(w_0).