Given dataset D = {(Xi,Yi), i=1,2,...N}, where Xi, the problem is to find function f(x) = a0 + a1x^13 + a2cos(x^2) + a3x^3 that minimizes MSE (mean squared error).
a) Derive a stochastic gradient descent step for updating the parameters 'a'.
b) Is it possible for the gradient descent procedure that minimizes the mean squared error of f(x) to get stuck in a local minimum? Explain.