Question 4 (40 points): For a sequential input x ={x1,..., xt} and y = {y1, ..., yt}, a recurrent
neuron is as follows:
$z_i = \tanh(Wx*x_i + Wh*z_{i-1} + bh)$
$a_i = \text{sigmoid}(Wy*z_i + by)$
where Wx, Wh and Wy are weight matrices, bh and by are bias vectors, and z_i is the hidden
state, ai is the output at time step i, and error is computed as a difference of ai and yi.
4.1) For a sequence input data x1, x2 and x3, and output sequence of y1, y2 and y3, write the
forward propagation.
4.2) To compute back propagation gradients, you need to computes three error values, loss(y1,
a1), loss(y2,a2) and loss(y3,a3). You can suppose the loss function is a mean square loss.
Write back propagation chain rule in terms of weight matrices.