Problem 1: Computational Graphs and Backpropagation (25 marks)
Given a sequential dataset D = {(x1,y1),...,(xT, yr)}, consider a vanilla RNN model, which computes the loss function at the t-th time step using the following formulas:
at =b+Wht-1+Uxt
h=tanh(a)
ot=c+vTht
Lt =(ot -y)2,
the length of the sequence. For simplicity, let's assume the input data is three dimensional (i.e., x R3), the hidden size is 2 i.e.. h E R2 and T=1 in this problem.
Q1. Construct the computational graph for L1
Q2. What are the model parameters to be learned?
Q3. Given the following model parameters and data, compute the loss function in forward pass and compute the gradient with respect to all model parameters in backward pass. We follow the common practice to use the zero initial hidden states, i.e., ho = [0, 0]
b=[0.1,0]T
W= [0.5 0.4]
-0.3 0.7
[0.8 -0.3 0.5]
U= 1 0.1 0.9
c=0.7
v =[0.5,0.8]T
D={([1,-1,1],1.5}