Question

Consider the neural network shown below, with two inputs $x_1, x_2$, one output $\hat{y}$, and one hidden layer with three neurons. The weights are as shown in the diagram, and you may assume that the biases are all zero and the sigmoid function is used as the activation throughout. (a) By representing the weight parameters in matrix form, write down matrix-vector expressions for $a^{(1)}$, the output of the hidden layer, and $\hat{y}$, the network output. (b) Using the multivariate chain rule, find expressions for the gradients $\frac{\partial \mathcal{L}}{\partial W^{(1)}}$ and $\frac{\partial \mathcal{L}}{\partial W^{(2)}}$, where $W^{(1)}, W^{(2)}$ are the weights between the inputs and hidden layer, and between the hidden layer and output, respectively. (c) Let $W^{(1)} = \begin{pmatrix} 0.4 & 0.5\\ 0.3 & 0.2\\ 0.7 & 0.1 \end{pmatrix}$ and $W^{(2)} = \begin{pmatrix} 0.8 & 0.6 & 0.9 \end{pmatrix}$. Calculate the output values at each node in the hidden layer and at the output $\hat{y}$ for input values $x_1 = 0, x_2 = 1$. (d) The mean-squared error loss function for $N$ examples is defined as $\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$, where $\hat{y}$ is the network output of that example at the output node, and $y$ is the target output (label) for that example. For input $x_1 = 0, x_2 = 1$ and target output $y = 1$, compute the updated network weights by performing one step of gradient descent. Show all steps in your calculation.

          Consider the neural network shown below, with two inputs $x_1, x_2$, one output $\hat{y}$, and one hidden layer with three neurons. The weights are as shown in the diagram, and you may assume that the biases are all zero and the sigmoid function is used as the activation throughout.
(a) By representing the weight parameters in matrix form, write down matrix-vector expressions for $a^{(1)}$, the output of the hidden layer, and $\hat{y}$, the network output.
(b) Using the multivariate chain rule, find expressions for the gradients $\frac{\partial \mathcal{L}}{\partial W^{(1)}}$ and $\frac{\partial \mathcal{L}}{\partial W^{(2)}}$, where $W^{(1)}, W^{(2)}$ are the weights between the inputs and hidden layer, and between the hidden layer and output, respectively.
(c) Let
$W^{(1)} = \begin{pmatrix} 0.4 & 0.5\\ 0.3 & 0.2\\ 0.7 & 0.1 \end{pmatrix}$ and $W^{(2)} = \begin{pmatrix} 0.8 & 0.6 & 0.9 \end{pmatrix}$.
Calculate the output values at each node in the hidden layer and at the output $\hat{y}$ for input values $x_1 = 0, x_2 = 1$.
(d) The mean-squared error loss function for $N$ examples is defined as
$\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$,
where $\hat{y}$ is the network output of that example at the output node, and $y$ is the target output (label) for that example. For input $x_1 = 0, x_2 = 1$ and target output $y = 1$, compute the updated network weights by performing one step of gradient descent. Show all steps in your calculation.

Consider the neural network shown below, with two inputs x1, x2, one output ŷ, and one hidden layer with three neurons. The weights are as shown in the diagram, and you may assume that the biases are all zero and the sigmoid function is used as the activation throughout.
(a) By representing the weight parameters in matrix form, write down matrix-vector expressions for a^(1), the output of the hidden layer, and ŷ, the network output.
(b) Using the multivariate chain rule, find expressions for the gradients (∂ℒ)/(∂ W^(1)) and (∂ℒ)/(∂ W^(2)), where W^(1), W^(2) are the weights between the inputs and hidden layer, and between the hidden layer and output, respectively.
(c) Let
W^(1) =
< p m a t r i x > and W^(2) =
< p m a t r i x >.
Calculate the output values at each node in the hidden layer and at the output ŷ for input values x1 = 0, x2 = 1.
(d) The mean-squared error loss function for N examples is defined as
ℒ = (1)/(N)∑i=1^N (yi - ŷi)^2,
where ŷ is the network output of that example at the output node, and y is the target output (label) for that example. For input x1 = 0, x2 = 1 and target output y = 1, compute the updated network weights by performing one step of gradient descent. Show all steps in your calculation.

Added by Renee B.

Question

Please give Ace some feedback