Texts: X is the last digit of your student ID. Y is the second to last digit of your student ID. (For example, if your student ID is 7890456, then X=6, Y=5.)
Question 4: (5 points)
Calculate the cosine distance between the vectors [1,2+√x] and [2,3+√2].
Question 5: (5 points)
Which of the following statements are TRUE about autoencoders and variational autoencoders?
(i) Variational autoencoders enforce latent space distribution priors while autoencoders do not.
(ii) Autoencoders can be trained for denoising while variational autoencoders cannot.
(iii) Both autoencoders and variational autoencoders use KL-divergence as the regularization term.
Question 6: (5 points)
Which of the following statements are TRUE about recurrent neural networks?
(i) Elman network is less likely to suffer from "vanishing gradient" or "exploding gradient" issues than LSTM network.
(ii) For an RNN model, if the sequence length is increased, the number of parameters increases.
(iii) RNNs can be stacked to create multiple layers.
Question 7: (15 points)
Consider the vision transformer model to classify an image. The image size is (512x512). The image is split into a total of 8x8=64 patches. Each patch will then be vectorized and mapped to an embedding vector. The embedding dimension is 12. In the transformer, the number of heads is 1, the number of blocks is 1, and the output dimension is 2.
(a) What is the size of each patch?
(b) What is the dimension of the embedding matrix?
(c) What is the dimension of the query matrix?
(d) What is the dimension of the attention matrix?
Question 8: (5 points)
Consider an object detector model that predicts bounding boxes as well as class. What could be the loss function for such a model? Explain.
Question 9: (5 points)
What could be the loss function for an image segmentation network, such as U-Net? Explain.