Problem 4. (50 points) Consider a multivariate linear regression problem of mapping $mathbb{R}^d$ to $mathbb{R}$, with two different objective functions. The first objective function is the sum of squared errors, as presented in class; i.e., $sum_{i=1}^n e_i^2$, where $e_i = w_0 + sum_{j=1}^d w_j x_{ij} - y_i$. The second objective function is the sum of square Euclidean distances to the hyperplane; i.e., $sum_{i=1}^n r_i^2$, where $r_i$ is the Euclidean distance between point $(x_i, y_i)$ to the hyperplane $f(x) = w_0 + sum_{j=1}^d w_j x_j$. a) (10 points) Derive a gradient descent algorithm to find the parameters of the model that minimizes the sum of squared errors. b) (20 points) Derive a gradient descent algorithm to find the parameters of the model that minimizes the sum of squared distances. c) (20 points) Implement both algorithms and test them on 3 different datasets. Datasets can be randomly generated, as in class, or obtained from resources such as UCI Machine Learning Repository. Compare the solutions to the closed-form (maximum likelihood) solution derived in class and find $R^2$ in all cases on the same dataset used to fit the parameters; i.e., do not implement cross-validation. Briefly describe the data you use and discuss your results.
Added by Lisa T.
Close
Step 1
We want to minimize this function. The gradient of J with respect to w is ∇J(w) = 2Σei * ∇ei. The gradient of ei with respect to w is ∇ei = xi, where xi is the i-th input vector. Show more…
Show all steps
Your feedback will help us improve your experience
Akash M and 81 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Problem 3. Support Vector Machines (25 points) Given a linearly separable data set where each sample is either from class yi = 1 or class yi = -1. Then a linear SVM can always find the separating hyperplane with the maximum margin. Assuming we know the two nearest training samples from different classes are xj, yj = 1 and xk, yk = -1, and the parameters for the linear SVM are w and b. The linear SVM is trying to maximize the margin as: arg max d = d+ - d- = (w^T xj / ||w||_2) - (w^T xk / ||w||_2) (1) w, b where d+ (d-) denotes the distance of xj (xk) to the separating hyperplane. 1. (15 points) Please formulate an optimization problem (including objective function, constraints) of the "hard-margin" linear SVM (i.e., SVM without slack variables). 2. (10 points) In practice, it is often the case that data points cannot be well-separated via a "hard margin". Can you provide a solution to solve this problem?
Shyam P.
Problem 1: Linear Regression and Gradient Learning [30 points] In class, we derived linear regression and various learning algorithms based on gradient descent. In addition to the least square objective, we also learned its probabilistic perspective where each observation is assumed to have Gaussian noise. The noise of each example is an independent and identically distributed sample from a normal distribution. In this problem, you are supposed to deal with the following regression model that includes two linear features and one quadratic feature: y = ̈́0 + ̈́1x1 + ̈́2x2 + ̈́3x1^2 + ̄̄ where ̄̄ ~ N(0, ́^2). Your goal is to develop a gradient descent learning algorithm that will estimate the best parameters ̈́ = {̈́0, ̈́1, ̈́2, ̈́3}. Given the definition of noise, derive the corresponding mean and variance parameters of the normal distribution for y|x1, x2; ̈́. Also, write down its probability density function. You are provided with training observations D = {(x1(i), x2(i), y(i)) | 1 ≤ i ≤ m}. Derive the conditional log-likelihood that will be later maximized to make D most likely. If you omit the constant term that does not relate to the parameters, what will be the objective function J(̈́) that you are going to perform Maximum Likelihood Estimation? Does it look similar to the Least Square objective for this problem? Compute the gradient of J(̈́) with respect to each parameter. (Hint: You should evaluate the partial derivatives of J(̈́) with respect to each ̈́j, for 0 ≤ j ≤ 3) [Coding] Develop two learning algorithms from scratch: batch and stochastic gradient descent for this problem on the Auto dataset given in the Problem in Homework. Compare and contrast the performance among your batch gradient, stochastic gradient, and R's built-in function call: lm. Are the two best input features for predicting the output mpg the same across different algorithms? (Hint: At least your stochastic gradient algorithm must learn parameters comparable to the result from calling the R's built-in function. Otherwise, try to tune the learning rate ́.)
Sri K.
Problem 2 (20 points) You collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. You then fit a linear regression model to the data, as well as a separate cubic regression, i.e. Y = ̠₀ + ̠₁X + ̠₂X² + ̠₃X³ + ϵ. (a) (5 points) Suppose that the true relationship between X and Y is linear, i.e. Y = ̠₀ + ̠₁X + ϵ. Consider the training residual sum of squares (RSS) for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer. (b) (5 points) Answer (a) using test rather than training RSS. (c) (5 points) Suppose that the true relationship between X and Y is not linear, but we don't know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer. (d) (5 points) Answer (c) using test rather than training RSS.
Paul A.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Watch the video solution with this free unlock.
EMAIL
PASSWORD