Measuring distance between observations based on their predictor values underlies many machine learning methods. Compare different approaches in measuring distances between observations.
Added by Lori J.
Step 1
Common distance metrics include Euclidean distance, Manhattan distance, Minkowski distance, and Hamming distance. Show more…
Show all steps
Your feedback will help us improve your experience
Sri K and 58 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies. The online education company Statistics.com segments its customers and prospects into three main categories: professionals (IT), statisticians (Stat), and other (Other). It also tracks, for each customer, the number of years since first contact (years). Consider the following customers; information about whether they have taken a course or not (the outcome to be predicted) is included: Customer 1: Stat, 1 year, did not take the course Customer 2: Other, 1.1 years, took the course a. Consider now the following new prospect: Prospect 1: IT, 1 year Using the above information on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 binaries, and a similar dataset with the categorical predictor variable transformed into 3 binaries. b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers. (Note: while it is typical to normalize data for kNN, this is not an iron-clad rule and you may proceed here without normalization.) c. Using k-NN with k = 1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies?
Sri K.
The table below provides a training dataset containing six observations, three predictors, and one qualitative response variable. Suppose we wish to use this dataset to make predictions for Y when X1 = X2 = X3 = 0 using K-nearest neighbors. (a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0. (b) What is our prediction with K = 1? Why? (c) What is our prediction with K = 3? Why?
Areen D.
The table below provides a training data set containing six observations, three predictors, and one qualitative response variable. Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K-nearest neighbors. (a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0. Statistical Learning (b) What is our prediction with K = 1? Why? (c) What is our prediction with K = 3? Why? (d) If the Bayes decision boundary in this problem is highly non-linear, then would we expect the best value for K to be large or small? Why?
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD