Explain how to work out, especially D2(). The squared distances for each variable are:
The 3-nearest neighbours of x^(new) are x^(2), x^(3), and x^(6).
We finally classify x^(new) using the distances we found.
Therefore, y^(new) is predicted to be yes.
Exercise 4: Solution
We first normalize all numerical variables of all instances. The new table is the following:
gen age bmi city ill
x1 male 0.24 1 Bristol no
x(2) female 0.48 0.524 London no
x(3) female 0.94 0.286 Edinburgh yes
x(4) male 0 0.410 London yes
x(5) male 1 0 Birmingham no
x(6) female 0.12 0.924 Birmingham yes
x(new) female 0.1 0.162 Birmingham
We next find the squared distances of x(new) with all other instances, for each variable separately.
2023/2024
Exercise 4: Solution (continued)
The squared distances for each variable are:
gen age bmi city D2() y(i)
Dist(x(new), x(1) 0.0196 0.70 1 1.65 no
Dist(x(new), x(2) 0.1444 0.13 1 1.13 no
Dist(x(new), x(3) 0 0.7056 0.015 1 1.31 yes
Dist(x(new), x(4) 1 0.01 0.06 1 1.44 yes
Dist(x(new), x(5) 1 0.81 0.026 0 1.36 no
Dist(x(new), x(6) 0 0.0004 0.58 0 0.76 yes
The 3-nearest neighbours of x(new) are (2), (3), and (6).
Therefore, y(new) is predicted to be yes.