The questions involve the data set for asking prices of Richmond
townhouses obtained on 2014.11.03.
For your subset, the response variable is:
asking price divided by 10000:
askpr=c(41.99, 46.8, 60.8, 55.8, 56.88, 56.8, 53.9, 73.8, 25.9,
50.8, 61.5, 48.5, 54.98, 68.8, 45.99, 58.39, 52.4, 51.99, 48.8,
58.8, 33.7, 49.9, 58.68, 57.5, 79.99, 62.8888, 73.9, 50.5, 65.8,
40.8, 71.99, 78.8, 59.8, 81.9, 47.8, 62.9, 68.8, 54.8, 55.2, 47.9,
50.8, 65.99, 86.8, 68.5, 54.8, 53.8, 79.8, 57.8, 51.68, 77.8)
The explanatory variables are:
(i) finished floor area divided by 100
ffarea=c(12.9, 16.2, 13.2, 13.06, 15.78, 15.5, 11.84, 17.54, 6.1,
12.27, 14.5, 14.8, 13.06, 16.9, 16.01, 15.09, 16.22, 12.09, 14.8,
17.37, 12, 15.6, 13.96, 13.46, 22, 15.77, 15.15, 12.26, 13.45, 14,
15.05, 19.48, 17.63, 20.95, 13.34, 14, 15.95, 11.26, 15.3, 12.1,
16.6, 22.78, 15.08, 15.76, 15.46, 12.22, 15.25, 13.84, 15.1,
16.5)
(ii) age
age=c(44, 30, 3, 0, 17, 23, 15, 9, 11, 17, 7, 24, 1, 8, 25, 8, 25,
7, 50, 26, 28, 20, 9, 10, 20, 6, 0, 3, 1, 38, 8, 11, 26, 19, 32, 5,
18, 0, 9, 7, 23, 35, 1, 4, 41, 9, 3, 10, 20, 3)
(iii) monthly maintenance fee divided by 10
mfee=c(23.2, 16, 18.9, 18.6, 17.3, 17.4, 21, 18.2, 17.1, 25.2,
18.7, 16.1, 19.6, 19.4, 33.7, 20.3, 36.4, 18.1, 25, 31, 25.9, 27,
22, 22.1, 26.7, 35.7, 22.2, 18, 18.2, 23, 22.3, 20.4, 32, 34.8,
24.5, 19.6, 23.6, 24.8, 16.9, 18, 19.9, 57.4, 48.8, 22.1, 31, 18.5,
35, 16, 24.5, 25.4)
(iv) number of bedrooms
beds=c(3, 4, 3, 3, 4, 3, 2, 4, 1, 2, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3,
2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 5, 1, 3, 3, 3, 2, 3, 3, 4, 2,
3, 4, 3, 3, 2, 3, 3, 4)
You are to make a prediction of the response variable when
ffarea=17, age=12, mfee=28, beds=3.
You are to fit three multiple regression models with the response
variable askpr:
(i) 2 explanatory variables ffarea, age
(ii) 3 explanatory variables ffarea, age, mfee
(iii) 4 explanatory variables ffarea, age, mfee, beds
After you have copied the above R vectors into your R session, you
can get a dataframe with
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
Please use 3 decimal places for the answers below which are not
integer-valued
Part a)
The values of adjusted đť‘…2R2 for the above models with 2,
3 and 4 explanatory variables are respectively:
2 explanatory:
3 explanatory:
4 explanatory:
Part b)
For the best of these 3 models based on adjusted đť‘…2R2, the
number of explanatory variables is:
Part c)
For the best of these 3 models based on adjusted đť‘…2R2, the
least squares coefficient for ffarea is
and a 95% confidence interval for 𝛽𝑓𝑓𝑎𝑟𝑒𝑎βffarea is
to
Part d)
For the best of these 3 models based on adjusted đť‘…2R2, get the
prediction, SE and 95% prediction interval when the future values
of the explanatory variables are: ffarea=17, age=12, mfee=28,
beds=3.
Note: the SE can be solved for from the prediction interval, or
manually using the complicated equation (with the big square root!)
from lecture.
prediction: and its SE ,
and the upper endpoint of the 95% prediction interval is