4. Consider the "sale.txt" data posted (at onQ, below this file). A company conducted a survey of a new product to develop a marketing strategy. The response Y is the amount (in thousand dollars) that each individual can spend on the new product. The possible explanatory variable x is the yearly income (in hundred dollars). There are 21 participants in the study.
(a) Fit a simple linear regression for Y and x. Define this model clearly in mathematical form. Assess the model fit by the 3 types of residual plots introduced in Section 5.2. Do you find any problems with the constant variance assumption?
(b) Use the Box-Cox transformation on Y to improve the model. What transformation do you choose? Define a new model in mathematical form based on this transformation. Fit the model, and assess the model performance using suitable residual plot or plots. Is it a good remedy for the problem identified in (a)?
(c) Now consider a model without intercept,
Yi = ̢̢xi + ̢i
where Yi and xi are the amount individual i can spend on the new product, and his/her yearly income. The error terms ̢i are assumed to be i.i.d. N(0, ̢2). Fit this model to the data. Does it fit the data better than the model in (a)? Repeat residual analysis (and plots) for this model as in (a) and comment.
(d) Suppose the true model is
Yi = ̢xi + ̢i
where error terms ̢i are independent from N(0, ̢2xi) distribution. That is, Var(Yi) = Var(̢i) = ̢2xi. Notice this is a model without an intercept, and for heteroscedastic data! A possible remedy for heteroscedasticity is to consider the model
Yi* = ̢xi* + ̢i*,
where Yi* = Yi/√xi, xi* = √xi and ̢i* = ̢i/√xi. In theory, do ̢i*'s have equal variances now? Fit this model to the data, and assess the model performance through relevant residual plot or plots. Does it work in fixing the non-constant variance problem for the model in (c)?
Remark: Part (d) is really applying a weighted least squares method to the heteroscedastic data. We can also apply it directly to the original data through the lm() function with weights. Try this to see if you get exactly the same output as from the model you fit for (d).
> fitwt=lm(Y~X-1,weights=1/X) # Original data attached.