(8x5 = 40 points) Suppose that X1, X2, and X3 are potential predictor variables in a model used to predict Y. Use the given SSE values in the following table to answer the questions below.
SSE(X1) = 510
SSE(X1, X2) = 450
SSE(X1, X2, X3) = 330
SSE(X2) = 905
SSE(X1, X3) = 400
SSE(X3) = 720
SSE(X2, X3) = 640
The notation SSE(X1, X2) means the sum of squares for a multiple regression model that includes X1 and X2 as predictors (and does not include X3).
Calculate SSR(X3|X1). Show your work.
Explain in words what is measured by the quantity calculated in the previous part.
Calculate SSR(X1|X2, X3). Show your work.
Explain in words what is measured by the quantity calculated in the previous part.
Consider the "full" model, Yi = β0 + β1 Xi,1 + β2 Xi,2 + β3 Xi,3 + εi. What is the "reduced" model associated with the null hypothesis H0: β2 = β3 = 0?
Suppose that n = 70. Calculate the value of an F-statistic for testing H0: β2 = β3 = 0 for the model Yi = β0 + β1 Xi,1 + β2 Xi,2 + β3 Xi,3 + εi. It is not necessary to carry out the test – just calculate the value of F. Show your work. (Hint: Use the general linear F-test.)
Calculate the value of the coefficient of partial determination R2Y,2|1.
Write a sentence that interprets the value calculated in the previous part.
(6+6+5+9+5+3 = 34 points) Use the "Hospital Infections" dataset. Preliminary data analyses have revealed that the variable Y = InfctRsk could be related to the variables X1 = Stay, X2 = Culture, X3 = Xray, X4 = Beds, X5 = Census, and X6 = Nurses.
Fit a multiple linear regression model that relates InfctRsk to the predictor variables X1-X6. Perform a hypothesis test at significance level 0.05 to determine if at least one of the predictors in this model is useful in predicting Y. State your null and alternative hypotheses in terms of the regression coefficients (b's), the test statistic value with calculations shown (i.e., how the relevant number in the Anova table is calculated from other numbers in the Anova table), the decision rule and the conclusion.
Use a partial F-test to determine if the predictor variables X5 = Census and X6 = Nurses can be deleted from the model while retaining the four remaining variables X1 = Stay, X2 = Culture, X3 = Xray, and X4 = Beds. Again, state your null and alternative hypotheses in terms of regression coefficients, show your work in calculating the test statistic (i.e., using sequential sums of squares), state the decision rule and the conclusion.
Confirm the value of the partial F-statistic from part (b) by calculating the F-statistic using the general linear F-test formula. State the full and reduced models and show your work in calculating the test statistic.
Perform a hypothesis test to determine if X4 = Beds can be dropped from a model with the four predictors, X1 = Stay, X2 = Culture, X3 = Xray, and X4 = Beds, by using:
a t-statistic
an F-statistic
In each case, state your null and alternative hypotheses in terms of the regression coefficients, the test statistic value, the decision rule and the conclusion.
Is there any relationship between the two test statistics in (i) and (ii) above?
Calculate the value of the coefficient of partial determination R2Y,4|1,2,3 and explain in words what it measures.
Write down the fitted regression equation based on your conclusion in part (d).
(4x5 = 20 points) Consider the following two regression outputs:
Regression Analysis: weight versus trunk, width, length
Analysis of Variance
Source DF Seq SS Seq MS F-Value P-Value
Regression 3 8208.9 2736.31 39.38 0.000
trunk 1 5453.4 5453.42 78.48 0.000
width 1 2551.7 2551.70 36.72 0.000
length 1 203.8 203.82 2.93 0.090
Error 93 6462.6 69.49
Total 96 14671.5
Coefficients
Term Coef SE Coef T-Value P-Value
Constant -15.71 4.60 -3.42 0.001
trunk 2.638 0.522 5.05 0.000
width 0.5108 0.0842 6.07 0.000
length 0.0106 0.00620 1.71 0.090
Regression Equation
weight = -15.71 + 2.638 trunk + 0.5108 width + 0.01062 length
Regression Analysis: weight versus length
Analysis of Variance
Source DF Seq SS Seq MS F-Value P-Value
Regression 1 2103 2103.5 15.90 0.000
Length 1 2103 2103.5 15.90 0.000
Error 95 12568 132.3
Lack-of-Fit 84 11363 135.3 1.23 0.370
Pure Error 11 1205 109.6
Total 96 14672
Coefficients
Term Coef SE Coef T-Value P-Value
Constant 14.73 1.93 7.61 0.000
Length 0.03046 0.00764 3.99 0.000
Regression Equation
weight = 14.73 + 0.03046 length
Let Y = weight, X1 = trunk, X2 = width, X3 = length.
Test H0: β2 = 0 vs. H0: β2 ≠ 0 in the model E(Y) = β0 + β1 X1 + β2 X2 + β3 X3? What is the value of the test statistic? What is the p-value and conclusion?
Calculate the value of an F-statistic for testing H0: β1 = β2 = 0 in the model E(Y) = β0 + β1 X1 + β2 X2 + β3 X3, where the X variables are defined in the order given above. [Hint: Calculate the numerator of the F-statistic from the difference in SSE between the full and reduced models.]
Propose a model to test H0 in part (b) by using a partial F-statistic using sequential sums of squares. Remember to specify the order of the predictors in your model. [Hint: For a partial F-statistic we would calculate the numerator of the F-statistic from the sum of two appropriate sequential sums of squares.]
Write an interpretation of the significance of the test results given for the predictor variable length within the three-predictor model (the first output). Be careful – you might also look at what happens in the one-predictor model in which length is the only predictor.
(3x2 = 6 points) For a multiple regression model with five predictor variables (X1,…, X5), what is the form of the F-statistic for each of the following scenarios? In each case, you may assume that the full model contains all 5 predictors. [Hint: Each answer should be expressed as a ratio of mean squares. You need to specify the correct mean square for the numerator and denominator. You should also specify the appropriate formulas for each mean square. In other words, each answer should be written in the form F* = MS1/MS2 = (SS1/df1)/ (SS2/df2), where you need to specify MS1, MS2, SS1, SS2, df1, and df2.]