a. Construct and examine the correlation matrix. Is multicollinearity a potential problem? b. Suggest an appropriate set of independent variables that predict the number of wins by examining the correlation matrix. c. Find the best multiple regression model for predicting the number of wins having only significant independent variables. How good is your model? Does it use the same variables you thought were appropriate in part b?
Added by Joshua C.
Step 1
- Examine the correlation coefficients: - Look for very strong correlations (absolute value close to 1) among independent variables. High pairwise correlations suggest potential multicollinearity. - Note the correlation between each candidate independent Show more…
Show all steps
Close
Your feedback will help us improve your experience
Sri K and 51 other Intro Stats / AP Statistics educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
The accompanying Major League Baseball data provides data for one season. Use the data to build a multiple regression model that predicts the number of wins. Complete parts a through c. a. Construct and examine the correlation matrix. Is multicollinearity a potential problem? Won Runs Hits Earned Run Average Strike Outs Walks Won ___ Runs ______ _______ Hits ______ _______ Earned Run Average ________ _______ _________ ________ Strike Outs ________ _________ ________ ________ ______ Walks ________ _________ _________ ________ ______ _____ c. Find the best multiple regression model for predicting the number of wins having only significant independent variables. How good is your model? Does it use the same variables you thought were appropriate in part b? Use a level of significance of 0.05. Determine the best multiple regression model. Let X1 represent Runs, let X2 represent Hits, let X3 represent Earned Run Average, let X4 represent Strike Outs, and let X5 represent Walks. Enter the terms of the equation so that the Xk-values are in ascending numeral order by base. Select the correct choice below and fill in the answer boxes within your choice. Won = __________ + (__________)X_____ + (________)X_____ + (________)X____ The model's R2 value is ______. This means that _______% of the number of wins can be predicted by these variables. This model (does/does not) use the same variables as part b. (Round the R2 value to three decimal places as needed. Round the percentage to one decimal place as needed.)
Sri K.
A baseball analytics specialist wants to determine which variables are important in predicting a team's wins in a given season. He has collected data related to wins, earned run average (ERA), and runs scored for the 2011 season (stored in BB2011). Develop a model to predict the number of wins based on ERA and runs scored. a. State the multiple regression equation. b. Interpret the meaning of the slopes in this equation. c. Predict the number of wins for a team that has an ERA of 4.50 and has scored 750 runs. d. Perform a residual analysis on the results and determine whether the regression assumptions are valid. e. Is there a significant relationship between the number of wins and the two independent variables (ERA and runs scored) at the 0.05 level of significance? f. Determine the p-value in (e) and interpret its meaning. g. Interpret the meaning of the coefficient of multiple determination in this problem. h. Determine the adjusted R-squared. i. At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. Indicate the most appropriate regression model for this set of data. j. Determine the p-values in (i) and interpret their meaning. k. Construct a 95% confidence interval estimate of the population slope between wins and ERA. l. Compute and interpret the coefficients of partial determination. m. Which is more important in predicting wins - pitching, as measured by ERA, or offense, as measured by runs scored? Explain.
Dominador T.
For the data set. $$ \begin{array}{cllc} x_{1} & x_{2} & x_{3} & y \\ \hline 0.8 & 2.8 & 2.5 & 11.0 \\ \hline 3.9 & 2.6 & 5.7 & 10.8 \\ \hline 1.8 & 2.4 & 7.8 & 10.6 \\ \hline 5.1 & 2.3 & 7.1 & 10.3 \\ \hline 4.9 & 2.5 & 5.9 & 10.3 \\ \hline 8.4 & 2.1 & 8.6 & 10.3 \\ \hline 12.9 & 2.3 & 9.2 & 10.0 \\ \hline 6.0 & 2.0 & 1.2 & 9.4 \\ \hline 14.6 & 2.2 & 3.7 & 8.7 \\ \hline 93 & 11 & 55 & 87 \end{array} $$ (a) Construct a correlation matrix between $x_{1}, x_{2}, x_{3},$ and $y .$ Is there any evidence that multicollinearity exists? Why? (b) Determine the multiple regression line with $x_{1}, x_{2},$ and $x_{3}$ as the explanatory variables. (c) Assuming that the requirements of the model are satisfied, test $H_{0}: \beta_{1}=\beta_{2}=\beta_{3}=0$ versus $H_{1}:$ at least one of the $\beta_{i}$ is different from zero at the $\alpha=0.05$ level of significance. (d) Assuming that the requirements of the model are satisfied, test $H_{0}: \beta_{i}=0$ versus $H_{1}: \beta_{i} \neq 0$ for $i=1,2,3$ at the $\alpha=0.05$ level of significance.
Inference on the Least-Squares Regression Model and Multiple Regression
Introduction to Multiple Regression
Recommended Textbooks
Elementary Statistics a Step by Step Approach
The Practice of Statistics for AP
Introductory Statistics
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD