• Home
  • Textbooks
  • Statistics The Art and Science of Learning from Data
  • Multiple Regression

Statistics The Art and Science of Learning from Data

Alan Agresti, Christine A. Franklin, Bernhard Klingenberg

Chapter 13

Multiple Regression - all with Video Answers

Educators


Section 1

Using Several Variables to Predict a Response

01:36

Problem 1

Predicting weight For a study of female college athletes, the prediction equation relating $y=$ total body weight (in pounds) to $x_{1}=$ height (in inches) and $x_{2}=$ percent body fat is $\hat{y}=-121+3.50 x_{1}+1.35 x_{2}$
a. Find the predicted total body weight for a female athlete at the mean values of 66 and 18 for $x_{1}$ and $x_{2}$.
b. An athlete with $x_{1}=66$ and $x_{2}=18$ has actual weight $y=115$ pounds. Find the residual and interpret it.

Lucas Finney
Lucas Finney
Numerade Educator
View

Problem 2

Does study help GPA? For the Georgia Student Survey file on the book's website, the prediction equation relating $y=$ college $\mathrm{GPA}$ to $x_{1}=$ high school GPA and $x_{2}=$ study time (hours per day), is $\hat{y}=1.13+$ $0.643 x_{1}+0.0078 x_{2}$
a. Find the predicted college GPA of a student who has a high school GPA of 3.5 and who studies three hours a day.
b. For students with fixed study time, what is the change in predicted college GPA when high school GPA increases from 3.0 to $4.0 ?$

Victor Salazar
Victor Salazar
Numerade Educator
02:04

Problem 3

Predicting visitor satisfaction For all the restaurants in a city, the prediction equation for $y=$ average monthly visitor satisfaction rating (range $0-4.0$ where $0=$ very poor and $4=$ very good) and $x_{1}=$ the monthly food quality score given by the food inspection authority (range $0-4.0$ where $0=$ very poor and $4=$ very good) and $x_{2}=$ the number of visitors in a month is $\hat{y}=0.35+0.55 x_{1}+0.0015 x_{2}$
a. Find the predicted average monthly visitor satisfaction rating for a restaurant having (i) a monthly food quality score of 4.0 and 800 visitors in a month and (ii) a monthly food quality score of 2.0 and 200 visitors in a month.
b. For restaurants with $x_{2}=500$, show that $\hat{y}=1.10+0.55 x_{1}$
c. For restaurants with $x_{2}=600,$ show that $\hat{y}=1.25+0.55 x_{1}$. Thus, compared to part b, the slope for $x_{1}$ is still $0.55,$ and increasing $x_{2}$ by 100 (from 500 to 600 ) shifts the intercept upward by $100 \times\left(\right.$ slope for $\left.x_{2}\right)=100(0.0015)=0.15$ units.

Dominador Tan
Dominador Tan
Numerade Educator
01:39

Problem 4

Interpreting slopes on average monthly visitor satisfaction Refer to the previous exercise.
a. Explain why setting $x_{2}$ at a variety of values yields a collection of parallel lines relating $\hat{y}$ to $x_{1}$. What is the value of the slope for those parallel lines?
b. Since the slope 0.55 for $x_{1}$ is larger than the slope 0.0015 for $x_{2},$ does this imply that $x_{1}$ has a larger effect than $x_{2}$ on $y$ in this sample? Explain.

Adriano Chikande
Adriano Chikande
Numerade Educator
01:22

Problem 5

Does more education cause more crime? The FL Crime data file on the book's website has data for the 67 counties in Florida on
$y=$ crime rate: Annual number of crimes in county per 1000 population $x_{1}=$ education: Percentage of adults in county with at least a high school education
$x_{2}=$ urbanization: Percentage in county living in an urban environment. The figure shows a scatterplot matrix. MINITAB multiple regression results are also displayed.
a. Find the predicted crime rate for a county that has $0 \%$ in an urban environment and (i) $70 \%$ high school graduation rate and (ii) $80 \%$ high school graduation rate.
b. Use results from part a to explain how education affects the crime rate, controlling for urbanization, interpreting the slope coefficient -0.58 of education.
c. Using the prediction equation, show that the equation relating crime rate and education when urbanization is fixed at (i) $0,$ (ii) $50,$ and (iii) $100,$ is as follows:d. The scatterplot matrix shows that education has a positive association with crime rate, but the multiple regression equation shows that the association is negative when we keep $x_{2}=$ urbanization fixed. Consider the hypothetical figure that follows. Sketch lines that represent (i) the prediction equation from a simple regression model using only education and ignoring the information on urbanization and (ii) the prediction equation from the multiple regression model for counties having urbanization $=50 .$ Use these lines to explain the difference in the interpretation of the slope for education in simple and multiple regression models with regard to ignoring or controlling for urbanization. (Note: The reversal in the association between crime rate and education is an example of Simpson's paradox; see Example 16 in Sec. 3.4 and Example 18 in Sec. 10.5 ).

Tyler Moulton
Tyler Moulton
Numerade Educator
03:16

Problem 6

Crime rate and income Refer to the previous exercise. MINITAB reports the following results for the multiple regression of $y=$ crime rate on $x_{1}=$ median income (in thousands of dollars) and $x_{2}=$ urbanization.
a. Report the prediction equations relating crime rate to income at urbanization levels of (i) 0 and (ii) $100 .$ Interpret.
b. For the simple regression model relating $y=$ crime rate to $x=$ income, MINITAB reports
crime $=-11.6+2.61$ income
Interpret the effect of income, according to the sign of its slope. How does this effect differ from the effect of income in the multiple regression equation?
c. Use the estimated slope for income in the simple and multiple regression model to explain the difference in the interpretation of the slope when (i) ignoring urbanization (ii) controlling urbanization. (Note: The reversal in the association between income and education is an example of Simpson's paradox.)

Vaidik Stats
Vaidik Stats
Numerade Educator
01:55

Problem 7

The economics of golf The earnings of a PGA Tour golfer are determined by performance in tournaments. A study analyzed tour data to determine the financial return for certain skills of professional golfers. The sample consisted of 393 golfers competing in one or both of the 2002 and 2008 seasons. The most significant factors that contribute to earnings were the percent of attempts a player was able to hit the green in regulation (GIR), the number of times that a golfer made par or better after hitting a bunker divided by the number of bunkers that were hit (SS), the average of putts after reaching the green (AvePutt), and the number of PGA events entered (Events). The resulting coefficients from multiple regression to predict yearly earnings (in $\$$ ) are:
a. State the prediction equation for a PGA Tour golfer's yearly earnings.
b. Explain how to interpret the coefficient for AvePutt.
c. Find the predicted earnings for a golfer who had a GIR score of $60,$ SS score of $50,$ AvePutt of 1.5 and participated in 20 events.

Dominador Tan
Dominador Tan
Numerade Educator
06:40

Problem 8

Comparable number of bedrooms and house size effects In Example $2,$ the prediction equation between $y=$ selling price and $x_{1}=$ house size and $x_{2}=$ number of bedrooms was $\hat{y}=60,102+63.0 x_{1}+15,170 x_{2}$
a. For fixed number of bedrooms, how much is the house selling price predicted to increase for each square foot increase in house size? Why?
b. For a fixed house size of 2000 square feet, how does the predicted selling price change for two, three, and four bedrooms?

Jennifer Stoner
Jennifer Stoner
Numerade Educator
00:51

Problem 9

Controlling has an effect The slope of $x_{1}$ is not the same for multiple linear regression of $y$ on $x_{1}$ and $x_{2}$ as compared to simple linear regression of $y$ on $x_{1},$ where $x_{1}$ is the only predictor. Explain why you would expect this to be true. Does the statement change when $x_{1}$ and $x_{2}$ are uncorrelated?

Victor Salazar
Victor Salazar
Numerade Educator
00:51

Problem 10

Controlling has an effect The slope of $x_{1}$ is not the same for multiple linear regression of $y$ on $x_{1}$ and $x_{2}$ as compared to simple linear regression of $y$ on $x_{1},$ where $x_{1}$ is the only predictor. Explain why you would expect this to be true. Does the statement change when $x_{1}$ and $x_{2}$ are uncorrelated?

Victor Salazar
Victor Salazar
Numerade Educator
08:20

Problem 11

Used cars The following data (also available from the book's website) is from a random sample of campus newspaper ads on used cars for sale. Consider the age and horsepower (HP) of a car to predict its selling price. (The variable Type stands for whether the car is from the United States, coded as $1,$ or a foreign car, coded as $0 .$ This variable will be considered in Exercise $13.48 .)$
a. Construct a scatterplot matrix (or separate scatterplots) to investigate the relationship among price, age, and horsepower and interpret.
b. Find the multiple regression prediction equation for the selling price in terms of age and horsepower of a car. What is the predicted price for a car that has a horsepower of 80 and (i) is 8 years old, (ii) 10 years old, rounded to the nearest hundred?
c. Based on this multiple regression, can you predict the price difference between a car with $60 \mathrm{HP}$ and a car with 80 HP without knowing the ages of the two cars? Explain.

Robin Corrigan
Robin Corrigan
Numerade Educator