• Home
  • Textbooks
  • Essential Statistics: Exploring the World through Data
  • Regression Analysis: Exploring Associations between Variables

Essential Statistics: Exploring the World through Data

Robert Gould, Colleen Ryan, Rebecca Wong

Chapter 4

Regression Analysis: Exploring Associations between Variables - all with Video Answers

Educators


Chapter Questions

00:48

Problem 1

The scatterplots show SAT scores and GPA in college for a sample of students. The top graph uses the critical reading SAT score to predict GPA in college and the bottom graph shows math SAT to predict GPA. Which is the better predictor of GPA for these students, critical reading SAT or math SAT? Explain your answer.

Nick Johnson
Nick Johnson
Numerade Educator
00:50

Problem 2

The first graph shows the years a person was employed before working at the company and the salary at the company. The second graph shows the years employed at the company and the salary. Which graph shows a stronger relationship and could do a better job predicting salary at the company? (Source: Minitab 14)

Nick Johnson
Nick Johnson
Numerade Educator
00:35

Problem 3

The scatterplot below shows data on age of a sample students and the number of college credits attained. Comment on the strength, direction, and shape of the trend.

Nick Johnson
Nick Johnson
Numerade Educator
01:03

Problem 4

The scatterplot shows data on age and GPA for a sample of college students. Comment on the trend of the scatterplot. Is the trend positive, negative, or near zero?

Nick Johnson
Nick Johnson
Numerade Educator
00:51

Problem 5

The scatterplot shows data on credits attained and GPA for a sample of college students. Comment on the trend of the scatterplot. Is the trend positive, negative, or near zero?

Nick Johnson
Nick Johnson
Numerade Educator
00:30

Problem 6

The scatterplot shows data on salary and years of education for a sample of workers. Comment on the trend of the scatterplot. Is the trend positive, negative, or near zero?

Nick Johnson
Nick Johnson
Numerade Educator
00:41

Problem 7

The scatterplot shows the numbers of brothers and sisters for a large number of students. Do you think the trend is somewhat positive or somewhat negative? What does the direction (positive or negative) of the trend mean? Does the direction make sense in this context?

Nick Johnson
Nick Johnson
Numerade Educator
01:54

Problem 8

Describe the trend in the scatterplot of house price and area for some houses. State which point appears to be an outlier that does not fit the rest of the data.

Jerelyn Nevil
Jerelyn Nevil
Numerade Educator
00:29

Problem 9

The scatterplot shows the number of work hours and the number of TV hours per week for some college students who work. There is a very slight trend. Is the trend positive or negative? What does the direction of the trend mean in this context? Identify any unusual points.

Nick Johnson
Nick Johnson
Numerade Educator
00:23

Problem 10

The scatterplot shows the number of hours of work per week and the number of hours of sleep per night for some college students. Does the graph show a strong increasing trend, a strong decreasing trend, or very little trend? Explain.

Nick Johnson
Nick Johnson
Numerade Educator
00:21

Problem 11

The scatterplot shows the age and number of hours of sleep "last night" for some students. Do you think the trend is slightly positive or slightly negative? What does that mean?

Nick Johnson
Nick Johnson
Numerade Educator
01:03

Problem 12

The figure shows a scatterplot of the heights and weights of some women taking statistics. Describe what you see. Is the trend positive, negative, or near zero? Explain.

Nick Johnson
Nick Johnson
Numerade Educator
00:53

Problem 13

a. The first scatterplot shows the college tuition and percentage acceptance at some colleges in Massachusetts. Would it make sense to find the correlation using this data set? Why or why not?
b. The second scatterplot shows the composite grade on the ACT (American College Testing) exam and the English grade on the same exam. Would it make sense to find the correlation using this data set? Why or why not?

Nick Johnson
Nick Johnson
Numerade Educator
00:37

Problem 14

The figure shows a scatterplot of birthrate (live births per 1000 women) and the age of the mother in the United States. Would it make sense to find the correlation for this data set? Explain. According to this graph, at approximately what age does the highest fertility rate occur? (Source: Wendel and Wendel (eds.), Vital statistics of the United States: Births, life expectancy, deaths, and selected health data, 2 nd ed. [Lanham, MD: Bernan Press, 2006])

Nick Johnson
Nick Johnson
Numerade Educator
00:36

Problem 15

The scatterplot shows the LSAT (Law School Aptitude Test) scores for a sample of law schools and the percent of students who were employed immediately after law school graduation. Do you think the correlation coefficient among these variables is positive, negative, or near zero? Give a reason for your choice. (Source: Internet Legal Research Group)

Nick Johnson
Nick Johnson
Numerade Educator
00:35

Problem 16

The scatterplot shows the acceptance rate and selectivity index for a sample of medical schools. The acceptance rate is the percentage of applicants who were accepted into the medical school. The selectivity index is a measure based on GPA, test scores, and acceptance rates. A higher index indicates a more selective school. Do you think the correlation coefficient among these variables is positive, negative, or near zero? Give a reason for your choice. (Source: Accepted.com)

Nick Johnson
Nick Johnson
Numerade Educator
00:57

Problem 17

Pick the letter of the graph that goes with each numerical value listed below for the correlation. Correlations:
0.767 _________
0.299 _________
-0.980 _________

Nick Johnson
Nick Johnson
Numerade Educator
00:25

Problem 18

Pick the letter of the graph that goes with each numerical value listed below for the correlation. Correlations:
-0.903 _________
0.374 _________
0.777 _________

Nick Johnson
Nick Johnson
Numerade Educator
00:47

Problem 19

Match each of the following correlations with the corresponding graph.
0.87 _________
-0.47 _________
0.67 _________

Nick Johnson
Nick Johnson
Numerade Educator
00:44

Problem 20

Match each of the following correlations with the corresponding graph.
-0.51 _________
0.98 _________
0.18 _________

Nick Johnson
Nick Johnson
Numerade Educator
05:05

Problem 21

The distance (in kilometers) and price (in dollars) for one-way airline tickets from San Francisco to several cities are shown in the table.
$$
\begin{array}{|lcc|}
\hline \text { Destination } & \text { Distance }(\mathbf{k m}) & \text { Price (\$) } \\
\hline \text { Chicago } & 2960 & 229 \\
\hline \text { New York City } & 4139 & 299 \\
\hline \text { Seattle } & 1094 & 146 \\
\hline \text { Austin } & 2420 & 127 \\
\hline \text { Atlanta } & 3440 & 152 \\
\hline
\end{array}
$$
a. Find the correlation coefficient for this data using a computer or statistical calculator. Use distance as the $x$ -variable and price as the $y$ -variable.
b. Recalculate the correlation coefficient for this data using price as the $x$ -variable and distance as the $y$ -variable. What effect does this have on the correlation coefficient?
c. Suppose a $$\$ 50$$ security fee was added to the price of each ticket. What effect would this have on the correlation coefficient?
d. Suppose the airline held an incredible sale, where travelers got a round-trip ticket for the price of a one-way ticket. This means that the distances would be doubled while the ticket price remained the same.
What effect would this have on the correlation coefficient?

Nick Johnson
Nick Johnson
Numerade Educator
04:01

Problem 22

The table for part (a) shows distances between selected cities and the cost of a business class train ticket for travel between these cities.
a. Calculate the correlation coefficient for the data shown in the table by using a computer or statistical calculator.
$$
\begin{array}{|c|c|}
\hline \text { Distance (in miles) } & \text { Cost (in \$) } \\
\hline 439 & 281 \\
\hline 102 & 152 \\
\hline 215 & 144 \\
\hline 310 & 293 \\
\hline 406 & 281 \\
\hline
\end{array}
$$
b. The table for part (b) shows the same information, except that the distance was converted to kilometers by multiplying the number of miles by $1.609$. What happens to the correlation when the numbers are multiplied by a constant?
$$
\begin{array}{|c|c|}
\hline \text { Distance (in kilometers) } & \text { Cost } \\
\hline 706 & 281 \\
\hline 164 & 152 \\
\hline 346 & 144 \\
\hline 499 & 293 \\
\hline 653 & 281 \\
\hline
\end{array}
$$
c. Suppose a surcharge is added to every train ticket to fund track maintenance. A fee of $$\$ 20$$ is added to each ticket, no matter how long the trip is. The following table shows the new data. What happens to the correlation coefficient when a constant is added to each number?
$$
\begin{array}{|c|c|}
\hline \text { Distance (in miles) } & \text { Cost (in \$) } \\
\hline 439 & 301 \\
\hline 102 & 172 \\
\hline 215 & 164 \\
\hline 310 & 313 \\
\hline 406 & 301 \\
\hline
\end{array}
$$

Nick Johnson
Nick Johnson
Numerade Educator
00:47

Problem 23

In Exercise $4.1$ there is a graph of the relationship between SAT score and college GPA. SAT score was the predictor and college GPA was the response variable. If you reverse the variables so that college GPA was the predictor and SAT score was the response variable, what effect would this have on the numerical value of the correlation coefficient?

Nick Johnson
Nick Johnson
Numerade Educator
00:27

Problem 24

The correlation between house price (in dollars) and area of the house (in square feet) for some houses is 0.91. If you found the correlation between house price in thousands of dollars and area in square feet for the same houses, what would the correlation be?

Nick Johnson
Nick Johnson
Numerade Educator
02:05

Problem 25

Seth Wagerman, a former professor at California Lutheran University, went to the website RateMyProfessors.com and looked up the quality rating and also the "easiness" of the six full-time professors in one department. The ratings are 1 (lowest quality) to 5 (highest quality) and 1 (hardest) to 5 (easiest). The numbers given are averages for each professor. Assume the trend is linear, find the correlation, and comment on what it means.
$$
\begin{array}{|c|c|}
\hline \text { Quality } & \text { Easiness } \\
\hline 4.8 & 3.8 \\
\hline 4.6 & 3.1 \\
\hline 4.3 & 3.4 \\
\hline 4.2 & 2.6 \\
\hline 3.9 & 1.9 \\
3.6 & 2.0 \\
\hline
\end{array}
$$

Nick Johnson
Nick Johnson
Numerade Educator
01:54

Problem 26

Five people were asked how many female first cousins they had and how many male first cousins. The data are shown in the table. Assume the trend is linear, find the correlation. and comment on what it means.
$$
\begin{array}{|c|c|}
\hline \text { Female } & \text { Male } \\
\hline 2 & 4 \\
\hline 1 & 0 \\
\hline 3 & 2 \\
\hline 5 & 8 \\
\hline 2 & 2 \\
\hline
\end{array}
$$

Nick Johnson
Nick Johnson
Numerade Educator
00:27

Problem 27

USA Today College published an article with the headline "Positive Correlation Found between Gym Usage and GPA." Explain what a positive correlation means in the context of this headline.

Nick Johnson
Nick Johnson
Numerade Educator
00:37

Problem 28

United Press International published an article with the headline "Study Fînds Correlation between Education, Life Expectancy." Would you expect this correlation to be negative or positive? Explain your reasoning in the context of this headline.

Nick Johnson
Nick Johnson
Numerade Educator
03:01

Problem 29

The scatterplot shows the median starting salaries and the median mid-career salaries for graduates at a selection of colleges. (Source: The Wall Street Journal, Salary increase by salary type, http://online.wsj.com/ public/resources/documents/info-Salaries_for_Colleges_by_Typesort.html. Accessed via StatCrunch. Owner: Webster West)
a. As the data are graphed, which is the independent and which the dependent variable?
b. Why do you suppose median salary at a school is used instead of the mean?
c. Using the graph, estimate the median mid-career salary for a median starting salary of $$\$ 60,000$$.
d. Use the equation to predict the median mid-career salary for a median starting salary of $$\$ 60,000$$.
e. What other factors besides starting salary might influence mid-career salary?

Nick Johnson
Nick Johnson
Numerade Educator
02:49

Problem 30

The graph shows the heights of mothers and daughters. (Source: StatCrunch: Mother and Daughter Heights.xls. Owner: craig_slinkman)
a, As the data are graphed, which is the independent variable and which the dependent variable?
b. From the graph, approximate the predicted height of the daughter of a mother who is 60 inches $(5$ feet $)$ tall.
c. From the equation, determine the predicted height of the daughter of a mother who is 60 inches tall.
d. Interpret the slope.
e. What other factors besides mother's height might influence the daughter's height?

James Kiss
James Kiss
Numerade Educator
03:12

Problem 31

The scatterplot shows the median weekly earning (by quarter) for men and women in the United States for the years from 2005 through 2017 . The correlation is 0.974. (Source: Bureau of Labor Statistics)
a. Use the scatterplot to estimate the median weekly income for women in a quarter in which the median pay for men is about $$\$ 850$$.
b. Use the regression equation shown above the graph to get a more precise estimate of the median pay for women in a quarter in which the median pay for men is $$\$ 850$$.
c. What is the slope of the regression equation? Interpret the slope of the regression equation.
d. What is the $y$ -intercept of the regression equation? Interpret the $y$ -intercept of the regression equation or explain why it would be inappropriate
to do so.

Nick Johnson
Nick Johnson
Numerade Educator
02:27

Problem 32

The scatterplot shows the size (in square feet) and selling prices for homes in a certain zip code in California. (Source: realtor.com)
a. Use the graph to estimate the selling price of a home with 2000 square feet.
b. Use the equation to predict the selling price for a home with 2000 square feet.
c. What is the slope of the regression equation? Interpret the slope of the regression equation.
d. What is the $y$ -intercept of the regression equation? Interpret the $y$ -intercept of the regression equation or explain why it would be inappropriate to do so.

James Kiss
James Kiss
Numerade Educator
03:52

Problem 33

TI-84 output from a linear model for predicting arm span (in centimeters) from height (in inches) is given in the figure. Summary statistics are also provided.
$$
\begin{array}{|lrc|}
\hline & \text { Mean } & \text { Standard Deviation } \\
\hline \text { Height, } x & 63.59 & 3.41 \\
\hline \text { Arm span, } y & 159.86 & 8.10 \\
\hline
\end{array}
$$
To do parts a through $\mathrm{c}$, assume that the association between arm span and height is linear.
a. Report the regression equation, using the words height and arm span, not $x$ and $y$, employing the output given.
b. Verify the slope by using the formula $b=r \frac{s_{y}}{s_{x}}$.
c. Verify the $y$ -intercept using $a=\bar{y}-b \bar{x}$.
d. Using the regression equation, predict the arm span (in centimeters) for someone 64 inches tall.

Nick Johnson
Nick Johnson
Numerade Educator
02:43

Problem 34

The computer output shown below is for predicting foot length from hand length (in centimeters) for a group of women. Assume the trend is linear. Summary statistics for the data are shown in the table below.
$$
\begin{array}{|l|l|c|}
\hline & \text { Mean } & \text { Standard Deviation } \\
\hline \text { Hand, } x & 17.682 & 1.168 \\
\hline \text { Foot, } y & 23.318 & 1.230 \\
\hline
\end{array}
$$

Nick Johnson
Nick Johnson
Numerade Educator
01:17

Problem 35

Height and Arm Span for Men (Example 5) Measurements were made for a sample of adult men. A regression line was fit to predict the men's arm span from their height. The output from several different statistical technologies is provided. The scatterplot confirms that the association between arm span and height is linear.
a. Report the equation for predicting arm span from height. Use words such as arm span, not just $x$ and $y$
b. Report the slope and the intercept from each technology, using all the digits given.

Nick Johnson
Nick Johnson
Numerade Educator
00:57

Problem 36

Hand Length and Foot Length for Men Measurements were made for a sample of adult men. Assume that the association between their hand length and foot length is linear. Output for predicting foot length from hand length is provided from several different statistical technologies.
a. Report the equation for predicting foot length from hand length. Use the variable names Foot $L$ and Hand $L$ in the equation, rather than $x$ and $y$.
b. Report the slope and the intercept from each technology, using all the digits given.

Nick Johnson
Nick Johnson
Numerade Educator
01:20

Problem 37

The correlation between height and armspan in a sample of adult women was found to be $r=0.948 .$ The correlation between arm span and height in a sample of adult men was found to be $r=0.868$. Assuming both associations are linear, which association-the association between height and arm span for women, or the association between height and arm span for men-is stronger? Explain.

Nick Johnson
Nick Johnson
Numerade Educator
00:40

Problem 38

The scatterplot shows a solid blue line for predicting weight from age of men; the dotted red line is for predicting weight from age of women. The data were collected from a large statistics class.
a. Which line is higher and what does that mean?
b. Which line has a steeper slope and what does that mean?

Nick Johnson
Nick Johnson
Numerade Educator
01:29

Problem 39

The following graph shows the winning percentages in singles matches and doubles matches for a sample of male professional tennis players. (Source: tennis.com)
a. Based on this scatterplot, would you say there is a strong linear association between these two variables?
b. Would the numerical value of the correlation between these two variables be close to negative one, positive one, or zero? Give a reason for
your answer.
c. Based on this graph, do you think one can accurately predict a professional tennis player's doubles winning percentage based on his singles winning percentage?

Nick Johnson
Nick Johnson
Numerade Educator
00:18

Problem 40

The figure shows a scatterplot of the height of the left seat of a seesaw and the height of the right seat of the same seesaw. Estimate the numerical value of the correlation, and explain the reason for your estimate.

Nick Johnson
Nick Johnson
Numerade Educator
01:31

Problem 41

Indicate which variable you think should be the predictor $(x)$ and which variable should be the response $(y) .$ Explain your choices.
a. You have collected data on used cars for sale. The variables are price and odometer readings of the cars.
b. Research is conducted on monthly household expenses. Variables are monthly water bill and household size.
c. A personal trainer gathers data on the weights and time spent in the gym for each of her clients.

Nick Johnson
Nick Johnson
Numerade Educator
01:19

Problem 42

Indicate which variable you think should be the predictor $(x)$ and which variable should be the response $(y) .$ Explain your choices.
a. A researcher measures subjects' stress levels and blood pressures.
b. Workers who commute by car record the length of their commutes (in miles) and the amount spent monthly on gasoline purchases.
c. Amusement parks record the heights and maximum speeds of roller
coasters.

Nick Johnson
Nick Johnson
Numerade Educator
01:20

Problem 43

The following figure shows a scatterplot with the regression line. The data are for the 50 states. The predictor is the percentage of smokefree homes. The response is the percentage of high school students who smoke. The data came from the Centers for Disease Control
and Prevention.
a. Explain what the trend shows.
b. Use the regression equation to predict the percentage of students in high school who smoke, assuming that there are $70 \%$ smoke-free homes in the state. Use 70 not $0.70$.

Nick Johnson
Nick Johnson
Numerade Educator
01:22

Problem 44

The following figure shows a scatterplot with a regression line. The data are for the 50 states. The predictor is the percentage of adults who smoke. The response is the percentage of high school students who smoke. (The point in the lower left is Utah.)
a. Explain what the trend shows.
b. Use the regression equation to predict the percentage of high school students who smoke, assuming that $25 \%$ of adults in the state smoke. Use 25, not $0.25$.

Nick Johnson
Nick Johnson
Numerade Educator
01:01

Problem 45

The following graph shows the average car insurance premium for a sample of ages. (Source:
valuepenguin.com)
a. Explain what the graph tells us about insurance rates for drivers at different ages. Explain why insurance rates might follow this trend.
b. Would it be appropriate to do a linear regression analysis on these data? Why or why not?

Nick Johnson
Nick Johnson
Numerade Educator
00:58

Problem 46

The graph shows the monthly premiums for a 10-year $$\$ 250,000$$ male life insurance policy by age of purchase. For example, a 20-year-old male could purchase such a policy for about $$\$ 10$$ per month, while a 50 -year-old male would pay about $$\$ 24$$ per month for the same policy.
a. Explain what the graph tells us about life insurance rates for males at different ages. Explain why life insurance rates might follow this trend.
b. Would it be appropriate to do a linear regression analysis on these data? Why or why not?

Nick Johnson
Nick Johnson
Numerade Educator
01:01

Problem 47

The following table gives the distance from Boston to each city (in thousands of miles) and gives the time for one randomly chosen, commercial airplane to make that flight. Do a complete regression analysis that includes a scatterplot with the line, interprets the slope and intercept, and predicts how much time a nonstop flight from Boston to Seattle would take. The distance from Boston to Seattle is 3000 miles. See page 209 for guidance.
$$
\begin{array}{|lcc|}
\hline \text { City } & \begin{array}{c}
\text { Distance } \\
\text { (1000s of miles) }
\end{array} & \text { Time (hours) } \\
\hline \text { St. Louis } & 1.141 & 2.83 \\
\hline \text { Los Angeles } & 2.979 & 6.00 \\
\hline \text { Paris } & 3.346 & 7.25 \\
\hline \text { Denver } & 1.748 & 4.25 \\
\hline \text { Salt Lake City } & 2.343 & 5.00 \\
\hline \text { Houston } & 1.804 & 4.25 \\
\hline \text { New York } & 0.218 & 1.25 \\
\hline
\end{array}
$$

Nick Johnson
Nick Johnson
Numerade Educator
01:01

Problem 48

The following table gives the distance from Boston to each city and the cost of a train ticket from Boston to that city for a certain date.
$$
\begin{array}{lcc}
\hline \text { City } & \text { Distance (in miles) } & \text { Ticket Price (in \$) } \\
\hline \text { Washington, } & 439 & 181 \\
\text { D.C. } & & \\
\hline \text { Hartford } & 102 & 73 \\
\hline \text { New York } & 215 & 79 \\
\hline \text { Philadelphia } & 310 & 293 \\
\hline \text { Baltimore } & 406 & 175 \\
\hline \text { Charlotte } & 847 & 288 \\
\hline \text { Miami } & 1499 & 340 \\
\hline \text { Roanoke } & 680 & 219 \\
\hline \text { Atlanta } & 1086 & 310 \\
\hline
\end{array}
$$
$$
\begin{array}{lcc}
\text { City } & \text { Distance (in miles) } & \text { Ticket Price (in \$) } \\
\hline \text { Tampa } & 1349 & 370 \\
\text { Montgomery } & 1247 & 373 \\
\text { Columbus } & 776 & 164 \\
\hline \text { Indianapolis } & 950 & 245 \\
\hline \text { Detroit } & 707 & 189 \\
\hline \text { Nashville } & 1105 & 245 \\
\hline
\end{array}
$$
a. Use technology to produce a scatterplot. Based on your scatterplot do you think there is a strong linear relationship between these two variables? Explain.
b. Compute $r$ and write the equation of the regression line. Use the words "Ticket Price" and "Distance" in your equation. Round off to two decimal places.
c. Provide an interpretation of the slope of the regression line.
d. Provide an interpretation of the $y$ -intercept of the regression line or explain why it would not be appropriate to do so.
e. Use the regression equation to predict the cost of a train ticket from Boston to Pittsburgh, a distance of 572 miles.

Nick Johnson
Nick Johnson
Numerade Educator
03:16

Problem 49

The following table gives the number of millionaires (in thousands) and the population (in hundreds of thousands) for the states in the northeastern region of the United States in 2008 . The numbers of millionaires come from Forbes Magazine in March 2007 .
a. Without doing any calculations, predict whether the correlation and slope will be positive or negative. Explain your prediction.
b. Make a scatterplot with the population (in hundreds of thousands) on the $x$ -axis and the number of millionaires (in thousands) on the $y$ -axis. Was your prediction correct?
c. Find the numerical value for the correlation.
d. Find the value of the slope and explain what it means in context. Be careful with the units.
e. Explain why interpreting the value for the intercept does not make sense in this situation.
$$
\begin{array}{|l|c|r|}
\hline \text { State } & \text { Millionaires } & \text { Population } \\
\hline \text { Connecticut } & 86 & 35 \\
\hline \text { Delaware } & 18 & 8 \\
\hline \text { Maine } & 22 & 13 \\
\hline \text { Massachusetts } & 141 & 64 \\
\hline \text { New Hampshire } & 26 & 13 \\
\hline \text { New Jersey } & 207 & 87 \\
\hline \text { New York } & 368 & 193 \\
\hline \text { Pennsylvania } & 228 & 124 \\
\hline \text { Rhode Island } & 20 & 11 \\
\hline \text { Vermont } & 11 & 6 \\
\hline
\end{array}
$$

Vaidik Stats
Vaidik Stats
Numerade Educator
01:36

Problem 50

The following table give the Rotten Tomatoes and Metacritic scores for the several movies produced in 2017 . Both of these ratings systems give movies a score using a scale from 0 to 100 . (Source: vox.com)
a. Use technology to make a scatterplot using Rotten Tomatoes as the independent variable and Metacritic as the dependent variable. Based on your scatterplot, do you think there is a strong linear association between these variables?
b. Compute the correlation coefficient, $r$, and write the equation of the regression line. Use the words "Rotten Tomatoes" and "Metacritic" in your equation. Round to two decimal places.
c. Provide an interpretation of the slope of the regression line.
d. Provide an interpretation of the $y$ -intercept of the regression line or explain why it would not be appropriate to do so.

Shu Naito
Shu Naito
Numerade Educator
02:11

Problem 51

Pitchers The table shows the Earned Run Average (ERA) and WHIP rating (walks plus hits per inning) for the top 40 Major League Baseball pitchers in the 2017 season. Top pitchers will tend to have low ERA and WHIP ratings. (Source: ESPN.com)
a. Make a scatterplot of the data and state the sign of the slope from the scatterplot. Use WHIP to predict ERA.
b. Use linear regression to find the equation of the best-fit line. Show the line on the scatterplot using technology or by hand.
c. Interpret the slope.
d. Interpret the $y$ -intercept or explain why it would be inappropriate to do so.
$$
\begin{aligned}
&\begin{array}{|ll|}
\hline \text { WHIP } & \text { ERA } \\
\hline 0.87 & 2.25 \\
\hline 0.95 & 2.31 \\
\hline 0.9 & 2.51 \\
\hline 1.02 & 2.52 \\
\hline 1.15 & 2.89 \\
\hline 0.97 & 2.9 \\
\hline
\end{array}\\
&\begin{array}{|c|c|}
\hline \text { WHIP } & \text { ERA } \\
\hline 1.21 & 3.55 \\
\hline 1.22 & 3.64 \\
\hline 1.22 & 3.66 \\
\hline 1.27 & 3.82 \\
\hline 1.15 & 3.83 \\
\hline 1.16 & 3.86 \\
\hline
\end{array}
\end{aligned}
$$

Lucas Finney
Lucas Finney
Numerade Educator
02:30

Problem 52

The following table shows the number of text messages sent and received by some people in one day. (Source: StatCrunch: Responses to survey How often do you text? Owner: Webster West. A subset was used.)
a. Make a scatterplot of the data, and state the sign of the slope from the scatterplot. Use the number sent as the independent variable.
b. Use linear regression to find the equation of the best-fit line. Graph the line with technology or by hand.
c. Interpret the slope.
d. Interpret the intercept.
$$
\begin{aligned}
&\begin{array}{|c|c|}
\hline \text { Sent } & \text { Received } \\
\hline 1 & 2 \\
\hline 1 & 1 \\
\hline 0 & 0 \\
\hline 5 & 5 \\
\hline 5 & 1 \\
\hline 50 & 75 \\
\hline 6 & 8 \\
\hline 5 & 7 \\
\hline 300 & 300 \\
\hline 30 & 40 \\
\hline
\end{array}\\
&\begin{array}{|r|r|}
\hline \text { Sent } & \text { Received } \\
\hline 10 & 10 \\
\hline 3 & 5 \\
\hline 2 & 2 \\
\hline 5 & 5 \\
\hline 0 & 0 \\
\hline 2 & 2 \\
\hline 200 & 200 \\
\hline 1 & 1 \\
\hline 100 & 100 \\
\hline 50 & 50 \\
\hline
\end{array}
\end{aligned}
$$

Sheryl Ezze
Sheryl Ezze
Numerade Educator
01:11

Problem 53

Answer the questions using complete sentences.
a. What is an influential point? How should influential points be treated when doing a regression analysis?
b. What is the coefficient of determination and what does it measure?
c. What is extrapolation? Should extrapolation ever be used?

Nick Johnson
Nick Johnson
Numerade Educator
02:38

Problem 54

Answer the questions using complete sentences.
a. An economist noted the correlation between consumer confidence and monthly personal savings was negative. As consumer confidence increases, would we expect monthly personal savings to increase, decrease, or remain constant?
b. A study found a correlation between higher education and lower death rates. Does this mean that one can live longer by going to college? Why
or why not?

Sandile Ndlovu
Sandile Ndlovu
Numerade Educator
01:00

Problem 55

If there is a positive correlation between number of years studying math and shoe size (for children), does that prove that larger shoes cause more studying of math or vice versa? Can you think of a confounding variable that might be influencing both of the other variables?

Nick Johnson
Nick Johnson
Numerade Educator
01:44

Problem 56

Suppose that the growth rate of children looks like a straight line if the height of a child is observed at the ages of 24 months, 28 months, 32 months, and 36 months. If you use the regression obtained from these ages and predict the height of the child at 21 years, you might find that the predicted height is 20 feet. What is wrong with the prediction and the process used?

Nick Johnson
Nick Johnson
Numerade Educator
01:14

Problem 57

If the correlation between height and weight of a large group of people is $0.67$, find the $\mathrm{co}$ efficient of determination (as a percentage) and explain what it means. Assume that height is the predictor and weight is the response, and assume that the association between height and weight is linear.

Nick Johnson
Nick Johnson
Numerade Educator
00:48

Problem 58

Does a correlation of $-0.70$ or $+0.50$ give a larger coefficient of determination? We say that the linear relationship that has the larger coefficient of determination is more strongly correlated. Which of the values shows a stronger correlation?

Nick Johnson
Nick Johnson
Numerade Educator
00:18

Problem 59

Some investors use a technique called the "Dogs of the Dow" to invest. They pick several stocks that are performing poorly from the Dow Jones group (which is a composite of 30 wellknown stocks) and invest in these. Explain why these stocks will probably do better than they have done before.

Nick Johnson
Nick Johnson
Numerade Educator
00:24

Problem 60

Suppose a doctor telephones those patients who are in the highest $10 \%$ with regard to their recently recorded blood pressure and asks them to return for a clinical review. When she retakes their blood pressures, will those new blood pressures, as a group (that is, on average), tend to be higher than, lower than, or the same as the earlier blood pressures, and why?

Nick Johnson
Nick Johnson
Numerade Educator
01:03

Problem 61

The equation for the regression line relating the salary and the year first employed is given above the figure.
a. Report the slope and explain what it means.
b. Either interpret the intercept $(4,255,000)$ or explain why it is not appropriate to interpret the intercept.

Nick Johnson
Nick Johnson
Numerade Educator
01:05

Problem 62

The following figure shows the relationship between the number of miles per gallon on the highway and that in the city for some cars.
a. Report the slope and explain what it means.
b. Either interpret the intercept $(7.792)$ or explain why it is not appropriate to interpret the intercept.

Nick Johnson
Nick Johnson
Numerade Educator
02:09

Problem 63

The following table shows the weights and prices of some turkeys at different supermarkets.
a. Make a scatterplot with weight on the $x$ -axis and cost on the $y$ -axis. Include the regression line on your scatterplot.
b. Find the numerical value for the correlation between weight and price. Explain what the sign of the correlation shows.
c. Report the equation of the best-fit straight line, using weight as the predictor $(x)$ and cost as the response $(y)$.
d. Report the slope and intercept of the regression line, and explain what they show. If the intercept is not appropriate to report, explain why.
e. Add a new point to your data: a 30 -pound turkey that is free. Give the new value for $r$ and the new regression equation. Explain what the negative correlation implies. What happened?
f. Find and interpret the coefficient of determination using the original data.
$$
\begin{array}{|c|c|}
\hline \text { Weight (pounds) } & \text { Price } \\
\hline 12.3 & \$ 17.10 \\
\hline 18.5 & \$ 23.87 \\
\hline 20.1 & \$ 26.73 \\
\hline 16.7 & \$ 19.87 \\
\hline 15.6 & \$ 23.24 \\
\hline 10.2 & \$ 9.08
\end{array}
$$

Tyler Moulton
Tyler Moulton
Numerade Educator
10:08

Problem 64

The table shows the calories in a five-ounce serving and the $\%$ alcohol content for a sample of wines. (Source: healthalicious.com)
$$
\begin{array}{|c|c|}
\hline \text { Calories } & \% \text { alcohol } \\
\hline 122 & 10.6 \\
\hline 119 & 10.1 \\
\hline 121 & 10.1 \\
\hline 123 & 8.8 \\
\hline 129 & 11.1 \\
\hline 236 & 15.5 \\
\hline
\end{array}
$$
a. Make a scatterplot using $\%$ alcohol as the independent variable and calories as the dependent variable. Include the regression line on your scatterplot. Based on your scatterplot do you think there is a strong linear relationship between these variables?
b. Find the numerical value of the correlation between $\%$ alcohol and
calories. Explain what the sign of the correlation means in the context of this problem.
c. Report the equation of the regression line and interpret the slope of the regression line in the context of this problem. Use the words calories and $\%$ alcohol in your equation. Round to two decimal places.
d. Find and interpret the value of the coefficient of determination.
e. Add a new point to your data: a wine that is $20 \%$ alcohol that contains 0 calories. Find $r$ and the regression equation after including this new data point. What was the effect of this one data point on the value of $r$ and the slope of the regression equation?

Jerrah Biggerstaff
Jerrah Biggerstaff
Numerade Educator
01:48

Problem 65

The scatterplot shows the average teacher pay and the per pupil expenditure for each of the 50 states and the District of Columbia. The regression equation is also shown. (Source: The 2017 World Almanac and Book of Facts).
a. From the scatterplot is the correlation between average teacher pay and per pupil expenditure positive, negative, or near zero?
b. What is the slope of the regression equation? Interpret the slope in the context of the problem.
c. What is the $y$ -intercept of the regression equation? Interpret the $y$ -intercept or explain why it would be inappropriate to do so for this problem.
d. Use the regression equation to estimate the per pupil expenditure for a state with an average teacher pay of $$\$ 60,000$$.

Nick Johnson
Nick Johnson
Numerade Educator
03:00

Problem 66

The scatterplot shows the average teacher pay and high school graduation percentage rate for each of the 50 states and the District of Columbia. The regression equation is also shown. (Source: 2017 World Almanac Book of Facts and higheredinfo.org)
a. Based on the scatterplot is the correlation between average teacher pay and high school graduation rate positive, negative, or near zero?
b. Should the regression equation be used to predict the high school graduation rate for a state with an average teacher salary of $$\$ 60,000 ?$$ If so, predict the graduation rate. If not, explain why the regression equa-
tion should not be used to make this prediction.

Nick Johnson
Nick Johnson
Numerade Educator
01:50

Problem 67

Grades on a political science test and the number of hours of paid work in the week before the test were recorded. The instructor was trying to predict the grade on a test from the hours of work. The following figure shows a scatterplot and the regression line for these data.
a. Referring to the figure, state whether you think the correlation is positive or negative, and explain your prediction.
b. Interpret the slope.
c. Interpret the intercept.

Nick Johnson
Nick Johnson
Numerade Educator
03:41

Problem 68

Data were collected that included information on the weight of the trash (in pounds) on the street for one week and the number of people who live in the house. The following figure shows a scatterplot with the regression line.
a. Is the trend positive or negative? What does that mean?
b. Now calculate the correlation between the weight of trash and the number of people. (Use $r$ -squared from the figure and take the square root of it.)
c. Report the slope. For each additional person in the house, there are, on
average, how many additional pounds of trash?
d. Either report the intercept or explain why it is not appropriate to interpret it.

Nick Johnson
Nick Johnson
Numerade Educator
01:17

Problem 69

Data on the number of home runs, strikeouts, and batting averages for a sample of 50 Major League Baseball players were obtained. Regression analyses were conducted on the relationships between home runs and strikeouts and between home runs and batting averages. The StatCrunch results are shown below. (Source: mlb.com)
Simple linear regression results:
Dependent Variable: Home Runs Independent Variable: Strikeouts Home Runs $=0.092770565+0.22866236$ Strikeouts
Sample size: 50 $\mathrm{R}$ (correlation coefficient) $=0.63591835$
$\mathrm{R}-\mathrm{sq}=0.40439215$
Estimate of error standard deviation: $8.7661607$
Simple linear regression results:
Dependent Variable: Home Runs Independent Variable: Batting Average Home Runs $=45.463921-71.232795$ Batting Average Sample size: 50 $\mathrm{R}$ (correlation coefficient) $=-0.093683651$
$\mathrm{R}-\mathrm{sq}=0.0087766264$
Estimate of error standard deviation: $11.30876$
Based on this sample, is there a stronger association between home runs and strikeouts or home runs and batting average? Provide a reason for your choice based on the StatCrunch results provided.

Nick Johnson
Nick Johnson
Numerade Educator
01:43

Problem 70

Data on the 3-point percentage, field-goal percentage, and free-throw percentage for a sample of 50 professional basketball players were obtained. Regression analyses were conducted on the relationships between 3-point percentage and field-goal percentage and between 3-point percentage and freethrow percentage. The StatCrunch results are shown below. (Source: nba.com)
Simple linear regression results:
Dependent Variable: 3 Point \% Independent Variable: Field Goal \% 3 Point $\%=40.090108-0.091032596$ Field Goal \% Sample size: 50 $\mathrm{R}$ (correlation coefficient) $=-0.048875984$
$\mathrm{R}-\mathrm{sq}=0.0023888618$
Estimate of error standard deviation: $7.7329785$
Simple linear regression results:
Dependent Variable: 3 Point \% Independent Variable: Free Throw \% 3 Point $\%=-8.2347225+0.54224127$ Free Throw $\%$ Sample size: 50 $\mathrm{R}$ (correlation coefficient) $=0.57040364$
$\mathrm{R}-\mathrm{sq}=0.32536031$
Estimate of error standard deviation: $6.3591944$
Based on this sample, is there a stronger association between 3 -point percentage and field-goal percentage or 3 -point percentage and freethrow percentage? Provide a reason for your choice based on the StatCrunch results provided.

Nick Johnson
Nick Johnson
Numerade Educator
06:02

Problem 71

Data from the National Data shown in the table are the 4 th-grade reading and math scores for a sample of states from the National Assessment of Educational Progress. The scores represent the percentage of 4 thgraders in each state who scored at or above basic level in reading and math. A scatterplot of the data suggests a linear trend. (Source: nationsreportcard.gov)
$$
\begin{array}{|c|c|c|c|}
\hline \begin{array}{c}
\text { 4th-Grade } \\
\text { Reading } \\
\text { Scores }
\end{array} & \begin{array}{c}
\text { 4th-Grade } \\
\text { Math Scores }
\end{array} & \begin{array}{c}
\text { 4th-Grade } \\
\text { Reading } \\
\text { Scores }
\end{array} & \begin{array}{c}
\text { 4th-Grade } \\
\text { Math Scores }
\end{array} \\
\hline 65 & 75 & 68 & 78 \\
\hline 61 & 78 & 61 & 79 \\
\hline 62 & 79 & 69 & 80 \\
\hline 65 & 79 & 68 & 77 \\
\hline 59 & 72 & 75 & 89 \\
\hline 72 & 82 & 71 & 84 \\
\hline 74 & 81 & 68 & 83 \\
\hline 70 & 82 & 75 & 84 \\
\hline 56 & 69 & 63 & 78 \\
\hline 75 & 85 & 71 & 85 \\
\hline
\end{array}
$$
a. Find and report the value for the correlation coefficient and the regression equation for predicting the math score from the reading score. Use the words Reading and Math in your regression equation and round off to two decimal places. Then find the predicted math score for a state with a reading score of 70 .
b. Find and report the value of the correlation coefficient regression equation for predicting the reading score from the math score. Then find the predicted reading score for a state with a math score of 70 .
c. Discuss the effect of changing the choice of dependent and independent variable on the value of $r$ and on the regression equation.

Jameson Kuper
Jameson Kuper
Numerade Educator
01:12

Problem 72

The following table shows the average SAT Math and Critical Reading scores for students in a sample of states. A scatterplot for these two variables suggests a linear trend. (Source: qsleap.com)
$$
\begin{aligned}
&\begin{array}{|c|c|}
\hline \begin{array}{c}
\text { SAT Math } \\
\text { Score }
\end{array} & \begin{array}{c}
\text { SAT Critical } \\
\text { Reading Score }
\end{array} \\
\hline 463 & 450 \\
\hline 494 & 494 \\
\hline 488 & 487 \\
\hline 592 & 597 \\
\hline 581 & 574 \\
\hline 470 & 486 \\
\hline 579 & 575 \\
\hline 523 & 524 \\
\hline 518 & 516 \\
\hline 414 & 388 \\
\hline 502 & 510 \\
\hline 509 & 497 \\
\hline 591 & 605 \\
\hline 589 & 586 \\
\hline
\end{array}\\
&\begin{array}{|c|c|}
\hline \begin{array}{c}
\text { SAT Math } \\
\text { Score }
\end{array} & \begin{array}{c}
\text { SAT Critical } \\
\text { Reading Score }
\end{array} \\
\hline 580 & 563 \\
\hline 596 & 599 \\
\hline 561 & 556 \\
\hline 589 & 590 \\
\hline 494 & 494 \\
\hline 525 & 530 \\
\hline 500 & 521 \\
\hline 551 & 544 \\
\hline 489 & 502 \\
\hline 498 & 504 \\
\hline 597 & 608 \\
\hline 557 & 563 \\
\hline 576 & 569 \\
\hline 523 & 521 \\
\hline 499 & 504 \\
\hline
\end{array}
\end{aligned}
$$
a. Find and report the value for the correlation coefficient and the regression equation for predicting the math score from the critical reading score, rounding off to two decimal places. Then find the predicted math score for a state with a critical reading score of 600 .
b. Find and report the value of the correlation coefficient and the regression equation for predicting the critical reading score from the math score. Then find the predicted reading score for a state with a math score of 600 .
c. Discuss the effect of changing the choice of dependent and independent variable on the value of $r$ and on the regression equation.

Tyler Moulton
Tyler Moulton
Numerade Educator
01:51

Problem 73

Assume that in a political science class, the teacher gives a midterm exam and a final exam. Assume that the association between midterm and final scores is linear. The summary statistics have been simplified for clarity see Guidance on page $209 .$
Midterm:
Mean $=75, \quad$ Standard deviation $=10$
Final:

Mean $=75, \quad$ Standard deviation $=10$
Also, $r=0.7$ and $n=20$.
According to the regression equation, for a student who gets a 95 on the midterm, what is the predicted final exam grade? What phenomenon from the chapter does this demonstrate? Explain. See page 209 for guidance.

Nick Johnson
Nick Johnson
Numerade Educator
02:41

Problem 74

Assume that in a sociology class, the teacher gives a midterm exam and a final exam. Assume that the association between midterm and final scores is linear. Here are the summary statistics:
Midterm:
Mean $=72, \quad$ Standard deviation $=8$ $\begin{aligned} \text { Final: } & \text { Mean }=72, \quad \text { Standard deviation }=8 \\ \text { Also, } r &=0.75 \text { and } n=28 . \end{aligned}$
a. Find and report the equation of the regression line to predict the final exam score from the midterm score.
b. For a student who gets 55 on the midterm, predict the final exam score.
c. Your answer to part (b) should be higher than 55 . Why?
d. Consider a student who gets a 100 on the midterm. Without doing any calculations, state whether the predicted score on the final exam would be higher, lower, or the same as 100 .

Nick Johnson
Nick Johnson
Numerade Educator
03:52

Problem 75

The following table shows the heights and weights of some people. The scatterplot shows that the association is linear enough to proceed.
$$
\begin{array}{|c|c|}
\hline \text { Height (inches) } & \text { Weight (pounds) } \\
\hline 60 & 105 \\
\hline 66 & 140 \\
\hline 72 & 185 \\
\hline 70 & 145 \\
\hline 63 & 120 \\
\hline
\end{array}
$$
a. Calculate the correlation, and find and report the equation of the regression line, using height as the predictor and weight as the response.
b. Change the height to centimeters by multiplying each height in inches by $2.54$. Find the weight in kilograms by dividing the weight in pounds by $2.205 .$ Retain at least six digits in each number so there will be no errors due to rounding.
c. Report the correlation between height in centimeters and weight in kilograms, and compare it with the correlation between the height in inches and weight in pounds.
d. Find the equation of the regression line for predicting weight from height, using height in centimeters and weight in kilograms. Is the equation for weight in pounds and height in inches the same as or different from the equation for weight in kilograms and height in centimeters?

Erin Kearney
Erin Kearney
Numerade Educator
03:52

Problem 76

The table shows the heights (in inches) and weights (in pounds) of 14 college men. The scatterplot shows that the association is linear enough to proceed.
$$
\begin{aligned}
&\begin{array}{|c|c|}
\hline \begin{array}{c}
\text { Height } \\
\text { (inches) }
\end{array} & \begin{array}{c}
\text { Weight } \\
\text { (pounds) }
\end{array} \\
\hline 68 & 205 \\
\hline 68 & 168 \\
\hline 74 & 230 \\
\hline 68 & 190 \\
\hline 67 & 185 \\
\hline 69 & 190 \\
\hline 68 & 165 \\
\hline
\end{array}\\
&\begin{array}{|c|c|}
\hline \begin{array}{c}
\text { Height } \\
\text { (inches) }
\end{array} & \begin{array}{c}
\text { Weight } \\
\text { (pounds) }
\end{array} \\
\hline 70 & 200 \\
\hline 69 & 175 \\
\hline 72 & 210 \\
\hline 72 & 205 \\
\hline 72 & 185 \\
\hline 71 & 200 \\
\hline 73 & 195 \\
\hline
\end{array}
\end{aligned}
$$
a. Find the equation for the regression line with weight (in pounds) as the response and height (in inches) as the predictor. Report the slope and the intercept of the regression line, and explain what they show. If the intercept is not appropriate to report, explain why.
b. Find the correlation between weight (in pounds) and height (in inches).
c. Find the coefficient of determination and interpret it.
d. If you changed each height to centimeters by multiplying heights in inches by $2.54$, what would the new correlation be? Explain.
e. Find the equation with weight (in pounds) as the response and height (in inches) as the predictor, and interpret the slope.
f. Summarize what you found: Does changing units change the correlation? Does changing units change the regression equation?

Erin Kearney
Erin Kearney
Numerade Educator
08:36

Problem 77

The data shows the number of calories, carbohydrates (in grams) and sugar (in grams) found in a selection of menu items at McDonald's. Scatterplots suggest the relationship between calories and both carbs and sugars is linear. The data are also available on this text's website. (Source: shapefit.com)
$$
\begin{array}{|c|c|c|}
\hline \text { Calories } & \text { Carbs (in grams) } & \text { Sugars (in grams) } \\
\hline 530 & 47 & 9 \\
\hline 520 & 42 & 10 \\
\hline 720 & 52 & 14 \\
\hline 610 & 47 & 10 \\
\hline 600 & 48 & 12 \\
\hline 540 & 45 & 9 \\
\hline 740 & 43 & 10 \\
\hline 240 & 32 & 6 \\
\hline 290 & 33 & 7 \\
\hline 340 & 37 & 7 \\
\hline 300 & 32 & 6 \\
\hline 430 & 35 & 7 \\
\hline 380 & 34 & 7 \\
\hline 430 & 35 & 6 \\
\hline 440 & 35 & 7 \\
\hline 430 & 34 & 7 \\
\hline 750 & 65 & 16 \\
\hline 590 & 51 & 14 \\
\hline 510 & 55 & 10 \\
\hline 350 & 42 & 8 \\
\hline
\end{array}
$$
$$
\begin{array}{|l|l|}
\hline \text { Calories } & \text { Carbs (in grams) } & \text { Sugars (in grams) } \\
\hline 670 & 58 & 11 \\
\hline 510 & 44 & 9 \\
\hline 610 & 57 & 11 \\
\hline 450 & 43 & 9 \\
\hline 360 & 40 & 5 \\
\hline 360 & 40 & 5 \\
\hline 430 & 41 & 6 \\
\hline 480 & 43 & 6 \\
\hline 430 & 43 & 7 \\
\hline 390 & 39 & 5 \\
\hline 500 & 44 & 11 \\
\hline 670 & 68 & 12 \\
\hline 510 & 54 & 10 \\
\hline 630 & 56 & 7 \\
\hline 480 & 42 & 6 \\
\hline 610 & 56 & 8 \\
\hline 450 & 42 & 6 \\
\hline 540 & 61 & 14 \\
\hline 380 & 47 & 12 \\
\hline 340 & 37 & 8 \\
\hline 260 & 30 & 7 \\
\hline 340 & 34 & 5 \\
\hline 260 & 27 & 4 \\
\hline 360 & 32 & 3 \\
\hline 280 & 25 & 2 \\
\hline 330 & 26 & 3 \\
\hline 190 & 12 & 0 \\
\hline 750 & 65 & 16 \\
\hline
\end{array}
$$
a. Calculate the correlation coefficient and report the equation of the regression line using carbs as the predictor and calories as the response variable. Report the slope and interpret it in the context of this problem. Then use your regression equation to predict the number of calories in a menu item containing 55 grams of carbohydrates.
b. Calculate the correlation coefficient and report the equation of the regression line using sugar as the predictor and calories as the response variable. Report the slope and interpret it in the context of this problem. Then use your regression equation to predict the number of calories in a menu item containing 10 grams of sugars.
c. Based on your answers to parts (a) and (b), which is a better predictor of calories for these data: carbs or sugars? Explain your choice using appropriate statistics.

Vaidik Stats
Vaidik Stats
Numerade Educator
03:15

Problem 78

The following table shows the fat content (in grams) and calories for a sample of granola bars. (Source: calorielab. com )
$$
\begin{array}{|c|l|}
\hline \text { Fat (in grams) } & \text { Calories } \\
\hline 7.6 & 370 \\
\hline 3.3 & 106.1 \\
\hline 18.7 & 312.4 \\
\hline
\end{array}
$$
$$
\begin{array}{|c|c|}
\hline \text { Fat (in grams) } & \text { Calories } \\
\hline 3.8 & 113.1 \\
\hline 5 & 117.8 \\
\hline 5.5 & 131.9 \\
\hline 7.2 & 140.6 \\
\hline 6.1 & 118.8 \\
\hline 4.6 & 124.4 \\
\hline 3.9 & 105.1 \\
\hline 6.1 & 136 \\
\hline 4.8 & 124 \\
\hline 4.4 & 119.3 \\
\hline 7.7 & 192.6 \\
\hline
\end{array}
$$
a. Use technology to make a scatterplot of the data. Use fat as the independent variable $(x)$ and calories as the dependent variable $(y)$. Does there seem to be a linear trend to the data?
b. Compute the correlation coefficient and the regression equation, using fat as the independent variable and calories as the dependent variable.
c. What is the slope of the regression equation? Interpret the slope in the context of this problem.
d. What is the $y$ -intercept of the regression equation? Interpret the $y$ -intercept in the context of this problem or explain why it would be inappropriate to do so.
e. Find and interpret the coefficient of determination.
f. Use the regression equation to predict the calories in a granola bar containing 7 grams of fat.
g. Would it be appropriate to use the regression equation to predict the calories in a granola bar containing 25 grams of fat? If so, predict the number of calories in such a bar. If not, explain why it would be inappropriate to do so.
h. Looking at the scatterplot there is a granola bar in the sample that has an extremely high number of calories given the moderate amount of fat it contains. Remove its data from the sample and recalculate the correlation coefficient and regression equation. How did removing this unusual point change the value of $r$ and the regression equation?

Trent Speier
Trent Speier
Numerade Educator
01:51

Problem 79

The scatterplot shows the shoe size and height for some men (M) and women (F).
a. Why did we not extend the red line (for the women) all the way to 74 inches, instead stopping at 69 inches?
b. How do we interpret the fact that the blue line is above the red line?
c. How do we interpret the fact that the two lines are (nearly) parallel?

James Kiss
James Kiss
Numerade Educator
01:23

Problem 80

The following scatterplot shows the age in years and the number of hours of sleep for some men (M) and women (F).
a. How do we interpret the fact that both lines have a negative slope?
b. How do we interpret the fact that the slopes are the same for both lines?
c. How do we interpret the fact that the lines are nearly the same?
d. Why is the line for the men shorter than the line for the women?

Nick Johnson
Nick Johnson
Numerade Educator
00:24

Problem 81

The following scatterplot shows the age and weight for some women. Some of them exercised regularly, and some did not. Explain what it means that the blue line (for those who did not exercise) is a bit steeper than the red line (for those who did exercise). (Source: StatCrunch: 2012 Women's final. Owner: molly7son@yahoo.com)

Nick Johnson
Nick Johnson
Numerade Educator
01:14

Problem 82

a. The following figure shows hypothetical data for a group of children. By looking at the figure, state whether the correlation between height and test score is positive, negative, or near zero.
b. The shape and color of the each marker show what grade these children were in at the time they took the test. Look at the six different groupings (for grades $1,2,3,4,5$, and 6 ) and decide whether the correlation (the answer to part [a]) would stay the same if you controlled for grade (that is, if you looked only within specific grades).
c. Suppose a school principal looked at this scatterplot and said, "This means that taller students get better test scores, so we should give more assistance to shorter students." Do the data support this conclusion? Explain. If yes, say why. If no, give another cause for the association.

Nick Johnson
Nick Johnson
Numerade Educator
02:24

Problem 83

The acceptance rate for a sample of law schools and the percentage of students employed at graduation are on this text's website. A low acceptance rate means the law school is highly selective in admitting students. (Source: Internet Legal Research Group)
a. Do more selective law schools have better job-placement success? Make a graph that could addresses this question, report whether the trend is linear, and write a sentence answering the research question. If the trend is not linear, comment on what it shows and do not go on to part (b).
b. If the trend is linear, do the following:
I. Write the regression equation.
II. Interpret the slope of the regression equation.
III. Interpret the $y$ -intercept of the regression equation or explain why it would be inappropriate to do so.
IV. Find and interpret the value of the coefficient of determination.
$\mathrm{V}$. Use the regression equation to predict the percentage of students employed at graduation for a law school with a $50 \%$ acceptance
rate.

Nick Johnson
Nick Johnson
Numerade Educator
03:01

Problem 84

The LSAT is a standardized test required for entrance to most law schools. The high LSAT score for admitted students and the percentage of students passing the bar exam immediately after law school graduation for a sample of law schools is found on this text's website. (Source:
Internet Legal Research Group)
a. Do law schools that have high LSAT-scoring students also have a higher rate of students who pass the bar exam? Make a graph and report whether the trend is linear. Interpret the association if any. If the trend is not linear, comment on what it shows and do not go on to part (b).
b. If the trend is linear, do the following:
I. Write the regression equation.
II. Interpret the slope of the regression equation.
III. Interpret the $y$ -intercept of the regression equation or explain why it would be inappropriate to do so.
IV. Find and interpret the value of the coefficient of determination.
V. Use the regression equation to predict the percent of students employed at graduation for a law school with a high LSAT score for admitted students of 150 .

Heena Haldankar
Heena Haldankar
Numerade Educator
01:25

Problem 85

Data on the fat, carbohydrate, and calorie content for a sample of popular snack foods are found on this text's website. Use the data to determine which is a better predictor of the number of calories in these snack foods: fat or carbohydrates?

Victor Salazar
Victor Salazar
Numerade Educator
01:26

Problem 86

Does education pay? The salary per year in dollars, the number of years employed (YrsEm), and the number of years of education after high school (Educ) for the employees of a company were recorded. Determine whether number of years employed or number of years of education after high school is a better predictor of salary. Explain your thinking. Data are at this text's website. (Source: Minitab File)

Nick Johnson
Nick Johnson
Numerade Educator
02:45

Problem 87

Move studios try to predict how much money their movies will make. One possible predictor is the amount of money spent on the production of the movie. The table shows the budget and amount of money made for a sample of movies made in 2017 . The budget (amount spent making the movie) and gross (amount earned by ticket sales) are shown in the table. Make a scatterplot of the data and comment on what you see. If appropriate, do a complete linear regression analysis. If it is not appropriate to do so, explain why not. (Source: IMDB)
$$
\begin{array}{|lcc|}
\hline \text { Film } & \begin{array}{c}
\text { Gross(in \$ } \\
\text { millions) }
\end{array} & \begin{array}{c}
\text { Budget (in \$ } \\
\text { millions) }
\end{array} \\
\hline \text { Wonder Woman } & 412.6 & 149 \\
\hline \text { Beauty and the Beast } & 504 & 160 \\
\hline \text { Guardians of the Galaxy Vol. } 2 & 389.8 & 200 \\
\hline \text { Spider-Man: Homecoming } & 334.2 & 175 \\
\hline \text { It } & 327.5 & 35 \\
\hline \text { Despicable Me 3 } & 264.6 & 80 \\
\hline \text { Logan } & 226.3 & 97 \\
\hline \text { The Fate of the Furious } & 225.8 & 250 \\
\hline \text { Dunkirk } & 188 & 100 \\
\hline \text { The LEGO Batman Movie } & 175.8 & 80 \\
\hline \text { Thor Ragnarok } & 310.7 & 180 \\
\hline \text { Get Out } & 175.5 & 5 \\
\hline \text { Dead Men Tell No Tales } & 172.6 & 230 \\
\hline \text { Cars } 3 & 152 & 175 \\
\hline
\end{array}
$$

Nick Johnson
Nick Johnson
Numerade Educator
03:41

Problem 88

The following table gives the number of miles per gallon in the city and on the highway for some of the most fuel efficient cars according to Consumer Reports. Make a scatterplot of the data using city mileage as the predictor variable. Find the regression equation and use it to predict the highway mileage for a fuel-efficient car that gets 40 miles per gallon in city driving. Would it be appropriate to use the regression equation to predict the highway mileage for a fuel-efficient car that got 60 miles per gallon in city driving? If so, make the prediction. If not, explain why it would be inappropriate to do so.
$$
\begin{array}{|lcc|}
\hline \text { Model } & \text { City Mileage } & \text { Highway Mileage } \\
\hline \text { Toyota Prius 3 } & 43 & 59 \\
\hline \text { Hyundai Ioniq } & 42 & 60 \\
\hline \text { Toyota Prius Prime } & 38 & 62 \\
\hline \text { Kia Niro } & 33 & 52 \\
\hline \text { Toyota Prius C } & 37 & 48 \\
\hline \text { Chevrolet Malibu } & 33 & 49 \\
\hline
\end{array}
$$$$
\begin{array}{|lcc|}
\hline \text { Model } & \text { City Mileage } & \text { Highway Mileage } \\
\hline \text { Ford Fusion } & 35 & 41 \\
\hline \text { Hyundai Sonata } & 31 & 46 \\
\hline \text { Toyota Camry } & 32 & 43 \\
\hline \text { Ford C-Max } & 35 & 38 \\
\hline
\end{array}
$$

Sheryl Ezze
Sheryl Ezze
Numerade Educator
03:05

Problem 89

The following scatterplot shows information about the world's tallest 169 buildings. Stories means floors.
a. What does the trend tell us about the relationship between stories and height (feet)?
b. The regression line for predicting the height (in feet) from the number of stories is shown above the graph. What height would you predict for a building with 100 stories?
c. Interpret the slope.
d. What, if anything, do we learn from the intercept?
e. Interpret the coefficient of determination.
(This data set is available at this text's website, and it contains several other variables. You might want to check to see whether the year the building was constructed is related to its height, for example.)

James Kiss
James Kiss
Numerade Educator
01:59

Problem 90

Poverty rates and high school graduation rates for the 50 states and the District of Columbia are graphed below. (Source: 2017 World Almanac and Book of Facts)
a. What does the trend tell us about the relationship between poverty and high school graduation rates?
b. Interpret the slope of the regression equation.
c. The value of the coefficient of determination for this data set is $61.1 \%$.
Explain what this means in the context of the problem.
d. Data on bachelor's degree attainment and advanced degree attainment are available on this text's website. Which level of education (high school, bachelor's degree, or advanced degree) is most closely associated with the state poverty level?

Nick Johnson
Nick Johnson
Numerade Educator
00:32

Problem 91

Show your points in a rough scatterplot and give the coordinates of the points.
Construct a small set of numbers with at least three points with a perfect positive correlation of $1.00$.

Nick Johnson
Nick Johnson
Numerade Educator
00:31

Problem 92

Show your points in a rough scatterplot and give the coordinates of the points.
Construct a small set of numbers with at least three points with a perfect negative correlation of $-1.00$.

Nick Johnson
Nick Johnson
Numerade Educator
01:08

Problem 93

Show your points in a rough scatterplot and give the coordinates of the points.
Construct a set of numbers (with at least three points) with a strong negative correlation. Then add one point (an influential point) that changes the correlation to positive. Report the data and give the correlation of each set.

Nick Johnson
Nick Johnson
Numerade Educator
00:36

Problem 94

Show your points in a rough scatterplot and give the coordinates of the points.
Construct a set of numbers (with at least three points) with a strong positive correlation. Then add one point (an influential point) that changes the correlation to negative. Report the data and give the correlation of each set.

Nick Johnson
Nick Johnson
Numerade Educator
00:44

Problem 95

The following figure shows a scatterplot of the educational level of twins. Describe the scatterplot. Explain the trend and mention any unusual points. (Source: www.stat.ucla.edu)

Nick Johnson
Nick Johnson
Numerade Educator
00:14

Problem 96

The figure shows a scatterplot of the wages and educational level of some people. Describe what you see. Explain the trend and mention any unusual points. (Source: www.stat.ucla.edu)

Nick Johnson
Nick Johnson
Numerade Educator
00:24

Problem 97

Do Students Taking More Units Study More Hours? The following figure shows the number of units that students were enrolled in and the number of hours (per week) that they reported studying. Do you think there is a positive trend, a negative trend, or no noticeable trend? Explain what this means about the students.

Nick Johnson
Nick Johnson
Numerade Educator
00:13

Problem 98

Hours of Exercise and Hours of Homework The following scatterplot shows the number of hours of exercise per week and the number of hours of homework per week for some students. Explain what it shows.

Nick Johnson
Nick Johnson
Numerade Educator
00:24

Problem 99

The following figure shows information about the ages and heights of several children. Why would it not make sense to find the correlation or to perform linear regression with this data set? Explain.

Nick Johnson
Nick Johnson
Numerade Educator
00:21

Problem 100

The following figure shows the amount of money won by people playing blackjack and the amount of tips they gave to the dealer (who was a statistics student), in dollars. Would it make sense to find a correlation for this data set? Explain.

Nick Johnson
Nick Johnson
Numerade Educator
00:23

Problem 101

A doctor is studying cholesterol readings in his patients. After reviewing the cholesterol readings, he calls the patients with the highest cholesterol readings (the top $5 \%$ of readings in his office) and asks them to come back to discuss cholesterol-lowering methods. When he tests these patients a second time, the average cholesterol readings tend to have gone down somewhat. Explain what statistical phenomenon might have been partly responsible for this lowering of the readings.

Nick Johnson
Nick Johnson
Numerade Educator
00:22

Problem 102

Suppose that students who scored much lower than the mean on their first statistics test were given special tutoring in the subject. Suppose that they tended to show some improvement on the next test. Explain what might cause the rise in grades other than the tutoring program itself.

Nick Johnson
Nick Johnson
Numerade Educator