Essentials of Statistics

Mario F. Triola

Chapter 11

Chi-Square And Analysis Of Variance - all with Video Answers

Educators

Section 1

Goodness-of-Fit

Problem 1

The table below lists leading digits of 317 inter-arrival Internet traffic times for a computer, along with the frequencies of leading digits expected with Benford's law (from Table $11-1$ in the Chapter Problem).
a. Identify the notation used for observed and expected values.
b. Identify the observed and expected values for the leading digit of $2 .$
c. Use the results from part (b) to find the contribution to the $\chi^{2}$ test statistic from the category representing the leading digit of 2 .
$$
\begin{array}{l|c|c|c|c|c|c|c|c|c|}
\hline \text { Leading Digit } & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
\hline \text { Benford's Law } & 30.1 \% & 17.6 \% & 12.5 \% & 9.7 \% & 7.9 \% & 6.7 \% & 5.8 \% & 5.1 \% & 4.6 \% \\
\hline \begin{array}{l}
\text { Leading Digits of } \\
\text { Inter-Arrival Traffic Times }
\end{array} & 76 & 62 & 29 & 33 & 19 & 27 & 28 & 21 & 22 \\
\hline
\end{array}
$$

Sheryl Ezze

Sheryl Ezze

Numerade Educator

Problem 2

When using the data from Exercise 1 to test for goodness-of-fit with the distribution described by Benford's law, identify the null and alternative hypotheses.

Evelyn Cunningham

Evelyn Cunningham

Numerade Educator

Problem 3

The accompanying Statdisk results shown in the margin are obtained from the data given in Exercise 1 . What should be concluded when testing the claim that the leading digits have a distribution that fits well with Benford's law?

Evelyn Cunningham

Evelyn Cunningham

Numerade Educator

Problem 4

What do the results from the preceding exercises suggest about the possibility that the computer has been hacked? Is there any corrective action that should be taken?

Evelyn Cunningham

Evelyn Cunningham

Numerade Educator

Problem 5

The author purchased a slot machine (Bally Model 809 ) and tested it by playing it 1197 times. There are 10 different categories of outcomes, including no win. win jackpot, win with three bells, and so on. When testing the claim that the observed outcomes agree with the expected frequencies, the author obtained a test statistic of $\chi^{2}=8.185$. Use a $0.05$ significance level to test the claim that the actual outcomes agree with the expected frequencies. Does the slot machine appear to be functioning as expected?

Sheryl Ezze

Numerade Educator

Problem 6

A classic story involves four carpooling students who missed a test and gave as an excuse a flat tire. On the makeup test, the instructor asked the students to identify the particular tire that went flat. If they really didn't have a flat tire, would they be able to identify the same tire? The author asked 41 other students to identify the tire they would select. The results are listed in the following table (except for one student who selected the spare). Use a $0.05$ significance level to test the author's claim that the results fit a uniform distribution. What does the result suggest about the likelihood of four students identifying the same tire when they really didn't have a flat?
$$
\begin{array}{l|c|c|c|c}
\hline \text { Tire } & \text { Left Front } & \text { Right Front } & \text { Left Rear } & \text { Right Rear } \\
\hline \text { Number Selected } & 11 & 15 & 8 & 6 \\
\hline
\end{array}
$$

Sheryl Ezze

Numerade Educator

Problem 7

The author drilled a hole in a die and filled it with a lead weight, then proceeded to roll it 200 times. Here are the observed frequencies for the outcomes of $1,2,3,4,5$, and 6 . respectively: $27,31,42,40,28$, and 32. Use a $0.05$ significance level to test the claim that the outcomes are not equally likely. Does it appear that the loaded die behaves differently than a fair die?

Sheryl Ezze

Numerade Educator

Problem 8

Researchers investigated the issue of race and equality of access to clinical trials. The following table shows the population distribution and the numbers of participants in clinical trials involving lung cancer (based on data from "Participation in Cancer Clinical Trials," by Murthy, Krumholz, and Gross, Journal of the American Medical Association, Vol. 291, No. 22 ). Use a $0.01$ significance level to test the claim that the distribution of clinical trial participants fits well with the population distribution. Is there a race/ethnic group that appears to be very underrepresented?
$$
\begin{array}{l|c|c|c|c|c}
\hline \text { Race/ethnicity } & \begin{array}{c}
\text { White } \\
\text { non-Hispanic }
\end{array} & \text { Hispanic } & \text { Black } & \begin{array}{c}
\text { Asian / Pacific } \\
\text { Islander }
\end{array} & \begin{array}{c}
\text { American Indian / } \\
\text { Alaskan Native }
\end{array} \\
\hline \begin{array}{l}
\text { Distribution of } \\
\text { Population }
\end{array} & 75.6 \% & 9.1 \% & 10.8 \% & 3.8 \% & 0.7 \% \\
\hline \begin{array}{l}
\text { Number in Lung } \\
\text { Cancer Clinical Trials }
\end{array} & 3855 & 60 & 316 & 54 & 12 \\
\hline
\end{array}
$$

Sheryl Ezze

Numerade Educator

Problem 9

Experiments are conducted with hybrids of two types of peas. If the offspring follow Mendel's theory of inheritance, the seeds that are produced are yellow smooth, green smooth, yellow wrinkled, and green wrinkled, and they should occur in the ratio of $9: 3: 3: 1$, respectively. An experiment is designed to test Mendel's theory, with the result that the offspring seeds consist of 307 that are yellow smooth, 77 that are green smooth, 98 that are yellow wrinkled, and 18 that are green wrinkled. Use a $0.05$ significance level to test the claim that the results contradict Mendel's theory.

Sheryl Ezze

Numerade Educator

Problem 10

In analyzing hits by $\mathrm{V}-1$ buzz bombs in World War II, South London was subdivided into regions, each with an area of $0.25$ $\mathrm{km}^{2}$. Shown below is a table of actual frequencies of hits and the frequencies expected with the Poisson distribution. (The Poisson distribution is described in Section 5-3.) Use the values listed and a $0.05$ significance level to test the claim that the actual frequencies fit a Poisson distribution. Does the result prove that the data conform to the Poisson distribution?
$$
\begin{array}{l|c|c|c|c|c}
\hline \text { Number of Bomb Hits } & 0 & 1 & 2 & 3 & 4 \text { or more } \\
\hline \text { Actual Number of Regions } & 229 & 211 & 93 & 35 & 8 \\
\hline \begin{array}{l}
\text { Expected Number of Regions } \\
\text { (from Poisson Distribution) }
\end{array} & 227.5 & 211.4 & 97.9 & 30.5 & 8.7 \\
\hline
\end{array}
$$

Sheryl Ezze

Numerade Educator

Problem 11

The police department in Madison, Connecticut, released the following numbers of calls for the different days of the week during a February that had 28 days: Monday (114); Tuesday (152); Wednesday (160); Thursday (164); Friday (179); Saturday (196); Sunday (130). Use a $0.01$ significance level to test the claim that the different days of the week have the same frequencies of police calls. Is there anything notable about the observed frequencies?

Sheryl Ezze

Numerade Educator

Problem 12

Repeat Exercise 11 using these observed frequencies for police calls received during the month of March: Monday (208); Tuesday (224); Wednesday (246); Thursday (173); Friday (210); Saturday (236); Sunday (154). What is a fundamental error with this analysis?

Sheryl Ezze

Numerade Educator

Problem 13

The table below lists the frequency of wins for different post positions through the 141 st running of the Kentucky Derby horse race. A post position of 1 is closest to the inside rail, so the horse in that position has the shortest distance to run. (Because the number of horses varies from year to year, only the first 10 post positions are included.) Use a $0.05$ significance level to test the claim that the likelihood of winning is the same for the different post positions. Based on the result, should bettors consider the post position of a horse racing in the Kentucky Derby?
$$
\begin{array}{|l|r|r|r|r|r|r|r|r|r|r|}
\hline \text { Post Position } & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\
\hline \text { Wins } & 19 & 14 & 11 & 15 & 15 & 7 & 8 & 12 & 5 & 11 \\
\hline
\end{array}
$$

Sheryl Ezze

Numerade Educator

Problem 14

The author recorded all digits selected in California's Daily 4 Lottery for the 60 days preceding the time that this exercise was created. The frequencies of the digits from 0 through 9 are $21,30,31,33,19,23,21,16,24$, and 22. Use a $0.05$ significance level to test the claim of lottery officials that the digits are selected in a way that they are equally likely.

Sheryl Ezze

Numerade Educator

Problem 15

The table below lists the numbers of games played in 105 Major League Baseball (MLB) World Series. This table also includes the expected proportions for the numbers of games in a World Series, assuming that in each series, both teams have about the same chance of winning. Use a $0.05$ significance level to test the claim that the actual numbers of games fit the distribution indicated by the expected proportions.
$$
\begin{array}{|l|c|c|c|c|}
\hline \text { Games Played } & 4 & 5 & 6 & 7 \\
\hline \text { World Series Contests } & 21 & 23 & 23 & 38 \\
\hline \text { Expected Proportion } & 2 / 16 & 4 / 16 & 5 / 16 & 5 / 16 \\
\hline
\end{array}
$$

Sheryl Ezze

Numerade Educator

Problem 16

In his book Outliers, author Malcolm Gladwell argues that more baseball players have birth dates in the months immediately following July 31 because that was the age cutoff date for nonschool baseball leagues. Here is a sample of frequency counts of months of birth dates of American-born Major League Baseball players starting with January: $387,329,366,344$, $336,313,313,503,421,434,398,371$. Using a $0.05$ significance level, is there sufficient evidence to warrant rejection of the claim that American-born Major League Baseball players are born in different months with the same frequency? Do the sample values appear to support Gladwell's claim?

Sheryl Ezze

Numerade Educator

Problem 17

Data Set 4 "Births" includes the days of the weeks that prospective mothers were admitted to a hospital to give birth. A physician claims that because many births are induced or involve cesarean section, they are scheduled for days other than Saturday or Sunday, so births do not occur on the seven different days of the week with equal frequency. Use a $0.01$ significance level to test that claim.

Sheryl Ezze

Numerade Educator

Problem 18

Data Set 4 "Births" includes the days of the weeks that newborn babies were discharged from the hospital. A hospital administrator claims that such discharges occur on the seven different days of the week with equal frequency. Use a $0.01$ significance level to test that claim.

Sheryl Ezze

Numerade Educator

Problem 19

Mars, Inc. claims that its M\&M plain candies are distributed with the following color percentages: $16 \%$ green, $20 \%$ orange, $14 \%$ yellow, $24 \%$ blue, $13 \%$ red, and $13 \%$ brown. Refer to Data Set 27 "M\&M Weights" in Appendix B and use the sample data to test the claim that the color distribution is as claimed by Mars, Inc. Use a $0.05$ significance level.

Victor Salazar

Numerade Educator

Problem 20

Data Set 1 "Body Data" in Appendix B includes weights (kg) of 300 subjects. Use a $0.05$ significance level to test the claim that the sample is from a population of weights in which the last digits do not occur with the same frequency. When people report their weights instead of being measured, they tend to round so that the last digits do not occur with the same frequency. Do the results suggest that the weights were reported?

Victor Salazar

Numerade Educator

Problem 21

When working for the Brooklyn district attorney, investigator Robert Burton analyzed the leading digits of the amounts from 784 checks issued by seven suspect companies. The frequencies were found to be $0,15,0,76,479,183,8,23$, and 0 , and those digits correspond to the leading digits of $1,2,3,4,5,6,7,8$, and 9, respectively. If the observed frequencies are substantially different from the frequencies expected with Benford's law, the check amounts appear to result from fraud. Use a $0.01$ significance level to test for goodnessof-fit with Benford's law. Does it appear that the checks are the result of fraud?

Sheryl Ezze

Numerade Educator

Problem 22

Exercise 21 lists the observed frequencies of leading digits from amounts on checks from seven suspect companies. Here are the observed frequencies of the leading digits from the amounts on the most recent checks written by the author at the time this exercise was created: $83,58,27,21,21,21,6,4,9$. (Those observed frequencies correspond to the leading digits of $1,2,3,4,5,6,7,8$, and 9, respectively.) Using a $0.01$ significance level, test the claim that these leading digits are from a population of leading digits that conform to Benford's law. Does the conclusion change if the significance level is $0.05$ ?

Sheryl Ezze

Numerade Educator

Problem 23

Frequencies of leading digits from IRS tax files are $152,89,63,48,39,40$, 28,25, and 27 (corresponding to the leading digits of $1,2,3,4,5,6,7,8$, and 9, respectively, based on data from Mark Nigrini, who provides software for Benford data analysis). Using a $0.05$ significance level, test for goodness-of-fit with Benford's law. Does it appear that the tax entries are legitimate?

Sheryl Ezze

Numerade Educator

Problem 24

The author recorded the leading digits of the sizes of the electronic document files for the current edition of this book. The leading digits have frequencies of $55,25,17,24,18,12,12,3$, and 4 (corresponding to the leading digits of $1,2,3,4,5,6,7,8$, and 9 , respectively). Using a $0.05$ significance level, test for goodness-of-fit with Benford's law.

Sheryl Ezze

Numerade Educator

Problem 25

Refer to Data Set 1 "Body Data" in Appendix $\mathrm{B}$ for the heights of females.
$$
\begin{array}{l|l|l|l|l|}
\hline \text { Height }(\mathrm{cm}) & \text { Less than } 155.45 & 155.45-162.05 & 162.05-168.65 & \text { Greater than } 168.65 \\
\hline \text { Frequency } & & & & \\
\hline
\end{array}
$$
a. Enter the observed frequencies in the table above.
b. Assuming a normal distribution with mean and standard deviation given by the sample mean and standard deviation, use the methods of Chapter 6 to find the probability of a randomly selected height belonging to each class.
c. Using the probabilities found in part (b), find the expected frequency for each category.
d. Use a $0.01$ significance level to test the claim that the heights were randomly selected from a normally distributed population. Does the goodness-of-fit test suggest that the data are from a normally distributed population?

Victor Salazar

Numerade Educator