• Home
  • Textbooks
  • Introductory Econometrics
  • Multiple Regression Analysis: Estimation

Introductory Econometrics

Jeffrey M. Wooldridge

Chapter 3

Multiple Regression Analysis: Estimation - all with Video Answers

Educators

OC
PV

Chapter Questions

09:15

Problem 1

A problem of interest to health officials (and others) is to determine the effects of smoking during
pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking that affect birth weight are likely to be correlated with smoking, we should take those factors into account. For example, higher income generally results in access to better prenatal care, as well as better nutrition for the mother. An equation that recognizes this is
$$=\beta_{0}+\beta_{1} \text { cigs }+\beta_{2} \text { faminc }+u$$
(i) What is the most likely sign for $\beta_{2} ?$
(ii) Do you think cigs and faminc are likely to be correlated? Explain why the correlation might be
positive or negative.
(iii) Now, estimate the equation with and without faminc, using the data in BWGHT. Report the re-
sults in equation form, including the sample size and $R$ -squared. Discuss your results, focusing
on whether adding faminc substantially changes the estimated effect of cigs on bwght.

OC
Omer Ceyhan
Numerade Educator
11:32

Problem 2

Use the data in HPRICEl to estimate the model
$$=\beta_{0}+\beta_{1} s q r f t+\beta_{2} b d r m s+u$$
where price is the house price measured in thousands of dollars.
(i) Write out the results in equation form.
(ii) What is the estimated increase in price for a house with one more bedroom, holding square
footage constant?
(iii) What is the estimated increase in price for a house with an additional bedroom that is
140 square feet in size? Compare this to your answer in part (ii).
(iv) What percentage of the variation in price is explained by square footage and number of
bedrooms?
(v) The first house in the sample has sqrft $=2,438$ and bdrms $=4 .$ Find the predicted selling price for this house from the OLS regression line.
(vi) The actual selling price of the first house in the sample was $\$ 300,000$ (so price $=300$ ). Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for the house?

OC
Omer Ceyhan
Numerade Educator
09:36

Problem 3

The file CEOSAL2 contains data on 177 chief executive officers and can be used to examine the
effects of firm performance on CEO salary.
(i) Estimate a model relating annual salary to firm sales and market value. Make the model of the
constant elasticity variety for both independent variables. Write the results out in equation form.
(ii) Add profits to the model from part (i). Why can this variable not be included in logarithmic
form? Would you say that these firm performance variables explain most of the variation in
CEO salaries?
(iii) Add the variable ceoten to the model in part (ii). What is the estimated percentage return for
another year of CEO tenure, holding other factors fixed?
(iv) Find the sample correlation coefficient between the variables log(mktval) and profits. Are these
variables highly correlated? What does this say about the OLS estimators?

OC
Omer Ceyhan
Numerade Educator
11:54

Problem 4

Use the data in ATTEND for this exercise.
(i) Obtain the minimum, maximum, and average values for the variables atndrte, priGPA, and ACT.
(ii) Estimate the model
$$=\beta_{0}+\beta_{1} \text { priGPA }+\beta_{2} A C T+u$$
and write the results in equation form. Interpret the intercept. Does it have a useful meaning?
(iii) Discuss the estimated slope coefficients. Are there any surprises?
(iv) What is the predicted atndrte if priGPA$=3.65$ and $A C T=20 ?$ What do you make of this
result? Are there any students in the sample with these values of the explanatory variables?
(v) If Student A has priGPA $=3.1$ and $A C T=21$ and Student $B$ has priGPA $=2.1$ and $A C T=26,$ what is the predicted difference in their attendance rates?

OC
Omer Ceyhan
Numerade Educator
01:55

Problem 5

Confirm the partialling out interpretation of the OLS estimates by explicitly doing the partialling out
for Example $3.2 .$ This first requires regressing educ on exper and tenure and saving the residuals, $\hat{r}_{1}$ .Then, regress log(wage) on $\hat{r}_{1}$ . Compare the coefficient on $\hat{r}_{1}$ with the coefficient on educ in the regression of log(wage) on educ, exper, and tenure.

Heather Duong
Heather Duong
Numerade Educator
04:28

Problem 6

Use the data set in WAGE2 for this problem. As usual, be sure all of the following regressions contain
an intercept.
(i) Run a simple regression of $I Q$ on educ to obtain the slope coefficient, say, $\tilde{\delta}_{1}$
(ii) Run the simple regression of log(wage) on educ, and obtain the slope coefficient, $\widetilde{\beta}_{1}$
(iii) Run the multiple regression of log(wage) on educ and $I Q,$ and obtain the slope coefficients,
$\hat{\beta}_{1}$ and $\hat{\beta}_{2},$ respectively.
(iv) Verify that $\tilde{\beta}_{1}=\hat{\beta}_{1}+\hat{\beta}_{2} \tilde{\delta}_{1}$.

OC
Omer Ceyhan
Numerade Educator
07:32

Problem 7

Use the data in MEAP9 3 to answer this question.
(i) Estimate the model
$$m a t h l 0=\beta_{0}+\beta_{1} \log (\text {expend})+\beta_{2} \text { lnchprg }+u$$
and report the results in the usual form, including the sample size and $R$ -squared. Are the signs
of the slope coefficients what you expected? Explain.
(ii) What do you make of the intercept you estimated in part (i)? In particular, does it make sense to
set the two explanatory variables to zero? [Hint: Recall that log $(1)=0 . ]$
(iii) Now run the simple regression of $m a t h 10$ on log(expend), and compare the slope coefficient with the estimate obtained in part (i). Is the estimated spending effect now larger or smaller than in part (i)?
(iv) Find the correlation between lexpend $=$ log $($expend$)$ and lnchprg. Does its sign make sense to you?
(v) Use part (iv) to explain your findings in part (iii).

OC
Omer Ceyhan
Numerade Educator
15:03

Problem 8

Use the data in DISCRIM to answer this question. These are ZIP code-level data on prices for various items at fast-food restaurants, along with characteristics of the zip code population, in New Jersey
and Pennsylvania. The idea is to see whether fast-food restaurants charge higher prices in areas with a larger concentration of blacks.
(i) Find the average values of prpblck and income in the sample, along with their standard deviations. What are the units of measurement of prpblck and income?
(ii) Consider a model to explain the price of soda, psoda, in terms of the proportion of the population that is black and median income:
$$p s o d a=\beta_{0}+\beta_{1} p r p b l c k+\beta_{2} i n c o m e+u$$
Estimate this model by OLS and report the results in equation form, including the sample size
and $R$ -squared. (Do not use scientific notation when reporting the estimates.) Interpret the coefficient on prpblck. Do you think it is economically large?
(iii) Compare the estimate from part (ii) with the simple regression estimate from psoda on prpblck.
Is the discrimination effect larger or smaller when you control for income?
(iv) A model with a constant price elasticity with respect to income may be more appropriate.
Report estimates of the model
$$\log (p s o d a)=\beta_{0}+\beta_{1} \text { prpblck }+\beta_{2} \log (\text { income })+u$$
If prpblck increases by .20$(20$ percentage points), what is the estimated percentage change in
psoda? (Hint: The answer is $2 . x x,$ where you fill in the "xx.")
(v) Now add the variable prppov to the regression in part (iv). What happens to $\hat{\beta}_{\text {prpblck}} ?$
(vi) Find the correlation between log(income) and prppov. Is it roughly what you expected?
(vii) Evaluate the following statement: "Because log(income) and prppov are so highly correlated?
they have no business being in the same regression."

OC
Omer Ceyhan
Numerade Educator
04:57

Problem 9

Use the data in CHARITY to answer the following questions:
(i) Estimate the equation
$$g i f t=\beta_{0}+\beta_{1} \text { mailsyear }+\beta_{2} \text { giftlast }+\beta_{3} \text { propresp }+u
$$
by OLS and report the results in the usual way, including the sample size and $R$ -squared.
How does the $R$ -squared compare with that from the simple regression that omits giftlast and
propresp?
(ii) Interpret the coefficient on mailsyear. Is it bigger or smaller than the corresponding simple
regression coefficient?
(iii) Interpret the coefficient on propresp. Be careful to notice the units of measurement of propresp.
(iv) Now add the variable avggift to the equation. What happens to the estimated effect of mails-
year?
(v) In the equation from part (iv), what has happened to the coefficient on giftlast? What do you
think is happening?

Heather Duong
Heather Duong
Numerade Educator
13:46

Problem 10

Use the data in HTV to answer this question. The data set includes information on wages, education,
parents' education, and several other variables for $1,230$ working men in $1991 .$
(i) What is the range of the $e d u c$ variable in the sample? What percentage of men completed
twelfth grade but no higher grade? Do the men or their parents have, on average, higher levels
of education?
(ii) Estimate the regression model
$$e d u c=\beta_{0}+\beta_{1} \text { motheduc }+\beta_{2} \text { fatheduc }+u$$
by OLS and report the results in the usual form. How much sample variation in $e d u c$ is ex-
plained by parents' education? Interpret the coefficient on motheduc.
(iii) Add the variable $a b i l($ a measure of cognitive ability) to the regression from part (ii), and report the results in equation form. Does "ability" help to explain variations in education, even after
controlling for parents' education? Explain.
(iv) (Requires calculus) Now estimate an equation where abil appears in quadratic form:
$$e d u c=\beta_{0}+\beta_{1} \text { motheduc }+\beta_{2} \text { fatheduc }+\beta_{3} a b i l+\beta_{4} a b i l^{2}+u$$
Using the estimates $\hat{\beta}_{3}$ and $\hat{\beta}_{4}$ use calculus to find the value of abil, call it abil, where educ is minimized. (The other coefficients and values of parents' education variables have no effect; we are holding parents' education fixed.) Notice that abil is measured so that negative values are permissible. You might also verify that the second derivative is positive so that you do indeed have a minimum.
(v) Argue that only a small fraction of men in the sample have "ability" less than the value calculated in part (iv). Why is this important?
(vi) If you have access to a statistical program that includes graphing capabilities, use the estimates
in part (iv) to graph the relationship between the predicted education and abil. Set motheduc and
fatheduc at their average values in the sample, 12.18 and $12.45,$ respectively.

Heather Duong
Heather Duong
Numerade Educator
10:53

Problem 11

Use the data in MEAPSINGLE to study the effects of single-parent households on student math performance. These data are for a subset of schools in southeast Michigan for the year $2000 .$ The socioeconomic variables are obtained at the ZIP code level (where ZIP code is assigned to schools based on their mailing addresses.
(i) Run the simple regression of $\mathrm{math} 4$ on pctsgle and report the results in the usual format. Interpret the slope coefficient. Does the effect of single parenthood seem large or small?
(ii) Add the variables lmedinc and free to the equation. What happens to the coefficient on pctsgle?
Explain what is happening.
(iii) Find the sample correlation between lmedinc and free. Does it have the sign you expect?
(vi) Does the substantial correlation between lmedinc and free mean that you should drop one from
the regression to better estimate the causal effect of single parenthood on student performance?
Explain.
(v) Find the variance inflation factors (VIFs) for each of the explanatory variables appearing
in the regression in part (ii). Which variable has the largest VIF? Does this knowledge
affect the model you would use to study the causal effect of single parenthood on math
performance?

Heather Duong
Heather Duong
Numerade Educator
06:17

Problem 12

The data in ECONMATH contain grade point averages and standardized test scores, along with
performance in an introductory economics course, for students at a large public university. The variable to be explained is score, the final score in the course measured as a percentage.
(i) How many students received a perfect score for the course? What was the average score? Find
the means and standard deviations of actmth and acteng, and discuss how they compare.
(ii) Estimate a linear equation relating score to colgpa, actmth, and acteng, where colgpa is measured at the beginning of the term. Report the results in the usual form.
(iii) Would you say the math or English ACT score is a better predictor of performance in the economics course? Explain.
(iv) Discuss the size of the $R$ -squared in the regression.

PV
Priyam Verma
Numerade Educator