The data set HAPPINESS contains independently pooled cross sections for the even years from 1994

through $2006,$ obtained from the General Social Survey. The dependent variable for this problem is a measure of "happiness," vhappy, which is a binary variable equal to one if the person reports being

"very happy" (as opposed to just "pretty happy" or "not too happy").

(i) Which year has the largest number of observations? Which has the smallest? What is the percentage of people in the sample reporting they are "very happy"?

(ii) Regress vhappy on all of the year dummies, leaving out $y 94$ so that 1994 is the base year. Compute a heteroskedasticity-robust statistic of the null hypothesis that the proportion of very happy people has not changed over time. What is the $p$ -value of the test?

(iii) To the regression in part (ii), add the dummy variables occattend and regattend. Interpret their

coefficients. (Remember, the coefficients are interpreted relative to a base group.) How would you summarize the effects of church attendance on happiness?

(iv) Define a variable, say highinc, equal to one if family income is above $\$ 25,000 .$ (Unfortunately, the same threshold is used in each year, and so inflation is not accounted for. Also, $\$ 25,000$ is hardly what one would consider "high income.") Include highinc, unem $10,$ educ, and teens in the regression in part (iii). Is the coefficient on regattend affected much? What about its statistical significance?

(v) Discuss the signs, magnitudes, and statistical significance of the four new variables in part (iv).

Do the estimates make sense?

(vi) Controlling for the factors in part (iv), do there appear to be differences in happiness by gender

or race? Justify your answer.