Use the data in CATHOLIC to answer this question. The model of interest is$$m a t h 12=\beta_{0}+\beta_{1} \text { cathhs }+\beta_{2} \text { faminc }+\beta_{3} \text { motheduc }+\beta_{\text { tatheduc }}+u$$where cathhs is a binary indicator for whether a student attends a Catholic high school.(i) How many students are in the sample? What percentage of these students attend a Catholic highschool?(ii) Estimate the above equation by OLS. What is the estimate of $\beta_{1} ?$ What is its 95$\%$ confidence interval?(iii) Using parcath as an instrument for caths, estimate the reduced form for cathhs. What is the$t$ statistic for parcath? Is there evidence of a weak instrument problem?(iv) Estimate the above equation by IV, using parcath as an IV for cathhs. How does the estimateand 95$\%$ CI compare with the OLS quantities?(v) Test the null hypothesis that cathhs is exogenous. What is the $p$ -value of the test?(vi) Suppose you add the interaction between cathhs $\cdot$motheducto the above model.Why is it generally endogenous? Why is pareduc \cdotmotheduc a good IV candidate for cathhs$\cdot$ motheduc?(vii) Before you create the interactions in part (vi), first find the sample average of motheduc andcreate cathhs $\cdot($ motheduc $-\overline{\text {motheduc}})$ and parcath $\cdot($ motheduc $-\overline{\text {motheduc}}) .$ Add the first interaction to the model and use the second as an IV. Of course, cathhs is also instrumented. Is the interaction term statistically significant?(viii) Compare the coefficient on cathhs in (vii) to that in part (iv). Is including the interactionimportant for estimating the average partial effect?

(i) 7430 obs, 6.1% attend Catholic school (ii) see video, CI = (0.658, 2.297) (iii) t=25.7, no evidence of weak IV (iv) see video (v) reject null that cathhs is exogenous (vi) see video (vii) the interaction term is highly significant

Chapter 15

Instrumental Variables Estimation and Two Stage Least Squares

Use the data in COUNTYMURD…

apartment. The size of the sample is 7000 430 students. The percentage of students attending a catholic high school is six 0.1 percent. Now you can find this number by this function in. Are you into summary the data? Said catholic a dollar sign and are variable of interest catholic high school. And this number is the mean of the variable to see exactly how many students attending catholic high school. You can either multiply 6.1% with the total number of students or you can use this variable um tablet generated from function table. You will find that there are only 452 students going to catholic high schools, aren't you? This is the regression result using A. S. We have beta one. The estimate on catholic high school attendance here 1.48 with a centered barrel of point for two. So this estimate is highly significant to find the 95% confidence interval. You can either go with their formula, which means you have to look up their critical value. Or you can use in our you can use this function on event and you can put the name of the regression in it. This function will return the 95% confidence interval for all variables in their regression. And the interval is yeah 0.658 Running to you. 2.297 part three. We use variable power cast as an instrument for calf H. S. Yeah, this variable Parkas takes a value of one if a parent reports being catholic and the problem asked you to estimate the reduced form which means you win the regress the endogenous variable catholic high school on the instrument and other exhaustion. Ist variables in the right hand side of the structural equation, there are two other exogenous variables. Mother education, father education should be more and family income, so you should have four variables on the right hand side introduced form equation. The estimate on Park F. Is Poin on four and the T value is 25.7. So this variable is uh large and significant and relevant. So it it would be a good instrument, assuming that this variable does not correlate with the error terms in the structural equation. Okay, we will come back to the equation in part three and we estimate it again with instrument variable. This is a result. Is this better one? So all the estimates are significant compared to you what we get in part three. The instrument variable approach produced an estimate for beta one that is four times greater than the old L. S estimate. Yeah, the 95% confidence interval is now on 0.244 Upper limit is 6.991 Yeah this is a wider range which makes sense because the standard error in the ivy approach is larger and there is a slight overlap between the two confidence intervals. Okay. Past five. You in tests another hypothesis that catholic high school variable is exogenous and what you can do is to follow sections 50 teen 0.5. In the textbook first you can obtain the residuals from the reduced form regression. You will include the residuals in the structural equation, meaning you in regress math 12 on catholic high school. The suspected indulge in is variable and other exogenous variable except the I. V. I get the coefficient on the residual to b minus 2.8 75 With a centered era of 1.5 to 6. That gives me a T. Value of minus 1.885 and a p value of one oh six. We are able to sir, we are unable to reject the non hypothesis that the coefficient or the estimate on residual is not different. Steve it's not different from zero. This non hypothesis is equivalent debt equivalent here. Um catholic high school is exogenous. So at the 5% confidence level we can um believe that catholic high school is endogenous. Our sixth. You add the interaction between catholic high school and mother education to the above model. It is in general endogenous because because catholic high school is endogenous, so any interaction term of it is likely endogenous and the next question is likely to have a typo there is no parent education in the data set and this variable is not mentioned any further in the problem, the correct one should be parent catholic. So because parent catholic is a good instrument for catholic high school attendance, we expect the interaction between parents catholic and mother education to be good. A good instrument for the interaction between Catholics high school and mother education. In part seven we add an interaction term to the equation and I find the estimate on it to be minus four point 881 with a centered errol of one point 084 That gives me a T. Value of minus 4.5. So this term is highly significant. Mark eight. Mhm. I find that including the interaction term important for estimating the average partial effect. The average pasture effect of a variable in our can be found from this function margins and to show the centre errol of this martian estimate. Mhm. Along with the p value, you will need summary function outside this margin function. I find the average partial estimate uh effect of catholic high school in the new re question to be seven points 448 with a standard barrel over one point triple eight. The T value is 3.94 and the p value is almost zero. So this effect, the effect of attending catholic high school is stronger and more significant.

