The data set in CATHOLIC includes test score information on over $7,000$ students in the United Stateswho were in eighth grade in $1988 .$ The variables math 12 and $r e a d 12$ are scores on twelfth grade standardized math and reading tests, respectively.$\begin{array}{l}{\text { (i) How many students are in the sample? Find the means and standard deviations of math } 12 \text { and }} \\ {\text { read12. }}\end{array}$$\begin{array}{l}{\text { (ii) Run the simple regression of } m a t h 12 \text { on read12 to obtain the OLS intercept and slope estimates. }} \\ {\text { Report the results in the form }}\end{array}$$$\widehat{m a t h 12}=\hat{\beta}_{0}+\hat{\beta}_{1} r e a d 12$$$$n=?, R^{2}=?$$where you fill in the values for $\hat{\beta}_{0}$ and $\hat{\beta}_{1}$ and also replace the question marks.$\begin{array}{l}{\text { (iii) Does the intercept reported in part (ii) have a meaningful interpretation? Explain. }} \\ {\text { (iv) Are you surprised by the } \hat{\beta}_{1} \text { that you found? What about } R^{2} ?}\end{array}$$\begin{array}{l}{\text { (v) Suppose that you present your findings to a superintendent of a school district, and the }} \\ {\text { superintendent says, "Your findings show that to improve math scores we just need to }}\end{array}$$\begin{array}{l}{\text { improve reading scores, so we should hire more reading tutors." How would you respond }} \\ {\text { to this comment? (Hint: If you instead run the regression of read } 2 \text { on math } 12, \text { what would }} \\ {\text { you expect to find? }}\end{array}$

NO ANSWER AVAILABLE

Chapter 2

The Simple Regression Model

H?Ng N.

May 2, 2021

where can I get the data set

07:16

$$\begin{array}{l}{\te…

16:51

Use the data in CATHOLIC t…

13:46

Use the data in HTV to ans…

we're concluding chopper to with computer exercise number 10 where we have to use the data cell told Catholic, which includes that score information of over on over 7000 students in the United States who were in eighth grade in 1988. Ah, here the variables. Math 12 and re 12 or scores on 12. Great center guys. Math and reading tests, respectively. And the first question we're being asked. Thio, find out how many students are in the sample. And what are the means that they're deviations of master weaponry. 12. Okay, so first, we're gonna go ahead and describe our data set. As we can see, we have 7430 students, which is the number of observations us we can see here. Every observation corresponds to a different student that has been assigned a different I d and different binary variables, depending on whether it's female, Asian, Hispanic, black ah, and the corresponding reading and math test. So the answer is 7430. Now about the descriptive statistics variables going to summarize them and see that indeed, we have no missing observations here and the average score for the reading test is 51.77 Stan Division 9.41. And for the math test, the ever scores 52.13 and and higher Sunday ovation before 9.46. The minimum values are comparable into maximum or 68 71.3, respectively. So, as we can see, uh, in this sample, it looks like the reading test is, on average, more difficult, more difficult. Get Ah, hi great endless last variation, slightly less variation compared to mass of the math test. So we're going to say, you know, if if it mean is ah lower. A smaller and variation is smaller than it seems that the reading that's more difficult for the students in the sand. OK, in Part B, we'd run a simple regression of math 12 and read 12 and obtained the old Lessen your sentence, local estimates and reporter results like we always do. But we're going around a very simple, regression dependent. Very boys, the mask or independent variable. He's a reading score and very nice. Here we have full number of observations and extremely statistically significant join F test from the coefficients when, which means that the model does a lot better than a simple intercept model and explain the variation in maths. Ah, very high r squared. More than 50% of the variation in math scores is explained by immigration readings course here. The point is, of interest are the constant term, which is 15.15 and slope coefficient. The estimate for slow go fishing 0171 43. I have reported the equation right here Now, in part C doesn't interested reported here. Have a minimal have meaningful interpretation. Explain. Well, let's think about what intercept means. As we remember, this is a linear equation. So this is a line on the X Y plane, and the intercept is where the line dissect the y axis. So the interpretation is if the reading scores zero, then the course wanting massacre will be 15. Now, is this being full? Yeah, well, not really. Not really meaning that if someone does not, that has no ideas. Absolutely. The wars and reading is guaranteed to have a mass for 15 on average. I know. I mean, maybe could be the case, but nothing makes thought of sense in terms of analysis in terms of statement to make out of the analysis, right? Ah, in Part four, we're being asked if were surprised by the Beata hat that we found. And what about the R squared? Well, that we don't have is 0.71 43 and it's fine. It's hardly surprising, right? Not really surprising, because this means they will increase reading. Sport was scored by one point than mass court will increase by 0.71 close to one points. But let will get less and won. And before we saw that the reading score ese on average, more difficult than the mask or so that's not really surprising. Not at all. I would say, on the other hand, cleared the 50. The 0.50 505 is surprising, meaning that, you know there might be thousands of variables that play a role in explaining the very Asian math scores. But yet that reading squeeze more than 50% of these variations. This is surprising. This is amazing, I would say, but we'll we'll talk about it a little bit later. In Part five, supposed to present your findings to a superintendent of a school district and the superintendent says your findings show that improved mass scores we just need to improve reading scores so we should hire more reading tutors. How should we respond to this comment? And the hand is even said Run regression of re troll of Maxwell. What would you expect to find? Well, let's answer this first, what we expect to find, I would expect I haven't run the aggression, but I would expect that would get almost identical. Nothing the identical but extremely similar results. And indeed let's do that. Let me just, um, flipped around, uh, the order of the variables. So now we're trying to explain reading scores with math scores. Let's see. Yeah, well, look at that identical R squared off course. Not surprisingly, coefficients almost identical. Almost very, very close. I mean, I don't think the difference is that basically significant. I mean, the difference between the two results, so we get practically the same thing. And of course, this does not surprise us at all, because what the superintendent saying is trying to impose a cause ality chain. The superintendent reads these results as the readings court cause mass course, but here we're not talking about brutality here. And this is a common fallacy that many people, you know, uh, this is a very common mistake. Economic theory or theory in general should dictate if a variable causes the other. But the regression analysis does not with here we're talking about statistical association, right? Statistical association does not imply personality and correlations night like reality. So here we know that an increase of escort by one is associated with that correspond increase of 0.71 in maths or the other way around, right from mass. Ooh ah, reading. So the common interpretations that there might be many, many variables affecting those performances. For example, the i Q. The family income level, the mother and father education, the race, maybe religion. Whatever you name it. Those two scores tend to move together. And we, um computer correlation. I would expect to find something very similar to our slope coefficient. Yeah. I mean, look at that. Is there a point? 71 is almost identical to slope coefficient. Right? So this the superintendent statement is false. Ah, we're talking about statistical association, not cause ality and ah, just be very careful with these kinds of statements because the regression analysis, especially that by a very regression analysis, does not imply quiz Allah given form, but it does imply it's the physical coordination.

