Download the App!

Get 24/7 study help with the Numerade app for iOS and Android! Enter your email for an invite.

Get the answer to your homework problem.

Try Numerade free for 7 days

Like

Report

Use the data in DISCRIM to answer this question. (See also Computer Exercise $\mathrm{C} 8$ in Chapter $3 . )$(i) Use OLS to estimate the model$\log (p s o d a)=\beta_{0}+\beta_{\mathrm{i}} p r p b l c k+\beta_{2} \log ($income$)+\beta_{3}$ prppov $+u$and report the results in the usual form. Is $\hat{\beta}_{1}$ statistically different from zero at the 5$\%$ level against a two-sided alternative? What about at the 1$\%$ level?(ii) What is the correlation between log(income) and prppov? Is each variable statistically significant in any case? Report the two-sided $p$ -values.(iii) To the regression in part (i), add the variable log(hseval). Interpret its coefficient and report the two-sided $p$ -value for $\mathrm{H}_{0} : \beta_{\text { loghseral }}=0$(iv) In the regression in part (ii), what happens to the individual statistical significance of $\quad$ log(income) and prppov? Are these variables jointly significant? (Compute a p-value.) What do you make of your answers?(v) Given the results of the previous regressions, which one would you report as most reliable in determining whether the racial makeup of a zip code influences local fast-food prices?

SEE SOLUTION

No Related Courses

Chapter 4

Multiple Regression Analysis: Inference

No Related Subtopics

15:04

Use the data in DISCRIM to…

07:33

Use the data in MEAP9 3 to…

15:53

The data set 401 $\mathrm{…

12:04

Consider a model where the…

10:42

The following model can be…

09:46

Use the data in WAGEl for …

11:33

Use the data in HPRICEl to…

09:08

Counting carnivores Ecolog…

06:21

Income and housing revisit…

09:37

The file CEOSAL2 contains …

04:18

Use VOTEl for this exercis…

04:30

Use the data set in WAGE2 …

05:53

In this exercise, you are …

01:28

Involve a design matrix $X…

02:41

In Problem 2 in Chapter $4…

14:03

Use the data in MLB1 for t…

20:11

The following data resulte…

16:03

Use the data in KIELMC, on…

05:31

Consider the following mod…

08:18

Use the data in GPA1 for t…

Hi, everyone. So just a quick refresher. Before we start this problem, we're still in the statistical inference part of the book. So this problem is just gonna have us Look at the question whether the racial makeup of a zip code influences local fast food prices. Where the fast food prices that we're looking at are the price of a soda. We'll just go through a couple different parts of the problem. Looking at how adding different control variables or other explanatory variables in the model besides the race variable of interest, how including those other variables changes are estimation and our significance of the of the variables that were interested in so part one asked you to estimate the following model where your again your the fast food price were interested in is the price of a soda and log of the price of soda is going to be the dependent variable. And again, I'm just gonna write this out in our, uh, function notation just to make you a little bit shorter, gonna have it be a function of percent black. That's the race variable that were interested in the racial makeup and the other variable. We're gonna have. I'm sorry. The second variable will be long of income, so that'll be kind of a control variable. So holding income constant in a zip code. Do we think that the racial makeup, what influence? Fast food prices as seen in soda prices. And then the last explanatory variable were going toe have in here is the percent of the proportion. Sorry, Prp is proportion so similar? 2% But we'll just call it proportion Proportion of poverty and the zip code. So these are the three variables again, this is the one that we're interested in And these air you can think of these two as control variables. So once you estimate this this equation, you should get the following numbers and I'll do my typical thing. We're all right out the coefficient estimation and in the standard air underneath. So these the business you should be getting for proportion of black individuals and the zip code. So that looks it's going to be statistically significant. The standard air is, um, less than half is largely coefficient, so that makes sense to us for log income. Oh, and I should also say that this coefficient it for proportion. Black is positive. So that's the first interesting result. The log of income. We you should get the following go efficient. So 0.137 and then standard air is much smaller, so that'll be definitely statistically significant. And then the proportion of poverty should get Maybe 0.38 is the coefficient with the standard error of 0.13 So once you've estimated this equation, you are supposed to figure out if if Thea coefficient on proportion Black. So this one here, it asks you if that estimated coefficient is statistically different from zero at the 5% level against a two sided alternative. And the answer is so whatever statistical software you're using, you should get a T statistic of our A P value. I should say, for this coefficient equals 0.18 about stove we want to think about Is that statistically significant at the 5% level? We definitely should say no to that 0.18 No, I'm sorry. We want to say the opposite of that. We want to say yes. So this will be a check mark. Sorry. So it is significant at the 5% level 0.18 But then, when we want to think about is that statistically significant at the 1% level, doesn't quite make that cut off. So put a an X there so that that rounds up question or part. One of the question Part two asked you toe at asked you to find the correlation between two of the variables in the regression. So the correlation between log of income and the proportion of poverty and the zip code and one thing we could think about for we work to even trying to answer this is what do we think the correlation would be? Is it positive or negative? So if the as the income of a zip code rises, what do we think? What happened to the proportion of people living in poverty? And we would typically think that would be a negative relationship, Probably a strong one. Um, so once you type this into your software program, you should get this correlation of about negative. So it is negative. That's what we did. We would expect negative 0.8385 what I got for mine. So that number makes sense. It's negative. Um, we would expect that, and it's a very highly negative correlation. So 0.8 negative 0.8 is a very high, highly negative correlations that it's something we would expect. It also asked you to report whether each variable is statistically significant in any case and what you shouldn't get for P values for these variables, I would say mhm P values for P value for log income is 0.45 and then for proportion of poverty, it's less than point. Oh Oh, one. So both of these variables are statistically significant at the 1% level, and that's for part two. And once you found those two sided P values, you go into part three, part three. Ask you to adjust the regression from part one and add a new variable so we can just write it like this. So three ask you to dio to estimate the following aggression. So the aggression from one plus a new variable, which is log of this variable H s evil. And that's just the log of the meeting median housing value in the zip code. So just think of that. H s evil is, um, housing value and the median housing value in the zip code. So once you've estimated that the problem ask you to estimate the coefficient on this log of HS evil. So what's the what is the beta hat here? And you should get something. Yeah, you get the following estimation of 0.1 to 1 and the promise. You what? What's the interpretation? So how do we think of this interpretation? So remember, this is the median housing value in the zip code, or, um, and of course, we're taking the log of it. So the log of the median housing value in the zip code. And just remember that our dependent variable again when we're thinking about interpretation, price of a medium soda is our deep in it variable. So we have a log log interpretation going on, so I'll just write that down really quick. So our log log interpretation with this with this beta hat here would be the following. You think of it as the quotations here. So a 1% increase or one, let's say a 1% higher median than them there. Median housing values. So HV is housing value. Ah, 1% higher Median housing value in the ZIP code. will. It's predicted the increase, the predicted price tens of dollars and increases the predicted price of a medium soda right? That's the dependent variable. And then we're gonna go back to the percentage terms again. So by how much So let's just start from beginning. So this log log interpretation will start with again, Ah, 1% higher. Median housing value in the ZIP code increases the predicted price of a medium soda by how much percent. And there's our percent here we could just say by 0.12%. So there's the interpretation, and we're not quite done with Part three yet. The last part of this part three is reporting the two sided P value for for this coefficient. We just met this down here, so value for this again, it's It's very small. It's less than 0001 so highly, statistically significant well, under the 1% significance threshold. So that's the last part of part of three. So to sort of quick refresher. It's the longer problem we have. Our three started off with our three explanatory variables, all of which were statistically significant. We found that two of them so log of income and the proportion of poverty and a zip code were highly correlated on. Also, uh, hi. I'm a low p values so highly significant coefficients. Let me added this other this third, Explain it. Sorry, this fourth explanatory variable here, which also ended up being highly, statistically significant. So part four has a store for us, or what it asks is in this regression in part two. So this is before we added the h s e val variable. So back in this regression from Part two, what happens to the individual statistical significance of log income and the proportion of poverty? So let me just write out quickly what little little chart here. So that's just l income stands for a log of income. And this one, the second one is just the proportion of poverty here. So, um, actually, I should backtrack a little bit this ask you, so to compare from part one. So I'm part one, um, What were the what was the significance? And we already already found out the significance from part one. We got p values, or we could see that both coefficients were significant at the 5% level, but not though. 1% level. But let's just write out what we got from from part one and then regression. Uh huh. And I decided to look at the T statistics here or looking at this this statistical significance of both these coefficients. So the T value from part the regression in part one for lack of income was 5.12 and the for the proportion of poverty. 2.86 So both of those t statistics office right that there. So we're we're sure what we're talking about those. They're not p values. Was there t statistics? Right? The individual tests boozer both definitely statistically significant coefficients. You have to compare that from the part one Regression Thio. What happened when we added I think I might have said the regression of part two earlier, but we're comparing part one regression to part three right when we added this other variable the hse evil. So what happened to the T statistics on log income and proportion of poverty in part three? And what you should get is actually a negative coefficient here, or log income and positive A pigeon here, but not statistically significant for portion of poverty. So when we added our H s evil variable the median housing value we find this log of income variable actually switches signs, which is a little concerning that's definitely correct. Sign that we expect. Um, but then the portion of poverty ah, also becomes less positive and less statistically significant. So Thio wrap up part for here. It asks you, Are these variables jointly? Significant? And we have to run an F test for this. So I've gone over f test in a couple of earlier problems. I'll just write out what you should get here how you should compute this. So the f test again. You have to run the unrestricted regressions and the restricted regressions and fill in the different sums of squared residuals. But I'll just write out what I got for this statistic. So 2.6 to 8 and 2.349 again, this is 2.6 to 8 is the unrestricted regression 2.349 is the restricted regression. We should have, ah, to restrictions. Right? So that's the taking care of the numerator. Then on the bottom again, we have this sum of squared residuals of the restricted regression divided by our sample size, which is 401. You can check that minus six. The number of explanatory variables in the unrestricted aggression plus one. Now put parentheses around this quantity here. And so this F statistic ends up ends up being 23 years. So that's a very high statistic. This tells us that these variables are jointly significance. So to put it quick, check mark there. That just means that log income in proportion of poverty are jointly significance. So what do we make of this of this conclusion here? So if log income and proportion of poverty are all are jointly significance, So I would say log of income here and proportion Poverty? Uh huh. Sorry. It's not exactly how we should have spelled it, but proportion of poverty. Um, so we know now that they're even though they're individually insignificant in the full regression with with HS evil, right? They are jointly significant with the regression. And we also know from part to remember that there highly correlated. So that's right, our third variable in here, the HSE evil. So we can just kind of conclude here that all three of these variables are, let's say all three probably are highly correlated with each other. So we already again knew that log of income and proportion of poverty in the neighborhood were highly negatively correlated and median housing prices right, which is hs evil here. That's probably going toe highly correlate with one or both of the other two here, right? So one or both. So that probably gives rise to the fact that when we include HS evil in the regression, the other two variables become statistically insignificant. There's just a lot of correlation going on. You might not want Thio include all three in your aggression if you're again, we're interested in. We haven't talked about this coefficient much, but again, if you're interested in this variable proportion of black in the zip code, you might not want to include all three of these highly correlated explanatory variables. It would you think of that is a kind of muddying the water or kind of maybe misleading you and putting a little bit too much explanatory power in your in your model. Another way of thinking about these three variables together being highly correlated is that you can think about about how they're capturing the the variation in soda prices across zip codes. So that's kind of what the main thing we mean when we say they're highly correlated. They're probably all likely capturing similar variation in soda prices across ZIP codes in our sample. So including all three of them might be over doing it a little bit. And that brings us to Part five, which is asking you, given the results of Parts one through four. Which regression would you report is most reliable in determining whether the racial makeup of a zip code influences local fast food prices. So I'll just draw really rough schematic here. So again, our main question we were interested in from Part one, it's wondering. Well, does the proportion of you know black residents in a zip code influence okay, influence our local fast food prices, which were which we are representing by the price of soda. So make a question mark. This is our basic question. And of course, you want to control for something representing maybe the income of the neighborhood or the proportion of poverty, because we want to disentangle the, um, the effect of or the disentangle the correlation of racial make up of a zip code from the poverty level or the income level of zip code. So, you know, controlling for income, controlling for poverty and maybe controlling for how's housing? Ah, housing prices. So which progression that we have run would be most reliable in determining this basic relationship and front we've done, Remember in part two, way back up part two, we found that log of income in proportion of poverty are very highly negatively correlated. Then when we did three and four, adding in the median housing price of a zip code, we found that these two variables that had been negatively, highly correlated and significant suddenly lost their significance. Right. And we sort of said that these things were probably all highly correlated, so you might not want to include all three of them. So given that it might be smart, Thio, just use the regression from part one as the most reliable regression. So that's how how I would answer this to just say regression that the the regression from part one, um, is probably the most reliable. And again, if you're more new to the econometrics, uh, econometrics fielder. The statistical inference. Sometimes you might be inclined to include as many variables as you as you can in your regression if they seem to be, ah, be relevant. But you should always think about whether they're highly correlated. If there again, I'll go back up here if there explaining the same kind of variation in your outcome of interest. And if if there's good reason or good evidence to believe that they are probably explaining the same variation, you probably don't want to include all of them. So again, we're gonna conclude here by saying we probably just want the simpler regression from Part one and a za most reliable regression.

View More Answers From This Book

Find Another Textbook

04:16

$$\begin{array}{l}{\text { Use the data in WAGEI for this exercise. }} \…

Use the data in MEAP9 3 to answer this question.(i) Estimate the model

02:59

As the number of firms in an oligopoly grows large,the industry approach…

08:33

Use the data in COUNTYMURDERS to answer this question. Use only the year $19…

01:39

A technological advance that increases the marginalproduct of labor shif…

02:48

If a higher level of production allows workers tospecialize in particula…

03:52

Matthew and Susan are both optimizing consumers inthe markets for shirts…

The data set 401 $\mathrm{KSUBS}$ contains information on net financial weal…

07:27

Use VOTEl for this exercise.(i) Estimate a model with voteA as the depen…

Use the data in PHILLIPS for this exercise.(i) Estimate an AR(1) model f…