Download the App!

Get 24/7 study help with the Numerade app for iOS and Android! Enter your email for an invite.

Get the answer to your homework problem.

Try Numerade free for 7 days

Like

Report

You need to use two data sets for this exercise, JTRAIN2 and JTRAIN3. The former is the outcome ofa job training experiment. The file JTRAIN3 contains observational data, where individuals themselveslargely determine whether they participate in job training. The data sets cover the same time period.(i) In the data set JTRAIN2, what fraction of the men received job training? What is the fraction inJTRAIN3? Why do you think there is such a big difference?(ii) Using JTRAIN2, run a simple regression of $r e 78$ on train. What is the estimated effect ofparticipating in job training on real earnings?(iii) Now add as controls to the regression in part (ii) the variables $r e 74,$ re75, educ, age, black,and hisp. Does the estimated effect of job training on $r e 78$ change much? How come? (Hint:Remember that these are experimental data.)(iv) Do the regressions in parts (ii) and (iii) using the data in JTRAIN3, reporting only the estimatedcoefficients on train, along with their $t$ statistics. What is the effect now of controlling for the$\quad$ extra factors, and why?(v) Define avgre $=(r e 74+r e 75) / 2 .$ Find the sample averages, standard deviations, and minimum and maximum values in the two data sets. Are these data sets representative of the samepopulations in 1978$?$(vi) Almost 96$\%$ of men in the data set JTRAIN2 have avgre less than $\$ 10,000 .$ Using only these men, run the regression and report the training estimate and its $t$ statistic. Run the same regression for JTRAIN $3,$ using only men with avgre $\leq 10 .$ For the subsample of low-income men, how do the estimated training effects compare across the experimental and nonexperimental data sets?(vii) Now use each data set to run the simple regression $r e 78$ on train, but only for men who were unemployed in 1974 and $1975 .$ How do the training estimates compare now?(viii) Using your findings from the previous regressions, discuss the potential importance of havingcomparable populations underlying comparisons of experimental and nonexperimental estimates.

(i) There is such a big difference between JTRAIN2.RAW and JTRAIN3.RAW in terms of fraction ofmen receiving job training because of the higher number of men in the sample in the datasetJTRAIN3.RAW(ii) The estimated effect of participating in the job training on real earnings is given by the coefficientof train which is 1.794343 indicating that the real earnings would increase by $\$ 1794.343$ if themen participate in the job training(iii) The change is not much. This is because; the real earnings significantly depend on the jobtraining. The independent explanatory variables other than educ amongst re $74,$ re $75,$ educ, age, black, hisp are individually statistically insignificant at 5$\%$ level ofsignificant(iv) This is obvious as of 2675 men based observational data, 6.915$\%$ of the men received jobtraining indicating that when the individuals themselves largely determine whether theyparticipate in the job training, they would not participate in the training as it is not an importantdetermining factor of the real earnings(v) The maximum value of avgre is 146.901The minimum value of avgre is 0Given the descriptive statistics of the two datasets, it is evident that the datasets are notrepresentative of the same population in 1978

No Related Courses

Chapter 9

More on Specification and Data Issues

No Related Subtopics

08:22

The data in JTRAIN2 come f…

15:04

Use the data in DISCRIM to…

05:09

NAEP scores Young people h…

51:43

A common problem in experi…

01:45

What’s wrong? A driving sc…

44:28

12:04

Consider a model where the…

31:17

Athletic Records An analys…

09:58

Have you or a friend ever …

05:50

Exercises $55-60$ introduc…

03:45

03:06

07:13

The percent of female wage…

06:46

13:15

Refer to the example used …

07:52

Exercises 48 to 50 refer t…

04:45

Exercises 33 and 34 refer …

12:25

Use the data in TWOYEAR fo…

16:14

(Hard.) Does the psycholog…

07:33

Use the data in MEAP9 3 to…

All right. Hello, everybody. Um, there's a pretty long problem. We're gonna be comparing two different data sets that measure essentially the same thing with some slight differences. Um, and we're gonna be basically discussing the importance of having comparable population. So again, there's a pretty long one. So let's buckle in and get straight into it. First thing we're gonna need is we're gonna need our Goldrich looks Wooldridge package. Um, this contain the data sets. I'm also going to be using the the d E l. Why are package This package allows me to basically take account of all the data points and observations in a data set. Just makes my life a little bit easier, So yeah. All right. We're gonna be using the Wooldridge library and the DPL. Liar Library. Okay, So first thing we'll end, of course, are few data sets are the J train too, and the J train three data sets. Okay, let's get into it. So first we want to find the fraction off members of the first data set that actually receive job training. So the way we're gonna do this is we're gonna say count, and then we're gonna find the subset of J train to where the condition for our sub state subset is trained equals one. So basically, whoever in this subsidy it received job training really divide that simply by the number of members of J Train to, and we will get about 41.6%. All right, waken do this exact same thing for J J Train three And I'm only actually just going back in same function, same line of code. Just changing the variable. And we will see that here only 6.92% of the members received job training. And, um so this difference is really important because in the first one it's in. It's an experiment, right? We've done an experiment to see how to find the results of job training on real earnings, and the second one is just observational, and there are a lot more members, and very few of them actually went through job training. So that's an important difference. In fact, if we look at this, we can see that trained to their only 445 people used in the experiment. But if we take this observational look there, 2675 people that responded to the observational service. So that's a pretty important difference. Okay, now we're going to run a regression, actually. Let me going. There we go. OK, now we're gonna run a regression on G. Trained to Teoh, calculate the effect of training on riel earnings and 78. So pretty simple we're gonna do shave cream to regression is a linear model, right? Of, um, Ari 78 on training and our data is J train to again. There's a very standard ah, basically been here model function. And if you take our summary, we will find that the effective training is that first of all, it is significant 5% confidence level or significance level. But we will find that, um, in the experimental study, if you receive training, you're really earnings would go up by $1794.3. OK, so that's our estimated effective participating in job training. Now to add some Now, we're gonna add some controls, so r j train controlled is gonna be the same formula, But we're also gonna add in. Are you something before plus r E 75 bucks. Education plus age was back. Black suspended. When we take the summary of this, we'll see that retraining estimate actually didn't change. That much is still significant at the 5% confidence level. But, yeah, this didn't change too much, right? It's still pretty close. Here was 1.7943 and here is 1.6801 So, um, this isn't changing too much simply because it's an experiment and it's already designed to control for these factors, right? All of these air designed to be controlled foreign, an experiment that's trying to study this specific relationship. Otherwise it wouldn't be a good experiment if you know all these other factors influenced it. So, um yes. So that's an important difference between something experimental versus something non experiment. And we will actually check that out right now by trying this exact same thing with the J train. Three data set. So here, same function. But instead this is Jake. Train every regression. Keep in mind this one was observational, not experimental and and same thing here with our controlled. So we have a James trained three regression and R J train three controlled And if we take the summary of these two and the summary of are controlled, we will see that, um, So in our regular aggression, there's actually a negative impact on from training on to real earnings. If you don't control for other factors, when we do control for other factors, we do find a positive impact, but it's very minimal. Um, and that's because, um, you know, Onley 6.915% if you recall, actually receive job training. Um, in the previous one, which means that, you know, if people choose to receive job training, it doesn't really have a big impact on real earth. Okay. All right. So, um and so if we look at the T statistics will also find the training isn't statistically significant at all. It's very it's not even close is a 0.803 compared to the 0.5 you need to be significant at the 5% significance. So, um, and the effect of controlling for other factors now shows that yeah, these other factors really do determine the real earnings and 78 as opposed to training, it doesn't actually have a huge effect. Okay, Cool. So that's, um, good comparison to make. Now we're gonna add into both of these are new variable. This is called average riel earnings and regular call. This, um we're going to define this as the earnings as the average of the earnings in 74 and 25. Sorry, I forgot. But our formula is just gonna be the relearning somebody four plus the real earning 75 divided by two. And we'll do the same thing for J Train three. Awesome. Now we ever thinks, in fact, just to show you if I view this go all the way Oh, well, that here to the end, you'll see we have our averages here. Um, And again, most of these were zero because they didn't really have really earnings in that time. But I get here here, you can see, you know, um, the average coming through. So So now we want to find our sample averages. Standard Deviations and Min Max for this variable. So pretty simply weaken. Just do the summary of the J train to average, uh, real earnings. We will get our minimum with zero R mean, which is 1.740 and our maximum with 24.376 And to find standard deviation, we just do s d on that. Same on that same. And we'll find a standard deviation of 3.900 Okay, so this gives us our, you know, values and standard deviation and weaken. Then do the same thing for J train three again. Minimum zero r mean here, instead of 1.74 it's now 18.4 and our maximum instead of 24.376 is 146.901 But you'll also notice our standard deviation went from 3.9 to 13.3. That's a pretty pretty darn big change. So, um so yeah, so it's evident that these data sets aren't representative of the same population because these air huge differences, right? 24.376 is nowhere close to 146. So because they're checking different populations, Um, that means that, you know, we're checking some very different things, especially considering the difference between experimental and non experiment. All right, now we're being asked. So Okay, all that's handled Griffin populations getting now We're being asked. Now we're tough being sorry. Now we're being told that almost 96% of the men in R J train to data set have an average retain real earnings of less than $10,000. So now, using Onley, those people we want to run, um, the regression. So it doesn't specify, but I'm going to assume we're running the regression without the controlling variables. Um, there's I with the controlling variables. My bet. So first thing we're gonna do is we're gonna take a subset of So this is gonna of a train juice that there's gonna be our low bank. Are low incomes, upset organ boozer and use a subset function or a subset Criteria exact Average retained earnings is less than 10 because obviously retain early riel earnings is in the thousands. Right? So we use 10 and we're going to do the same thing for J train. Great. Okay, cool. So now we have our low income subsets using only those were gonna now run our same regression with controlling variables, so I don't want to really type it all out. So we're just gonna go up There we go. So this is gonna be with the J train to low income, and this is gonna be are trained to l I low in the model, and we're gonna do the same thing for J train. Um, if you have similar lines of code you need to write, it's really honestly beneficial to just, you know, instead of writing the entire thing out, just go back and edit it slightly. All right? Okay, so now we're gonna do a summary J train too low in from, and we find that training still has a pretty solid effect on real earnings, right? It's still pretty similar. It's in the 1.581 point six ish range, but you'll notice they here. It's not, um, it is still significant. Sorry. At the 5% significance of if I take a summary now of J train three low income, we'll see that training job training has a very has a much greater impact on low income members than it does on the data set as a whole for this observation. And now our training is statistically significant. Of the 5% level, four are real earnings. So that's something to no training might have less effect on everyone as a whole, but it does still hold a significant effect on the lower income. Simple. Okay. All right. That one's done now and again. It is a long one. So let's just keep going with it now. We're gonna run the, um Now we're gonna Sorry. Run the regression of relearning 78 on train. That's a simple regression, but only for people who were unemployed in 74 75. So we're gonna have to do this subset again We're gonna do on em, right? And this is gonna be equal to the subset of J train to as a whole, not our low income. And we're going to say, um Oh, actually, hold up. Let me show you the data set. Just so you understand what we're looking at. So we have Here are variables unemployed, 74 unemployed, 75. If this is one, it means that the people were not employed in 74 right? There's a boolean value. A true false. If this is one, it means they weren't employed in 75. So to get our subset, we're gonna be day trained to unemployed is equal to the subset of J train to where our condition for the subset is that, um, that on on m 74 equals one and on m 75 close one. And in that way, we'll be checking to make sure that they the people in this subset, were employed in both of those years, and we can do the exact same thing for J Train three. All right, now we're running our simple in your regression. So we're gonna do, um, on em for day train to is L M R E 78 on train. Where Rita set is now Jake trained to unemployed data set, and they train three. Same thing. Exchange these up. All right. And if we take a summary of these two, uh, we will find, all right, So looking at the J train to summary for those of our unemployed again, this is a pretty consistent estimate of the effect of training on riel earnings, which makes sense because as an experiment, factors like this would have been controlled for here. However, we find a huge boost in the effect of job training. So what we're finding is if you were unemployed in 74 of 75 75. Then job training has a very significant boost on your real early, which tells us that, um that, you know, from an understanding perspective, yeah, if you were unemployed for those two years, you probably started relatively recently in 78 which means your earnings they're taking training is definitely gonna help your efficiency and efficacy at your job, which will therefore aid in your earnings. Awesome. So our last question is discussed the potential importance of having comparable populations underlining these comparison. Our populations need to be similar and comparable so that we can determine the effects of an experiment versus non experimental. Because if you have an experiment, you essentially it's like you're null and alternative hypothesis, right? You need a baseline, you need a null hypothesis. So you have tohave comparable up population in your non experimental estimates. Otherwise, you're not really going to be able to get a good result. You're not gonna understand the difference from your experiment. All right, Cool. So that is our fairly long problem. I hope you guys understood it. Um, in general, this was just running basic econometrics. The hard part here was the understanding and, um, the understanding, the value and the importance of these estimates and one thing mean outside of just numbers, which is, of course, you know a consistent thing throughout all of the content. All right, cool. So that's us for today. Thank you very much and have a good one.

View More Answers From This Book

Find Another Textbook

The data in JTRAIN2 come from a job training experiment conducted for low-in…

Use the data in DISCRIM to answer this question. These are ZIP code-level da…

NAEP scores Young people have a better chance of full-time employment and go…

A common problem in experimental work is to obtain a mathematical relationsh…

What’s wrong? A driving school wants to find out which of its two instructor…

Consider a model where the return to education depends upon the amount of wo…

Athletic Records An analysis similar to that of the previous exercise can be…

Have you or a friend ever run in a $10 \mathrm{K}(10,000 \text { meter })$ r…

Exercises $55-60$ introduce a model for population growth thattakes into…

The percent of female wage and salary workers who are paid hourly rates is g…

Refer to the example used in Section $4-4 .$ You will use the data set TWOYE…

Exercises 48 to 50 refer to the following setting. Do birds learn to time th…

Exercises 33 and 34 refer to the following setting. Thirty randomly selected…

Use the data in TWOYEAR for this exercise.(i) The variable stotal is a s…

(Hard.) Does the psychological environment affect the anatomy of the brain? …

Use the data in MEAP9 3 to answer this question.(i) Estimate the model

04:14

This question assumes that you have access to a statistical package that com…

05:40

(i) Estimate equation $(10.2)$ using all the data in PHILLIPS and report the…

01:44

In Example $11.6,$ we estimated a finite DL model in first differences (chan…

06:48

Use the data in MINWAGE for this exercise, focusing on the wage and employme…

00:13

Refer to Example 13.9 and the data in CRIME 4(i) Suppose that, after dif…

02:28

A negative income tax is a policy under whicha. individuals with low inc…

06:47

Use the data in CHARITY [obtained from Franses and Paap $(2001) ]$ to answer…

07:19

Use SMOKE for this exercise.(i) A model to estimate the effects of smoki…

09:44

A model that allows major league baseball player salary to differ by positio…

02:39

A firm is producing 20 units with an average totalcost of $\$ 25$ and a …