00:01
Hello, everybody, and welcome to an econometrics tutorial using our studio.
00:06
So in this video, we're going to be looking into some endogenous variables and how you can try to use instrumental variables to fix that problem.
00:17
So let's begin.
00:20
First of all, you need to download the dataset that we're going to be using, which is called oldridge.
00:27
So go ahead and open up that dataset.
00:31
Open up the data for k4 .1.
00:33
Subs.
00:35
That is the data we're going to be using and it is data about retirement funds, age, income, and that kind of thing.
00:49
If you open it up in help, you can see what all the denotations mean in the 401 subs data file.
00:56
So for this example, we're going to want to compare what effect a bunch of these variables have on pira, which is whether or not you have an individual retirement arrangement.
01:10
We'll be comparing it to income, age, and then whether or not you have a 401k, which is a retirement savings plan that some firms offer their employees.
01:23
We're first going to start with an ordinary laced squares model to come up with some inferences about the data.
01:35
So you're going to want to do a linear regression on this formula here.
01:44
So we're going to be comparing p4 -1k and what effect that has on it? what effect income has on it? income squared, age and age squared.
01:55
And then of course an error term.
01:57
So let's get to that, shall we? oh, actually, before we get to that.
02:02
Of course, it is also common practice.
02:04
As you see in here, quite a few of these are factors.
02:06
So you have, you know, marriage is a factor zero or one.
02:11
E401k is a factor.
02:12
Male or female is a factor.
02:16
That's not a factor.
02:18
P401k is a factor and also peers a factor.
02:20
So it is pretty good practice to change the data, the data for these factors into actual factors in r.
02:29
Because right now, as you can see down here, peer is not being recognized as a factor.
02:34
Neither is male, marriage.
02:38
E401k or p411.
02:40
So it is good practice to do that.
02:44
If you have factors, if you have binary variables, go ahead and switch them.
02:50
Again, you don't technically need to do this.
02:53
The conclusions you can draw from the model, the same more or less.
03:03
So it's just good practice to do this.
03:07
All right, so now is a good time to run the regression.
03:10
So let's go ahead and name it, model 1.
03:16
A very original.
03:28
All right.
03:29
So, let's run the model and see what happens.
03:32
Boom.
03:34
Creating the factors didn't seem to work.
03:36
So yeah, it is good to create factors, but it's not imperative.
03:39
So we will avoid it here.
03:42
All right.
03:43
So, here we have our model, and for this example, we're going to be mainly focusing on the effect of p4 .1k on pira.
03:51
So that's right over here.
03:54
We have the estimation.
03:55
It is what appears to be 5%.
03:59
P value is very small.
04:00
So this is really significant.
04:01
This looks really good.
04:03
Also, because both pira and p4 .1c are factors.
04:08
The number here, the b1, is actually represented by this equation here.
04:16
So you have pira hat, pira 1 hat, and pera 0 hat.
04:19
So this would be the probability that you, the probability that you have an ira given you also have a 401k account.
04:33
And then this is the probability that you have an ira given that you don't have a 401k account.
04:37
And that's what your b1 is...