00:02
Once again, welcome to a new problem.
00:07
This time we're dealing with correlation.
00:10
We're dealing with correlation.
00:12
If you think about inferential statistics, we have two types of inferential statistics that we can deal with.
00:25
We have hypothesis testing.
00:28
We have hypothesis testing.
00:31
And, of course, we also have confidence intervals so we have hypothesis testing and we have confidence intervals we have under hypothesis testing we could have numerical variables we could have numerical variables and we could also have categorical variables so we have numerical variables and we also have categorical variables so we these are different types of hypothesis that we can deal with.
01:15
Under numerical variables we have regression.
01:19
We have regression and regression involves relationships.
01:26
Regression involves relationships between independent and dependent variables.
01:36
So we have relationships between independent and dependent variable, where the independent variable the x and the dependent variables the y we could call the independent variable the explanatory variable so this is going to be the explanatory variable and the y is the response variable so we do have the explanatory variable and we also have the response variable so in this particular problem we have a new problem that we're running right now and it just so that there's an investigation so it's a state investigation that's looking at the relationship it's looking at the relationship between salary which is our x independent variable and the number of absences absences of state employees so we're looking at those relationships.
02:52
This one is the why.
02:53
This is the dependent variable, obviously, and the salary is the independent variable.
03:01
So we want to see how salaries affect the number of absences.
03:08
And in the table we're going to have the salary and this is in k ,000s.
03:17
Pay thousands would be 2 ,000 or 3 ,000 or 20 ,000s and so on and so forth.
03:24
And so that's what you have over there.
03:32
And then we have number of absences, number of absences.
03:37
And on this one we have the y.
03:41
So the first one is we have 22 .3 absences.
03:47
We have 22 .5.
03:49
2 .0 absences and we have 25.
03:53
2 .0 absences we do have 27 .5 and then 2 .1 .8 absences and of course we do have 30 and that case is 2 .2 absences and then we have 32 .5 we have 1 .5 absences and then we have 1 .5 point zero and that's going to give us the same thing as one point two and don't forget 37 .5 this one is going to give us 1 .3 and then of course the last one we're going to have 40 .0 and this gives us 0 .6 just recall that this is the x and this is the y part of it that's just part of the table so in part a draw scatter plots relating the x and y variable and then in part b we want to estimate a simple linear linear regression line in their regression line explaining explaining the explaining the parameters, linear regression line explaining the parameters.
05:47
Determine determine the average number of absences for employees making $29 ,000.
06:14
We want to see what's the average for that.
06:18
And then at alpha equals to 5 % is there a relationship between and y, is there a relationship between x and y? determine, determine 95 % confidence, determine 95 % confidence, determine 95 % confidence, interval for beta and remember beta is the slope parameter so beta happens to be your slope parameters so we want to see the 95 % confidence interval for that f determine the coefficient determine the coefficient of determination coefficient of determination and then of course the final thing is determine the correlation coefficient the coefficient of determination that's your r square and then your correlation coefficient that happens to be your r so i will just want to jump into the problem the first thing is we're doing this kind of plot and this means that we're gonna have the x and the y axis the x -axis obviously is the salary and then the y axis is the absences of state employees and we have a scale we could start a 10 then we have 15 and then we have 20 and then we have 25 and then 30 we have 35 we have 40 and then of course we have 45 on this side we have 5 we have 1 we have 1 .5 we have 2 we have 2 .5 and then you know we could stop there and then the next up is to plug in the numbers 20 and about 2 .3 number right there so you know we have it these are just the point so we're using the table if you go back to the table you're going to see that this is what's going to happen and we do have the table and these are your scatterplot numbers are your scatterplot numbers 1 .7 this one goes 25 and then of course we have 30 and 2 .2 somewhere there and then we have between and 1 .2 somewhere there and then we have at 35 we're going to have in close to that so we have all these numbers coming up so all you're doing is looking at the points of the scatterplots and then after that we could draw the line of best feet that estimates the position of this line so that's your typical this is your typical line of best feet you could see it right there and it's an estimate of the so there's a negative relationship.
10:49
There's a negative relationship between salary and state employee.
11:05
So we could see the negative relationship.
11:08
The next stop is we want to do computation for, we want to run a regression analysis.
11:15
And in this case you could use excel, so if you use excel software, and you run an analysis you're going to get all the specific numbers remember going back we wanted to figure out this simple linear regression line and their parameters so when you run the numbers using your excel you're going to get an an nova table so on the data analysis excel toolpubb you run regression on x and y and so you end up with an an over table we have the source we have regression we have the residual and then we have the total sum of squares degrees of freedom mean square the f ratio and the p value sum of squares for regression is 1 .944 and then this is 0 .316 then this is 2 .260 we have the degrees of freedom 178 mean square mean square for the regression is 1 .94 and then mean square residual is 0 .051 i remember if you're getting the mean square that is sum of squares of degrees of freedom so this over that and then this over that gives you that and this one gives you that and then this we got the total we want to get the f ratio for the f ratio it's the it's the mean square for the regression over the mean square of the residual residual and that's how we're getting the f ratio number so we have 43 .06 and then of course you know the key value the distribution for the regression is this f equals to 43 .06 this point right here this represents the p value and that p value is 0 .003 that's our p value of course we still need an output for the regression equation there are certain requirements for example the correlation coefficients we get that to be 0 .86 and then the r value is negative 0 .927.
14:13
Of course the standard error that ends up being 0 .212.
14:18
The sample size 9, k is 1 and so those are those are the numbers you're looking at.
14:26
In terms of regression regression outputs there are certain things that are going to capture...