00:01
Once again, welcome to a new problem.
00:05
This time we're dealing with inferential statistics.
00:09
And when it comes to inferential statistics, we have two branches.
00:16
We have categorical variables.
00:22
And we also have numerical variables.
00:26
And when it comes to numerical variables, we can't be have regression analysis as one of the options where we're using the independent variable, the independent variable, which is sometimes called the explanatory variable, and we designate this as x, and then we are also using the dependent variable.
01:06
Which is also called a response variable and we designate this as why.
01:14
The regression model involves a scatter plot for quantitative variables.
01:22
This could be age, for example, and this could be income.
01:26
And so once you have the scatter plot, you introduce a line of best fit, y hat equals to a plus b x which seeks to estimate these points the gap the gap between a single point y for example if this point is x comma y so the gap between the point y and the predicted value that gap is called the residual the residual error the residual error and so we have options we can use we can use uh this line is called the least squares line of best fit line of best fit and it simplifies a relationship where we have sum of squares total becoming the same as sum of squares residual plus the sum of squares of the error.
02:43
So sum of squares total is the same as the sum of squares of residual.
02:52
Sorry, the sum of square.
02:55
So the total total sum of squares, the total sum of squares becomes the same as the regression sum of squares, regression sum of squares, combined with the sum of squares, combined with the sum of squares due to the error or the error sum of squares.
03:22
So that's the relationship that you use to get the line of best fit.
03:26
It's a line that estimates the average deviation.
03:32
So we have points on, so points, points above the line, above the line are balanced by points below the line.
03:51
And this is the sole purpose of the line of best fit.
03:54
So now we have a new problem and in this problem we have data.
04:05
And the data shows a bivariate, a bivariate relationship between the x and y variable.
04:21
Remember the x is the independent and the y is the dependent.
04:30
And so we have the data table, this is x, this is y, and we're going to have the numbers 22 .3.
04:38
24 .3, 26 .3, 27 .8, 29 .7 and the y is 29 .5, 30 .1, 24 .7, 25 .4 and 21 .1.
04:53
These are the x and the y value.
04:57
And then on this side, we have y minus y bar.
05:01
Remember, y bar is the average for y.
05:05
And the average for y, if we wanted to get that average, you would have to take.
05:09
The entire sum of this column divided by the sample size.
05:13
The sample size n you could see is one, two, three, four.
05:18
We have five points.
05:20
So whatever sum you get, that's going to be that.
05:22
So we have that column squared and this is the same as the total sum of squares.
05:30
So total sum of squares, that's the first column.
05:38
And then the second column we do have the regression sum of squares, regression sum of squares and the formula for that is y hat minus the average that gives us the regression sum of squares and then the final column is going to give us the error sum of squares, the error sum of squares and the error sum of squares is y minus y hat...