00:01
For this problem, in part a, we'd have that our null hypothesis is that level of education and health are independent.
00:11
When we're doing a test to see if there's a relationship between two categorical variables, our null hypothesis is always that there is no relationship or that the two categorical variables are independent.
00:23
The alternative hypothesis for a test like this is going to be that the two variables are, dependent, or that there is a relationship.
00:37
For part b, or actually, we're technically still part of the same section of the question, we're asked to calculate the test statistic.
00:44
Our test statistic here will be a kai squared statistic, calculated by taking the sum over each row and column, so we'll just write that as sum over i and j, of the observed frequency in row i column j, minus the expected frequency in row i column j, squared divided by the expected frequency in row i column j where we find the expected frequency in row i column j by taking the row total for row i times the column total for column j divided by the grand total which we call n now for actually doing the calculations i'm going to jump over into excel so as you can see here i've copied all the values down into xl we'll want to find our our column totals and our row totals as well.
01:36
So sum of the first column is 289, and then we can drag the formula across if you're using excel.
01:42
We also want those row totals, so we take the sum from b2 through to e2.
01:49
E2, drag that down.
01:53
Then we'll want to have our expected frequencies for each cell, which we'll calculate by, as i said, we take the row total, so we'll do dollar sign f2, times the corresponding column total, so we do b, dollar sign 6, divided by the grand total, which we can see is 1 ,510.
02:14
Dragging that across and down, you can see all of our expected frequencies, so we then want to find those squared residuals, o minus e squared over e...