00:01
For this problem, we are told that when analyzing large data sets with many variables, researchers often encounter the problem of missing data or non -response.
00:09
Typically, an imputation method will be used to substitute in reasonable values or the mean of the variable for the missing data.
00:17
We have an imputation method that uses nearest neighbors as substitutes for the missing data was evaluated.
00:23
In data and knowledge engineering, we have the two quantitative assessment measures of the imputation algorithm are normalized root mean square error and classification bias.
00:33
The researchers applied the imputation method to a sample of 3 ,600 data sets with missing values and determined the nrmse and classification bias for each data set.
00:44
The correlation coefficient between the two variables was reported as r equals 0 .2838.
00:50
In part a, we are asked to conduct a test to determine if the true population correlation coefficient relating nr mse and bias is positive and to interpret this result practically so to begin let's select our level of significance let's say alpha equals 0 .001 in that case our alpha value or our t alpha value that we'll want to use would be equal to 3 .0925 now calculating our actual t value based on our our statistic, or our r value, that would be equal to r, so 0 .2838 times the square root of n minus 2, so that would be 3 ,598 divided by the square root of 1 minus r squared, so 1 .2828 squared which will give a result of tc equals 1 second here, 17 .7532.
02:02
The p value corresponding to this will be equal to 3 .0...
02:09
Or, excuse me, one moment here.
02:11
Excuse me...