00:01
Well, for this question, i can guide you how to structure your report based on the question.
00:15
Yes.
00:16
So here is a concept to structure this report.
00:26
So the first one is data exploration and pre -processes.
00:32
First, we need to load the datasets into a greater or suitable statistic software.
00:40
Then to explore the dataset to understand the distribution and the collectoristic of each variable.
00:51
Then to check for missing value and handling appropriately.
00:57
For example, like to imputation or removal.
01:03
Then we can convert a categorical variable like day id, bank id, and bank id, and bank id, or an international.
01:49
This is international, yeah.
01:56
This is a categorical variable.
02:00
So we can into a dummy variable if necessary.
02:08
So and we need to novelize or standardize continuous variable.
02:17
Like non -day open, like a value.
02:42
Like a value, this one is a continuous variable.
03:06
Then the second step is a model selection.
03:13
So we need to choose a suitable model for binary classification since the target or the target variable is fraud.
03:22
Is a binary.
03:32
So this one is binary.
03:48
Then like a logistic regulation is a common choice for this type of problem.
03:54
And you can consider other models like decision trees, random forest, or gradient boosting if logical regulation does not perform well.
04:06
Then the third part is we need to do a model training.
04:13
We split the data sets into training and testing sets to evaluate the model's performance.
04:22
We need to change the logistic regulation model using training data.
04:29
So with its fraud as a dependent variable and the other variable as independent variables.
04:39
Right.
04:39
And next step, we need to do a statistical significance testing.
04:45
After fitting the model, examine the coefficients and the p values to determine which variable are statistically significant.
04:57
So variable with p values less than a chosen significance level, like 0 .05, like 0 .05.
05:20
Are considered statistical significant.
05:23
So consider transformation of variable if they improve model fit or significance.
05:30
Like a log transformation of value.
05:37
Then the next step is model evaluation...