This assignment is based on the case study "Breaking Barriers: Micro-Mortgage Analytics" which can be purchased from Harvard Business Publishing. Please refer to the syllabus for the link to access this case study. Please answer the following questions as detailed as possible. If you want, you can use Excel to answer these questions. (1) Examine the decision tree in Exhibit 8 and come up with a strategy which Ajay and his team should adopt to identify creditworthy customers early in the assessment phase. (2) Using the classification table in Exhibit 10, answer the questions below: (a) Given 100 load applicants who have been approved a loan, how many of them would the model be able to correctly predict as sanctions? (b) Calculate the proportion of actual rejects which have been incorrectly classified as sanctions. (3) Apply the CHAID decision tree on the 100 data point provided in Exhibit 11 and construct the confusion matrix (assume the cut-off is 0.8). Is it similar to the one provided in Exhibit 10? What are your conclusions regarding the CHAID decision tree results? (4) Using Keerthana's model in Exhibit 12, access the statistical significance of a candidate's ability to repay the loan. Hint: Assume IAR to be a good proxy for a candidate's ability to repay the loan. (5) Siddharth had a look at the regression model built by Keerthana and believes that there could be a better model. Is Siddharth correct? Justify your answer. (6) Using Siddharth's model in Exhibit 12: (a) Compute the odds ratios relating the probability of a sanction associated with a 5\% increase in LTV after adjusting for the other factors. (b) Jamuna currently lives in a rented accommodation in the city of Rewari and she has applied to Shubham for a home load. What are her chances of getting a loan sanction had she not been living in a rented accommodation, keeping all other factors constant? (7) Use the data for two applicants in Exhibit 14 and derive the probability of sanction for each of them by applying Siddharth's model. (8) Exhibit 15 contains the confusion matrix for the training dataset at different cut off values. (a) Examine the table and suggest a suitable cut off point for Siddharth's model. (b) Using the optimal cut off point obtained from above, predict the decisions that Shubham would take with respect to applications in Q7. (9) Using Siddharth's model, calculate the profit/cost for correct and incorrect decisions during the application processing stage.
Added by David R.
Close
Step 1
Is it due to limitations, constraints, or other factors? Show more…
Show all steps
Your feedback will help us improve your experience
Dominador Tan and 81 other Intro Stats / AP Statistics educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
A copy company uses ten photocopy machines—machines 1 to 5 are model A and machines 6 to 10 are model B. To determine which model to purchase in the future, office managers tabulated the average number of errors per day that each machine made last month, resulting in the following data. You are hired for the statistical consulting on the decision. What would be your advice on the future decision on purchasing a new copy machine? Conduct a formal significance test and specify the four steps of hypothesis testing. Hint: you can use EXCEL to obtain the pertinent statistics, but you will also need to lay out the formulas. 12. Seven women were asked their opinions about the ideal number of children in a family. The ideal numbers of children before and after they got married are shown in the following data table. Conduct a formal significance test and specify the four steps of hypothesis testing. Hint: you can use EXCEL to obtain the pertinent statistics, but you will also need to lay out the formulas. Extra credits: calculate the observed t-value and 95% confidence interval using the data, and confirm your results with the EXCEL output you chose.
Dominador T.
Senior management at Humber bakery requested a new analysis based on adjusting the selling price and the number of units produced under each production plan. Initial probability estimates are also updated. Resulting gross profits ($) and state of nature probabilities are given in the following payoff table. Low Demand Medium Demand High Demand Light Production 55,550 85,000 85,000 Moderate Production 43,100 102,000 102,000 Heavy Production 5,750 64,650 123,550 Probability 0.2 0.5 0.3 a) [8 marks] What is the optimal decision using the minimax regret approach? Show your work. The new analysis also necessitated updating the offer made to Bramptinos under the heavy production plan. The probability that Bramptinos will accept the new offer is 26% and the associated gross profit is determined to be $112,500. Again here, if Bramptinos declines the offer, the loaves will still sell based on current demand conditions (low, medium, or high). b) [8 marks] Using the decision tree you selected from Part B along with the payoffs and probabilities provided in this section, construct a decision tree for the problem. (You can draw manually or use software. Marks will be given for presentation). What is the optimal decision in this case? Why? Before making a final decision on the production plan to adopt, the bakery's manager decides to contact Professor Leung in the Math Department to conduct a market research survey. The results of the survey will indicate either favourable or unfavourable market conditions for premium breads. In the past, when there was Low Demand, Professor Leung's predictions were unfavorable 80% of the time. The professor's predictions have also been favourable given Medium Demand 88% of the time, and unfavourable given High Demand 10% of the time. c) [12 marks] Calculate posterior (revised) probabilities (Round to 3 decimal places; do not round intermediate results). Show calculations or tables. d) [25 marks] Construct a multistage decision tree (based on part b) with the additional information from Professor Leung. e) [2 marks] What is the value of the sample information (EVSI) provided by Professor Leung? f) [3 marks] State the optimal decision strategy if Professor Leung's consulting fees were $500. g) [2 marks] Does the strategy change if Professor Leung's consulting fees were $1500? If yes, state the new optimal strategy? If no, explain.
The director of the MBA program at Salterdine University wants to develop a procedure to determine which applicants to admit to the MBA program. The director believes that an applicant's undergraduate grade point average (GPA) and score on the GMAT exam are helpful in predicting which applicants will be good students. To assist in this endeavor, the director asked a committee of faculty members to classify 70 of the recent students in the MBA program into two groups: (1) good students and (2) weak students. The file MBAStudents.xlsm summarizes these ratings, along with the GPA and GMAT scores for 70 students. Use Minitab or any other statistics software that you have available to answer the following questions. 1. Use discriminant analysis to create a classifier for this data. How accurate (percentage) is this procedure? 2. Use logistic regression to create a classifier for this data. How accurate is this procedure? Minitab does not automatically provide the accuracy percentage, so you have to calculate it. First, you need to obtain the predicted "event probabilities". This is done by selecting "Storage" on the binary logistic regression box and then checking "Event Probability". To determine if an observation was classified in the correct group, you need to compare the probability (column EPRO1) and the Rating columns. If the probability is less than 0.5, it means that our model predicts that the observation belongs to group 1. If the probability is greater than 0.5, it means that our model predicts that the observation belongs to group 2. This way you will determine the number and percentage of misclassified observations in each group. You need to complete the following table: True Group | Put into Group | 1 | 2 | Total N --- | --- | --- | --- | --- 35 | N correct | Proportion | Note that this is the same table that you automatically obtain in the discriminant analysis report, so by having the table of each procedure, we can determine which one is more accurate, which is required in part d. 3. Interpret the coefficients of the logistic regression. Here is part of Minitab's logistic regression report: Logistic Regression Table Predictor | Coef | SE Coef | Z | P | Odds Ratio | Lower 95% CI | Upper 95% CI --- | --- | --- | --- | --- | --- | --- | --- Constant | 19.5430 | 4.73543 | 4.13 | 0.000 | | | GPA | -3.03903 | 0.880393 | -3.45 | 0.001 | 0.05 | 0.01 | 0.27 GMAT | -0.0189050 | 0.0062161 | -3.04 | 0.002 | 0.98 | 0.97 | 0.99 The coefficients that you are required to interpret are those contained in the "Coef" column. Basically, you need to determine how the variables GPA and GMAT affect the odds of being in the groups. Week 8 folder contains "logistic regression slides" which provide a good explanation on how to interpret the coefficients. In particular, look at slides 22 and 23. 4. Compare the discriminant analysis and the logistic regression. Which one is more accurate?
Sri K.
Recommended Textbooks
Elementary Statistics a Step by Step Approach
The Practice of Statistics for AP
Introductory Statistics
Transcript
400,000+
Students learning Statistics & Probability with Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD