2. (50 points) In addition to the Gini index in Lecture 11-1, entropy is also commonly used as a metric to quantify impurity for learning the optimal decision trees based on the greedy search algorithm. Specifically, if a feature derives the smaller entropy, then the feature derives the less impure partition, so the feature is better to finish partition in a decision tree.
Table 1 contains six training samples with three binary features: chest pain, gender, and smoke status. You are required to apply the greed search algorithm with entropy as the metric for learning the optimal binary decision tree to predict whether a patient has a heart attack, and the label information is summarized in the rightmost column.
Table 1: The patient information of heart attack
| Patient ID | Chest Pain | Male | Smokes | Heart Attack |
|---|---|---|---|---|
| 1 | yes | yes | no | yes |
| 2 | yes | yes | yes | yes |
| 3 | no | no | yes | yes |
| 4 | no | yes | no | no |
| 5 | yes | no | yes | yes |
| 6 | no | yes | yes | no |
Hint and requirement:
(a) Please follow the greedy search algorithm in Lecture 11-1. In this problem, you need to replace the Gini index metric by entropy and derive the optimal decision tree for predicting heart attack.
(b) Please calculate the entropy of each partition based on each feature, and choose the feature with the lowest entropy as the optimal feature.
(c) Please draw your learned decision tree.