Your task for this short project is to compare the efficiency of ID3 and Logistic regression classifiers. You can use the sample code available in the class code section. But it is better not to use the API call of the Python library (without sklearn). Use the Heart Disease UCI and Forest Cover Type datasets (one for binary classification and another one for multiclass classification). Give me codes for the above mentioned datasets.
Added by Austin B.
Step 1
Step 1: First, we need to import the necessary libraries for data manipulation and visualization, such as pandas, numpy, and matplotlib. Show more…
Show all steps
Your feedback will help us improve your experience
Akash M and 68 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Using scikit-learn, use any of the datasets that come with the package (excluding the IRIS data) and apply four different classifiers. Use 10-fold Cross Validation. Try to improve on the classification accuracy using sampling and any other approaches you think might work. Submit the following: The documented source code (with comments) A 300-word write-up of your results (Explain your results, and discuss what worked, what did not, and why). A screenshot of the confusion matrix, showing the best results you can achieve.
Akash M.
For the following data set, apply ID3 separately and show all steps of derivation (computation, reasoning, developing/final decision trees, and rules). color shape size class 1 red square big + 2 blue square big + 3 red round small - 4 green square small - 5 red round big + 6 green round big - Entropy(t) = - ÎŁ p(j|t) log2 p(j|t) Here, class is the target attribute and has two values (+ and -). So it is a binary classification problem. For a binary classification problem: - If all examples are positive or all are negative, then entropy will be zero, i.e., low. - If half of the examples are of the positive class and half are of the negative class, then entropy is one, i.e., high. 1. Calculating Initial Entropy: Out of 6 instances, 3 are + and 3 are -. P(+) = - (3/6) * log2 (3/6) = 0.5 P(-) = - (3/6) * log2 (3/6) = 0.5 Entropy(t) = E(t) = 0.5 + 0.5 = 1 Note: 1 indicates that the classes are highly impure. It is true in our case as there are an equal number of observations with the target class + and -.
Consider the following table showing results of a binary classification problem with validation data: Actual Class 0 1 0 1 1 0 1 1 Predicted Class 0 1 1 1 0 0 1 0 Build the confusion matrix. Compute Classifier accuracy, Precision, Recall, and F-score for Class 1 based on the above data. [2+0.5+0.5+0.5+0.5 = 4 marks]
Sri K.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD