The following data shows a sample of tweets about COVID-19. The data shows the count of different words in a tweet and whether the tweet is positive or negative. Suppose you use the first 5 tweets for training and the last 2 tweets for testing. Compute both the training error and the testing error of the data when using Naïve Bayes to classify positive vs negative tweets. Show the details of your solution. Document# Sick Depressed Recovered Beach Polarity 1 1 0 1 1 + 2 0 1 0 0 - 3 0 0 0 1 + 4 1 1 0 1 - 5 1 0 1 0 + 6 1 1 0 0 - 7 1 1 1 1 +
Added by Ricardo B.
Step 1
First, we need to calculate the probabilities of each word given the polarity (positive or negative). We can do this by counting the number of times each word appears in positive and negative tweets and dividing by the total number of positive and negative tweets, Show more…
Show all steps
Close
Your feedback will help us improve your experience
Ameer Said and 93 other Intro Stats / AP Statistics educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Problem 1. Nave Bayes Classifier (25 points) From the given the training data set shown in Table 1, we train a Nave Bayes classifier. Each row refers to an apple instance with three categorical features (size, color and shape) and one class label (whether the apple is good or not). Table 1: Training Data for Nave Bayes Classifier RID Size Color Shape Class: good_apple 1 Small Green Irregular No 2 Large Red Irregular Yes 3 Large Red Circle Yes 4 Large Green Circle No 5 Large Green Irregular No 6 Small Red Circle Yes 7 Large Green Irregular No 8 Small Red Irregular No 9 Small Green Circle No 10 Large Red Circle Yes
Adi S.
Use the concepts discussed in the slides for handwritten dataset. Divide the dataset into train and test set (try 70-30, 80-20 train-test split & see if accuracy varies). Train the following models using the following import statements: (10 points) a. from sklearn.linear_model import LogisticRegression b. from sklearn.tree import DecisionTreeClassifier c. from sklearn.neighbors import KNeighborsClassifier d. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis e. from sklearn.naive_bayes import GaussianNB f. from sklearn.svm import SVC Test the accuracy of each model for the same test data. Generate the heatmap (see below) for the confusion matrix of predictions for each model. (15 points) Find which model has the highest accuracy and what is model accuracy achieved. (5) Try 10 fold cross validation for the models & create a boxplot that shows the accuracy for each model. (20)
Dominador T.
Bayes' rule can be used to identify and filter spam emails and text messages. This question refers to a large collection of real SMS text messages from participating cellphone users. In this collection, 747 of the 5574 total messages (13.40%) are identified as spam. The word "free" is contained in 4.75% of all messages, and 3.57% of all messages both contain the word "free" and are marked as spam. Use this information to answer to the following questions. Part 1 (a) What is the probability that a message contains the word "free", given that it is spam? Round your answer to three decimal places. P(Free if Spam) =
Audrey F.
Recommended Textbooks
Elementary Statistics a Step by Step Approach
The Practice of Statistics for AP
Introductory Statistics
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD