What is likelihood in Naïve Bayes classification? 1) \(\mathcal{L}(\theta) = \prod_{i=1}^{n} p(y_i|x_i;\theta)p(x_i)\) 2) \(\mathcal{L}(\theta) = \prod_{i=1}^{n} p(x_i|y_i;\theta)p(y_i)\) 3) \(\mathcal{L}(\theta) = \prod_{i=1}^{n} p(y_i|x_i;\theta)p(x_i)\) 4) \(\mathcal{L}(\theta) = \prod_{i=1}^{n} p(x_i|y_i;\theta)p(y_i)\)
Added by Barbara H.
Close
Step 1
Step 1: The likelihood in Naïve Bayes classification is the probability of observing the data given the parameters. Show more…
Show all steps
Your feedback will help us improve your experience
Supreeta N and 54 other Algebra educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Problem 2. Bayes Theorem & Naïve Bayes Classifier 1) Consider a study to determine the effectiveness of a new drug against an infectious disease. There were 10000 test subjects, some of whom were given the real drug while the rest were given a placebo. At the end of the study, 65% of the test subjects recovered from the disease, of out whom half of them took the real drug. Among the test subjects who did not recover from the disease, more than half of them (55%) took the real drug. Based on this information, will taking the drug help a patient to recover from the disease? Also, find the proportion of test subjects who were given the real drug. Show your steps clearly. 2) Consider a training set with 3 features, X1, X2 and X3, for a binary classification problem. The distribution of the data set is shown in the table below. a) Based on the information above, determine whether X1 and X2 are independent of each other. b) Determine whether X1 and X2 are conditionally independent of each other given the class. c) Compute the class conditional probabilities P(X1 = 1 | +), P(X1 = 1 | -), P(X2 = 1 | +), P(X2 = 1 | -), P(X3 = 1 | +), and P(X3 = 1 | -). d) Use the class conditional probabilities given in the previous question to predict the class label of each example with the feature set given in the training set above. Use your results to compute the training error of the naïve Bayes classifier.
Supreeta N.
Problem 4. Classification. Recall that in classification we assume that each data point is an i.i.d. sample from a distribution P(X = x, Y = y). In this question, we are going to consider a specific data distribution P and evaluate the performance of logistic regression and Bayes optimal classifier on data generated using P. In the following, we assume x ∈ ℑ and y ∈ {-1, 1}, i.e. the data is one-dimensional and the label is binary. Write P(X = x, Y = y) = P(Y = y)P(X = x|Y = y). We let P(y = +1) = P(Y = -1) = 1/2, P(X = x|Y = +1) = (1/√(2π)) exp(-((x-5)^2)/2), P(X = x|Y = -1) = (1/√(2π)) exp(-((x+5)^2)/2) 1. Start from P(X = x, Y = y) = P(Y = y)P(X = x|Y = y) and show that P(X = x, Y = y) = (1/(2√(2π))) exp(-((x-5y)^2)/2). (This is a simple one line derivation.) 2. Plot the conditional distributions P(X = x|Y = +1) and P(X = x|Y = -1) in one figure, i.e., you should plot two Gaussian PDFs in one figure. 3. Write the Bayes optimal classification rule given the above distribution P and simplify it (hint: in the end you should reach a very simple classification rule that classifies an input x based on whether or not its value is greater than a threshold). 4. Compute the probability of classification error for the Bayes optimal classifier. 5. Let us now consider logistic regression (this part and the next can be answered independently from the previous parts). Given training data (x1, y1), ..., (xn, yn), explain briefly the main steps of training a logistic regression model. I.e. what quantities/probabilities are being estimated by logistic regression? What is the parametric model used? How are the parameters of the model optimized? 6. Going back to the data distribution P detailed above, logistic regression needs to find the value of two parameters β0 and β1 using training data {(xi, yi)}i=1,...,n generated according to the distribution P. Assume that the number of training data points is very large (i.e. n → ∞); What will be the parameters β0 and β1 in this case? (Hint: Start by deriving the exact form of the conditional distribution P(Y = y|X = x).)
Shyam P.
Kirsty G.
Recommended Textbooks
Elementary and Intermediate Algebra
Algebra and Trigonometry
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD