a. Which one is subjectively most interesting? i. An association rule that has reasonably high support but low confidence. ii. A rule that has low support and low confidence. iii. A rule that has low support and high confidence. b. Suppose the confidence of the rules A → B and B → C is larger than some threshold, minconf. Is it possible that A → C has a confidence less than minconf? i. Yes ii. No c. The training error on an unpruned decision tree is 80% while the error rate on each bootstrap sample is close to 60%. What is the error estimate of 0.632 bootstrap? d. Choose the incorrect one - A continuous attribute could be discretized using. i. Binning ii. Clustering iii. Rule discovery e. Which one has the highest entropy? i. A fair coin ii. An unfair coin iii. A fair dice f. Based on your answer to question (e), write down the entropy values to corroborate your answer.
Added by Nuria P.
Step 1
A rule that has low support and high confidence is subjectively most interesting because it shows a strong relationship between items, even though it occurs less frequently. So, the answer is: $$\textbf{iii. A rule that has low support and high confidence}$$ b. Show more…
Show all steps
Close
Your feedback will help us improve your experience
Anas Venkitta and 92 other Algebra educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Suppose we have a random dataset, i.e. the attribute values are generated independently from the class label, and it contains data points belongs to either POSITIVE or NEGATIVE classes. Now we need to build a classifier for such a dataset and we use half of the dataset for training while the remaining half for testing purpose. Please answer following questions and provide brief explanation for your answers: 1) Suppose there are an equal number of positive and negative records in the data and the decision tree classifier predicts every test record to be positive. What is the expected error rate of the classifier on the test data? 2) Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 0.8 and negative class with probability 0.2. 3) Suppose two-thirds of the data belong to the positive class and the remaining one-third belong to the negative class. What is the expected error of a classifier that predicts every test record to be positive? 4)Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 2/3 and negative class with probability 1/3
Madhur L.
A poll of n voters is to be taken in an attempt to predict the outcome of a by-election in a certain riding. Specifically, you are interested in the proportion of voters that will vote for a certain candidate, Candidate A. n = 472 voters have been randomly chosen, each has indicated what candidate they will vote for. You are to count the number, out of 472, who say they will vote for Candidate A. This count is measured by the random variable X. You find X = 436. (a) Find a 98% confidence interval for p, the proportion of all voters who will vote for Candidate A. Use the Z distribution to create your confidence interval. Use at least four decimal points for your lower and upper bounds. At avoid rounding errors you should use R-Stuido and not Tables. Lower Bound = 0.8905 Upper Bound = 0.9495 (b) Interpret the meaning of the interval you found in part (a) . The proportion of all voters that will vote for Candidate A is somewhere between 89.05 % and 94.95 %. (Enter your answer to at least two decimals.) (c) Find a 98% confidence interval for for p, the proportion of all voters who will vote for Candidate A, by Bootstrapping 1000 samples. Use the seed 5911 to ensure that R-Studio "randomly" samples the same "random" samples as this question will expect. You can do this by including the code, you can copy it into your R-Studio to bootstrap your samples. RNGkind(sample.kind = "Rejection"); set.seed(5911); B=do(1000) * mean(resample(c(rep(1,436),rep(0,472-436)), 472)); Find a 98% confidence interval Use at least four decimal points for your lower and upper bounds. Lower Bound = 0.8954 Upper Bound = 0.9522
Jerelyn N.
Q1) Section 2.1: Specifying Random Experiments. Problem 2.5 A desk drawer contains six pens, four of which are dry. (a) The pens are selected at random one by one until a good pen is found. The sequence of test results is noted. What is the sample space? (b) Suppose that only the number, and not the sequence, of pens tested in part a is noted. Specify the sample space. (c) Suppose that the pens are selected one by one and tested until both good pens have been identified, and the sequence of test results is noted. What is the sample space? (d) Specify the sample space in part c if only the number of pens tested is noted. Q2) Section 2.2: The Axioms of Probability Problem 2.34 A number x is selected at random in the interval [-1, 2]. Let the events A = {x < 0}, B = {|x - 0.5| < 0.5}, and C = {x > 0.75}. (a) Find the probabilities of A, B, A ∩ B, and A ∩ C. (b) Find the probabilities of A ∪ B, A ∪ C, and A ∪ B ∪ C, first, by directly evaluating the sets and then their probabilities, and second, by using the appropriate axioms or corollaries.
Lucas F.
Recommended Textbooks
Elementary and Intermediate Algebra
Algebra and Trigonometry
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD