In this question we will do support counting using a hash tree. We have a 9 item set: {1, 2, 3, 4, 5, 6, 7, 8, 9}, and 20 candidate itemsets of length 3: {124}, {134}, {137}, {156}, {159}, {178}, {237}, {245}, {246}, {346}, {357}, {368}, {389}, {457}, {467}, {469}, {568}, {569}, {578}, {679}. We assume that each itemset keeps its items in increasing order. Our hash tree components are as follows: hash function is h(p) = p mod 2, and Max leaf size is 4. According to this hash tree structure, how many comparisons/matches we need to make in order to calculate the total number of itemsets (among the 20 candidates above) that are supported by transaction (1, 5, 6, 7, 9)? Hint: It's less than 20. Show all your calculations.
Added by Victor Manuel R.
Close
Step 1
This gives us: 1 mod 2 = 1 5 mod 2 = 1 6 mod 2 = 0 7 mod 2 = 1 9 mod 2 = 1 So, we have two buckets: bucket 0 contains {6}, and bucket 1 contains {1, 5, 7, 9}. Show more…
Show all steps
Your feedback will help us improve your experience
Akash M and 70 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
In the Apriori algorithm, we can use a hash tree data structure to efficiently count the support of candidate itemsets. Consider the hash tree for candidate 3-itemsets shown in Figure 6.32. (a) Based on this figure, how many candidate 3-itemsets are there in total? (b) Given a transaction that contains items {1, 2, 5, 6, 8}, which of the hash tree leaf nodes will be visited when finding the candidate 3-itemsets contained in the transaction?
Aarya B.
Suppose that you are manually running the Apriori algorithm to find the frequent itemsets in a given transaction database. Currently, you have determined the candidate set of 2-itemsets, C2, and the corresponding support count of each candidate 2-itemset. Assuming that the minimum support count is 2, which candidate 2-itemsets in C2 would be frequent 2-itemsets? Select all that apply. {I1, I2}, support count = 3 {I1, I3}, support count = 1 {I1, I4}, support count = 2 {I2, I3}, support count = 1 {I2, I4}, support count = 4 {I3, I4}, support count = 3
6.4 Let c be a candidate itemset in Ck generated by the Apriori algorithm. How many length-(k-1) subsets do we need to check in the prune step? Per your previous answer, can you give an improved version of procedure has_infrequent_subset in Figure 6.4? Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based on candidate generation. Input: D, a database of transactions; min_sup, the minimum support count threshold. Output: L, frequent itemsets in D. Method: (1) L1 = find_frequent_1-itemsets(D); (2) for (k = 2; Lk-1 ≠ ∅; k++) { (3) Ck = apriori_gen(Lk-1); (4) for each transaction t ∈ D { // scan D for counts (5) Ct = subset(Ck, t); // get the subsets of t that are candidates (6) for each candidate c ∈ Ct (7) c.count++; (8) } (9) Lk = {c ∈ Ck|c.count ≥ min_sup} (10) } (11) return L = ∡kLk; procedure apriori_gen(Lk-1:frequent (k - 1)-itemsets) (1) for each itemset l1 ∈ Lk-1 (2) for each itemset l2 ∈ Lk-1 (3) if (l1[1] = l2[1]) ∧ (l1[2] = l2[2]) ∧ ... ∧ (l1[k - 2] = l2[k - 2]) ∧ (l1[k - 1] < l2[k - 1]) then { (4) c = l1 ⋈ l2; // join step: generate candidates (5) if has_infrequent_subset(c, Lk-1) then (6) delete c; // prune step: remove unfruitful candidate (7) else add c to Ck; (8) } (9) return Ck; procedure has_infrequent_subset(c: candidate k-itemset; Lk-1: frequent (k - 1)-itemsets); // use prior knowledge (1) for each (k - 1)-subset s of c (2) if s ∉ Lk-1 then (3) return TRUE; (4) return FALSE; Figure 6.4 Apriori algorithm for discovering frequent itemsets for mining Boolean association rules.
Madhur L.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
Watch the video solution with this free unlock.
EMAIL
PASSWORD