Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
March 30, 2005 9:7 WSPC/Lecture Notes Series: 9in x 6in 40 heg05a M. Hegland Here we can apply the previous theorem to get a lower bound for Ck : |Ck | ⥠b(k) (mk , . . . , ms , s + p â 1). This, however is contradicting the higher upper bound we got previously and so we have to have |Ck+p | ⤠b(k+p) (mj , . . . , mr ). As a simple consequence one also gets tightness: Corollary 17: For any m and k there exists a Ck with |Ck | = m = b(k+p) (mk , . . . , ms+1 ). such that |Ck+p | = b(k+p) (mk , . . . , ms+1 ). Proof: The Ck consists of the first m k-itemsets in the colexicographic ordering. In practice one would know not only the size but also the contents of any Ck and from that one can get a much better bound than the one provided by the theory. A consequence of the theorem is that for Lk with |Lk | ⤠mkk mk one has |Ck+p | ⤠k+p . In particular, one has Ck+p = â for k > mp â p. 4. Extensions 4.1. Apriori Tid One variant of the apriori algorithm discussed above computes supports of itemsets by doing intersections of columns. Some of these intersections are repeated over time and, in particular, entries of the Boolean matrix are revisited which have no impact on the support. The Apriori TID [?] algorithm provides a solution to some of these problems. For computing the supports for larger itemsets it does not revisit the original table but transforms the table as it goes along. The new columns correspond to the candidate itemsets. In this way each new candidate itemset only requires the intersection of two old ones. The following demonstrates with an example how this works. The example is adapted from [?]. In the first row the itemsets from Ck are depicted. The minimal support is 50 percent or 2 rows. The initial matrix of the tid