Download Equivalence Classes: Another way to envision the traversal is to first

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
• Equivalence Classes: Another way to envision the traversal is to first partition the lattice into
disjoint groups of nodes (or equivalence classes). A frequent itemset generation algorithm searches
For frequent itemsets within a particular equivalence class first before moving to another
equivalence class. As an example, the level-wise strategy used in the Apriori algorithm can be
considered to be partitioning the lattice on the basis of itemset sizes;
• Breadth-First versus Depth-First: The Apriori algorithm traverses the lattice in a breadthfirst
manner, as shown in Figure 6.21(a). It first discovers all the frequent 1-itemsets, followed by the
frequent 2-itemsets, and so on, until no new frequent itemsets are generated.
Figure 4.20. Equivalence classes based on prefix and suffix labels of item sets
Figure 4.21. Breadth first and depth first traversal
Representation of Transaction Data Set There are many ways to represent a transaction data set. The
choice of representation can affect the I/O costs incurred when computing the support of candidate
itemsets. Figure 4.23 shows two different ways of representing market basket transactions. The
representation on the left is called a horizontal data layout, which is adopted by many association
rule mining algorithms, including Apriori. Another possibility is to store the list of transaction
identifiers (TID-list) associated with each item. Such a representation is known as the vertical data
layout. The support for each candidate itemset is obtained by intersecting the TID-lists of its subset
items. The length of the TID-lists shrinks as we progress to larger sized itemsets.
Figure 4.23. Horizontal and vertical data format.
However, one problem with this approach is that the initial set of TID-lists may be too large to fit
into main memory, thus requiring more sophisticated techniques to compress the TID-lists. We
describe another effective approach to represent the data in the next section.