Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4.9 Practical Issues of Mining Association Rules 166 4.9 Practical Issues of Mining Association Rules In this section, we will look at the different ways to address some of the practical issues in mining association rules such as incorporating non-binary data or product hierarchies and allowing the use of different minimum support thresholds. 4.9.1 Multiple Minimum Support The association rule mining framework we have described so far in this section uses only a single minimum support threshold. In practice, this may not be desirable because each item has its own frequency of occurrence. If the minimum support threshold is set too high, we may miss some interesting rules involving the rare items in the data set. Conversely, if the minimum support threshold is set too low, the problem becomes computationally intractable and tends to produce too many uninteresting rules. This section describes the challenges encountered when applying multiple minimum support thresholds to different itemsets. The discussion presented here follows closely the method proposed by Liu et al. in [31]. First, we assume that each item i has its own minimum support threshold, µi . For example, suppose µ(M ilk) = 5%, µ(Coke) = 3%, µ(Broccoli) = 0.1%, and µ(Salmon) = 0.5%. In addition, the minimum support for any itemset X = {x1 , x2 , · · · , xk } is given by min(µ(x1 ), µ(x2 ), · · · , µ(xk )). For example µ(M ilk, Broccoli) = min(µ(M ilk), µ(Broccoli)) = 0.1%. The problem of having multiple minimum support thresholds is that support is no longer an anti-monotone measure. For example, suppose the support for milk and Coke is 1.5% while the support for milk, Coke, and broccoli is 0.15%. In this case, the itemset {milk, Coke} is infrequent whereas its superset {milk, Coke, broccoli} is frequent, thus violating the anti-monotonicity property of the support measure. Liu et al [31] proposed a method to resolve this problem by ordering the items, in ascending order, according to their minimum support thresholds. Given the thresholds defined above, the following item ordering can be adopted: Broccoli, Salmon, Coke, Milk, as illustrated in Figure 4.20. In this example, broccoli has the lowest minimum support threshold. Furthermore, given the support counts shown in this figure, only broccoli and milk satisfy their minimum support thresholds. The set of singleton items can be divided into three sets: (1) F , which contains the frequent items broccoli and milk, (2) L, which contains the infrequent item Coke, and (3) N , which contains the infrequent item salmon. The difference between L and N is that the set L can still be used to generate larger-sized candidate itemsets even though its support count is below its minimum support threshold. This is because such items could still be part of a larger frequent itemset. On the other hand, the set N is guaranteed to be infrequent since its support is lower than the minimum support of the least-ordered item, i.e., broccoli. Any itemsets involving items in N can be pruned immediately. Furthermore, we do not have to generate candidate itemsets that begin with items in C since they can only be extended with items that have larger supports. For example, the itemset {Coke, milk} is not generated at all since Coke is known to be infrequent and the support of milk is much higher than Coke. 4.9.2 Continuous and Categorical Attributes Item Minimum Support Salmon 0.50% 0.05% (S) Coke (C) Milk (M) BS Support Broccoli 0.10% 0.20% (B) 167 B BSC BC S BM BSM C SC BCM 3.00% 2.50% M 5% 10.00% SM SCM CM Figure 4.20. Frequent itemset generation with multiple minimum supports. A modified version of the Apriori algorithm is proposed in [31] to incorporate items that have multiple minimum support thresholds. In traditional Apriori, a candidate (k + 1)-itemset is generated by merging two frequent itemsets of size k. The candidate is then pruned if any one of its subsets is found to be infrequent. To handle multiple minimum support thresholds, the pruning step must be modified. Specifically, a candidate itemset is pruned only if at least one of its subsets that contain the first item is infrequent. For example, suppose the candidate {broccoli, Coke, milk} is generated by merging the frequent itemsets {broccoli, Coke} and {broccoli, milk}. Even if the support of {Coke, milk} is lower that the minimum support threshold of Coke, the candidate itemset is not pruned because the infrequent itemset {Coke, milk} does not contain the first item, broccoli. Thus, the candidate itemset {broccoli, Coke, milk} could still be frequent at size three or larger. 4.9.2 Continuous and Categorical Attributes So far, we have described the problem of mining association rules in the context of binary attributes. What if the data set contains continuous and categorical attributes? Can we still apply the standard association rule mining technique to such data sets? Examples of association rules containing a non-binary types of attributes include: Example 20 (Salary ∈ [40K, 70K]) and (M arried = N o) =⇒ (N umHouses = 1) or Example 21 (T emperature > 90 ◦ F ) and (P ressure > 2.0 atm) =⇒ (V alve = Closed) 4.9.2 Continuous and Categorical Attributes 168 Srikant et al. [49] have presented an approach to generalize the notion of association rules to include both continuous and categorical attributes. Categorical attributes are handled by introducing as many new fields as the number of attribute values. For example, if the attribute V alve has three possible values, say Closed, Released, and Broken, the attribute will be replaced by three binary attributes, V alve = Closed, V alve = Released, and V alve = Broken in the transaction database. A straightforward way to handle continuous attributes would be to partition their range of values into discrete intervals. Some of the typical discretization strategies one could apply include equal width partitioning, where the range is divided into equal-sized partitions, and a percentile-based partitioning, where each interval contains the same number of points. There are other discretization methods that have been proposed. For example, Zhang et al. [56] suggested the use of clustering techniques to improve the quality of partitioning quantitative attributes. Definition 7 A quantitative association rule is an implication of the form X =⇒ Y , where X ⊂ IR , Y ⊂ IR and X ∩ Y = ∅. The problem of mining quantitative association rules is to discover all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence. How does discretization affects the quality of an association rule? Srikant et al. [49] described two major difficulties of mining association rules for continuous attributes: 1. The support of each newly created item can be low if the number of discretized intervals are large. In other words, if the granularity is too fine, we may miss out on this attribute. 2. If the granularity is too coarse, some rules may fail the minimum confidence threshold. This is also known as the information loss problem [49]. For instance, even though the rule Salary ∈ [0k, 15k] −→ OwnCar = N o may have a 95% confidence, the rule Salary ∈ [0k, 100k] −→ OwnCar = N o may have a much lower confidence. To alleviate the first problem, adjacent intervals are merged together to obtain coarser intervals. However, this approach could still suffer from the information loss problem. Srikant et al. [49] defined a concept called partial completeness measure to determine the amount of information loss that arise due to discretization. This measure is useful because it can help us to determine the right number of intervals needed for a given upper bound on the information loss. Once the range of a continuous attribute has been partitioned, the algorithm can proceed to generate frequent itemsets in the same manner as Apriori. One potential problem with merging adjacent intervals is that some of the discovered rules could be redundant. For example, the rules R1 : Salary ∈ [30k, 50k] −→ OwnCar = N o and R2 : Salary ∈ [20k, 50k] −→ OwnCar = N o may have approximately the same confidence even though the support of R2 is larger. In this case, R1 is a redundant rule and should be eliminated. In [49], a rule r is pruned if 4.9.3 Item Taxonomy Food 169 Electronics Bread Computers Milk Wheat White Skim Foremost 2% Desktop Kemps Home Laptop Accessory Printer TV DVD Scanner Figure 4.21. Example of an item taxonomy.. its support or confidence is within ρ times the expected support or confidence of r, where ρ is the minimum interest threshold. The expected support and confidence for r can be computed based on the support and confidence of its closest ancestor rule r 0 , which is a generalized version of r with a wider discretized interval. For example, suppose r1 : AB −→ C and r2 : A0 B 0 −→ C 0 is its closest ancestor rule. The expected support for r1 is computed as: E(s(r1 )) = s(r2 ) × s(A) s(B) s(C) × × 0 0 s(A ) s(B ) s(C 0 ) and its expected confidence is: E(c(r1 )) = c(r2 ) × s(B) s(C) s(A) × × 0 0 s(A ) s(B ) s(C 0 ) . 4.9.3 Item Taxonomy There are several reasons why it is useful to incorporate concept hierarchies such as item taxonomy into the association rule discovery problem: 1. Users may be interested in finding rules that span over multiple levels of the taxonomy. 2. Rules at lower levels of the item taxonomy may not have sufficient support. 3. Item taxonomy can be used to summarize the discovered rules. Note that one cannot directly infer rules involving items at higher levels of the taxonomy based on the rules discovered for lower level items. Let us investigate 4.9.3 Item Taxonomy 170 how the support and confidence measures change as we traverse the item taxonomy level-wise, in a top-down or bottom-up fashion. Let G denotes a directed acyclic graph representing the set of item taxonomies (Figure 4.21). If there is an edge in G from p to c, we call p a parent of c and c a child of p. For example, milk is the parent of “skim milk” because there is a directed edge from the node milk to skim milk in G. Furthermore, we say X̂ is an ancestor of X if there is a path from X̂ to X in G. The properties of support and confidence at different levels of the taxonomy are as follows : • If X̂ is the parent for both X1 and X2 , then σ(X) may not be equal to σ(X1 ) + σ(X2 ). • If σ(X ∪ Y ) ≥ s, then σ(X̂ ∪ Y ) ≥ s, σ(X ∪ Ŷ ) ≥ s, and σ(X̂ ∪ Ŷ ) ≥ s. In other words, support always increases as you go up the item taxonomy. • If conf (X =⇒ Y ) > α, then only conf (X =⇒ Ŷ ) is guaranteed to be larger than α. The same cannot be said about the confidence of the rules X̂ =⇒ Y and X̂ =⇒ Ŷ . There are several difficulties in applying the standard Apriori-like technique directly to this problem : 1. Items at high levels of the taxonomy would naturally have larger support counts compared to those at lower levels. Therefore, it may not be sufficient to apply only a single minimum support threshold for all items in the taxonomy. However, as noted in the earlier section, using multiple minimum support thresholds may lose the anti-monotonicity property of the support measure. 2. By introducing item taxonomy, we have increased the dimensionality of the feature space. This will increase the complexity of the problem as well as the number of rules that are being generated. The latter problem must be addressed for efficiency reasons since many of these rules are often redundant, e.g., the rule 2% Kemps Milk =⇒ Food. Han et al. [18] and Srikant et al. [48] have independently proposed two different algorithms for mining association rules with item taxonomy. In [18], the algorithm would progressively discover itemsets at different levels of the hierarchy using a topdown, level-wise approach. Initially, all frequent itemsets of size-1 at the top level of the taxonomy are generated. These itemsets will be denoted as L(1, 1). Next, using L(1, 1), the algorithm would derive all the frequent itemsets of size 2 at level 1 (L(1, 2)). The algorithm for generating these itemsets is basically the same as Apriori. This step will be repeated until all the frequent itemsets at the top level items, L(1, k) (k > 1), are found. The algorithm then moved on to the next level of the taxonomy. Only the descendents of large itemsets in L(1, 1) will be considered as candidates for L(2, 1). The algorithm then proceeds to generate L(2, 2), L(2, 3) · · · sequentially until all level-2 large itemsets are discovered. Subsequently, it will progress to the next level and so on, until it terminates at the deepest level requested 4.10 Generalization of Association Patterns 171 by the user. [18] used a special encoding scheme to represent the transactions. Basically, each item in the database is represented by a unique identifier, based on the overall item taxonomy. For example, the item 2% F oremost milk can be encoded as 1322 where the leftmost digit ’1’ represents the food category; followed by the digit ’3’ representing the product type, milk; and the digit ’2’ representing the product content i.e. ’2% milk’ and finally, the digit ’2’ for the product brand, ’Foremost’. Using this method of encoding, all level-1 items are represented as ’d***’, level-2 items represented as ’dd**’, and so on. Furthermore, each time L(k, 1) is generated, the transaction database is filtered to remove all level-k items that are infrequent and all transactions that contain only infrequent items. This is possible because the support for lower-level items is always smaller than their ancestors. Nevertheless. the algorithm proposed in [18] would discover only rules involving items that reside at the same level of the hierarchy. Furthermore, the algorithm becomes incomplete, i.e., it may miss some frequent itemsets, if multiple minimum support thresholds are used. Srikant et al. [48] introduced a more general framework for mining association rules with item taxonomy. In this framework, association rules may involve items from any levels of the taxonomy. Their algorithm works in the following way: initially, each transaction t is replaced by an extended transaction t0 which contains all the items in t as well as the ancestors of the items in t. For example, the transaction t = {scanner, wheat bread} will be extended to t0 = {scanner, wheat bread, accessories, computers, electronics, bread, food} where accessories, computers, and electronics are the ancestors of scanner while bread and food are the ancestors of wheat bread. We then apply the standard Apriori algorithm on the extended database to obtain rules that span over multiple levels of the taxonomy. However, the performance of this algorithm can be rather slow due to the additional number of features. Srikant et al. [48] have introduced various optimization methods to improve the performance of the algorithm by removing some of the redundant candidate itemsets. For example, we can prune away any itemsets containing an item, X, along with its ancestor, X̂. This is because the support of the itemset remains unchanged if we remove X̂ from the itemset. 4.10 Generalization of Association Patterns 4.10.1 Closed Itemsets The concept of closed itemsets was proposed independently by Zaki et al. [54] and Pasquier et al [38]. The idea is to find a minimal generating set called closed itemsets from which all association rules and frequent itemsets can be inferred from them. To illustrate the advantages of using closed itemsets, consider the database shown in Table 4.13. The database contains 10 transactions and 30 items. The items can be divided into three different groups: (A1 , A2 , · · · , A10 ), (B1 , B2 , · · · , B10 ), and (C1 , C2 , · · · , C10 ). Items in each group are perfectly associated to other items in the same group, but are not associated with items from a different group. 4.10.2 Dependence Rules 172 If the minimum support thresholdPis less than 30%, a standard association rule 10 mining algorithm would generate 3× 10 frequent itemsets. In contrast, there k=1 k are only 3 frequent closed itemsets generated: {A1 , A2 , · · · , A10 }, {B1 , B2 , · · · , B10 }, and {C1 , C2 , · · · , C10 }. Table 4.13. A transaction database for mining closed itemsets. TID 1 2 3 4 5 6 7 8 9 10 A1 1 1 1 0 0 0 0 0 0 0 A2 1 1 1 0 0 0 0 0 0 0 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· A10 1 1 1 0 0 0 0 0 0 0 B1 0 0 0 1 1 1 0 0 0 0 B2 0 0 0 1 1 1 0 0 0 0 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· B10 0 0 0 1 1 1 0 0 0 0 C1 0 0 0 0 0 0 1 1 1 1 C2 0 0 0 0 0 0 1 1 1 1 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· C10 0 0 0 0 0 0 1 1 1 1 The definition of a closed itemset is given below. Definition 8 An itemset X is a closed itemset if there exists no itemset X 0 such that (1) X 0 is a proper superset of X, and (2) all the transactions containing X also contains X 0 . In other words, an itemset is closed if none of its supersets have the same support count as the itemset itself. Furthermore, if a closed itemset is frequent, it is called a frequent closed itemset. Figure 4.22 illustrates an example of frequent closed itemsets derived for the transaction database depicted in the same figure. The minimum support count is assumed to be two, i.e., all itemsets whose support is greater than two are denoted as frequent. Each node in the graph is associated with a list of transaction ids. For example, the node BC belongs to transactions 1, 2, and 3. Thus, it is a frequent itemset. BC is also closed because none of its supersets, ABC, BCD, and BCE, have the same support count as BC. On the other hand, its subset B is not closed because BC has the same support count as B. Finally, it is important to note that all maximal frequent itemsets are contained within the set of frequent closed itemsets, but not vice-versa. 4.10.2 Dependence Rules Association rules, as we learnt so far, use support and confidence measures to quantify and filter the relationships. A rule of the form X =⇒ y is interesting if it provides us two pieces of information: a) X and y occur together in sufficiently large number of transactions (support), so that their co-occurrence can be considered statistically significant, and b) whenever X is present in a transaction, y is also present 4.10.2 Dependence Rules TID 173 Items 1 ABC 2 ABCD 3 BCE 4 ACDE 5 DE 12 ABC 124 2 123 A 124 AB 12 null 24 AC AD ABD ABE AE ABCD ACD ABCE 245 C 123 4 24 2 1234 B 2 BC 4 3 BD ACE 345 D 4 BE 2 ADE ABDE BCD 4 ACDE E 24 3 CD BCE 34 CE BDE 45 4 DE CDE BCDE ABCDE Figure 4.22. Example of frequent closed itemsets with minimum support count equals to 2. in it with high probability (confidence). Confidence of this rule can be simply looked at as a conditional probability of presence of y given presence of X. Now, consider an example to critically evaluate the quality of this information. In a computer store transaction data, out of 200 buyers of operating systems, 160 bought Windows operating system and 100 customers bought Linux operating system. Of the Linux buyers, 60 also bought the Windows operating system. Now a rule Linux −→ W indows has sufficiently large support (30%) and also sufficiently high confidence (60%). So, looking at this rule, one might conclude that whenever Linux is sold, Windows is also sold with 60% chance. Now, look at it from a different angle. If we didn’t know anything about a customer, we know that she will buy Windows with 80% probability. However, the moment we know that a customer has bought Linux, her probability of purchasing Windows has gone down from 80% to 60%. This implies that although the rule Linux −→ W indows has sufficient confidence, it is misleading. Here is the reason. Let us assume that the store-owner decides to the push the Windows sales up by 200 by sending ads to customers. If he were to follow just the prior probability of Windows, then he would need to mail ads to 200/0.8 = 250 customers. Whereas, if he were to base his decision on the Linux −→ W indows rule, then he would need to increase the Linux sale by 200/0.6 = 333, by identifying and attracting 333 more Linux customers. Moreover, following the first decision option, the store-owner needs to identify profile of his store’s generic customer, whereas for second option, he has to identify profiles of Linux customers, which may be more difficult. On the other hand, if the confidence of the rule Linux −→ W indows was more than the prior probability of Windows, then following this rule would have made more sense, rather than merely relying on the prior probability. 4.10.3 Negative Associations TID Items TID A A B B C C D D 1 {A,B} 1 1 0 1 0 0 1 0 1 2 {A,B,C} 2 1 0 1 0 1 0 0 1 3 {C} 3 0 1 0 1 1 0 0 1 4 {B,C} 4 0 1 1 0 1 0 0 1 5 {B,D} 5 0 1 1 0 0 1 1 0 Original Transactions 174 Transactions with Negative Items Figure 4.23. A naive approach for mining negative associations. This example stresses the need to get more information than mere conditional probability (confidence) of a rule. The impact of conditioning a presence of some item on presence of others can be judged by looking at whether the conditional probability is higher than or lower than or same as the prior probability of that item. This notion is precisely captured by statistical dependence. In statistics, a set of events {x,y} is said to be independent, if their joint probability is the product of their individual probabilities; i.e., P (x, y) = P (x)P (y). In the example above, the events were Product Sold = Linux or Product sold = Windows. If the events are not independent, we can look at the ratio P (x, y)/P (x)P (y) If the ratio is more than 1, then events are said to have positive dependence, which means that conditioning the presence of any of the events (x or y), on the presence of other, causes increase in the prior probability of occurrence of that event. This can be seen by rewriting the equation P (x, y)/P (x)P (y) > 1 as P (x, y)/P (x) > P (y). The left hand side of the last equation is the confidence (P (ykx)) of the rule x → y rule, which conditions the presence of y on presence of x. The right hand side is the prior probabiity of y. Similarly, negative dependence means that P (x, y)/P (x)P (y) < 1, and can be interpreted as decrease in prior probability of either of the events after conditioning its presence on the other’s presence. Thus, statistical dependence yields a possibly more useful information between two events, rather than merely looking at the confidence of a rule. Essentially, statistical dependence combines the prior probabilities and conditional probabilities into a single measure. 4.10.3 Negative Associations This section describes the different techniques for mining negative associations in data. A naive approach is to augment the original transaction data with new variables representing the absence of each item from a transaction, as shown in Figure 4.23. This allows existing algorithms such as Apriori to be applied on the modified data set. However, such a simple approach is computationally intractable for sparse trans- 4.10.3 Negative Associations 175 action data due to the following reasons: 1. Suppose there are d items in the database. The number of possible itemsets, which is equal to 2d , can be exponentially large even for moderate values of d. In fact, many of the positive itemsets do not appear in the transaction data set even though the data set contains millions of transactions. While many of the positive item combinations are absent, the number of negative itemsets could be exponentially large. 2. Transaction width increases when negative items are introduced into the data set. Suppose there are d items available, and the transaction width in the original data set is equal to k d. As a result, the maximum size of frequent (positive) itemsets one can obtain is k. When negative items are included, the width of the transactions increases to d because an item is either present or absent from a transaction, but not both, and the maximum size of frequent itemsets one can obtain is also equal to d. Since the complexity of many frequent itemset generation algorithms depend on the maximum size of frequent itemsets, these algorithms would break down for such data sets. 3. One may argue that the complexity can be improved if we impose a higher minimum support threshold, tf , on the modified data set. However, we are mostly interested in itemsets that contain mixtures of positive and negative items. If tf is set too high, we could end up finding itemsets involving negative items only. Therefore, in order to extract itemsets that contain positive and negative items, the threshold tf must be reasonably low. In [8], Brin et al. suggested using χ2 test to discover interesting positive and negative associations. Applying the χ2 test to frequent itemsets can be computationally expensive. Although the χ2 statistic always increases with the size of an itemset, the statistical significance or p-value of the test statistic is not a monotonic function since the number of degrees of freedom also increases with the size of an itemset.3 A recent study by Boulicaut et al. [7] introduced several alternative solutions to the negative association problem. First it uses a variant of the inclusion-exclusion principle from Boolean algebra to determine the support of negative itemsets or mixed itemsets. For example, the support of a mixed itemset {A, B} is given by: P (A ∪ B) = P (A) + n X X i=1 C⊂B,|C|=i (−1)i × P (A ∪ C) (4.4) Equation 4.4 is useful only if the support of A with subsets of B are known. This requires that the minimum support threshold must be low enough so that A ∪ B is frequent. Otherwise, one must rely on an approximate technique where the support of B and subsets of B are ignored when using Equation 4.4. 3 The degree of freedom for a k-itemset is 2k − k − 1 [15]. 4.10.3 Negative Associations A F C B D 176 G E J H K Suppose {C,G} is a frequent itemset. The expected support for itemsets related to C and G can be computed using the following equations: Figure 4.24. Savasere’s approach for mining negative association. In [44], Savasere et al. proposed using domain knowledge, in the form of item taxonomy, to decide what constitutes an interesting negative association. Their basic assumption was that items from the same product family are expected to have similar types of interaction with other items. For example, since Coke and Pepsi belong to the same product category, we expect the association between Coke and Ruffles to be somewhat similar to the association between Pepsi and Ruffles. Specifically, Savasere’s approach uses the item taxonomy to determine the expected support of an itemset (Figure 4.24). For example, if Coke and Ruffles are frequent, then we can compute the expected support between Pepsi and Ruffles using the second equation shown in Figure 4.24. If the actual support for Pepsi and Ruffles is considerably lower than their expected value, then Savasere’s approach suggests that there is an interesting negative association between Pepsi and Ruffles. Although this technique is intuitively appealing, it has several limitations. First, it assumes that an item taxonomy is available, which makes it difficult to apply the technique beyond the market-basket domain. Second, it discovers negative associations by computing the expected support of itemsets using the immediate parent-child or sibling relationships in the item taxonomy. It does not infer the expected support for itemsets that are not related by immediate parent-child or sibling relationships. In summary, mining negative associations is a challenging problem especially for sparse transaction data. The existing techniques for mining negative associations are either too expensive, produce too many uninteresting patterns, or require the use of domain knowledge to extract interesting patterns. 4.10.4 Indirect Associations 177 Y y1 y2 . . . a b yk Figure 4.25. Indirect association between a pair of items. 4.10.4 Indirect Associations Association rule mining would discover rules involving items that occur frequently together in the transaction database. Any itemsets that fail the support threshold criteria will be automatically removed. This action is justifiable since we are only interested in finding items that are directly associated with each other. However, under certain situations, it is possible that some of the lowly-supported itemsets can be interesting due to the presence of higher-order dependencies4 among the items. Consider a pair of items, (a, b), having a low joint support value (Figure 4.25). Let Y be a set that contains items that are highly dependent on both a and b. This figure illustrates an indirect association between a and b via a mediator set, Y . Unlike direct association, an indirect relationship is characterized not only by the two participants, a and b, but also by the items mediating this interaction. One can think of many potential applications for indirect association. In the market basket scenario, a and b may represent competing products. Analysts may be interested in finding what are the common products (Y ) bought together by customers who purchased the two competing products. With this information, they can then performed an in-depth analysis on why customers prefer one brand over the other based on the strength of dependencies between Y with a and b. Another potential application for indirect association is in textual data mining. For text documents, indirectly associated words may represent synonyms and antonyms or terms that appear in different contexts of another word. For example, in a collection of news articles, the words growth and decline may be indirectly associated via the word economy. Another example is the pair soviet and worker may refer to the different contexts in which the word union is being used. For stock market data, a and b may refer to disjoint events that will partition event Y . For example, the event Y = M icrosof t−U p can be partitioned into a = Redhat−Down and b = Intel−U p 4 direct association between items are considered to be first-order dependencies. 4.10.5 Frequent Subgraphs 178 where a denotes the event where the competitor’s stock value is down whereas b may represent the event in which most of the technology stocks for large corporations are up. [50] proposed a general framework for defining indirect association in various application domains. Their formal definition of indirect association is : Definition 9 The pair a, b is indirectly associated via a mediator Y if the following conditions hold : 1. σ(a, b) < ts (Itempair Support condition). 2. ∃Y 6= ∅ such that ∀yi ∈ Y : (a) sup(a, yi ) ≥ tf , sup(b, yi ) ≥ tf (Mediator Support condition). (b) d(a, yi ) ≥ td , d(b, yi ) ≥ td where d(p, q) is a measure of the dependence between p and q (Dependence condition). In order to discover indirectly associated items, one must address the following questions : 1. What is a good measure of dependence between items? 2. What constitutes the elements of the mediator? 3. How to define the overall indirect association measure between a and b? The first question is related to the issue of finding good interestingness measures for itemsets (or association rules) which was described in Section 4.8. The second question deals with the problem of defining mediating elements either as single items, single itemsets or multiple itemsets. The last question allows one to rank the discovered itempairs according to the strength of their indirect association. [50] described several ways to tackle these issues and presented a simple algorithm for mining indirect itempairs. 4.10.5 Frequent Subgraphs As the field of association rule mining continues to mature, there is an increasing need to apply similar techniques to other disciplines, such as Web Mining, bioinformatics, and computational chemistry. The data for these domains is often available in a graph-based format, as described in Chapter 2. For example, in the Web Mining domain, the vertices would correspond to Web pages and the the edges would correspond to hyperlinks traversed by the Web users. In the computational chemistry domain, the vertices would correspond to the atoms or ions while the edges would correspond to their chemical bonds. Mining associations from these kinds of data is more challenging due to the following reasons: 4.10.5 Frequent Subgraphs 179 1. Support and confidence thresholds are not the only requirements the patterns must satisfy. The structure of the data may restrict the type of patterns that can be generated by the mining algorithm. Specifically, additional constraints can be imposed regarding the connectivity of the subgraph, i.e., the frequent subgraphs must be connected graphs. 2. In the standard market-basket analysis, each entity (item or discretized attribute) is treated as a binary variable. (Although the quantity of an item bought in a transaction can be larger than one, it is often recorded as 1 in the record-based format.) For graph-based data, an entity (vertex, event, or item) can appear more than once within the same data object. These entities cannot be combined into a single entity without losing the structure information. The multiplicity of these entities poses additional challenges to the mining problem. As a result, the standard Apriori or FP-tree algorithms must be modified in order to mine frequent substructures efficiently from a collection of tree or graph objects. In this section, we would briefly describe what are the modifications needed and how they can be implemented effectively. Problem Formulation Let V = {v1 , v2 , · · · , vk } be the set of vertices and E = {eij = (vi , vj )|∀vi , vj ∈ V } be the set of edges connecting pairs of vertices. Each vertex v is associated with a vertex label denoted as l(v). We will use the notation L(V ) = {l(v)|∀v ∈ V } to represent the set of all vertex labels in the data set. Similarly, each edge e is associated with an edge label denoted as l(e). We will use the notation L(E) = {l(e)|∀e ∈ E} to represent the set of all edge labels in the data set. A labeled graph G is defined as a collection of vertices and edges with labels attached to each of them, i.e., G = (V, E, L(V ), L(E)). A graph G0 = (V 0 , E 0 , L(V 0 ), L(E 0 )) is a subgraph of G if its vertices V 0 and edges E 0 are subsets of V and E, respectively. G0 is called an induced subgraph of G if it satisfies the subgraph requirements along with the additional condition that all the edges connecting the vertices V 0 in G are also present in E 0 . Figure 4.26(a) illustrates an example of a labeled graph G that contains 6 vertices and 11 edges, with vertex labels obtained from the set L(V ) = {a, b, c} and edge labels obtained from the set L(E) = {p, q, r, s, t}. A subgraph of G that contains only 4 vertices and 4 edges is shown in Figure 4.26(b). It is not an induced subgraph because it does not include all the edges in G involving the subgraph vertices. Instead, the proper induced subgraph of G for these vertices is shown in Figure 4.26(c). There are many data sets that can be modelled by using the graph representation. For instance, a market-basket transaction can be represented as a graph, with vertices that correspond to each item and edges that correspond to the relationships between items. Furthermore, each transaction is essentially a clique because every item is associated with every other item in the same transaction. The problem of 4.10.5 Frequent Subgraphs a a q p b a s r t p r a s r t c p p a s t c a p p 180 t c q p b p b b (b) Subgraph (a) Labeled Graph r c (c) Induced Subgraph Figure 4.26. Graph, subgraph, and induced subgraph definitions. mining frequent itemsets is equivalent to finding frequent subgraphs among the collection of cliques. Although the graph representation is more expressive, it is much easier to work with the standard market-basket representation because we do not have to worry about the structural constraints. If the graphs are not cliques, we can still map the problem of finding frequent subgraphs into the standard market-basket representation as long as all the edge and vertex labels within a graph are unique. In this case, each graph can be regarded as a transaction and each triplet of vertex and edge labels t = (l(vi ), l(vj ), l(eij )) corresponds to an item, as illustrated in Figure 4.27. Unfortunately, most of the real applications do not contain graphs with unique labels, thus requiring more advanced methods to obtain the frequent subgraphs. a p e q a e r d r b p c (a,b,p) 1 1 0 … p b (a,b,r) 0 0 1 … q d r a G2 (a,b,q) 0 0 0 … c r d r G1 G1 G2 G3 … q p b (b,c,p) 0 0 1 … G3 (b,c,q) 0 0 0 … (b,c,r) 1 0 0 … … … … … … (d,e,r) 0 0 0 … Figure 4.27. Mapping a collection of graph structures into market-basket transactions. 4.10.5 Frequent Subgraphs 181 Given a set of graphs G, the support of a subgraph g is defined as: s(g) = |{Gi |g ⊆s Gi ∧ Gi ∈ G}| |G| (4.5) where ⊆s denotes the subgraph relation. This definition of the support measure also satisfies the anti-monotonocity property. The goal of frequent subgraph mining is to find all subgraphs whose support is greater than some minimum support threshold, minsup. In some cases, we could also be interested in finding association rules for the frequent subgraphs. The confidence of a graph association rule g1 −→ g2 can be defined as: |{G|g1 ∪ g2 ⊆s G ∧ G ∈ G} c(g1 −→ g2 ) = (4.6) |{G0 |g1 ⊆s G0 ∧ G0 ∈ G}| where g1 ∪ g2 denotes any graph containing both g1 and g2 as its subgraphs. Mining Frequent Subgraphs The idea of using frequent pattern discovery algorithms such as Apriori for mining frequent subgraphs was initially proposed by Inokuchi et al. in [22]. There are several issues one has to consider when mining frequent subgraphs: Subgraph growing: Most of the standard frequent itemset generation algorithms work in a bottom-up fashion, where the frequent itemsets of size k are used to enumerate the frequent itemsets of size k + 1. During frequent itemset generation step, the itemset lattice (search space) can be traversed in a breadth-first or depth-first manner. The same approach can also be applied to the frequent subgraph generation problem. However, unlike frequent itemset mining, k may refer to the number of vertices or edges in the subgraphs. If the algorithm enumerates frequent subgraphs one-vertex-at-a-time, the approach is called vertex growing. On the other hand, if the algorithm enumerates the frequent subgraphs one-edge-at-a-time, the approach is called edge growing: 1. Vertex growing has a meaningful interpretation because each iteration is equivalent to increasing the dimension of its corresponding k × k adjacency matrix by a factor of one, as illustrated in Figure 4.28. G1 and G2 are two graphs whose adjacency matrices are given by M (G1) and M (G2), respectively. This approach was used by the AGM algorithm proposed by Inokuchi et al. in [22]. As the subgraphs are grown one vertex at a time, the algorithm would find only the frequent induced subgraphs. The drawback of this approach is that it can be computationally expensive due to the large number of candidate subgraphs enumerated by the algorithm. For example, in Figure 4.28, when the graphs G1 and G2 are joined together, the the resulting graph G3 will have a new edge connecting between the vertices d and e. However, the label of the new edge can be arbitrary, which increases the number of candidate subgraphs dramatically. 4.10.5 Frequent Subgraphs a e p p p a a a q + r a r r q e p p p ? a d 182 r r d a a a G1 G2 G3 = join(G1,G2) Figure 4.28. Vertex growing approach. 2. Edge growing was initially proposed by Kuramochi et al [28] to enable a more efficient candidate generation strategy. At each iteration, a new edge is added to the subgraph, which may or may not increase the number of vertices in the initial subgraph, as illustrated in Figure 4.29. However, the number of candidate subgraphs generated using this approach is much smaller than the number of candidate subgraphs generated using the vertex-growing approach. Kuramochi et al. [28] have also used the more efficient sparse graph representation instead of the adjacency matrix representation. In addition, edge-growing approach tend to produce more general patterns because it can discover both frequent subgraphs and frequent induced subgraphs. a q e p a e p p a r a a q p + p r r p a r r e a a a G1 G2 e G3 = join(G1,G2) a q e p p a r r a G4 = join(G1,G2) Figure 4.29. Edge growing approach. 4.10.5 Frequent Subgraphs 183 B A A B B A B A B B A A A B A B Figure 4.30. Graph Isomorphism Graph isomorphism: A key task in frequent itemset discovery is to determine whether an itemset is a subset of a transaction. Likewise, in frequent subgraph mining, an important task is to determine whether a graph is a subgraph of another graph. This requires a graph matching routine that can determine whether a subgraph is topologically identical (isomorphic) to another graph. For example, Figure 4.30 illustrates two graphs that are isomorphic to each other because there is a one-to-one mapping between each vertex in the first graph to another vertex in the second graph that preserves the connections between the vertices. Graph isomorphism is a complicated problem especially when many of the candidate subgraphs are topologically equivalent. If two graphs are isomorphic to each other, their adjacency matrices are simply row and/or column permutations of the other. If the labels on the vertices are unique, the graph isomorphism problem can be solved in a polynomial time. However, if the labels of the edges and vertices are not unique, the complexity of the graph isomorphism problem is yet to be determined (i.e., it is still an open problem whether the graph isomorphism problem is NP-hard). In Apriori-like algorithms such as AGM [22] and FSM [28], graph isomorphism presents the biggest challenge to frequent subgraph discovery because it is needed for both candidate generation and candidate counting steps: 1. It is needed during the join operation to check whether a candidate subgraph has already been generated. 2. It is needed during candidate pruning to check whether its k −1 subgraph matches one of the previously discovered frequent subgraph. 3. It is needed during candidate counting to check whether a candidate subgraph is contained within another graph. 4.10.5 Frequent Subgraphs A(1) A(2) B (5) B (6) B (7) B (8) A(3) A(4) A(2) A(1) B (7) B (6) B (5) B (8) A(3) A(4) A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 184 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 0 1 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 Code = 1100111000010010010100001011 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 Code = 1011010010100000100110001110 Figure 4.31. Adjacency matrix and string encoding of a graph. Most of the frequent subgraph discovery algorithms would handle the graph isomorphism problem by using a technique known as canonical labeling, where each graph or subgraph is mapped into an ordered string representation (otherwise known as its code), in such a way that two isomorphic graphs would be mapped into the same canonical encoding. If a graph can have more than one representation, its canonical encoding is usually chosen to be the lowest precedence code. For example, Figure 4.31 illustrates two different ways to represent the same graph. (The graph has more than one representation because the vertices are not unique.) The code for each graph is obtained by doing a column-wise concatenation of the elements in the upper triangular portion of the adjacency matrix. The canonical labeling of the graph is obtained by selecting the code that has the lowest precedence, e.g., the code at the bottom of Figure 4.31 has a lower precedence than the code at the top of the figure. Multiplicity of Candidate Joining: In Apriori-like algorithms, candidate patterns are generated by joining together frequent patterns found in the earlier passes of the algorithms. For example, in frequent itemset discovery, a candidate k-itemset is generated by joining two frequent itemsets of size k − 1 that contains k − 2 items in common. Similarly, in frequent subgraph mining, a candidate k-subgraph is generated by joining together two frequent subgraphs of size k − 1 (where k may correspond to the number of vertices or edges of the graph). New candidates are created as long as both k − 1 subgraphs contain a common k − 2 subgraph, known as its core. However, unlike frequent itemset discovery, each join operation in frequent subgraph mining may produce more than one candidate subgraph, as illustrated in Figures 4.28 and 4.29. For the 4.10.5 Frequent Subgraphs 185 edge-growing approach, Kuramochi et al [28] three different scenarios in which multiple candidates can be generated: 1. When joining two subgraphs with identical vertex labels, as illustrated in Figure 4.32(a). Multiple candidates are generated because there are more than one way to arrange the edges and vertices of the combined subgraph. 2. When joining two subgraphs whose core contains identical vertex labels, as illustrated in Figure 4.32(b). In this example, the core contains vertices whose labels are equal to “a”. 3. When the joined subgraphs contain more than one core, as illustrated in Figure 4.32(c). The shaded vertices in this figure correspond to vertices that form the core of the join operation. As a result, the number of candidate subgraphs generated can be prohibitively large. Numerous algorithms have been proposed to reduce the number of candidate subgraphs generated by the frequent subgraph mining algorithms. For example, the FSG algorithm [28] uses various vertex invariants, which are structural constraints imposed on the candidate subgraphs, to restrict the number of admissible candidates. A more recent algorithm, called g-span [52], was proposed by Yan et al to generate frequent subgraphs without enumerating any infrequent candidates. The algorithm uses a canonical labeling scheme called minimum DFS code to represent the subgraphs. By encoding the graphs according to their minimum DFS codes, the authors showed that the frequent subgraph mining problem can be turned into a sequential pattern mining problem, for which an efficient algorithm can be developed that does not require candidate generation [39]. 4.10.5 Frequent Subgraphs a a a + e b c e b c e b a c e e b c (a) b c a a a a a b a a a a + c a a b a a a c a a a c a a a b (b) a b a b a a a a a b a a b a a b a a + a b a a a b a a (c) Figure 4.32. Multiplicity of Candidate Joining. 186 BIBLIOGRAPHY 189 Bibliography [1] R. C. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (Accepted for Publication) 2000. [2] C. Aggarwal and P. Yu. A new framework for itemset generation. In Proc. of the 17th Symposium on Principles of Database Systems, pages 18–24, Seattle, WA, June 1998. [3] R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5:914– 925, 1993. [4] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Intl. Conf. Management of Data, pages 207–216, Washington D.C., USA, 1993. [5] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487–499, Santiago, Chile, 1994. [6] A. Amir, R. Feldman, and R. Kashi. A new and versatile method for association generation. In H. J. Komorowski and J. M. Zytkow, editors, Proc of Principles of Data Mining and Knowledge Discovery, First European Symposium (PKDD’97), volume 1263, pages 221–231. Springer-Verlag, Trondheim, Norway, June 1997. [7] J.-F. Boulicaut, A. Bykowski, and B. Jeudy. Towards the tractable discovery of association rules with negations. In Proc of the Fourth Int’l Conf on Flexible Query Answering Systems FQAS’00, pages 425–434, Warsaw, Poland, October 2000. [8] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proc. ACM SIGMOD Intl. Conf. Management of Data, pages 265–276, Tucson, AZ, 1997. BIBLIOGRAPHY 190 [9] S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, pages 255–264, Tucson, AZ, June 1997. [10] V. Cherkassky and F. Mulier. Learning From Data: Concepts, Theory, and Methods. Wiley Interscience, 1998. [11] P. Clark and T. Niblett. The cn2 induction algorithm. Machine Learning, 3(4):261–283, 1989. [12] R. Cooley, P. Tan, and J. Srivastava. Discovery of interesting usage patterns from web data. In M. Spiliopoulou and B. Masand, editors, Advances in Web Usage Analysis and User Profiling, volume 1836, pages 163–182. Lecture Notes in Computer Science, 2000. [13] P. Domingos. The role of occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3(4):409–425, 1999. [14] G. Dong and J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 72– 86, Melbourne, Australia, April 1998. [15] W. duMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In Proc. of the Seventh Int’l Conference on Knowledge Discovery and Data Mining, pages 67–76, San Francisco, CA, August 2001. [16] A. George and W. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall series in computational mathematics, 1981. [17] D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, 2001. [18] J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In VLDB95, pages 420–431, Zurich, Switzerland, 1995. [19] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’00), Dallas, TX, May 2000. [20] R. Hilderman and H. Hamilton. Evaluation of interestingness measures for ranking discovered knowledge. In Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’01), pages 247–259, Hong Kong, April 2001. [21] E. Hunt, J. Marin, and P. Stone. Experiments in Induction. Academic Press, New York, 1966. BIBLIOGRAPHY 191 [22] A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of The Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, Lyon, France, 2000. [23] D. Jensen and P. Cohen. Multiple comparisons in induction algorithms. Machine Learning, 38(3):309–338, March 2000. [24] G. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In Machine Learning: Proc. of the Eleventh International Conference, pages 121–129, San Francisco, CA, 1994. [25] M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of the Second Int’l Conference on Knowledge Discovery and Data Mining, pages 263–266, Portland, Oregon, 1996. [26] M. Klemettinen, H. Mannila, P. Ronkainen, T. Toivonen, and A. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proc. 3rd Int. Conf. Information and Knowledge Management, pages 401–408, Gaithersburg, Maryland, Nov 1994. [27] R. Kohavi. A study on cross-validation and bootstrap for accuracy estimation and model selection. In Proc of the Int’l Joint Conference on Artificial Intelligence (IJCAI’95), 1995. [28] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of the 2001 IEEE International Conference on Data Mining, pages 313–320, San Jose, CA, November 2001. [29] D. Lin and Z. Kedem. Pincer-search: An efficient algorithm for discovering the maximum frequent set. IEEE Transactions on Knowledge and Data Engineering, 14(3):553–566, 2002. [30] B. Liu, W. Hsu, and S. Chen. Using general impressions to analyze discovered classification rules. In Proc. of the Third Int’l Conference on Knowledge Discovery and Data Mining, pages 31–36, 1997. [31] B. Liu, W. Hsu, and Y. Ma. Mining association rules with multiple minimum supports. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 125–134, 1999. [32] B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 125–134, San Diego, CA, 1999. [33] J. Major and J. Mangano. Selecting among rules induced from a hurricane database. In Proceedings of AAAI Workshop on Knowledge Discovery in Databases, pages 30–31, 1993. BIBLIOGRAPHY 192 [34] T. Mitchell. Machine Learning. McGraw-Hill, 1997. [35] F. Mosteller. Association and estimation in contingency tables. Journal of the American Statistical Association, 63:1–28, 1968. [36] A. Mueller. Fast sequential and parallel algorithms for association rule mining: A comparison. Technical Report CS-TR-3515, University of Maryland, August 1995. [37] J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. SIGMOD Record, 25(2):175–186, 1995. [38] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Int’l Conf. on Database Theory, Jan 1999. [39] J. Pei, J. Han, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc of Int’l Conf. on Data Engineering (ICDE’01), Heidelberg, Germany, April 2001. [40] G. Piatetsky-Shapiro and C. Matheus. The interestingness of deviations. pages 25–36, July 1994. [41] G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. In G. Piatetsky-Shapiro and W. Frawley, editors, Knowledge Discovery in Databases, pages 229–248. MIT Press, Cambridge, MA, 1991. [42] J. Quinlan. C4.5: Programs for Machine Learning. Morgan-Kaufmann Publishers, 1993. [43] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. of the 21st Int. Conf. on Very Large Databases (VLDB‘95), Zurich, Switzerland, Sept 1995. [44] A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. In Proc. of the 14th International Conference on Data Engineering, pages 494–502, Orlando, Florida, February 1998. [45] P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah. Turbocharging vertical mining of large databases. In Proc. of 2000 ACM-SIGMOD Int. Conf. on Management of Data, Dallas, TX, May 2000. [46] A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. on Knowledge and Data Engineering, 8(6):970– 974, 1996. BIBLIOGRAPHY 193 [47] C. Silverstein, S. Brin, and R. Motwani. Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1):39–68, 1998. [48] R. Srikant and R. Agrawal. Mining generalized association rules. In 21rst Very Large Database Conference, pages 407–419, Zurich, Switzerland, 1995. [49] R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proc. of 1996 ACM-SIGMOD Int. Conf. on Management of Data, pages 1–12, Montreal, Canada, 1996. [50] P. Tan, V. Kumar, and J. Srivastava. Indirect association: Mining higher order dependencies in data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 632–637, Lyon, France, 2000. [51] H. Toivonen. Sampling large databases for association rules. In Proc. of the 22nd VLDB Conference, pages 134–145, Bombay, India, 1996. [52] X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proc of the IEEE Int’l Conf. on Data Mining (ICDM’02), Maebashi City, Japan, December 2002. [53] M. Zaki and K. Gouda. Fast vertical mining using diffsets. Technical Report TR-01-1, Department of Computer Science, Rensselaer Polytechnic Institute, 2001. [54] M. Zaki and M. Orihara. Theoretical foundations of association rules. In Proc. of 3rd SIGMOD’98 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’98), Seattle, WA, June 1998. [55] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of the Third Int’l Conference on Knowledge Discovery and Data Mining, 1997. [56] Z. Zhang, Y. Lu, and B. Zhang. An effective partioning-combining algorithm for discovering quantitative association rules. In Proc. of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1997.