Download 4.9 Practical Issues of Mining Association Rules

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
4.9 Practical Issues of Mining Association Rules
166
4.9 Practical Issues of Mining Association Rules
In this section, we will look at the different ways to address some of the practical
issues in mining association rules such as incorporating non-binary data or product
hierarchies and allowing the use of different minimum support thresholds.
4.9.1 Multiple Minimum Support
The association rule mining framework we have described so far in this section uses
only a single minimum support threshold. In practice, this may not be desirable
because each item has its own frequency of occurrence. If the minimum support
threshold is set too high, we may miss some interesting rules involving the rare
items in the data set. Conversely, if the minimum support threshold is set too
low, the problem becomes computationally intractable and tends to produce too
many uninteresting rules. This section describes the challenges encountered when
applying multiple minimum support thresholds to different itemsets. The discussion
presented here follows closely the method proposed by Liu et al. in [31].
First, we assume that each item i has its own minimum support threshold,
µi . For example, suppose µ(M ilk) = 5%, µ(Coke) = 3%, µ(Broccoli) = 0.1%,
and µ(Salmon) = 0.5%. In addition, the minimum support for any itemset X =
{x1 , x2 , · · · , xk } is given by min(µ(x1 ), µ(x2 ), · · · , µ(xk )). For example µ(M ilk, Broccoli)
= min(µ(M ilk), µ(Broccoli)) = 0.1%.
The problem of having multiple minimum support thresholds is that support is
no longer an anti-monotone measure. For example, suppose the support for milk and
Coke is 1.5% while the support for milk, Coke, and broccoli is 0.15%. In this case,
the itemset {milk, Coke} is infrequent whereas its superset {milk, Coke, broccoli}
is frequent, thus violating the anti-monotonicity property of the support measure.
Liu et al [31] proposed a method to resolve this problem by ordering the items,
in ascending order, according to their minimum support thresholds. Given the
thresholds defined above, the following item ordering can be adopted: Broccoli,
Salmon, Coke, Milk, as illustrated in Figure 4.20. In this example, broccoli has the
lowest minimum support threshold. Furthermore, given the support counts shown in
this figure, only broccoli and milk satisfy their minimum support thresholds. The set
of singleton items can be divided into three sets: (1) F , which contains the frequent
items broccoli and milk, (2) L, which contains the infrequent item Coke, and (3) N ,
which contains the infrequent item salmon. The difference between L and N is that
the set L can still be used to generate larger-sized candidate itemsets even though
its support count is below its minimum support threshold. This is because such
items could still be part of a larger frequent itemset. On the other hand, the set N
is guaranteed to be infrequent since its support is lower than the minimum support
of the least-ordered item, i.e., broccoli. Any itemsets involving items in N can be
pruned immediately. Furthermore, we do not have to generate candidate itemsets
that begin with items in C since they can only be extended with items that have
larger supports. For example, the itemset {Coke, milk} is not generated at all since
Coke is known to be infrequent and the support of milk is much higher than Coke.
4.9.2 Continuous and Categorical Attributes
Item
Minimum
Support
Salmon
0.50% 0.05%
(S)
Coke
(C)
Milk
(M)
BS
Support
Broccoli
0.10% 0.20%
(B)
167
B
BSC
BC
S
BM
BSM
C
SC
BCM
3.00% 2.50%
M
5%
10.00%
SM
SCM
CM
Figure 4.20. Frequent itemset generation with multiple minimum supports.
A modified version of the Apriori algorithm is proposed in [31] to incorporate
items that have multiple minimum support thresholds. In traditional Apriori, a
candidate (k + 1)-itemset is generated by merging two frequent itemsets of size k.
The candidate is then pruned if any one of its subsets is found to be infrequent. To
handle multiple minimum support thresholds, the pruning step must be modified.
Specifically, a candidate itemset is pruned only if at least one of its subsets that
contain the first item is infrequent. For example, suppose the candidate {broccoli,
Coke, milk} is generated by merging the frequent itemsets {broccoli, Coke} and
{broccoli, milk}. Even if the support of {Coke, milk} is lower that the minimum
support threshold of Coke, the candidate itemset is not pruned because the infrequent itemset {Coke, milk} does not contain the first item, broccoli. Thus, the
candidate itemset {broccoli, Coke, milk} could still be frequent at size three or
larger.
4.9.2 Continuous and Categorical Attributes
So far, we have described the problem of mining association rules in the context
of binary attributes. What if the data set contains continuous and categorical attributes? Can we still apply the standard association rule mining technique to such
data sets? Examples of association rules containing a non-binary types of attributes
include:
Example 20 (Salary ∈ [40K, 70K]) and (M arried = N o) =⇒ (N umHouses = 1)
or
Example 21 (T emperature > 90 ◦ F ) and (P ressure > 2.0 atm) =⇒ (V alve =
Closed)
4.9.2 Continuous and Categorical Attributes
168
Srikant et al. [49] have presented an approach to generalize the notion of association rules to include both continuous and categorical attributes. Categorical
attributes are handled by introducing as many new fields as the number of attribute
values. For example, if the attribute V alve has three possible values, say Closed,
Released, and Broken, the attribute will be replaced by three binary attributes,
V alve = Closed, V alve = Released, and V alve = Broken in the transaction
database. A straightforward way to handle continuous attributes would be to partition their range of values into discrete intervals. Some of the typical discretization
strategies one could apply include equal width partitioning, where the range is divided into equal-sized partitions, and a percentile-based partitioning, where each
interval contains the same number of points. There are other discretization methods that have been proposed. For example, Zhang et al. [56] suggested the use of
clustering techniques to improve the quality of partitioning quantitative attributes.
Definition 7 A quantitative association rule is an implication of the form X =⇒
Y , where X ⊂ IR , Y ⊂ IR and X ∩ Y = ∅. The problem of mining quantitative
association rules is to discover all association rules that have support and confidence
greater than the user-specified minimum support and minimum confidence.
How does discretization affects the quality of an association rule? Srikant et
al. [49] described two major difficulties of mining association rules for continuous
attributes:
1. The support of each newly created item can be low if the number of discretized
intervals are large. In other words, if the granularity is too fine, we may miss
out on this attribute.
2. If the granularity is too coarse, some rules may fail the minimum confidence
threshold. This is also known as the information loss problem [49]. For instance, even though the rule Salary ∈ [0k, 15k] −→ OwnCar = N o may have
a 95% confidence, the rule Salary ∈ [0k, 100k] −→ OwnCar = N o may have
a much lower confidence.
To alleviate the first problem, adjacent intervals are merged together to obtain
coarser intervals. However, this approach could still suffer from the information loss
problem. Srikant et al. [49] defined a concept called partial completeness measure
to determine the amount of information loss that arise due to discretization. This
measure is useful because it can help us to determine the right number of intervals
needed for a given upper bound on the information loss. Once the range of a
continuous attribute has been partitioned, the algorithm can proceed to generate
frequent itemsets in the same manner as Apriori.
One potential problem with merging adjacent intervals is that some of the discovered rules could be redundant. For example, the rules R1 : Salary ∈ [30k, 50k] −→
OwnCar = N o and R2 : Salary ∈ [20k, 50k] −→ OwnCar = N o may have approximately the same confidence even though the support of R2 is larger. In this
case, R1 is a redundant rule and should be eliminated. In [49], a rule r is pruned if
4.9.3 Item Taxonomy
Food
169
Electronics
Bread
Computers
Milk
Wheat
White
Skim
Foremost
2%
Desktop
Kemps
Home
Laptop Accessory
Printer
TV
DVD
Scanner
Figure 4.21. Example of an item taxonomy..
its support or confidence is within ρ times the expected support or confidence of r,
where ρ is the minimum interest threshold. The expected support and confidence
for r can be computed based on the support and confidence of its closest ancestor
rule r 0 , which is a generalized version of r with a wider discretized interval. For
example, suppose r1 : AB −→ C and r2 : A0 B 0 −→ C 0 is its closest ancestor rule.
The expected support for r1 is computed as:
E(s(r1 )) = s(r2 ) ×
s(A)
s(B)
s(C)
×
×
0
0
s(A ) s(B ) s(C 0 )
and its expected confidence is:
E(c(r1 )) = c(r2 ) ×
s(B)
s(C)
s(A)
×
×
0
0
s(A ) s(B ) s(C 0 )
.
4.9.3 Item Taxonomy
There are several reasons why it is useful to incorporate concept hierarchies such as
item taxonomy into the association rule discovery problem:
1. Users may be interested in finding rules that span over multiple levels of the
taxonomy.
2. Rules at lower levels of the item taxonomy may not have sufficient support.
3. Item taxonomy can be used to summarize the discovered rules.
Note that one cannot directly infer rules involving items at higher levels of the
taxonomy based on the rules discovered for lower level items. Let us investigate
4.9.3 Item Taxonomy
170
how the support and confidence measures change as we traverse the item taxonomy
level-wise, in a top-down or bottom-up fashion.
Let G denotes a directed acyclic graph representing the set of item taxonomies
(Figure 4.21). If there is an edge in G from p to c, we call p a parent of c and c a
child of p. For example, milk is the parent of “skim milk” because there is a directed
edge from the node milk to skim milk in G. Furthermore, we say X̂ is an ancestor
of X if there is a path from X̂ to X in G. The properties of support and confidence
at different levels of the taxonomy are as follows :
• If X̂ is the parent for both X1 and X2 , then σ(X) may not be equal to
σ(X1 ) + σ(X2 ).
• If σ(X ∪ Y ) ≥ s, then σ(X̂ ∪ Y ) ≥ s, σ(X ∪ Ŷ ) ≥ s, and σ(X̂ ∪ Ŷ ) ≥ s. In
other words, support always increases as you go up the item taxonomy.
• If conf (X =⇒ Y ) > α, then only conf (X =⇒ Ŷ ) is guaranteed to be larger
than α. The same cannot be said about the confidence of the rules X̂ =⇒ Y
and X̂ =⇒ Ŷ .
There are several difficulties in applying the standard Apriori-like technique
directly to this problem :
1. Items at high levels of the taxonomy would naturally have larger support
counts compared to those at lower levels. Therefore, it may not be sufficient to
apply only a single minimum support threshold for all items in the taxonomy.
However, as noted in the earlier section, using multiple minimum support
thresholds may lose the anti-monotonicity property of the support measure.
2. By introducing item taxonomy, we have increased the dimensionality of the
feature space. This will increase the complexity of the problem as well as
the number of rules that are being generated. The latter problem must be
addressed for efficiency reasons since many of these rules are often redundant,
e.g., the rule 2% Kemps Milk =⇒ Food.
Han et al. [18] and Srikant et al. [48] have independently proposed two different
algorithms for mining association rules with item taxonomy. In [18], the algorithm
would progressively discover itemsets at different levels of the hierarchy using a topdown, level-wise approach. Initially, all frequent itemsets of size-1 at the top level
of the taxonomy are generated. These itemsets will be denoted as L(1, 1). Next,
using L(1, 1), the algorithm would derive all the frequent itemsets of size 2 at level
1 (L(1, 2)). The algorithm for generating these itemsets is basically the same as
Apriori. This step will be repeated until all the frequent itemsets at the top level
items, L(1, k) (k > 1), are found. The algorithm then moved on to the next level of
the taxonomy. Only the descendents of large itemsets in L(1, 1) will be considered as
candidates for L(2, 1). The algorithm then proceeds to generate L(2, 2), L(2, 3) · · ·
sequentially until all level-2 large itemsets are discovered. Subsequently, it will
progress to the next level and so on, until it terminates at the deepest level requested
4.10 Generalization of Association Patterns
171
by the user. [18] used a special encoding scheme to represent the transactions. Basically, each item in the database is represented by a unique identifier, based on the
overall item taxonomy. For example, the item 2% F oremost milk can be encoded as
1322 where the leftmost digit ’1’ represents the food category; followed by the digit
’3’ representing the product type, milk; and the digit ’2’ representing the product
content i.e. ’2% milk’ and finally, the digit ’2’ for the product brand, ’Foremost’.
Using this method of encoding, all level-1 items are represented as ’d***’, level-2
items represented as ’dd**’, and so on. Furthermore, each time L(k, 1) is generated,
the transaction database is filtered to remove all level-k items that are infrequent
and all transactions that contain only infrequent items. This is possible because the
support for lower-level items is always smaller than their ancestors. Nevertheless.
the algorithm proposed in [18] would discover only rules involving items that reside
at the same level of the hierarchy. Furthermore, the algorithm becomes incomplete,
i.e., it may miss some frequent itemsets, if multiple minimum support thresholds
are used.
Srikant et al. [48] introduced a more general framework for mining association
rules with item taxonomy. In this framework, association rules may involve items
from any levels of the taxonomy. Their algorithm works in the following way: initially, each transaction t is replaced by an extended transaction t0 which contains all
the items in t as well as the ancestors of the items in t. For example, the transaction t = {scanner, wheat bread} will be extended to t0 = {scanner, wheat bread,
accessories, computers, electronics, bread, food} where accessories, computers, and
electronics are the ancestors of scanner while bread and food are the ancestors
of wheat bread. We then apply the standard Apriori algorithm on the extended
database to obtain rules that span over multiple levels of the taxonomy. However,
the performance of this algorithm can be rather slow due to the additional number of features. Srikant et al. [48] have introduced various optimization methods
to improve the performance of the algorithm by removing some of the redundant
candidate itemsets. For example, we can prune away any itemsets containing an
item, X, along with its ancestor, X̂. This is because the support of the itemset
remains unchanged if we remove X̂ from the itemset.
4.10 Generalization of Association Patterns
4.10.1 Closed Itemsets
The concept of closed itemsets was proposed independently by Zaki et al. [54] and
Pasquier et al [38]. The idea is to find a minimal generating set called closed itemsets
from which all association rules and frequent itemsets can be inferred from them.
To illustrate the advantages of using closed itemsets, consider the database shown
in Table 4.13. The database contains 10 transactions and 30 items. The items can
be divided into three different groups: (A1 , A2 , · · · , A10 ), (B1 , B2 , · · · , B10 ), and
(C1 , C2 , · · · , C10 ). Items in each group are perfectly associated to other items in
the same group, but are not associated with items from a different group.
4.10.2 Dependence Rules
172
If the minimum support thresholdPis less than
30%, a standard association rule
10
mining algorithm would generate 3× 10
frequent
itemsets. In contrast, there
k=1 k
are only 3 frequent closed itemsets generated: {A1 , A2 , · · · , A10 }, {B1 , B2 , · · · , B10 },
and {C1 , C2 , · · · , C10 }.
Table 4.13. A transaction database for mining closed itemsets.
TID
1
2
3
4
5
6
7
8
9
10
A1
1
1
1
0
0
0
0
0
0
0
A2
1
1
1
0
0
0
0
0
0
0
···
···
···
···
···
···
···
···
···
···
···
A10
1
1
1
0
0
0
0
0
0
0
B1
0
0
0
1
1
1
0
0
0
0
B2
0
0
0
1
1
1
0
0
0
0
···
···
···
···
···
···
···
···
···
···
···
B10
0
0
0
1
1
1
0
0
0
0
C1
0
0
0
0
0
0
1
1
1
1
C2
0
0
0
0
0
0
1
1
1
1
···
···
···
···
···
···
···
···
···
···
···
C10
0
0
0
0
0
0
1
1
1
1
The definition of a closed itemset is given below.
Definition 8 An itemset X is a closed itemset if there exists no itemset X 0 such
that (1) X 0 is a proper superset of X, and (2) all the transactions containing X also
contains X 0 .
In other words, an itemset is closed if none of its supersets have the same support
count as the itemset itself. Furthermore, if a closed itemset is frequent, it is called
a frequent closed itemset.
Figure 4.22 illustrates an example of frequent closed itemsets derived for the
transaction database depicted in the same figure. The minimum support count is
assumed to be two, i.e., all itemsets whose support is greater than two are denoted
as frequent. Each node in the graph is associated with a list of transaction ids. For
example, the node BC belongs to transactions 1, 2, and 3. Thus, it is a frequent
itemset. BC is also closed because none of its supersets, ABC, BCD, and BCE,
have the same support count as BC. On the other hand, its subset B is not closed
because BC has the same support count as B.
Finally, it is important to note that all maximal frequent itemsets are contained
within the set of frequent closed itemsets, but not vice-versa.
4.10.2 Dependence Rules
Association rules, as we learnt so far, use support and confidence measures to quantify and filter the relationships. A rule of the form X =⇒ y is interesting if it
provides us two pieces of information: a) X and y occur together in sufficiently large
number of transactions (support), so that their co-occurrence can be considered statistically significant, and b) whenever X is present in a transaction, y is also present
4.10.2 Dependence Rules
TID
173
Items
1
ABC
2
ABCD
3
BCE
4
ACDE
5
DE
12
ABC
124
2
123
A
124
AB
12
null
24
AC
AD
ABD
ABE
AE
ABCD
ACD
ABCE
245
C
123
4
24
2
1234
B
2
BC
4
3
BD
ACE
345
D
4
BE
2
ADE
ABDE
BCD
4
ACDE
E
24
3
CD
BCE
34
CE
BDE
45
4
DE
CDE
BCDE
ABCDE
Figure 4.22. Example of frequent closed itemsets with minimum support count equals to 2.
in it with high probability (confidence). Confidence of this rule can be simply looked
at as a conditional probability of presence of y given presence of X. Now, consider
an example to critically evaluate the quality of this information.
In a computer store transaction data, out of 200 buyers of operating systems,
160 bought Windows operating system and 100 customers bought Linux operating
system. Of the Linux buyers, 60 also bought the Windows operating system. Now a
rule Linux −→ W indows has sufficiently large support (30%) and also sufficiently
high confidence (60%). So, looking at this rule, one might conclude that whenever
Linux is sold, Windows is also sold with 60% chance. Now, look at it from a different
angle. If we didn’t know anything about a customer, we know that she will buy
Windows with 80% probability. However, the moment we know that a customer
has bought Linux, her probability of purchasing Windows has gone down from 80%
to 60%. This implies that although the rule Linux −→ W indows has sufficient
confidence, it is misleading. Here is the reason. Let us assume that the store-owner
decides to the push the Windows sales up by 200 by sending ads to customers. If
he were to follow just the prior probability of Windows, then he would need to
mail ads to 200/0.8 = 250 customers. Whereas, if he were to base his decision on
the Linux −→ W indows rule, then he would need to increase the Linux sale by
200/0.6 = 333, by identifying and attracting 333 more Linux customers. Moreover,
following the first decision option, the store-owner needs to identify profile of his
store’s generic customer, whereas for second option, he has to identify profiles of
Linux customers, which may be more difficult. On the other hand, if the confidence
of the rule Linux −→ W indows was more than the prior probability of Windows,
then following this rule would have made more sense, rather than merely relying on
the prior probability.
4.10.3 Negative Associations
TID
Items
TID
A
A
B
B
C
C
D
D
1
{A,B}
1
1
0
1
0
0
1
0
1
2
{A,B,C}
2
1
0
1
0
1
0
0
1
3
{C}
3
0
1
0
1
1
0
0
1
4
{B,C}
4
0
1
1
0
1
0
0
1
5
{B,D}
5
0
1
1
0
0
1
1
0
Original Transactions
174
Transactions with Negative Items
Figure 4.23. A naive approach for mining negative associations.
This example stresses the need to get more information than mere conditional
probability (confidence) of a rule. The impact of conditioning a presence of some
item on presence of others can be judged by looking at whether the conditional
probability is higher than or lower than or same as the prior probability of that
item. This notion is precisely captured by statistical dependence. In statistics, a set
of events {x,y} is said to be independent, if their joint probability is the product of
their individual probabilities; i.e., P (x, y) = P (x)P (y). In the example above, the
events were Product Sold = Linux or Product sold = Windows. If the events are not
independent, we can look at the ratio P (x, y)/P (x)P (y) If the ratio is more than
1, then events are said to have positive dependence, which means that conditioning
the presence of any of the events (x or y), on the presence of other, causes increase
in the prior probability of occurrence of that event. This can be seen by rewriting
the equation P (x, y)/P (x)P (y) > 1 as P (x, y)/P (x) > P (y). The left hand side of
the last equation is the confidence (P (ykx)) of the rule x → y rule, which conditions
the presence of y on presence of x. The right hand side is the prior probabiity of
y. Similarly, negative dependence means that P (x, y)/P (x)P (y) < 1, and can be
interpreted as decrease in prior probability of either of the events after conditioning
its presence on the other’s presence.
Thus, statistical dependence yields a possibly more useful information between
two events, rather than merely looking at the confidence of a rule. Essentially,
statistical dependence combines the prior probabilities and conditional probabilities
into a single measure.
4.10.3 Negative Associations
This section describes the different techniques for mining negative associations in
data. A naive approach is to augment the original transaction data with new variables representing the absence of each item from a transaction, as shown in Figure
4.23. This allows existing algorithms such as Apriori to be applied on the modified
data set.
However, such a simple approach is computationally intractable for sparse trans-
4.10.3 Negative Associations
175
action data due to the following reasons:
1. Suppose there are d items in the database. The number of possible itemsets,
which is equal to 2d , can be exponentially large even for moderate values of d.
In fact, many of the positive itemsets do not appear in the transaction data
set even though the data set contains millions of transactions. While many
of the positive item combinations are absent, the number of negative itemsets
could be exponentially large.
2. Transaction width increases when negative items are introduced into the data
set. Suppose there are d items available, and the transaction width in the
original data set is equal to k d. As a result, the maximum size of frequent
(positive) itemsets one can obtain is k. When negative items are included, the
width of the transactions increases to d because an item is either present or
absent from a transaction, but not both, and the maximum size of frequent
itemsets one can obtain is also equal to d. Since the complexity of many frequent itemset generation algorithms depend on the maximum size of frequent
itemsets, these algorithms would break down for such data sets.
3. One may argue that the complexity can be improved if we impose a higher
minimum support threshold, tf , on the modified data set. However, we are
mostly interested in itemsets that contain mixtures of positive and negative
items. If tf is set too high, we could end up finding itemsets involving negative
items only. Therefore, in order to extract itemsets that contain positive and
negative items, the threshold tf must be reasonably low.
In [8], Brin et al. suggested using χ2 test to discover interesting positive and
negative associations. Applying the χ2 test to frequent itemsets can be computationally expensive. Although the χ2 statistic always increases with the size of an
itemset, the statistical significance or p-value of the test statistic is not a monotonic
function since the number of degrees of freedom also increases with the size of an
itemset.3
A recent study by Boulicaut et al. [7] introduced several alternative solutions to
the negative association problem. First it uses a variant of the inclusion-exclusion
principle from Boolean algebra to determine the support of negative itemsets or
mixed itemsets. For example, the support of a mixed itemset {A, B} is given by:
P (A ∪ B) = P (A) +
n
X
X
i=1 C⊂B,|C|=i
(−1)i × P (A ∪ C)
(4.4)
Equation 4.4 is useful only if the support of A with subsets of B are known. This
requires that the minimum support threshold must be low enough so that A ∪ B is
frequent. Otherwise, one must rely on an approximate technique where the support
of B and subsets of B are ignored when using Equation 4.4.
3
The degree of freedom for a k-itemset is 2k − k − 1 [15].
4.10.3 Negative Associations
A
F
C
B
D
176
G
E
J
H
K
Suppose {C,G} is a frequent itemset. The expected
support for itemsets related to C and G can be
computed using the following equations:
Figure 4.24. Savasere’s approach for mining negative association.
In [44], Savasere et al. proposed using domain knowledge, in the form of item
taxonomy, to decide what constitutes an interesting negative association. Their
basic assumption was that items from the same product family are expected to
have similar types of interaction with other items. For example, since Coke and
Pepsi belong to the same product category, we expect the association between Coke
and Ruffles to be somewhat similar to the association between Pepsi and Ruffles.
Specifically, Savasere’s approach uses the item taxonomy to determine the expected
support of an itemset (Figure 4.24). For example, if Coke and Ruffles are frequent,
then we can compute the expected support between Pepsi and Ruffles using the
second equation shown in Figure 4.24. If the actual support for Pepsi and Ruffles
is considerably lower than their expected value, then Savasere’s approach suggests
that there is an interesting negative association between Pepsi and Ruffles. Although
this technique is intuitively appealing, it has several limitations. First, it assumes
that an item taxonomy is available, which makes it difficult to apply the technique
beyond the market-basket domain. Second, it discovers negative associations by
computing the expected support of itemsets using the immediate parent-child or
sibling relationships in the item taxonomy. It does not infer the expected support
for itemsets that are not related by immediate parent-child or sibling relationships.
In summary, mining negative associations is a challenging problem especially for
sparse transaction data. The existing techniques for mining negative associations
are either too expensive, produce too many uninteresting patterns, or require the
use of domain knowledge to extract interesting patterns.
4.10.4 Indirect Associations
177
Y
y1
y2
.
.
.
a
b
yk
Figure 4.25. Indirect association between a pair of items.
4.10.4 Indirect Associations
Association rule mining would discover rules involving items that occur frequently
together in the transaction database. Any itemsets that fail the support threshold
criteria will be automatically removed. This action is justifiable since we are only
interested in finding items that are directly associated with each other. However,
under certain situations, it is possible that some of the lowly-supported itemsets can
be interesting due to the presence of higher-order dependencies4 among the items.
Consider a pair of items, (a, b), having a low joint support value (Figure 4.25). Let
Y be a set that contains items that are highly dependent on both a and b. This
figure illustrates an indirect association between a and b via a mediator set, Y .
Unlike direct association, an indirect relationship is characterized not only by the
two participants, a and b, but also by the items mediating this interaction.
One can think of many potential applications for indirect association. In the
market basket scenario, a and b may represent competing products. Analysts may
be interested in finding what are the common products (Y ) bought together by
customers who purchased the two competing products. With this information, they
can then performed an in-depth analysis on why customers prefer one brand over
the other based on the strength of dependencies between Y with a and b. Another
potential application for indirect association is in textual data mining. For text
documents, indirectly associated words may represent synonyms and antonyms or
terms that appear in different contexts of another word. For example, in a collection
of news articles, the words growth and decline may be indirectly associated via the
word economy. Another example is the pair soviet and worker may refer to the
different contexts in which the word union is being used. For stock market data, a
and b may refer to disjoint events that will partition event Y . For example, the event
Y = M icrosof t−U p can be partitioned into a = Redhat−Down and b = Intel−U p
4
direct association between items are considered to be first-order dependencies.
4.10.5 Frequent Subgraphs
178
where a denotes the event where the competitor’s stock value is down whereas b may
represent the event in which most of the technology stocks for large corporations
are up.
[50] proposed a general framework for defining indirect association in various
application domains. Their formal definition of indirect association is :
Definition 9 The pair a, b is indirectly associated via a mediator Y if the following
conditions hold :
1. σ(a, b) < ts (Itempair Support condition).
2. ∃Y 6= ∅ such that ∀yi ∈ Y :
(a) sup(a, yi ) ≥ tf , sup(b, yi ) ≥ tf (Mediator Support condition).
(b) d(a, yi ) ≥ td , d(b, yi ) ≥ td where d(p, q) is a measure of the dependence
between p and q (Dependence condition).
In order to discover indirectly associated items, one must address the following
questions :
1. What is a good measure of dependence between items?
2. What constitutes the elements of the mediator?
3. How to define the overall indirect association measure between a and b?
The first question is related to the issue of finding good interestingness measures
for itemsets (or association rules) which was described in Section 4.8. The second
question deals with the problem of defining mediating elements either as single
items, single itemsets or multiple itemsets. The last question allows one to rank
the discovered itempairs according to the strength of their indirect association. [50]
described several ways to tackle these issues and presented a simple algorithm for
mining indirect itempairs.
4.10.5 Frequent Subgraphs
As the field of association rule mining continues to mature, there is an increasing
need to apply similar techniques to other disciplines, such as Web Mining, bioinformatics, and computational chemistry. The data for these domains is often available
in a graph-based format, as described in Chapter 2. For example, in the Web Mining
domain, the vertices would correspond to Web pages and the the edges would correspond to hyperlinks traversed by the Web users. In the computational chemistry
domain, the vertices would correspond to the atoms or ions while the edges would
correspond to their chemical bonds. Mining associations from these kinds of data is
more challenging due to the following reasons:
4.10.5 Frequent Subgraphs
179
1. Support and confidence thresholds are not the only requirements the patterns
must satisfy. The structure of the data may restrict the type of patterns that
can be generated by the mining algorithm. Specifically, additional constraints
can be imposed regarding the connectivity of the subgraph, i.e., the frequent
subgraphs must be connected graphs.
2. In the standard market-basket analysis, each entity (item or discretized attribute) is treated as a binary variable. (Although the quantity of an item
bought in a transaction can be larger than one, it is often recorded as 1 in
the record-based format.) For graph-based data, an entity (vertex, event, or
item) can appear more than once within the same data object. These entities
cannot be combined into a single entity without losing the structure information. The multiplicity of these entities poses additional challenges to the
mining problem.
As a result, the standard Apriori or FP-tree algorithms must be modified in
order to mine frequent substructures efficiently from a collection of tree or graph
objects. In this section, we would briefly describe what are the modifications needed
and how they can be implemented effectively.
Problem Formulation
Let V = {v1 , v2 , · · · , vk } be the set of vertices and E = {eij = (vi , vj )|∀vi , vj ∈ V } be
the set of edges connecting pairs of vertices. Each vertex v is associated with a vertex
label denoted as l(v). We will use the notation L(V ) = {l(v)|∀v ∈ V } to represent
the set of all vertex labels in the data set. Similarly, each edge e is associated with
an edge label denoted as l(e). We will use the notation L(E) = {l(e)|∀e ∈ E} to
represent the set of all edge labels in the data set.
A labeled graph G is defined as a collection of vertices and edges with labels attached to each of them, i.e., G = (V, E, L(V ), L(E)). A graph G0 = (V 0 , E 0 , L(V 0 ), L(E 0 ))
is a subgraph of G if its vertices V 0 and edges E 0 are subsets of V and E, respectively. G0 is called an induced subgraph of G if it satisfies the subgraph requirements
along with the additional condition that all the edges connecting the vertices V 0 in
G are also present in E 0 . Figure 4.26(a) illustrates an example of a labeled graph
G that contains 6 vertices and 11 edges, with vertex labels obtained from the set
L(V ) = {a, b, c} and edge labels obtained from the set L(E) = {p, q, r, s, t}. A subgraph of G that contains only 4 vertices and 4 edges is shown in Figure 4.26(b). It
is not an induced subgraph because it does not include all the edges in G involving
the subgraph vertices. Instead, the proper induced subgraph of G for these vertices
is shown in Figure 4.26(c).
There are many data sets that can be modelled by using the graph representation. For instance, a market-basket transaction can be represented as a graph, with
vertices that correspond to each item and edges that correspond to the relationships
between items. Furthermore, each transaction is essentially a clique because every
item is associated with every other item in the same transaction. The problem of
4.10.5 Frequent Subgraphs
a
a
q
p
b
a
s
r
t
p
r
a
s
r
t
c
p
p
a
s
t
c
a
p
p
180
t
c
q
p
b
p
b
b
(b) Subgraph
(a) Labeled Graph
r
c
(c) Induced Subgraph
Figure 4.26. Graph, subgraph, and induced subgraph definitions.
mining frequent itemsets is equivalent to finding frequent subgraphs among the collection of cliques. Although the graph representation is more expressive, it is much
easier to work with the standard market-basket representation because we do not
have to worry about the structural constraints.
If the graphs are not cliques, we can still map the problem of finding frequent
subgraphs into the standard market-basket representation as long as all the edge and
vertex labels within a graph are unique. In this case, each graph can be regarded
as a transaction and each triplet of vertex and edge labels t = (l(vi ), l(vj ), l(eij ))
corresponds to an item, as illustrated in Figure 4.27. Unfortunately, most of the real
applications do not contain graphs with unique labels, thus requiring more advanced
methods to obtain the frequent subgraphs.
a
p
e
q
a
e
r
d
r
b
p
c
(a,b,p)
1
1
0
…
p
b
(a,b,r)
0
0
1
…
q
d
r
a
G2
(a,b,q)
0
0
0
…
c
r
d
r
G1
G1
G2
G3
…
q
p
b
(b,c,p)
0
0
1
…
G3
(b,c,q)
0
0
0
…
(b,c,r)
1
0
0
…
…
…
…
…
…
(d,e,r)
0
0
0
…
Figure 4.27. Mapping a collection of graph structures into market-basket transactions.
4.10.5 Frequent Subgraphs
181
Given a set of graphs G, the support of a subgraph g is defined as:
s(g) =
|{Gi |g ⊆s Gi ∧ Gi ∈ G}|
|G|
(4.5)
where ⊆s denotes the subgraph relation. This definition of the support measure also
satisfies the anti-monotonocity property. The goal of frequent subgraph mining is to
find all subgraphs whose support is greater than some minimum support threshold,
minsup. In some cases, we could also be interested in finding association rules for
the frequent subgraphs. The confidence of a graph association rule g1 −→ g2 can be
defined as:
|{G|g1 ∪ g2 ⊆s G ∧ G ∈ G}
c(g1 −→ g2 ) =
(4.6)
|{G0 |g1 ⊆s G0 ∧ G0 ∈ G}|
where g1 ∪ g2 denotes any graph containing both g1 and g2 as its subgraphs.
Mining Frequent Subgraphs
The idea of using frequent pattern discovery algorithms such as Apriori for mining
frequent subgraphs was initially proposed by Inokuchi et al. in [22]. There are
several issues one has to consider when mining frequent subgraphs:
Subgraph growing: Most of the standard frequent itemset generation algorithms
work in a bottom-up fashion, where the frequent itemsets of size k are used to
enumerate the frequent itemsets of size k + 1. During frequent itemset generation step, the itemset lattice (search space) can be traversed in a breadth-first
or depth-first manner. The same approach can also be applied to the frequent
subgraph generation problem. However, unlike frequent itemset mining, k
may refer to the number of vertices or edges in the subgraphs. If the algorithm enumerates frequent subgraphs one-vertex-at-a-time, the approach is
called vertex growing. On the other hand, if the algorithm enumerates the
frequent subgraphs one-edge-at-a-time, the approach is called edge growing:
1. Vertex growing has a meaningful interpretation because each iteration is
equivalent to increasing the dimension of its corresponding k × k adjacency matrix by a factor of one, as illustrated in Figure 4.28. G1 and
G2 are two graphs whose adjacency matrices are given by M (G1) and
M (G2), respectively. This approach was used by the AGM algorithm
proposed by Inokuchi et al. in [22]. As the subgraphs are grown one
vertex at a time, the algorithm would find only the frequent induced subgraphs. The drawback of this approach is that it can be computationally
expensive due to the large number of candidate subgraphs enumerated
by the algorithm. For example, in Figure 4.28, when the graphs G1 and
G2 are joined together, the the resulting graph G3 will have a new edge
connecting between the vertices d and e. However, the label of the new
edge can be arbitrary, which increases the number of candidate subgraphs
dramatically.
4.10.5 Frequent Subgraphs
a
e
p
p
p
a
a
a
q
+
r
a
r
r
q
e
p
p
p
?
a
d
182
r
r
d
a
a
a
G1
G2
G3 = join(G1,G2)
Figure 4.28. Vertex growing approach.
2. Edge growing was initially proposed by Kuramochi et al [28] to enable
a more efficient candidate generation strategy. At each iteration, a new
edge is added to the subgraph, which may or may not increase the number of vertices in the initial subgraph, as illustrated in Figure 4.29. However, the number of candidate subgraphs generated using this approach
is much smaller than the number of candidate subgraphs generated using
the vertex-growing approach. Kuramochi et al. [28] have also used the
more efficient sparse graph representation instead of the adjacency matrix representation. In addition, edge-growing approach tend to produce
more general patterns because it can discover both frequent subgraphs
and frequent induced subgraphs.
a
q
e
p
a
e
p
p
a
r
a
a
q
p
+
p
r
r
p
a
r
r
e
a
a
a
G1
G2
e
G3 = join(G1,G2)
a
q
e
p
p
a
r
r
a
G4 = join(G1,G2)
Figure 4.29. Edge growing approach.
4.10.5 Frequent Subgraphs
183
B
A
A
B
B
A
B
A
B
B
A
A
A
B
A
B
Figure 4.30. Graph Isomorphism
Graph isomorphism: A key task in frequent itemset discovery is to determine
whether an itemset is a subset of a transaction. Likewise, in frequent subgraph
mining, an important task is to determine whether a graph is a subgraph of
another graph. This requires a graph matching routine that can determine
whether a subgraph is topologically identical (isomorphic) to another graph.
For example, Figure 4.30 illustrates two graphs that are isomorphic to each
other because there is a one-to-one mapping between each vertex in the first
graph to another vertex in the second graph that preserves the connections
between the vertices.
Graph isomorphism is a complicated problem especially when many of the
candidate subgraphs are topologically equivalent. If two graphs are isomorphic to each other, their adjacency matrices are simply row and/or column
permutations of the other. If the labels on the vertices are unique, the graph
isomorphism problem can be solved in a polynomial time. However, if the
labels of the edges and vertices are not unique, the complexity of the graph
isomorphism problem is yet to be determined (i.e., it is still an open problem whether the graph isomorphism problem is NP-hard). In Apriori-like
algorithms such as AGM [22] and FSM [28], graph isomorphism presents the
biggest challenge to frequent subgraph discovery because it is needed for both
candidate generation and candidate counting steps:
1. It is needed during the join operation to check whether a candidate subgraph has already been generated.
2. It is needed during candidate pruning to check whether its k −1 subgraph
matches one of the previously discovered frequent subgraph.
3. It is needed during candidate counting to check whether a candidate
subgraph is contained within another graph.
4.10.5 Frequent Subgraphs
A(1)
A(2)
B (5)
B (6)
B (7)
B (8)
A(3)
A(4)
A(2)
A(1)
B (7)
B (6)
B (5)
B (8)
A(3)
A(4)
A(1)
A(2)
A(3)
A(4)
B(5)
B(6)
B(7)
B(8)
184
A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8)
0
1
1
0
1
0
0
0
1
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0
1
0
0
1
0
0
1
0
0
1
0
1
0
0
1
0
0
0
1
0
1
1
0
Code = 1100111000010010010100001011
A(1)
A(2)
A(3)
A(4)
B(5)
B(6)
B(7)
B(8)
A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8)
0
1
0
1
0
1
0
0
1
0
1
0
0
0
1
0
0
1
0
1
1
0
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
0
1
1
1
0
0
0
0
0
1
1
0
1
0
0
1
1
0
0
0
0
0
1
1
1
0
0
Code = 1011010010100000100110001110
Figure 4.31. Adjacency matrix and string encoding of a graph.
Most of the frequent subgraph discovery algorithms would handle the graph
isomorphism problem by using a technique known as canonical labeling, where
each graph or subgraph is mapped into an ordered string representation (otherwise known as its code), in such a way that two isomorphic graphs would
be mapped into the same canonical encoding. If a graph can have more than
one representation, its canonical encoding is usually chosen to be the lowest
precedence code. For example, Figure 4.31 illustrates two different ways to
represent the same graph. (The graph has more than one representation because the vertices are not unique.) The code for each graph is obtained by
doing a column-wise concatenation of the elements in the upper triangular
portion of the adjacency matrix. The canonical labeling of the graph is obtained by selecting the code that has the lowest precedence, e.g., the code at
the bottom of Figure 4.31 has a lower precedence than the code at the top of
the figure.
Multiplicity of Candidate Joining: In Apriori-like algorithms, candidate patterns are generated by joining together frequent patterns found in the earlier
passes of the algorithms. For example, in frequent itemset discovery, a candidate k-itemset is generated by joining two frequent itemsets of size k − 1 that
contains k − 2 items in common. Similarly, in frequent subgraph mining, a
candidate k-subgraph is generated by joining together two frequent subgraphs
of size k − 1 (where k may correspond to the number of vertices or edges of the
graph). New candidates are created as long as both k − 1 subgraphs contain a
common k − 2 subgraph, known as its core. However, unlike frequent itemset
discovery, each join operation in frequent subgraph mining may produce more
than one candidate subgraph, as illustrated in Figures 4.28 and 4.29. For the
4.10.5 Frequent Subgraphs
185
edge-growing approach, Kuramochi et al [28] three different scenarios in which
multiple candidates can be generated:
1. When joining two subgraphs with identical vertex labels, as illustrated
in Figure 4.32(a). Multiple candidates are generated because there are
more than one way to arrange the edges and vertices of the combined
subgraph.
2. When joining two subgraphs whose core contains identical vertex labels,
as illustrated in Figure 4.32(b). In this example, the core contains vertices
whose labels are equal to “a”.
3. When the joined subgraphs contain more than one core, as illustrated in
Figure 4.32(c). The shaded vertices in this figure correspond to vertices
that form the core of the join operation.
As a result, the number of candidate subgraphs generated can be prohibitively
large. Numerous algorithms have been proposed to reduce the number of candidate subgraphs generated by the frequent subgraph mining algorithms. For
example, the FSG algorithm [28] uses various vertex invariants, which are
structural constraints imposed on the candidate subgraphs, to restrict the
number of admissible candidates. A more recent algorithm, called g-span [52],
was proposed by Yan et al to generate frequent subgraphs without enumerating any infrequent candidates. The algorithm uses a canonical labeling scheme
called minimum DFS code to represent the subgraphs. By encoding the graphs
according to their minimum DFS codes, the authors showed that the frequent
subgraph mining problem can be turned into a sequential pattern mining problem, for which an efficient algorithm can be developed that does not require
candidate generation [39].
4.10.5 Frequent Subgraphs
a
a
a
+
e
b
c
e
b
c
e
b
a
c
e
e
b
c
(a)
b
c
a
a
a
a
a
b
a
a
a
a
+
c
a
a
b
a
a
a
c
a
a
a
c
a
a
a
b
(b)
a
b
a
b
a
a
a
a
a
b
a
a
b
a
a
b
a
a
+
a
b
a
a
a
b
a
a
(c)
Figure 4.32. Multiplicity of Candidate Joining.
186
BIBLIOGRAPHY
189
Bibliography
[1] R. C. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed
Computing (Special Issue on High Performance Data Mining), (Accepted for
Publication) 2000.
[2] C. Aggarwal and P. Yu. A new framework for itemset generation. In Proc. of
the 17th Symposium on Principles of Database Systems, pages 18–24, Seattle,
WA, June 1998.
[3] R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance
perspective. IEEE Transactions on Knowledge and Data Engineering, 5:914–
925, 1993.
[4] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets
of items in large databases. In Proc. ACM SIGMOD Intl. Conf. Management
of Data, pages 207–216, Washington D.C., USA, 1993.
[5] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In
Proc. of the 20th VLDB Conference, pages 487–499, Santiago, Chile, 1994.
[6] A. Amir, R. Feldman, and R. Kashi. A new and versatile method for association generation. In H. J. Komorowski and J. M. Zytkow, editors, Proc of
Principles of Data Mining and Knowledge Discovery, First European Symposium (PKDD’97), volume 1263, pages 221–231. Springer-Verlag, Trondheim,
Norway, June 1997.
[7] J.-F. Boulicaut, A. Bykowski, and B. Jeudy. Towards the tractable discovery
of association rules with negations. In Proc of the Fourth Int’l Conf on Flexible
Query Answering Systems FQAS’00, pages 425–434, Warsaw, Poland, October
2000.
[8] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing
association rules to correlations. In Proc. ACM SIGMOD Intl. Conf. Management of Data, pages 265–276, Tucson, AZ, 1997.
BIBLIOGRAPHY
190
[9] S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and
implication rules for market basket data. In Proc. of 1997 ACM-SIGMOD Int.
Conf. on Management of Data, pages 255–264, Tucson, AZ, June 1997.
[10] V. Cherkassky and F. Mulier. Learning From Data: Concepts, Theory, and
Methods. Wiley Interscience, 1998.
[11] P. Clark and T. Niblett. The cn2 induction algorithm. Machine Learning,
3(4):261–283, 1989.
[12] R. Cooley, P. Tan, and J. Srivastava. Discovery of interesting usage patterns
from web data. In M. Spiliopoulou and B. Masand, editors, Advances in Web
Usage Analysis and User Profiling, volume 1836, pages 163–182. Lecture Notes
in Computer Science, 2000.
[13] P. Domingos. The role of occam’s razor in knowledge discovery. Data Mining
and Knowledge Discovery, 3(4):409–425, 1999.
[14] G. Dong and J. Li. Interestingness of discovered association rules in terms of
neighborhood-based unexpectedness. In Proceedings of the Second Pacific-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 72–
86, Melbourne, Australia, April 1998.
[15] W. duMouchel and D. Pregibon. Empirical bayes screening for multi-item
associations. In Proc. of the Seventh Int’l Conference on Knowledge Discovery
and Data Mining, pages 67–76, San Francisco, CA, August 2001.
[16] A. George and W. Liu. Computer Solution of Large Sparse Positive Definite
Systems. Prentice-Hall series in computational mathematics, 1981.
[17] D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press,
2001.
[18] J. Han and Y. Fu. Discovery of multiple-level association rules from large
databases. In VLDB95, pages 420–431, Zurich, Switzerland, 1995.
[19] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’00), Dallas, TX, May 2000.
[20] R. Hilderman and H. Hamilton. Evaluation of interestingness measures for
ranking discovered knowledge. In Proc. of the 5th Pacific-Asia Conference
on Knowledge Discovery and Data Mining (PAKDD’01), pages 247–259, Hong
Kong, April 2001.
[21] E. Hunt, J. Marin, and P. Stone. Experiments in Induction. Academic Press,
New York, 1966.
BIBLIOGRAPHY
191
[22] A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining
frequent substructures from graph data. In Proc. of The Fourth European
Conference on Principles and Practice of Knowledge Discovery in Databases,
Lyon, France, 2000.
[23] D. Jensen and P. Cohen. Multiple comparisons in induction algorithms. Machine Learning, 38(3):309–338, March 2000.
[24] G. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection
problem. In Machine Learning: Proc. of the Eleventh International Conference,
pages 121–129, San Francisco, CA, 1994.
[25] M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic
rules. In Proc. of the Second Int’l Conference on Knowledge Discovery and
Data Mining, pages 263–266, Portland, Oregon, 1996.
[26] M. Klemettinen, H. Mannila, P. Ronkainen, T. Toivonen, and A. Verkamo.
Finding interesting rules from large sets of discovered association rules. In
Proc. 3rd Int. Conf. Information and Knowledge Management, pages 401–408,
Gaithersburg, Maryland, Nov 1994.
[27] R. Kohavi. A study on cross-validation and bootstrap for accuracy estimation and model selection. In Proc of the Int’l Joint Conference on Artificial
Intelligence (IJCAI’95), 1995.
[28] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of the
2001 IEEE International Conference on Data Mining, pages 313–320, San Jose,
CA, November 2001.
[29] D. Lin and Z. Kedem. Pincer-search: An efficient algorithm for discovering the
maximum frequent set. IEEE Transactions on Knowledge and Data Engineering, 14(3):553–566, 2002.
[30] B. Liu, W. Hsu, and S. Chen. Using general impressions to analyze discovered classification rules. In Proc. of the Third Int’l Conference on Knowledge
Discovery and Data Mining, pages 31–36, 1997.
[31] B. Liu, W. Hsu, and Y. Ma. Mining association rules with multiple minimum
supports. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and
Data Mining, pages 125–134, 1999.
[32] B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data
Mining, pages 125–134, San Diego, CA, 1999.
[33] J. Major and J. Mangano. Selecting among rules induced from a hurricane
database. In Proceedings of AAAI Workshop on Knowledge Discovery in
Databases, pages 30–31, 1993.
BIBLIOGRAPHY
192
[34] T. Mitchell. Machine Learning. McGraw-Hill, 1997.
[35] F. Mosteller. Association and estimation in contingency tables. Journal of the
American Statistical Association, 63:1–28, 1968.
[36] A. Mueller. Fast sequential and parallel algorithms for association rule mining:
A comparison. Technical Report CS-TR-3515, University of Maryland, August
1995.
[37] J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD Record, 25(2):175–186, 1995.
[38] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed
itemsets for association rules. In 7th Int’l Conf. on Database Theory, Jan 1999.
[39] J. Pei, J. Han, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth.
In Proc of Int’l Conf. on Data Engineering (ICDE’01), Heidelberg, Germany,
April 2001.
[40] G. Piatetsky-Shapiro and C. Matheus. The interestingness of deviations. pages
25–36, July 1994.
[41] G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules.
In G. Piatetsky-Shapiro and W. Frawley, editors, Knowledge Discovery in
Databases, pages 229–248. MIT Press, Cambridge, MA, 1991.
[42] J. Quinlan. C4.5: Programs for Machine Learning. Morgan-Kaufmann Publishers, 1993.
[43] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining
association rules in large databases. In Proc. of the 21st Int. Conf. on Very
Large Databases (VLDB‘95), Zurich, Switzerland, Sept 1995.
[44] A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. In Proc. of the 14th International Conference on Data Engineering, pages 494–502, Orlando, Florida,
February 1998.
[45] P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah. Turbocharging vertical mining of large databases. In Proc. of 2000 ACM-SIGMOD
Int. Conf. on Management of Data, Dallas, TX, May 2000.
[46] A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge
discovery systems. IEEE Trans. on Knowledge and Data Engineering, 8(6):970–
974, 1996.
BIBLIOGRAPHY
193
[47] C. Silverstein, S. Brin, and R. Motwani. Beyond market baskets: Generalizing
association rules to dependence rules. Data Mining and Knowledge Discovery,
2(1):39–68, 1998.
[48] R. Srikant and R. Agrawal. Mining generalized association rules. In 21rst Very
Large Database Conference, pages 407–419, Zurich, Switzerland, 1995.
[49] R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proc. of 1996 ACM-SIGMOD Int. Conf. on Management of
Data, pages 1–12, Montreal, Canada, 1996.
[50] P. Tan, V. Kumar, and J. Srivastava. Indirect association: Mining higher order
dependencies in data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 632–637, Lyon,
France, 2000.
[51] H. Toivonen. Sampling large databases for association rules. In Proc. of the
22nd VLDB Conference, pages 134–145, Bombay, India, 1996.
[52] X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proc
of the IEEE Int’l Conf. on Data Mining (ICDM’02), Maebashi City, Japan,
December 2002.
[53] M. Zaki and K. Gouda. Fast vertical mining using diffsets. Technical Report
TR-01-1, Department of Computer Science, Rensselaer Polytechnic Institute,
2001.
[54] M. Zaki and M. Orihara. Theoretical foundations of association rules. In
Proc. of 3rd SIGMOD’98 Workshop on Research Issues in Data Mining and
Knowledge Discovery (DMKD’98), Seattle, WA, June 1998.
[55] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of the Third Int’l Conference on Knowledge
Discovery and Data Mining, 1997.
[56] Z. Zhang, Y. Lu, and B. Zhang. An effective partioning-combining algorithm
for discovering quantitative association rules. In Proc. of the First Pacific-Asia
Conference on Knowledge Discovery and Data Mining, 1997.