Download An Algorithm To Discover Time-Interval Sequential Patterns In

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
An Algorithm To Discover Time-Interval Sequential Patterns
In Sequence Databases
Y.L. Chen, M.C. Chiang and M. T. Kao
Department of Information Management
National Central University
E-mail:[email protected]
Abstract
Mining sequential patterns is to discover those frequently occurred subsequences hidden in a
sequence database. Although the conventional sequential patterns can tell us the order among the
items, we cannot know after how much time the next item would take place; that is, this sequential
pattern does not provide time intervals between successive items. And it is for this very concern that
this paper provides a study of mining sequential patterns that takes time intervals into consideration,
named as time-interval sequential patterns. In this paper, an efficient algorithm is developed to mine
time-interval sequential patterns by modifying traditional Apriori algorithm.
Keywords: Sequential patterns; Sequence data; Data mining; Time interval
1.
Introduction
Data mining is to extract implicit, previously unknown and potentially useful information from
databases [5, 9]. The discovered information and knowledge is useful for various applications, such as
market analysis, decision support, flaw detection and business management. Because of its importance,
many approaches have been proposed to extract information, and mining sequential patterns is one of
the most important ones. This approach was introduced around the mid 90s and its goal is to discover,
from sequence database, patterns that occur frequently [4, 11]. A typical example of sequential
pattern is like that a customer, having bought a computer, comes back in the future to buy a scanner and
a microphone. Although this kind of sequential pattern tells us the order among the items, we cannot
know after how much time the next item would take place; that is, this sequential pattern does not
provide time intervals between successive items. Thus, this paper would probe into a new issue: finding
sequential patterns with time intervals. We provide a new pattern, named “the time-interval sequential
pattern,” which tell us not only the order among the items but also the time intervals between
successive items. The following is examples of the time-interval sequential pattern: (a) Having bought
a laser printer, a customer will come back to buy a scanner in three months and then a CD burner in six
months; (b) A customer revisits website A within a week; (c) After operation X, it is very likely that the
patient is infected by virus Y in two weeks.
The time-interval sequential pattern provides more valuable pieces of information than the
traditional sequential pattern. Let us take the retailing business as an example: with the help of the
time-interval sequential pattern, the retailer not only learns the habits, interests, and needs of his
customers but also the timing of shopping. As a result, the retailer can mail out the best suitable
31
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
catalogues to different types of customers at a right time, or he can determine when, how many
quantities and which products he needs to place order so that the future demand can be satisfied.
Therefore, the time-interval sequential pattern allows a business to provide right products and right
services to right customers at a right time.
In addition to the retailing data, time-interval sequential
patterns could be also mined from many other kinds of data, such as the criminal records in a police
department, the traveler records in a travel agency, the diagnosis records in a hospital, and any other
business record. In all these applications, the discovered time-interval sequential patterns could provide
very useful information for decision-makings.
The goal of this paper is to propose an algorithm for mining time-interval sequential patterns from
sequence data. In Section 2, we review the previous research results concerning mining sequential
patterns, and we also discuss some of its potential applications. In Section 3, we formally define the
problem and define what the time-interval sequential pattern is. Following that, Section 4 gives an
algorithm to find time-interval sequential patterns by modifying the traditional Apriori algorithm.
Finally, our conclusion is given in Section 5.
2.
Previous Researches
The problem of mining sequential patterns was first introduced by Agrawal and Skrikant [4], and
can be described as follows. We are given a sequence database, which is formed by a set of
data-sequences.
Each data sequence includes a series of transactions, where the transactions in a data
sequence are ordered according to their transaction-times. And the purpose of the research was to find
all the subsequences whose ratios of appearance exceed the minimum support threshold. After Agrawal
and Skrikant [4], one after another scholar provides more efficient algorithms to solve the question [14,
19, 20, 22]. Besides that, some try to extend the method of mining sequential patterns to analyze
other time-related patterns. The important researches in this category include finding frequent episodes
in event sequences [17], finding cyclic patterns in time-stamped transaction database [12, 13, 16],
finding similar patterns in a time-series database [2, 8, 15], finding traversal patterns in a web log [6, 7,
18], and finding sequential alarm patterns in a telecommunication database [21].
Typically, there are two ways how the previous researches deal with the issue of time interval: (a)
the time-window approach, and (b) ignore the time interval completely. First, in the time-window
approach we must specify the length of time window in advance. After that, a sequential pattern
mined from the database is a sequence of windows, where each window contains a set of items.
The
items appear in the same time window mean these items are bought in the same time period. For
example, [17] has to specify the window width (win) in order to find the episodes that are frequent in
event sequences. In this research, we could find the serial episodes that, in the time range win, a
happens, then b follows, and c happens lastly; or we could find the parallel episodes that, in the time
range win, a, b, and c happens according to no specific order. Serial episode only indicates that, in win,
a, b, and c happen according to some order, but the time intervals between them are unknown. And
parallel episodes only indicates that a, b, and c happen in win, but we do not know in what order and
32
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
time interval that they happen. As for the research of [21], it specifies the referent urgent window δ
beforehand, with which it means that, in a sequential alarm pattern, the time interval between adjacent
alarm events is within the time range δ. For example, if we set δ as six hours, the sequential alarm
pattern (a, b, c) indicates that a, b, and c happen in order and that the time interval between a, and b,
and between b and c is less than six hours. However, this research cannot discover other possible
time-interval patterns between the events. And lastly, the research of [20] has to specify the maximum
time interval (max-interval), the minimum time interval (min-interval), and the sliding time window
size (window-size). For example, we specify max-interval as thirty days, min-interval as zero days, and
window-size as seven days. If the pattern we get is ((a, b) (c, d)), then we know that a and b happen in
the time range of seven days, that c and d also happen in seven days, and that the time interval between
(a, b) and (c, d) is in thirty days. Nevertheless, this pattern could not tell us the order of a and b and that
of c and d. Moreover, the time interval between (a, b) and (c, d) is fixed, but maybe in reality some
other values of time interval exist between (a, b) and (c, d). Yet this research is incapable of finding
patterns with other time intervals. On the other hand, the second means of dealing with time is
employed by [4, 14, 19]. The only thing they do is to mine traditional sequential patterns, which only
learn the temporal order of the items. For example, the sequential pattern (d, e, f) only tells us that d
happens, then e happens, and lastly f happens.
From the above discussion, we see that all the past researches do not articulate the time intervals
between successive items. Therefore, this paper proposes a new approach that clearly shows the time
intervals between items.
By this, we could learn in what time ranges the items happen in order, and in
addition, we would know the time intervals between them. We believe that mining sequential patterns
with time intervals could find more meaningful patterns for us and provide more valuable pieces of
information.
Because the previous researches could not find patterns that include the time intervals between
successive items, this paper develops an efficient method to do so by modifying the traditional Apriori
approach.
3. Problem Definition
Traditionally, the data sequence of a customer is represented as an ordered list of itemsets, where
each itemset is attached with a transaction time. However, this paper represents the sequence in another
way: a sequence A is represented as ((a1, t1), (a2, t2), (a3, t3), …, (an, tn)), where aj is an item and tj
stands for the time that aj happens, 1 ≤ j ≤ n, and tj-1 ≤ tj for 2 ≤ j ≤ n. In the sequence, if there are items
that occur at the same time, they are ordered according to the alphabetical order.
The new presentation of sequence provided by this paper is in fact the same with the typical one.
A sequence in the previous format can be transformed to the new format by sorting all of items first by
time order and then by alphabetical order. Likewise, a sequence in the new format can be transformed
to the traditional format by first combing those items occurring at the same time into an item set and
then sort these item sets by time order.
33
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
In addition, let t denote the length of time interval between two successive items, and let Tk be the
given constants for 1 ≤ k ≤ r-1. Then, we divide the time interval into r+1 ranges:
z
I0 stands for the time interval t satisfying t = 0;
z
I1 stands for the time interval t satisfying 0 < t ≤ T1;
z
Ij stands for the time interval t satisfying Tj-1< t ≤ Tj for 1< j < r-1;
z
Ir stands for the time interval t satisfying Tr-1< t < ∞;
Let the set of time intervals be denoted as TI= {I0, I1, I2, …, Ir }.
Then, we can define the
time-interval sequential pattern as follows.
Definition 1. Let I = {i1, i2, …, im} be the set of all items and TI= {I0, I1, I2, …, Ir} be the set of time
intervals. A sequence B=(b1, &1, b2, &2, …, bs-1, &s-1, bs ) is a time-interval sequence if bi∈I for 1≤i≤s
and &i∈TI for 1≤i≤s-1.
Definition 2. For a sequences A = ((a1, t1), (a2, t2), (a3, t3), …, (an, tn)) and a time-interval sequence
B=( b1, &1, b2, &2,…, bs-1, &s-1, bs ), we say that B is contained in A or B is a time-interval subsequence
of A if there exist integers 1 ≤ j1 < j2 < …< js ≤ n such that
1.
b1 = aj1, b2 = aj2, …, br = ajs.
2.
tji - tji-1 satisfies the condition of interval &i-1 for 2≤i≤s.
We denote a transaction by <sid, s>, where sid is the identifier of this transaction and s a
sequence. A sequence database is formed by a set of records <sid, s>, and can be denoted by S. For
a given time-interval sequence α, its support count in database S can be defined by the following way.
Definition 3. support_countS (α) = {(sid, s)  (sid, s) ∈ S ∧ α is contained in s}
A time-interval sequence α is called the time-interval sequential pattern or the frequent
time-interval sequence if the percentage of transactions in S containing α is greater than or equal to the
user-specified minimum support (called min_supp.) In other words, we call α a time-interval sequential
pattern in S if support_countS(α) ≥ S × min_supp. The total number of items in a time-interval
sequence s is referred to as the length of the sequence. A time-interval sequence whose length is k is
referred to as k-time-interval sequence. Similarly, a time-interval sequential pattern whose length is k is
referred to as k-time-interval sequential pattern.
Given a sequence database and min_sup, the goal of time-interval sequential pattern mining is to
find in the sequence database all the time-interval sequential sequences whose supports are more than
or equal to min_sup.
Example 1. Consider a given sequence database shown in Fig. 1 with TI= {I0, I1, I2, I3}, where I0 : t = 0,
I1 : 0< t ≤3, I2 : 3< t ≤6 and I3: 6< t ≤∞. The time-interval sequence (b, I1, e, I2, c) includes three
items, and therefore the length is 3. We call it as 3-time-interval sequence. The time-interval sequence
(b, I1, e, I2, c) is a time-interval subsequence of transaction 40, i.e., sequence (b, 15), (f, 17), (e, 18), (b,
22), and (c, 22). Besides, (b, I1, e, I2, c) is also contained in transactions 10 and 30. Therefore, its
support is 75%. If we set min_sup=50%, then (b, I1, e, I2, c) is a time-interval sequential pattern in the
database.
34
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
Sid
sequence
10
( (a, 1), (c, 3) (a, 4), (b, 4), (a, 6), (e, 6), (c, 10) )
20
( (d, 5), (a, 7) (b, 7), (e, 7), (d, 9), (e, 9), (c, 14), (d, 14) )
30
( (a, 8), (b, 8) (e, 11), (d, 13), (b, 16), (c, 16), (c, 20) )
40
( (b, 15), (f, 17), (e, 18), (b, 22), (c, 22) )
Fig.1.
4.
A sequence database.
The Algorithm
The goal of this section is to develop an algorithm for mining time-interval sequential patterns
from databases. The algorithm is developed by modifying the well-known Apriori algorithm. We
introduce them in the following.
4.1 The I-Apriori algorithm
The algorithm is listed in Fig. 2. In the algorithm, Lk denotes the set of all frequent k-time-interval
sequences and Ck the set of candidate k-time-interval sequences.
where the k-th phase is to find Lk from Ck.
The algorithm proceeds in phases,
In the first phase, we will find L1 from C1.
Obviously, C1
can be generated by list all distinct items in databases. Then, by scanning the database sequentially,
we can get the supports of all time-interval sequences in C1. Next, by removing those infrequent
time-interval sequences, we get the resulting large set L1. The k-th phase is to produce Lk. To this end,
we first derive Ck by the function apriori_gen(Lk-1). Having obtained Ck, we then produce Lk by
computing their supports, which can be done by scanning the database and deleting all infrequent
time-interval sequences.
L1= find_1-frequent_item( S );
For ( k = 2 ; Lk-1 ≠ ∅; k++) {
Ck = apriori_gen ( Lk-1, TI );
For each sequence s∈S { //scan D for computing support counts
Cs = subseq (Ck, s ); //get the subsequences of s
For each candidate c ∈ Cs
c.count ++;
}
Lk = { c ∈ Ck | c.count ≥ min_sup }
}
return ∪Lk ;
Fig. 2. The I-Apriori algorithm.
Our algorithm has two major differences from the traditional Apriori algorithm: (1) the method to
generate Ck, (2) the method to compute the supports of candidate sequences. We introduce them in the
following.
We first discuss how to generate C2.
Next, we discuss how to generate Ck, where k>2.
Traditional, C2 can be obtained by joining L1 with L1 directly.
35
However, since the first item and
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
the second item in C2, say b and c, may have different time-interval relations, we need to generate the
pairs for all possible time-interval relations. Let us use an example for explanation. Suppose (b) and
(c) belong to L1 and TI= {I0, I1, I2}. Then we have the following candidate time-interval sequences in
C2: (b, I0, b), (b, I1, b), (b, I2, b), (b, I0, c), (b, I1, c), (b, I2, c), (c, I0, b), (c, I1, b), (c, I2, b), (c, I0, c), (c, I1,
In a word, we can generate C2 by L1 × TI × L1, where × denotes join.
c) and (c, I2, c).
Next, we consider how to generate Ck. Let (e1 &1 e2 &2…&k-1 ek) be a k-time-interval sequence
in Lk. Then the (k-1)-time-interval sequences (e1 &1 e2 &2…&k-2 ek-1) and (e2 &2…ek-1 &k-1 ek) must be
also frequent, because all the transactions containing (e1 &1 e2 &2…&k-1 ek) will have these two
sequences as their subsequences. Therefore, if the time-interval sequences (e1 &1 e2 &2…&k-2 ek-1) and
(e2 &2…ek-1 &k-1 ek) exist in Lk-1, then we are certain that (e1 &1 e2 &2…&k-1 ek) must exist in Ck.
By
joining the time-interval sequences in Lk-1 this way, we can generate all the time-interval sequences in
Ck. For example, if (A, I0, B) and (B, I2, D) are in L2, then (A, I0, B, I2, D) must be in C3. Similarly, if (A,
I0, B, I2, D) and (B, I2, D, I3, C) are in L3, then (A, I0, B, I2, D, I3, C) must be in C4. Finally, the
algorithm for producing the candidate set Ck from Lk-1 is shown in Fig. 3.
Procedure apriori_gen ( Lk-1, TI )
for each sequence l1 ∈ Lk-1 {
for each sequence l2 ∈ Lk-1
if ( k = 2)
then
{
{
for each time interval i ∈TI
c = l1 × i × l2;
add c to Ck ;
{
}
}
else {
if ( l1[3] = l2[1] ∧ l1[4] = l2[2] ∧ … ∧ l1[(k-1)*2-1] = l2[(k-1)*2-3] ∧
[l1[(k-1)*2] = l2[(k- 1)*2-2] then {
c = l1 × l2 ;
add c to Ck ;
}
}
}
}
return Ck ;
Fig. 3.
The program for generating candidate set.
After finding the candidate set of k-time-interval sequences, a problem arises immediately is how to
compute their support counts.
In this respect, we use a tree structure, called candidate tree, as the
basis to do the job. Basically, the candidate tree is similar to the hash tree [3] adopted by the previous
researches. The major difference is that the traditional approach attaches each tree branch with an item
name, whereas we attach two components: item name and the time-interval value.
Except this
difference, how to build the tree, how to traverse the tree and how to compute support counts are all
similar.
36
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
First let us see how to build a candidate tree by example. Suppose the candidate set has three
4-time-interval sequences: (a I0 b I1 e I2 b), (a I0 b I1 e I2 c) and (b I1 e I2 b I0 c). Then the constructed
candidate tree will be look like the one shown in Fig. 4.
a
b
a
b
I0 b
I1 e
a I0 b
b I1 e
I1 e
I2 b
a I0 b I1 e
I2 b
a I0 b I1 e I2 b
I2 c
b I1 e I2 b
I0 c
a I0 b I1 e I2 c
Fig. 4.
b I1 e I2 b I0 c
An example of candidate tree.
Having constructed the candidate tree, the next job is to scan every transaction in the database,
and for every transaction we need to traverse the tree and compute the supports of those candidates.
The procedure Traverse shown in Fig. 5 is used to finish this job.
Subroutine Traverse(u, T, i, level )
Parameters: u: the node in the candidate tree where we are currently located
T: the sequence that we are dealing with
i: the position in the sequence that the preceding traversal matches
level: the level of node u in the candidate tree
item(u, v): the item of arc (u, v)
interval(u, v): the time interval of arc (u, v)
item(T(j)): the item of T(j)
interval(time( T(j) )-time( T(i))): the interval to which the time difference
between T(j) and T(i) belongs
T_length: the length of T
Method:
if u is a leaf node then
add 1 into the counter of the leaf node
return
else
for( j = i+1; j ≤ T_length; j++ )
for each child v of u
if level = 0 then
if item(u, v ) = item(T( j )) then
Traverse (v, T, j, level+1 )
else
if item( u, v ) = item(T( j )) and
interval( u, v ) = interval(time( T(j) ) - time( T( i ) ) ) then
Traverse (v, T, j, level+1)
return
Fig. 5. Procedure Traverse.
Example 2. Suppose the transaction T=((b, 15), (f, 17), (e, 18), (b, 22), (c, 22)) is about to traverse the
37
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
candidate tree in Fig. 4. Then the entire search is activated by calling Traverse(root, T, 0, 0), where we
will call Traverse((b), T, 1, 1) and Traverse((b), T, 4, 1) because item b occurs at the first and the fourth
positions in the transaction. In executing Traverse((b), T, 1, 1), a series of calls will be activated, first
Traverse((b I1 e), T, 3, 2), then Traverse((b I1 e I2 b), T, 4, 3) and finally Traverse((b I1 e I2 b I0 c), T, 5,
4). Continuing this way until the whole procedure stops, we finally find the pattern (b I1 e I2 b I0 c).
5.
Conclusion
In this paper, a new issue of the time-interval sequential pattern mining is raised. The
time-interval sequential pattern could provide knowledge about not only the order among the items (as
the traditional sequential pattern does) but also the time interval between every two successive items.
In daily life, we could find applications of the time-interval sequential pattern like analyzing the
purchasing behaviors of customers, the exploration of a website, the diagnosis of a disease, to name
just a few. We could use these pieces of information to design appropriate policies, in order to
strengthen for corporations their powers to compete and to bring to individuals enormous profits.
In the paper, we assume that the time interval can be partitioned into a set of fixed time-intervals
beforehand. In fact, how to partition the time interval and decide their boundaries are by no means
easy.
If the specified time interval is too wide, then many interesting patterns with narrow
time-intervals will be missing, for they are engulfed by other patterns with larger time intervals.
Contrarily, if too narrow, then many interesting time-interval patterns will be missing, for their supports
are not large enough. How to balance these two extremes so that all interesting time-interval sequential
patterns can be found would be a valuable but difficult issue. Besides, we may apply fuzzy theory or
rough set theory to partition the intervals so that the boundary of an interval is no longer fixed but
flexible. With this extension, we can mitigate the problem of sharp boundary and provide a smooth
transition between member and non-member of a set. Finally, we may build taxonomy on the time
intervals so that a bigger interval is formed by a set of smaller intervals. By this extension, we can
find multiple-level time-interval sequential patterns from sequence data, not only the sequential
patterns with time intervals at the same level but also with time intervals across different levels.
References
[1]
R.C. Agarwall, C. Aggarwal, V.V.V. Prasad, A tree projection algorithm for generation of
frequent itemsets, Journal of Parallel and Distributed Computing (2000).
[2]
R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases,
Conference on Foundations of Data Organization and Algorithms, Chicago, Illinois, 1993, pp.
69-84.
[3]
R. Agrawal, R. Srikant, Fast algorithms for mining association rules, Proc. 1994 Int. Conf. Very
Large Data Bases (VLDB’94), Santiago, Chile, 1994, pp. 487-499.
[4]
R. Agrawal, R. Srikant, Mining sequential patterns, Proc. 1995 Int. Conf. Data Engineering
(ICDE’95), Taipei, Taiwan, 1995, pp. 3-14.
38
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
[5]
M.S. Chen, J. Han, P.S. Yu, Data mining: an overview from a database perspective, IEEE
Transactions on Knowledge and Data Engineering, 8 (6) (1996) 866-883.
[6]
M.S. Chen, J.S. Park, P.S. Yu, Efficient data mining for path traversal patterns, IEEE
Transactions on Knowledge and Data Engineering 10 (2) (1998) 209-221.
[7]
R. Cooley, B. Mobasher, J. Srivastava, Data preparation for mining world wide web browsing
patterns, Journal of Knowledge and Information Systems, 1 (1) (1999) 5-32.
[8]
C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching in time-series
databases, Proceedings of the 1994 ACM SIGMOD International Conference on Management of
Data, Minneapolis, Minnesota, 1994, pp. 419-429.
[9]
W. J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, Knowledge Discovery in Databases: An
Overview, AAAI/MIT press, 1991.
[10] V. Guralnik, N. Garg, G. Karypis, Parallel tree projection algorithm for sequence mining, 7th
International European Conference on Parallel Processing, Manchester, UK, 2001, pp. 310-320.
[11] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Academic Press, 2001.
[12] J. Han, G. Dong, Y. Yin, Efficient mining of partial periodic patterns in time series database, Proc.
1999 Int. Conf. on Data Engineering (ICDE'99), Sydney, Australia, March 1999, pp. 106-115.
[13] J. Han, W. Gong, Y. Yin, Mining segment-wise periodic patterns in time-related databases. Proc.
of 1998 Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), New York City, NY,
1998, pp. 214-218.
[14] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, FreeSpan: frequent
pattern-projected sequential pattern mining,
Proc. 2000 Int. Conf. on Knowledge Discovery
and Data Mining (KDD’00), Boston, MA, 2000, pp. 355-359.
[15] C. Li, P.S. Yu, V. Castelli, Hierarchyscan: a hierarchical similarity search algorithm for databases
of long sequences, Proceedings of the 12th International Conference on Data Engineering, New
Orleans, Louisiana, 1996, pp. 546-553.
[16] S. Ma, and J. L. Hellerstein, Mining partially periodic event patterns with unknown periods, Proc.
17th Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, 2001, pp. 205-214.
[17] H. Mannila, H. Toivonen, A. Inkeri Verkamo, Discovery of frequent episodes in event sequences,
Data Mining and Knowledge Discovery 1(3) (1997) 259 –289.
[18] J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu, Mining access patterns efficiently from web logs, Proc.
2000 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'00), Kyoto, Japan,
2000, pp. 396-407.
[19] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: mining sequential patterns
efficiently by prefix-projected pattern growth, Proc. 2001 Int. Conf. on Data Engineering
(ICDE'01), Heidelberg, Germany, 2001.
[20] R. Srikant, R. Agrawal, Mining sequential patterns: generalizations and performance
improvements, Proc. of the Fifth Int'l Conference on Extending Database Technology
(EDBT’96), Avignon, France, 1996.
39
Proceedings of the First Workshop on Knowledge Economy and Electronic Commerce
[21] P.-H. Wu, W.-C. Peng, M.-S. Chen, Mining sequential alarm patterns in a telecommunication
database, Workshop on Databases in Telecommunications (VLDB 2001), 2001.
[22] M. J. Zaki, SPADE: an efficient algorithm for mining frequent sequences, Proceedings of
Machine Learning Journal, special issue on Unsupervised Learning, 2001, 42, pp. 31-60.
40