Download View Sample PDF - IRMA International

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
18 International Journal of Data Warehousing and Mining, 10(2), 18-38, April-June 2014
An Efficient Pruning and
Filtering Strategy to Mine
Partial Periodic Patterns from
a Sequence of Event Sets
Kung-Jiuan Yang, Department of Information Management, Fortune Institute of Technology,
Daliao District, Kaohsiung, Taiwan
Tzung-Pei Hong, Department of Computer Science and Information Engineering, National
University of Kaohsiung, Nan-Tzu District, Kaohsiung, Taiwan
Yuh-Min Chen, Institute of Manufacturing Information and Systems, National Cheng Kung
University, Tainan City, Taiwan
Guo-Cheng Lan, Computational Intelligence Technology Center, Industrial Technology
Research Institute, Hsinchu County, Taiwan
ABSTRACT
Partial periodic patterns are commonly seen in real-world applications. The major problem of mining partial
periodic patterns is the efficiency problem due to a huge set of partial periodic candidates. Although some
efficient algorithms have been developed to tackle the problem, the performance of the algorithms significantly
drops when the mining parameters are set low. In the past, the authors have adopted the projection-based
approach to discover the partial periodic patterns from single-event time series. In this paper, the authors
extend it to mine partial periodic patterns from a sequence of event sets which multiple events concurrently
occur at the same time stamp. Besides, an efficient pruning and filtering strategy is also proposed to speed
up the mining process. Finally, the experimental results on a synthetic dataset and real oil price dataset show
the good performance of the proposed approach.
Keywords:
Data Mining, Partial Periodic Pattern, Projection, Sequential Pattern
DOI: 10.4018/ijdwm.2014040102
Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Data Warehousing and Mining, 10(2), 18-38, April-June 2014 19
INTRODUCTION
In the past, periodic pattern mining has been
one of the important issues in sequential pattern
mining due to the discovery of regularity in
time series or event sequences. Periodic pattern
mining techniques have widely been applied to
various business applications, such as analyzing
web usage every day, tracing the regularity of
company’s stock prices every week, analyzing
the product re-order points every month, tracing
product sales volume every season, etc (Aref
et al., 2004; Elfeky et al., 2004). However, the
limitation of mining frequent full periodic patterns is a strict constraint since all events in a
full periodic pattern have to be known, and their
positions in the pattern are fixed and frequently
appeared in the long-term time-series data with
a specific periodic length. To solve this problem, Han et al. (1998) thus proposed another
kind of periodic pattern mining called partial
periodic pattern mining, which considered most
but not all points in a period contributing to an
approximate cyclic behavior of a time series.
For example, “The price of the stock goes up on
Monday” is a partial periodicity because it only
considers Mondays and says nothing about the
price fluctuations. In this example, a unit of the
period length is set as a week (seven days), and
“the stock price goes up” can be defined as an
“event”. To effectively find such patterns, Han
et al. subsequently proposed a partial periodic
mining algorithm called a max-subpattern hit
set (Han et al., 1999), which mainly used a
max-subpattern tree structure to store the hit
counts and to represent candidate patterns, and
then all partial periodic patterns could be found
from the built tree structure. Afterward, most
of existing studies related to partial periodic
pattern mining directly adopted Han et al.’s
max-subpattern hit set algorithm to various
data applications.
To efficiently find partial periodic patterns,
Yang et al. (2013) proposed a projection-based
partial periodic pattern mining algorithm (abbreviated as PPA) to find partial periodic patterns
with the consideration of only single event in
a time stamp. Chen et al. (2011) applied FP-
tree’s concept (Han et al., 2000) to proposed
PFP-growth partial periodic pattern mining
algorithm with multiple minimum supports. In
both Yang et al.’s study and Chen et al.’s study,
their algorithms adopted the ‘cycle’ encoding
method used in Han et al.’s study (1998) to reencode events into new event representation by
the positions of the events in the corresponding
periodic segments. However, in terms of finding
partial periodic patterns with consideration of
multiple events in a time stamp, a great deal
of unpromising candidates was still generated
in mining. Thus, it is desirable to effectively
handle the problem of partial periodic pattern
mining with the consideration of multiple events
in a time stamp.
Based on the above reasons, this work thus
presents an efficient projection-based candidate
reduction algorithm (abbreviated as PRA) to find
partial periodic patterns in a sequence of event
sets (instead of a single event). Different from
the PPA algorithm in Yang et al.’s study (2013),
two effective strategies, pruning and filtering,
are designed to reduce unpromising candidates
in mining. With the help of the two strategies,
all partial periodic patterns can efficiently
be explored by the proposed PRA algorithm
from the set of period sub-sequences. Finally,
the experimental results on synthetic and real
datasets reveal that the proposed PRA algorithm
outperforms the other compared algorithms,
traditional max-subpattern algorithm and the
PFP-growth algorithm, in terms of execution
efficiency.
The remaining parts of this paper are organized as follows. The related works are reviewed
in Section 2. The problem to be solved and the
related definitions are described in Section 3.
The detail steps of proposed PRA algorithm with
the consideration of a specific period length and
multiple events in a time stamp is described in
Section 4. A simple example is given to illustrate
the execution procedure of the proposed PRA
algorithm in Section 5. The experimental results
on generated datasets are revealed in Section
6, while conclusions and suggestions for future
work are given in Section 7.
Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
19 more pages are available in the full version of this
document, which may be purchased using the "Add to Cart"
button on the publisher's webpage:
www.igi-global.com/article/an-efficient-pruning-and-filteringstrategy-to-mine-partial-periodic-patterns-from-a-sequenceof-event-sets/110384
Related Content
The Importance of Data Within Contemporary CRM
Diana Luck (2009). Data Mining Applications for Empowering Knowledge Societies
(pp. 96-109).
www.irma-international.org/chapter/importance-data-within-contemporarycrm/7548/
Contextualized Text OLAP Based on Information Retrieval
Lamia Oukid, Nadjia Benblidia, Fadila Bentayeb, Ounas Asfari and Omar Boussaid
(2015). International Journal of Data Warehousing and Mining (pp. 1-21).
www.irma-international.org/article/contextualized-text-olap-based-oninformation-retrieval/125648/
An Association Rules Based Approach to Predict Semantic Land Use
Evolution in the French City of Saint-Denis
Asma Gharbi, Cyril de Runz, Sami Faiz and Herman Akdag (2014). International
Journal of Data Warehousing and Mining (pp. 1-17).
www.irma-international.org/article/an-association-rules-based-approach-topredict-semantic-land-use-evolution-in-the-french-city-of-saint-denis/110383/
SQL-Based Fuzzy Query Mechanism Over Encrypted Database
Zheli Liu, Jingwei Li, Jin Li, Chunfu Jia, Jun Yang and Ke Yuan (2014). International
Journal of Data Warehousing and Mining (pp. 71-87).
www.irma-international.org/article/sql-based-fuzzy-query-mechanism-overencrypted-database/117159/
Possibilities, Impediments, and Challenges for Network Security in Big Data
Anuj Kumar Dwivedi and O. P. Vyas (2016). Effective Big Data Management and
Opportunities for Implementation (pp. 94-103).
www.irma-international.org/chapter/possibilities-impediments-and-challenges
-for-network-security-in-big-data/157687/