Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
18 International Journal of Data Warehousing and Mining, 10(2), 18-38, April-June 2014 An Efficient Pruning and Filtering Strategy to Mine Partial Periodic Patterns from a Sequence of Event Sets Kung-Jiuan Yang, Department of Information Management, Fortune Institute of Technology, Daliao District, Kaohsiung, Taiwan Tzung-Pei Hong, Department of Computer Science and Information Engineering, National University of Kaohsiung, Nan-Tzu District, Kaohsiung, Taiwan Yuh-Min Chen, Institute of Manufacturing Information and Systems, National Cheng Kung University, Tainan City, Taiwan Guo-Cheng Lan, Computational Intelligence Technology Center, Industrial Technology Research Institute, Hsinchu County, Taiwan ABSTRACT Partial periodic patterns are commonly seen in real-world applications. The major problem of mining partial periodic patterns is the efficiency problem due to a huge set of partial periodic candidates. Although some efficient algorithms have been developed to tackle the problem, the performance of the algorithms significantly drops when the mining parameters are set low. In the past, the authors have adopted the projection-based approach to discover the partial periodic patterns from single-event time series. In this paper, the authors extend it to mine partial periodic patterns from a sequence of event sets which multiple events concurrently occur at the same time stamp. Besides, an efficient pruning and filtering strategy is also proposed to speed up the mining process. Finally, the experimental results on a synthetic dataset and real oil price dataset show the good performance of the proposed approach. Keywords: Data Mining, Partial Periodic Pattern, Projection, Sequential Pattern DOI: 10.4018/ijdwm.2014040102 Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Data Warehousing and Mining, 10(2), 18-38, April-June 2014 19 INTRODUCTION In the past, periodic pattern mining has been one of the important issues in sequential pattern mining due to the discovery of regularity in time series or event sequences. Periodic pattern mining techniques have widely been applied to various business applications, such as analyzing web usage every day, tracing the regularity of company’s stock prices every week, analyzing the product re-order points every month, tracing product sales volume every season, etc (Aref et al., 2004; Elfeky et al., 2004). However, the limitation of mining frequent full periodic patterns is a strict constraint since all events in a full periodic pattern have to be known, and their positions in the pattern are fixed and frequently appeared in the long-term time-series data with a specific periodic length. To solve this problem, Han et al. (1998) thus proposed another kind of periodic pattern mining called partial periodic pattern mining, which considered most but not all points in a period contributing to an approximate cyclic behavior of a time series. For example, “The price of the stock goes up on Monday” is a partial periodicity because it only considers Mondays and says nothing about the price fluctuations. In this example, a unit of the period length is set as a week (seven days), and “the stock price goes up” can be defined as an “event”. To effectively find such patterns, Han et al. subsequently proposed a partial periodic mining algorithm called a max-subpattern hit set (Han et al., 1999), which mainly used a max-subpattern tree structure to store the hit counts and to represent candidate patterns, and then all partial periodic patterns could be found from the built tree structure. Afterward, most of existing studies related to partial periodic pattern mining directly adopted Han et al.’s max-subpattern hit set algorithm to various data applications. To efficiently find partial periodic patterns, Yang et al. (2013) proposed a projection-based partial periodic pattern mining algorithm (abbreviated as PPA) to find partial periodic patterns with the consideration of only single event in a time stamp. Chen et al. (2011) applied FP- tree’s concept (Han et al., 2000) to proposed PFP-growth partial periodic pattern mining algorithm with multiple minimum supports. In both Yang et al.’s study and Chen et al.’s study, their algorithms adopted the ‘cycle’ encoding method used in Han et al.’s study (1998) to reencode events into new event representation by the positions of the events in the corresponding periodic segments. However, in terms of finding partial periodic patterns with consideration of multiple events in a time stamp, a great deal of unpromising candidates was still generated in mining. Thus, it is desirable to effectively handle the problem of partial periodic pattern mining with the consideration of multiple events in a time stamp. Based on the above reasons, this work thus presents an efficient projection-based candidate reduction algorithm (abbreviated as PRA) to find partial periodic patterns in a sequence of event sets (instead of a single event). Different from the PPA algorithm in Yang et al.’s study (2013), two effective strategies, pruning and filtering, are designed to reduce unpromising candidates in mining. With the help of the two strategies, all partial periodic patterns can efficiently be explored by the proposed PRA algorithm from the set of period sub-sequences. Finally, the experimental results on synthetic and real datasets reveal that the proposed PRA algorithm outperforms the other compared algorithms, traditional max-subpattern algorithm and the PFP-growth algorithm, in terms of execution efficiency. The remaining parts of this paper are organized as follows. The related works are reviewed in Section 2. The problem to be solved and the related definitions are described in Section 3. The detail steps of proposed PRA algorithm with the consideration of a specific period length and multiple events in a time stamp is described in Section 4. A simple example is given to illustrate the execution procedure of the proposed PRA algorithm in Section 5. The experimental results on generated datasets are revealed in Section 6, while conclusions and suggestions for future work are given in Section 7. Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 19 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the publisher's webpage: www.igi-global.com/article/an-efficient-pruning-and-filteringstrategy-to-mine-partial-periodic-patterns-from-a-sequenceof-event-sets/110384 Related Content The Importance of Data Within Contemporary CRM Diana Luck (2009). Data Mining Applications for Empowering Knowledge Societies (pp. 96-109). www.irma-international.org/chapter/importance-data-within-contemporarycrm/7548/ Contextualized Text OLAP Based on Information Retrieval Lamia Oukid, Nadjia Benblidia, Fadila Bentayeb, Ounas Asfari and Omar Boussaid (2015). International Journal of Data Warehousing and Mining (pp. 1-21). www.irma-international.org/article/contextualized-text-olap-based-oninformation-retrieval/125648/ An Association Rules Based Approach to Predict Semantic Land Use Evolution in the French City of Saint-Denis Asma Gharbi, Cyril de Runz, Sami Faiz and Herman Akdag (2014). International Journal of Data Warehousing and Mining (pp. 1-17). www.irma-international.org/article/an-association-rules-based-approach-topredict-semantic-land-use-evolution-in-the-french-city-of-saint-denis/110383/ SQL-Based Fuzzy Query Mechanism Over Encrypted Database Zheli Liu, Jingwei Li, Jin Li, Chunfu Jia, Jun Yang and Ke Yuan (2014). International Journal of Data Warehousing and Mining (pp. 71-87). www.irma-international.org/article/sql-based-fuzzy-query-mechanism-overencrypted-database/117159/ Possibilities, Impediments, and Challenges for Network Security in Big Data Anuj Kumar Dwivedi and O. P. Vyas (2016). Effective Big Data Management and Opportunities for Implementation (pp. 94-103). www.irma-international.org/chapter/possibilities-impediments-and-challenges -for-network-security-in-big-data/157687/