Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014), pp.21-32 http://dx.doi.org/10.14257/ijseia.2014.8.1.02 Fast Determination of Items Support Technique from Enhanced Tree Data Structure Zailani Abdullah1, Tutut Herawan2, A. Noraziah3 and Mustafa Mat Deris4 1 Department of Computer Science, Universiti Malaysia Terengganu 21030 Kuala Terengganu, Terengganu, Malaysia 2 Department of Mathematics Education, Universitas Ahmad Dahlan Jalan Prof Dr Soepomo 55166, Yogyakarta, Indonesia 3 Faculty of Computer Systems and Software Engineering, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Kuantan Pahang, Malaysia 4 Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat 86400, Johor, Malaysia [email protected], [email protected], [email protected], [email protected] Abstract Frequent Pattern Tree (FP-Tree) is one of the famous data structure to keep frequent itemsets. However when the content of transactional database is modified, FP-Tree must be reconstructed again due to the changes in patterns and items support. Until this recent, most of the techniques in frequent pattern mining are using the original database to determine the items support and not from their recommended trees data structure. Therefore in this paper, we proposed a technique called Fast Determination of Item Support Technique (F-DIST) to capture the items support from our suggested Disorder Support Trie Itemset (DOSTrieIT) data structure. Experiments with the UCI datasets show that the processing time to determine the items support using F-DIST from DOSTrieIT is outperformed the classical FP-Tree technique. Furthermore, the processing time to construct a complete tree data structure for DOSTrieIT is lesser than the benchmarked CanTree data structure. Keywords: Association Rules; Frequent Pattern; Tree Structure; Fast Technique 1. Introduction For the past decades, frequent pattern mining has received a lot of attention and exploration [1, 2, 3, 20, 22-27]. It was first introduced by Agrawal et al. [4] and still continuing as an active research in data mining community. Until now, more than hundreds of research papers have been published including the development of new or modification of algorithms. Generally, the main problem in frequent patterns mining is how to efficiency manage the huge data in computer's memory. As a result, frequent pattern tree (FP-Tree) [1] has been proposed and became one of the alternative data structure to store the vast transactional in compressed manner. Since that, several variations of constructing or updating the FP-Tree have been proposed and discussed in the literature [1, 5-11]. However, there are still two major drawbacks encountered from the past studies. First, when the existing transactions in database are updated, the current FP-Tree must be rebuilt again from the beginning. Second, in order to reconstruct FP-Tree, all items support will be recounted again from the original database due to the changes in frequent items. Therefore, in order to improve the processing time of capturing the items support, an enhanced tree data structure called ISSN: 1738-9984 IJSEIA Copyright ⓒ 2014 SERSC International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) Disorder Support of Trie Itemset (DOSTrieIT) and Fast Determination of Item Support Technique (F-DIST) are proposed and experimented with four Frequent Itemset Mining Datasets from Repository [21]. The performance analysis between F-DIST and the benchmarked FP-Tree technique was performed in determining the items support. In summary, there are three main contributions from this work. First, we propose a novel, complete and incremental pattern tree data structure, DOSTrieIT that can keep the entire transactional database. Second, we embed a feature called Single Item Without Support (SIWE) in DOSTrieIT to speed up the process of capturing the items support. Lastly, we suggest the F-DIST as a technique to efficiently determine the items support from DOSTrieIT. The paper structure is organized as follows. Section 2 explains the related works. In Section 3, the basic concept and terminology in association rules is discussed. Section 4 elaborates the proposed methods. Detail discussions of the experiments are reported in Section 5. Finally, Section 6 concludes the paper. 2. Related Works Since the introduction by Agrawal et al. in 1993 [4], frequent pattern mining has been received a great deal of attentions from data mining researchers [1, 3]. Thus, more than hundreds of papers have been published in an attempt to increase its efficiencies and scalabilities. In general, the algorithms for mining the frequent itemset could be classified into three; Apriori-like algorithms, frequent pattern-based algorithms and algorithms that use the vertical data format. Due to the problem of two nontrivial costs in Apriori [12]; cost of generating candidate itemsets and cost of repeatedly scanning the database, the frequent pattern based algorithms without candidate itemsets have been proposed. This method constructs a compact data structure known as FP-Tree [1] from the original transaction database. Typically, before constructing the prefix path in FP-Tree, the items in the transaction must be sorted in support descending order and also must be satisfied the minimum support threshold. In addition, the construction of FP-Tree is carried out through offline. Since the idea of FP-Tree, there are abundant researches have been put forward such as H-Mine [13], PatriciaMine [14], FPgrowth* [15], SOTrieIT [16], AFOPF [5], AFPIM [6], EFPIM [7], CATS-Tree [8], CanTree [17], FUFP-Tree [9], CP-Tree [18], BSM [19] and BIT [11]. Due to the limitations faced by FP-Growth algorithm, the H-Mine [13], PatriciaMine [14] and FPgrowth* are proposed. H-Mine and FPgrowth* use array-based technique to speed up the mining process. The AFPIM [6], EFPIM [7] and FUFP-Tree [9] algorithms use a compact data structure to perform incremental mining. The updated database can be obtained by adjusting the FP-tree according to the latest changes in the transactions. However, these approaches still require two database scans to construct the FP-tree structure and to update the tree structure. The limitations of AFPIM and EFPIM are properly addressed by CATS-Tree [17] that requires only one database scan. Indeed, CATS-Tree scans only the updated portion in database rather than the whole updated database. However, the tree construction process is very complicated and it is only suitable for static database. The drawbacks of CATS-Tree are well addressed by CanTree [17] that requires only simple tree construction and based on divide-and-conquer to mine the frequent patterns. However, the compactness of CanTree is not similar to FP-Tree due to not the items in the tree are not stored in support descending order. CP-Tree [18] and BSM [19] is proposed in an attempt to mitigate the limitation of CanTree. The compact prefix-tree structure is constructed with one database scan and the items in the tree are also arranged as similar to FP-Tree. However, the construction process of CanTree is quite complicated and still questionable due to overhead cost in adjusting the tree structure when the portion of database is updated. BIT [11] algorithm merges two small 22 Copyright ⓒ 2014 SERSC International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) consecutive duration FP-Trees to obtain the final FP-Tree before continuing the mining process. However, the process of merging two set of FP-Trees is very complex and unclear in term of memory consumption. Furthermore, it is only suitable for batch processing and not suitable at all for incremental mining. 3. Association Rules Throughout this section the set I i1 , i 2 , , i A , for A 0 refers to the set of literals called set of items and the set D t1 , t 2 , , t U , for U 0 refers to the data set of transactions, where each transaction t D is a list of distinct items t i1 , i 2 , , i M , 1 M A and each transaction can be identified by a distinct identifier TID. Definition 1. A set X I is called an itemset. An itemset with k-items is called a kitemset. Definition 2. The support of an itemset X I , denoted supp X is defined as a number of transactions contain X. Definition 3. Let X , Y I be itemset. An association rule between sets X and Y is an implication of the form X Y , where X Y . The sets X and Y are called antecedent and consequent, respectively. Definition 4. The support for an association rule X Y , denoted supp X Y , is defined as a number of transactions in D contain X Y . Definition 5. The confidence for an association rule X Y , denoted conf X Y is defined as a ratio of the numbers of transactions in D contain X Y to the number of transactions in D contain X. Thus conf X Y supp X Y supp X Definition 6. An itemset X is called frequent item if supp X , where is the minimum support. The set of frequent item will be denoted as Frequent Items and Frequent Item X I | supp X 4. Proposed Model 4.1. Definition In order to easily comprehend the whole process in DOSTrieIT, some required definitions together with a sample transactional data are presented. Copyright ⓒ 2014 SERSC 23 International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) Definition 7. Disorder Support Trie Itemset (DOSTrieIT) is defined as a complete tree data structure in canonical order of itemsets. The order of itemset is not based on the support descending order. DOSTrieIT contains n-levels of tree nodes (items) and their support. Moreover, DOSTrieIT is constructed in online manner and for the purpose of incremental pattern mining. Example 1. Let T 1,2,5, 2,4, 2,3, 1,2,4, 1,3, 2,3,6, 1,3, 1,2,3,5, 1,2,3. A step by step to construct DOSTrieIT is explained in the next section. Graphically, an item is represented as a node and its support is appeared nearby to the respective node. A complete structure of DOSTrieIT is shown as in Figure 1. Figure 1. DOSTrieIT and SIWE Path Arranged in Support Descending Order Definition 8. Single Item without Extension (SIWE) is a prefix path in the tree that contains only one item or node. SIWE is constructed upon receiving a new transaction and as a mechanism for fast searching of single item support. It will be employed during tree transformation process but it will not be physically transferred into the others tree. Example 2. From Example 1, the transactions have 6 unique items and it is not sorted in any order. In Figure 2, SIWE for DOSTrieIT i.e., SIWE 2,1,3,4,5,6 Proposition 1. (Instant Support of Single Items Property). For any item ai, the items support is instantly obtained from the 1-level of DOSTrieIT. All these items or nodes have no extension or also known as SIWE. Justification. Let single item a1 , a 2 , , a n , from Definition 8, Single Item without Extension (SIWE) a1 , a 2 , , a n is a prefix path in the tree that contains only one item or node. In this case a1 , a 2 , , a n is constructed upon receiving a new transaction. To this we can accelerate the process of updating and/or searching a support of a1 , a 2 , , a n . The trie-traversal for examining a support of a1 , a 2 , , a n is truncated once it reaches at the last of single items without extension. It will be employed during tree transformation process but it will not be physically transferred into the FP-Tree. 24 Copyright ⓒ 2014 SERSC International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) Example 3. Let examine the sample transaction from Example 1. The items support of 2, 1, 3, 4, 5, 6 is 7, 6, 5, 2, 2, 1 respectively. For CanTree and CATS-Tree, the support information is only can be captured after scanning 9 lines of transactions. However, the similar information can be easily determined from DOSTrieIT via SIWE as shown in Figure 1. The items support is obtained in DOSTrieIT by traversal in the trie and immediately stopped after no more single items without extension is found. 4.2. Activity Diagrams Activity diagram is employed in visualizing the details processes of constructing DOSTrieIT. It is one of the prominent diagrams in Unified Modeling Language (UML) to graphically represent the workflows of stepwise with support of choice (condition), iteration (loop) and concurrency. Figure 2 and Figure 3 show the activity diagrams for FDIST and constructing DOSTrieIT data structure, respectively. Figure 2. An Activity Diagram for F-DIST Figure 3. An Activity Diagram for DOSTrieIT 4.2. Pseudocode Development Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm. The main purpose of pseudocode is to Copyright ⓒ 2014 SERSC 25 International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) comprehend the detailed processes in coding the F-DIST and DOSTrieIT based on C# programming language. Figures 4 and 5 depict the pseudocode for constructing F-DIST and DOSTrieIT, respectively. F-DIST Pseudocode 1: 2: 3: 4: 5: 6: 7: Read DOSTrieIT Dowhile (prefixPath DOSTrieIT != eof) If SIWE != eof Then Get SIWE Display SIWE Endif EndDo Figure 4. Pseudocode for F-DIST DOSTrieIT Pseudocode 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: Read Transaction If DOSTrieIT != null Then Read DOSTrieT Else Initialize DOSTrieIT Endif Dowhile (line Transaction != eof) Dowhile (prefixPath DOSTrieIT != eof) If line != prefixPath Then Insert new SIWE Insert new prefixPath in DOSTrieIT Else If new prefixPath current prefixPath Then Update current SIWE Else If new prefixPath current prefixPath Then Update support in current prefixPath Update current SIWE Else If new prefixPath current prefixPath Then Update support in current prefixPath Update current SIWE Insert new SIWE Insert new prefixPath in DOSTrieIT Endif Endif Endif Endif Enddo Enddo Figure 5. Pseudocode for DOSTrieIT 5. Experimental Setup In this section, we do comparison tests between F-DIST and benchmarked FP-Tree technique. The performance analysis was carried out by comparing the computational 26 Copyright ⓒ 2014 SERSC International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) time required to read and extract the items support. We conducted our experiment in four benchmarked datasets. The experiment was performed on Intel® Core™ 2 Quad CPU at 2.33GHz speed with 4GB main memory, running on Microsoft Windows Vista. All coding have been developed using C# as a programming language. Four benchmarked datasets from Frequent Itemset Mining Dataset Repository [21] were employed in the experiment. The first dataset was Retails and it contains the retail market basket data from an anonymous Belgian retail store. For the second experiment, synthetic dataset T10I4D100K was used. It is a sparse dataset. In this dataset, the frequent itemsets are short but they are not abundant. The third benchmarked dataset was Mushroom. This is a dense dataset and consists of 23 species of gilled mushrooms in the Agaricus and Lepiota Family. The fourth and last benchmarked dataset was Chess. The chess dataset contain different game configurations, where a pawn on a7 is one square away from the queen. The task is to determine whether the White can win or not. The fundamental characteristics of the datasets are depicted in Table 1. Table 1. Fundamental Characteristics of Datasets Data sets Retails T10I4D100K Mushrooms Chess Size 4.153 MB 3.83MB 0.54MB 0.33MB #Trans 88,136 100,000 8,124 3,196 #Items 16,471 1000 119 76 Average length 10 10 23 37 Figure 6 shows the comparison between F-DIST and FP-Tree technique in term of duration taken (or processing time) to capture the items support. In overall, duration to determine the items support using F-DIST was less than FP-Tree technique. For Retails datasets, processing time via F-DIST technique was 6.24 times (83.97%) faster than FPTree technique. In term of T10I4D100K dataset, F-DIST technique was faster at 258.37 times (99.61%) as compared to FP-Tree technique. The processing time for Mushroom dataset based on F-DIST technique was 5.75 times (82.60%) better than FP-Tree technique. Finally, for the last dataset (Chess), the processing time employed by F-DIST was 13.88 times (92.79%) faster than FP-Tree technique. In summary, the average duration to determine the items support by F-DIST was 71.06 times (89.74%) better than FP-Tree technique. Figure 6. Performance Analysis for Determining the Items Support using Different Datasets In this experiment, the performance of constructing the DOSTrieIT and CanTree using different datasets was organized and presented collectively. For duration measure, Copyright ⓒ 2014 SERSC 27 International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) millisecond with Logarithmic scale view was employed. Figure 7 depicted the performance analysis for both data structures against four benchmarked datasets. Figire 7. Performance Analysis in Constructing Different Tree Data Structures Against Different Datasets In overall, DOSTrieIT construction required less time than CanTree construction. For Retail dataset, construction of DOSTrieIT was 1.63 times (38.74%) faster than CanTree. The performance to construct both trees via T10I4D100K was not much different but DOSTrieIT is still better at 1.01 times (1.25%) than CanTree. For the dataset Mushroom, the performance of constructing DOSTrieIT was better than CanTree, and it is up to 4.13 times (75.78%). Finally, for the dataset Chess, performance of constructing DOSTrieIT was 6.35 times (84.25%) faster than CanTree. In summary and based on the combination of three datasets, DOSTrieIT was outperformed at 3.28 times (50.01%) than CanTree. 6. Conclusion FP-Tree is a crucial and compact data structure for generating the frequent itemsets. However for incremental pattern mining, the latest items support must be recalculated again before reconstructing the FP-Tree. This is due to the changes occurred in the items and patterns support. At the moment, most of the tree-based techniques are still depend on the original dataset rather than their own tree data structure. Thus, it is a necessity to incorporate the feature of single items support in the tree. Therefore, in the paper we proposed a technique called F-DIST to determine the items support from our suggested DOSTrieIT data structure. We do experiment with serveral Frequent Itemset Mining Dataset Repository [21] datasets and found that our proposed technique is outperformed at 71.06 times (89.74%) faster than benchmarked FP-Tree technique. Moreover, the processing time to construct a complete set of DOSTrieIT data structure is 3.28 times (50.01%) better than CanTree. 28 Copyright ⓒ 2014 SERSC International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] J. Han, H. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation”, Proceeding of the 2000 ACM SIGMOD, (2000), pp. 1-12. Z. Zheng, R. Kohavi and L. Mason, “Real World Performance of Association Rule Algorithms”, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM Press, (2001), pp. 401-06. J. Han and J. Pei, “Mining Frequent Pattern without Candidate Itemset Generation: A Frequent Pattern Tree Approach”, Data Mining and Knowledge Discovery, vol. 8, (2004), pp. 53-87. R. Agrawal, T. Imielinski and A. Swami, “Database Mining: A Performance Perspective”, IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 6, (1993), pp. 914-925. G. Liu, H. Lu, W. Lou, Xu and J. X. Yu, “Efficient Mining of Frequent Patterns using Ascending Frequency Ordered Prefix-Tree”, Data Mining and Knowledge Discovery, vol. 9, (2004), pp. 249-274. J-L. Koh and S-F. Shieh, “An Efficient Approach for Maintenance Association Rules Based on Adjusting FP-Tree Structure”, Proceeding of the 2004 International Conference on Database Systems for Advanced Applications, (2004), pp. 417-424. X. Li, X. Deng and S. Tang, “A Fast Algorithm for Maintenance of Association Rules in Incremental Databases”, Proceeding of International Conference on Advance Data Mining and Applications, (2006), pp. 56-63. W. Cheung and O. R. Zaïane, “Incremental Mining of Frequent Patterns without Candidate Generation of Support Constraint”, Proceeding of the 7th International Database Engineering and Applications Symposium (IDEAS’2003), (2003). T.-P. Hong, J.-W. Lin and Y-L. We, “Incrementally Fast Updated Frequent Pattern Trees”, Expert Systems with Applications, vol. 34, no. 4, (2008), pp. 2424-2435. S. K. Tanbeer, C. F. Ahmed, B. S. Jeong and Y. K. Lee, “Efficient Single-Pass Frequent Pattern Mining Using a Prefix-Tree”, Information Science, vol. 279, pp. 559-583. S. G. Totad, R. B. Geeta and P. P. Reddy, “Batch Processing for Incremental FP-Tree Construction”, International Journal of Computer Applications, vol. 5, no. 5, (2010), pp. 28-32. R. Agrawal and J. Shafer, “Parallel Mining of Association Rules: Design, Implementation, and Experience”, IEEE Transaction Knowledge and Data Engineering, vol. 8, (1996), pp. 962-969. J. Pei, J. Han, H. Lu, S. Nishio, S. Tang and D. Yang, “Hmine: Hyper-Structure Mining of Frequent Patterns in Large Databases”, Proceedings of IEEE International Conference on Data Mining, (2001), pp. 441-448. P. and D. Zandolin, “Mining Frequent Item sets Using Patricia Tries”, Proceedings of the ICDM’03, (2003). G. Grahne and J. Zhu, “Efficiently using prefix-trees in mining frequent itemsets”, Proceeding of FIMI’03, (2003). Y. K. Woon, W. K. Ng and E. P. Lim, “A Support Order Trie for Fast Frequent Itemset Discovery”, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 7, (2004), pp. 875-879. C. K.-S. Leung, Q. I. Khan, Z. Li and T. Hoque, “CanTree: A Canonical-Order Tree for Incremental Frequent-Pattern Mining”, Knowledge Information System, vol. 11, no. 3, (2007), pp. 287-311. S. K. Tanbeer, F. A. Chowdhury, B. S. Jeong and Y.-K. Lee, “CP-Tree: A Tree Structure for SinglePass Frequent Pattern Mining”, T. Washio et al. (Eds.): PAKDD’08, Lecture Notes in Artificial Intelligence, Springer, Heidelberg, vol. 5012, (2008), pp. 1022-1027. S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong and Y.-K. Lee, “Sliding Window-based Frequent Pattern Mining Over Data Streams”, Information Sciences, vol. 179, (2009), pp. 3843-3865. R. Ivancsy and I. Vajk, “Fast Discovery of Frequent Itemsets: a Cubic Structure-Based Approach”, Informatica (Slovenia), vol. 29, no. 1, (2005), pp. 71-78. Frequent Itemset Mining Dataset Repository, http://fimi.ua.ac.be/data/. T. Herawan and M. M. Deris, “A soft set approach for association rules mining”, Knowledge Based Systems, vol. 24, no. 1, (2011), pp. 186-195. T. Herawan, Z. Abdullah, A. Noraziah, M. M. Deris and J. H. Abawajy, “EFP-M2: Efficient Model for Mining Frequent Patterns in Transactional Database”, N.T. Nguyen et al. (Eds.): ICCCI 2012, Lecture Notes in Computer Science, Springer-Verlag, vol. 7654, (2012), pp. 29-38. T. Herawan, Z. Abdullah, A. Noraziah, M. M. Deris and J. H. Abawajy, “IPMA: Indirect Patterns Mining Algorithm”, N.T. Nguyen et al. (Eds.): ICCCI 2012, Advanced Methods for Computational Collective Intelligence Studies in Computational Intelligence, Springer-Verlag, vol. 457, (2013), pp. 187-196. T. Herawan and Z. Abdullah, “CNAR-M: A Model for Mining Critical Negative Association Rules”, Zhihua Cai et al. (Eds): ISICA 2012, Communications in Computer and Information Science, SpringerVerlag, vol. 316, (2012), pp. 170-179. Copyright ⓒ 2014 SERSC 29 International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) [26] T. Herawan, P. Vitasari and Z. Abdullah, “Mining critical least association rules of student suffering language and social anxieties”, International Journal of Continuing Engineering Education and Life Long Learning, vol. 23, no. 2, (2013), pp. 128-146. [27] Z. Abdullah, T. Herawan and M. M. Deris, “Tracing Significant Information using Critical Least Association Rules Model”, International Journal of Innovative Computing and Applications, vol. 5, no. 1, (2013), pp. 3-17. Authors Zailani Abdullah received his Ph.D from University Tun Hussein Onn Malaysia, (UTHM) in 2012. He has published more than 40 research papers in journal and conference proceedings. He has served as a Co-Chair for the 4th Malaysian Software Engineering Conference 2008 (MySEC 2008), Committee Member for The 16th Asia-Pacific Software Engineering Conference (APSEC 2009), Committee Member for The 2nd KnowledgeGrid Malaysia Forum 2009 and Program Committee Member for The 3rd International Conference on Computer Systems and Software Engineering (ICSECS 2013). He is also Microsoft® Certified Technology Specialist (MCTS), .NET Framework 3.5, ASP.NET Application and Oracle Database 11g Administrator Certified Associate. He has appointed as editorial board member for Journal of Computational Intelligence and Electronic Systems (JCIES) and reviewer for World Applied Science Journal, Aceh International Journal of Science and Technology (AIJST), Global Perspective on Engineering Management (GPEM) and Mosharaka for Researches and Studies, respectively. His research interests include database, data mining and web-based applications. Tutut Herawan received a B.Ed degree in year 2002 and M.Sc degree in year 2006 degree in Mathematics from Universitas Ahmad Dahlan and Universitas Gadjah Mada Yogyakarta Indonesia, respectively. He obtained a PhD in Theoretical Data Mining from Universiti Tun Hussein Onn Malaysia in year 2010. Currently, he is a lecturer with Department of Mathematics Education, Universitas Ahmad Dahlan, Indonesia. He currently supervises four PhD and had successfully co-supervised two PhD students and published more than 120 papers in various international journals and conference proceedings. He has appointed as an editorial board member for IJDTA, TELKOMNIKA, IJNCAA, IJDCA and IJDIWC. He is also been appointed as a reviewer of several international journals such as Knowledge-Based Systems, Information Sciences, European Journal of Operational Research, Applied Mathematics Letters, and guest editor for several special issues of international journals. He has served as a program committee member and co-organizer for numerous international conferences/workshops including Soft Computing and Data Engineering (SCDE 2010-2011 at Korea, SCDE 2012 at Brazil), ADMTA 2012 Vietnam, DTA 2011-2012 at Korea, DICTAP 2012 at Thailand, ICDIPC 2012 at Lithuania, DEIS 2012 at Czech Republic, NDT 2012 at Bahrain, ICoCSIM 2012 at Indonesia, ICSDE’2013 at Malaysia, ICSECS 2013 at Malaysia, SCKDD 2013 at Vietnam and many more. His research area includes Knowledge Discovery in 30 Copyright ⓒ 2014 SERSC International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) Databases, Educational Data Mining, Decision Support in Information System, Rough and Soft Set theory. Noraziah Ahmad received Ph.D in Distributed Database from University Malaysia Terengganu (UMT) in 2007. She has published more than 150 papers in the journals and conference proceedings. Currently, she is currently an associate professor at Faculty of Computer Systems & Software Engineering, University Malaysia Pahang. In addition to serving as international program committee member and reviewers in many conferences, she is currently an editorial board members of the International Journal of Engineering and Technology (IJET), International Journal of Web Application (IJWA) and Journal of Emerging Technologies in Web Intelligence (JETWI); a member of IEEE Computer Society, International Association of Engineers (IAENG), World Academy of Science, Engineering and Technology (WASET), Malaysian National Computer Confederation (MNCC) and Senior member of International Association of Computer Science Information Technology (IACSIT). Mustafa Mat Deris received PhD from University Putra Malaysia in 2002. He is a professor of computer science in the Faculty of Computer Science and Information Technology, UTHM, Malaysia. He has successfully supervised ten PhD students and currently he is supervising six PhD students and published more than 170 papers in journals and conference proceedings. He has appointed as editorial board member for Journal of Next Generation Information Technology, JNIT, Korea, and Encyclopedia on Mobile Computing and Commerce, Idea Group, USA, Guest editor of International Journal of BioMedical Soft Computing and Human Science for Special Issue on “Soft Computing Methodologies and Its Applications” a reviewer of several international journals such as IEEE Transaction on Parallel and Distributed Computing, Journal of Parallel and Distributed Databases, Journal of Future Generation on Computer Systems, Journal of Information Sciences, Elsevier, Journal of Cluster Computing, Kluwer, and Journal of Computer Mathematics, Taylor & Francis, UK. He has served as a program committee member and co-organizer for numerous international conferences/workshops including Grid and Peer-to-Peer Computing, (GP2P 2005, 2006), Autonomic Distributed Data and Storage Systems Management (ADSM 2005, 2006, 2007), and Grid Pervasive Computing Security, organizer for workshops on Rough and Soft Sets Theories and Applications (RSAA 2010), Fukuoka, Japan, and Soft Computing and Data Engineering (SCDE) (2010, 2011, Korea), (2012, Brazil). His research interests include distributed databases, data grid, data mining and soft computing. Copyright ⓒ 2014 SERSC 31 International Journal of Software Engineering and Its Applications Vol.8, No.1 (2014) 32 Copyright ⓒ 2014 SERSC