Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Efficient Mining of Intertransaction Association Rules Advisor:Dr. Hsu Graduate: Keng-Wei Chang Author:Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING VOL. 15, NO. 1, JANUARY/FEBRUARY 2003 Intelligent Database Systems Lab N.Y.U.S.T. I.M. Outline Motivation Objective Introduction PROBLEM DESCRIPTION PRINCIPLES OF INTERTRANSACTION ASSOCIATION MINNING METHODS THE FITI ALGORITHM PERFORMANCE STUDY CONCLUSION PERSONAL OPINION Intelligent Database Systems Lab Intratransaction v.s. Intertransaction Intratransaction same transaction same customer same day Intertransaction different transaction Intelligent Database Systems Lab Motivation Most of the previous studies on mining association rules are on mining intratransaction associations. same transaction same customer same day Intelligent Database Systems Lab Objective we break the barrier of transactions and extend the scope of mining association rules from traditional single-dimensional, intratransaction associations to multidimensional, intertransaction associations. Intelligent Database Systems Lab Introduction Most of the pervious studies… There is an important form of association rules which are useful but could not discovered with existing association rule mining framework. Intelligent Database Systems Lab Introduction Example:stock R1: When the prices of IBM and SUN go up, at 80 percent of probability the price of Microsoft goes up on the same day. R2: If the prices of IBM and SUN go up, Microsoft’s will most likely(80 percent of probability) go up the next day and then drop four days later. Intelligent Database Systems Lab Introduction It is different from mining sequential patterns in transaction data. Contribution an association mining problem that is more general than what has been discussed in the literature. Developed an efficient algorithm for mining intertransaction association rule from large databases. (FITI) Intelligent Database Systems Lab PROBLEM DESCRIPTION Definition Let e1 , e2 ..., eu be a set of literals, called items. Let D be an attribute and Dom(D) be the domain of D. A transaction database is a database containing transactions in the form of (d, E), where d Dom ( D) and E Definition 1. 2. A sliding window W in a transaction database T is a block of w continuous intervals along domain D, starting from interval d 0 such that T contains a transaction at interval d 0 . Each interval d j in W is called a subwindow of W denoted as W[j], where j d j d 0 . We call j the subwindow number of d j within W. Intelligent Database Systems Lab PROBLEM DESCRIPTION Definition 3. Let W be a sliding window with w intervals and u be the number of literals in ∑ . We define a megatransaction M contained within W to be To distinguish the items in a megatransaction from the items in a traditional transaction, the items in a megatransaction are called extended-items. Intelligent Database Systems Lab PROBLEM DESCRIPTION Definition 4. intratransaction itemset is a set of items A An intertransaction itemset is a set of extended items B such that ei (0) B, 1 i u Definition 5. An intertransaction association rule is an implication of the form X Y , where Intelligent Database Systems Lab PROBLEM DESCRIPTION Definition 6. Let S be the number of transactions in the transaction database. Let Txy be the set of megatransactions that contain a set of extended-items X Y and Tx be the set of megatransactions that contain X. Then, the support and confidence of an intertransaction association rule X Y are defined as Intelligent Database Systems Lab PRINCIPLES OF INTERTRANSACTION ASSOCIATION MINNING METHODS Decomposed into two subproblems: Intelligent Database Systems Lab THE FITI ALGORITHM FITI consists of the following three phasses. Intelligent Database Systems Lab Phase 1 : Mining and Storing Frequent IntraTransaction Itemsets Frequent intratransaction itemsets are first mined using the Apriori algorithm Frequent-Itemsets Linked Table, FILT ItemSet Hash Table 1. Lookup Links 2. Generator and Extension Links 3. Subset Links 4. Descendant Links Intelligent Database Systems Lab Phase 1 : Mining and Storing Frequent IntraTransaction Itemsets Intelligent Database Systems Lab Phase 2 : Database Transformation FITI is to transform the database into a set of Encoded Frequent-Itemset Tables, FIT Intelligent Database Systems Lab Intelligent Database Systems Lab Phase 3 : Mining frequent Intertrasaction Itemsets Intelligent Database Systems Lab PERFORMANCE STUDY Generation of Synthetic Data Relative Performance Effect of Increasing Maxspan Effect of Increasing the Number of Transactions Effect of Increasing the Average Transaction Size Experiment on Real-Life Data Intelligent Database Systems Lab Generation of Synthetic Data TABLE4 Parameter Settings Intelligent Database Systems Lab Relative Performance Compare the performance of EH-Apriori and FITI vary the support level on data sets one and two maxspan equal to four and six Intelligent Database Systems Lab Relative Performance Intelligent Database Systems Lab Effect of Increasing Maxspan vary maxspan from two to eight fro both data sets Intelligent Database Systems Lab Effect of Increasing the Number of Transactions Vary the number of transactions in dataset one from 10k to 1,000k increase linearly Intelligent Database Systems Lab Effect of Increasing the Average Transaction Size Increase the average transaction size of data set one from five to 20 Intelligent Database Systems Lab Experiment on Real-Life Data Intelligent Database Systems Lab CONCLUSION Intorduce a new problem of mining intertransaction association rules and provided an extensive study of it. EH-Apriori and FITI FITI proves to be much faster than EH-Apriori Performance is acceptable for real life application Useful in providing prediction capability along a single dimension. Intelligent Database Systems Lab