Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk ICONIP2002, November 19, 2002 Orchid Country Club, Singapore Outline Background Probabilistic Classifiers Chow-Liu Tree Motivation Large Node Chow-Liu tree Experimental Results Conclusion A Typical Classification Problem Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease. Background a constant for a given instance of A1,A2,…An Probabilistic Classifiers The classification function is defined as: The joint probability is not easily estimated from the dataset; thus the assumption about the distribution has to be made, dependence or independence relationship among variables. Background Chow-Liu Tree (CLT) Assumption: a dependence tree exists among the variables, given the class variable C. A tree dependence structure Background Chow-Liu Tree Advantages Class Variable Comparable with some of the state-of-the-art classifiers. The tree structure enables it a resistance to the over-fitting problem and a decomposition characteristic. Disadvantages It cannot model non-tree dependence relationship among attributes or variables. Motivation Class Variable 1. 2. 3. Class Variable Fig. (b) can represent the same independence relationship as Fig. (a): Given B and E, there is an independence relationship among A, C, and D. Fig. (b) is still a tree structure, which inherits the advantages of a tree. By combining several nodes, a large node tree structure can represent a non-tree structure. This motivates our Large Node Chow-Liu tree approach. Overview of Large Node Chow-Liu Tree (LNCLT) Underlying structure Step 1:draft the ChowLiu tree Step 2:draft the ChowLiu tree The same independence relationship Step 1. Draft the Chow-Liu tree Draft the CL-tree of the dataset according to the CLT algorithm. Step 2. Refine the Chow-Liu tree based on some combination rules Refine the Chow-Liu tree into a large node Chow-Liu tree based on some combination rules Combination Rules Bounded cardinality The cardinality of each large node should not greater than a bound “k”. Frequent Itemsets Each large node should be Frequent itemset. Father-son or sibling relationship The nodes in a large node should be a father-son or sibling relationship. Combination Rules (1) Bounded Cardinality The cardinality of each large node ( the number of nodes in a large node) should not greater than a bound “k”. An example is that: if we set “k” as the number of the attributes or variables, the LNCLT will be a “one large node tree”, which will lose all the merits as a tree. “One node tree” will lose all the merits of the “tree”. Combination Rules (2) Frequent Itemsets Food store example: In a food store, if you buy {bread}, it will be highly possible for you to buy {butter}. Thus {bread, butter} is called a frequent itemset. Frequent Itemset is the set of attributes that occur with each other frequently. Frequent Itemsets are possible “large nodes”, since the attributes in a Frequent Itemset act just like one “attribute”— they occur with each other frequently at the same time. Combination Rules (3) Father-son or sibling relationship Combining Father-son and sibling nodes will increase the data fitness of the tree structure on the datasets (Proved in the paper). Combining Father-son and sibling nodes will maintain the graphical structure as a “tree structure”. Combining non-father or non-sibling nodes may result in a non-tree structure Father-son Combination Sibling Combination Constructing Large Node Chow-Liu Tree 1. Generate the frequent itemsets Call Apriori[AS94] to generate the frequent itemsets, which have the size less than k. Record all the frequent itemsets together with their frequnecy into list L. 2. Draft the Chow-Liu tree Draft the CL-tree of the dataset according to the CLT algorithm. 3. Combine nodes based on Combining rules Iteratively combine the frequent itemset with maximum frequency, which satisfy the combination conditions: father-son or sibling relationship until L is NULL. Example:Constructing LNCLT 1. {A,C} does not satisfy the 2. combination f({B,C}) iscondition, the biggestfilter and 3..satisfies Filtercombination the frequent itemsets out {A,C} 4. which {D,have E } iscoverage thethem frequent with condition, combine satisfies the , theand {D,E} is left. into{B,C} (c)itemset combination condition, Example: combine them into (d) We assume the k is 2, after step 1, we get the frequent itemsets {A, B} {A, C},{B, C}, {B, E}, {B, D}, {D, E}. And f({B, C})>f({A, B})> f({B, E}) >f({B, D})>f({D, E}) (f(*) represents the frequency of frequent itemsets). (b) is the CLT in step2. Experimental Setup Dataset: MNIST-handwritten digit (28*28 gray-level bitmap) database training dataset size: 60000 testing dataset size: 10000 Experimental Environments Platform: win2000 Developing tool: Visual C++ 6.0 Experiments Data fitness Comparison Experiments Data fitness Comparison Minus Log Likelihood in testing dataset Minus Log Likelihood in training dataset 36 31 27 LNCLT 22 CLT (bist/digit) (bits/digit) 32 16 12 11 1 2 3 4 5 digit 6 7 8 9 CLT 21 17 0 LNCLT 26 0 1 2 3 4 5 digit 6 7 8 9 Experiments Recognition Rate Future Work Evaluate our algorithm extensively in other benchmark datasets. Examine other combining rules. Conclusion A novel Large Node Chow-Liu tree is constructed based on Frequent Itemsets. LNCLT can partially overcome the disadvantages of CLT, i.e., inability to represent non-tree structures. We demonstrate that our LNCLT model has a better data fitness and a better prediction accuracy theoretically and experimentally. Main References [AS1994] R. Agrawal, R. Srikant, 1994,“Fast algorithms for mining association rules”, Proc. VLDB-94 1994. [Chow, Liu1968] Chow, C.K. and Liu, C.N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory, 14,(pp462-467) [Friedman1997] Friedman, N., Geiger, D. and Goldszmidt, M. (1997). Bayesian Network Classifiers. Machine Learning, 29,(pp.131-161). [Cheng1997] Cheng, J. Bell, D.A. Liu, W. 1997, Learning Belief Networks from Data: An Information Theory Based Approach. In Proceedings of ACM CIKM’97 [Cheng2001] Cheng, J. and Greiner, R. 2001, Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp.141-151), Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp.141-151). Q&A Thanks.