Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LTS: Discriminative Subgraph Mining by Learning from Search History Ning Jin , Wei Wang ICDE 2011 Outlines Motivation Objectives Methodology Experiments Conclusions 2 Motivation Complex structures in many scientific applications can be represented by graphs, and many data mining and database problems in graph databases, such as graph indexing, graph classification, need discriminative subgraph patterns. Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. 3 Objectives Discriminative subgraph pattern mining solved in one of two ways: one is a greedy approach the other is a branch-and-bound approach. greedy approach attempting to reach local optimal subgraph as fast as possible branch-and-bound approach prunes the search space using an estimated upper-bound of the scores. LTS (Learn To Search) algorithm, which integrates both approaches with novel probing and pruning techniques. 4 Methodology Fast-probe algorithm Upper-bound estimation algorithm LTS algorithm 5 Definitions Graph A graph is denoted as nodes and E is a set of edges. , where V is a set of positive graph set and a negative graph set we assume that the positive set is the interesting set denoted as and the negative set is the decoy denoted as . 6 Cont. Subgraph Isomorphism The label of node the label of an edge for two graphs such that is denoted by and is denoted by there is an injection is a supergraph of 7 , or , then is a subgraph of supports . Cont. Frequency Given a graph set , the frequency of a subgraph pattern is defined as: 8 Cont. Discrimination Score The more discriminative the pattern, the larger the discrimination score.we define the discrimination score as 9 Cont. Lineage lineage of pattern is a sequence of patterns: , can be directly extended from Score record the score record for is a sequence of scores for the patterns in the lineage: 10 Fast-probe algorithm maintains a list of candidate subgraph patterns to generate a good sample of discriminative subgraphs to facilitate the subsequent branch-and-bound search. candidate list is initialized with all single-edge subgraph patterns in . for each graph in , update the optimal pattern and optimal score for . Add one more edge and repeat , terminates when the candidate list becomes empty. Get optimal pattern for each in . 11 12 Upper-bound estimation algorithm Discriminative subgraph mining process always generates many score records, which can be organized into a prefix tree, called prediction tree. Root node is labelled with 0.0, each tree node is also associated with the maximum score in the sub-tree rooted at this node. the maximum score at each tree node is an estimated upper-bound in the search space. 13 14 LTS algorithm LTS first uses fast-probe to collect score records and 15 generates search history , which includes a of score records and a . LTS utilizes a vector F to keep track of the optimal pattern for each positive graph. stores the optimal pattern for positive graph . Candidate list is optimal pattern for each in by fast-probe algorithm. LTS updates if positive graph supports and is greater than . Terminates when the candidate list becomes empty. return the optimal pattern in . 16 Experiments 17 Conclusions In this paper, we investigate the feasibility of estimating upper-bound for discrimination scores of subgraph patterns in discriminative subgraph mining by learning from search history. In the more complex protein datasets, LTS can significantly improve classification accuracy by the branch-and-bound search following fast-probe. 18