Download link

Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip S. Yu 2, Jiawei Han 1, Dong-Qing Zhang 3, Xiaohui Gu 2 1 University of Illinois at Urbana-Champaign 2IBM T.J. Watson Research Center 3Thomson Research VLDB ’07, September 23-28, 2007, Vienna, Austria 2007. 12. 28 Summarized by Dongjoo Lee, IDS Lab., Seoul National University Presented by Dongjoo Lee, IDS Lab., Seoul National University Contents  Graph Search  cIndex Basic Framework  Indexing Features  Indexing Model  cIndex-Basic  Hierarchical Indexing Models – cIndex-BottomUp, cIndex-TopDown  Index Maintenances  Experiments  Discussion Copyright  2007 by CEBT 2 Graph Search gc Subgraph  Traditional Graph Search  q Supergraph  Graph Containment Search  gc  q Find all supergraphs of a query graph Find all subgraphs of a query graph Subgraph isomorphism SCAN Answers of Graph Search Answers of Graph Containment Search Sample Database q Sample Query Copyright  2007 by CEBT 3 Subgraph Indexing & Pruning f1 q1 ga gb f2 q2 f3 q3 gc f4 Sample Database  Feature-Graph Matrix Indexed Subgraphs SCAN Queries Traditional Graph Search - Inclusion Logic Given a query graph q and a database graph g  D, if a feature f  q and f  g, then q  g . That is, if feature f is in q then the graphs not having f are pruned.  Graph Containment Search - Exclusion Logic If a feature f  q and f  g, then g  q. That is, if feature f is not in q then the graphs having f are pruned. Copyright  2007 by CEBT 4 Basic Search Framework 1. 2. Off-line index construction  Generate and select a feature set F from the graph database D  f ∈ F, Df = {g | f ⊆ g, g ∈D} Search  3. Test indexed features in F against the query q which returns all f  q, and compute the candidate query answer set, Cq = D − fDf (f  q, f ∈ F). Verification  Check each graph g in the database set Cq to see whether g is really a subgraph of q Search Time  Let’s reduce these Copyright  2007 by CEBT 5 Redundancy-Aware Feature Selection   Select minimal features that cover many graphs that other features didn’t cover from frequent features Expected number of reduced subgraph isomorphism tests    X X X X X q1 f2 X q2 Jf = np(1-p’) -1 – n : |D| – p : frequency of f in D – p’ : probability that a query graph having f f3 q3 p  p’=> Maximum at p = ½ Frequent subgraph mining algorithm  f1 X f4 FSG, GASTON, gSpan Copyright  2007 by CEBT 6 Maximum Coverage With Cost Definition 6. (Maximum Coverage With Cost). Given a set of subsets S = {S1, S2, …, Sm} of the universal set U = {1, 2, … , n} and a cost parameter  associated with any Si ∈ S, find a subset T of S such that |∪Si∈TSi|−|T| is maximized. U = {1, 2, 3, 4, 5, 6, 7, 8, 9} S = {{4,5,6}, {1,2,4,5},{4,5,7,8}, {1,4,7}} f1 f2 f3 f4  = 3 Set Cover Problem http://en.wikipedia.org/wiki/Set_cover_problem 1 2 3 4 5 6 Contrast Graph Matrix 7 8 9 This is NP-complete problem The greedy feature selection process can approximate the optimal index with K features within a ratio of 1 − 1/e. [D. S. Hochbaum, editor. “Approximation Algorithms for NP-Hard Problems”, 1997] Copyright  2007 by CEBT 7 cIndex-Basic 1  2 3 4 5 6 7 8 9 Time complexity : O(|F0||D||L|) ----> O(|F0||Ds||L|) Reduced by sampling  Space usage : O(|F0||D||L|) ----> O(|F0||D| + |F0||L|) Reduced by virtualization Copyright  2007 by CEBT 8 Hierarchical Indexing Models  cIndex-BottomUp  cIndex-TopDown Copyright  2007 by CEBT 9 Index Maintenance  “ostrich” strategy  Stick with the same set of selected features and the same hierarchical structure built  Sampling strategy  Periodically take small samples (Fs, Ds, Ls) and construct sample index Is – if performance of legacy index I is much worse than Is then reconstruct index – else take ostrich strategy Copyright  2007 by CEBT 10 Experiments (1)  Chemical Descriptor Search   a model graph database usually includes a set of fundamental substructures, called descriptors. These descriptors, shared by specific groups of known molecules, often indicate particular chemical and physical properties. Given a molecule, fast searching for its “descriptor” substructures can help researchers to quickly predict its attributes. Comparison  SCAN  FB –  Extract features using gIndex and apply features as Basic Framework cIndex-Basic, cIndex-BottomUp, cIndex-TopDown Copyright  2007 by CEBT 11 Experiments (2) Subgraph Isomorphism Test Numbers Effectiveness of Data Space Reduction Copyright  2007 by CEBT Query Processing Time Index Maintenance 12 Experiments (3) Performance of Hierarchical Indices Scalability of Hierarchical Indices Index Size Copyright  2007 by CEBT 13

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download link