Download Swinburne Marketing Strategy

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology, Australia Outline  Motivation of Keyword Search in XML  Brief Review of Related Work  Existing Problems  Construct Structured Query Templates  Ranking Function  Processing Algorithms  Conclusions Motivation of XML Keyword Search  Keyword search is easy-to-use  Users don’t need to know the structure of XML data and specific query languages.  The XML data with different structures can be searched equivalently by a keyword query because it doesn’t specify the structures of the retrieved results. Brief Review of Related Work  We focus on 4 references using label and term as keyword query format:  [YunyaoLi2004VLDB] Schema-Free XQuery.  [DanielaFlorescu2002ComputerNetworks] Integrating keyword search into XML query processing.  [SaraCohen2003VLDB] XSEarch: A semantic search engine for XML.  [WeidongYang2007CIT] Schema-aware keyword search over xml streams.  Other relevant work can be found in our paper. Brief Review of Related Work  All the four work utilized label and term as keyword query format.  The difference: the first three work shared the similar basic strategy that first retrieves the relevant keyword lists and then merges them into the results; while the last one first generate a big template that covers all the kinds of results w.r.t. XML schema and then cache the possible results over xml streams. Template-based strategy can obtain better performance [WeidongYang2007CIT]! Existing Problems  [WeidongYang2007CIT] was used to query over XML streams, which is not enough because of the challenges:  Different templates may exist in one XML data repository.  Users prefer to see part of the results, e.g., top k results.  Domain knowledge can be helped to process the labels with the same meaning.  Therefore, it is required to study the problem of applying template-based keyword search strategy to XML data repository. Construct Structured Query Templates  Example: There are two data sources that conform to t1 and t2 respectively. Schema t1 Schema t2 Keyword query – (year:2006, title:xml, author:philip) Construct Structured Query Templates  Identifying context of keywords  Determine master entities using labels in keyword query and XML schema.  Generate FOR clause for each entity.  Judge the occurrences of every label under each master entity.  Once a time – Generate WHERE clauses  More than once – First cluster and then generate WHERE clauses.  Step 1: determine master entity and its corresponding label set  Q1 = “For $b in bibliography/books/book”  Q2 = “For $a in bibliography/articles/article”  Schema t1 Step 2: only one occurrence of each label in each master entity.  Q1 += “Where $b/year=‘2006’ and $b/title.contains(xml) and $b/author.contains(philip)”  Q2 += “Where $a/year=‘2006’ and $a/title.contains(xml) and $a/author.contains(philip)” Keyword query – (year:2006, title:xml, author:philip)  Step 1: determine master entity and its corresponding label set  Q = “For $bi in bibliography/bib” Schema t2  Step 2: only two occurrences of each label in the master entity. Cluster title and author using book and article respectively  Q1 += Q + “For $bo in $bi/book”  Q2 += Q + “For $a in $bi/article”  Step 3: only one occurrence of each label in each cluster.  Q1 += “Where $bi/year=‘2006’ and $bo/title.contains(xml) and $bo/author.contains(philip)”  Q2 … Keyword query – (year:2006, title:xml, author:philip) Construct Structured Query Templates  Identifying returned nodes  Step1: If the cardinality of a master entity satisfies “*” and no cluster operation is activated, we take the master entity as a return node in constructed queries;  Step 2: If the cardinality of a master entity satisfies “*” and clusters are generated, we first check the root node of each cluster in a recursive procedure (back to step 1);  Step 3: If the cardinality of a master entity does not satisfy “*”, we will probe its ancestor nodes one by one until this kind of node exists or the root of the xml schema. Schema t1  Schema t2 Master entities are the returned nodes.  Q1 += “$b”  Q2 += “$a”  Roots of clusters are the returned nodes.  Q1 += “$bo”  Q2 += “$a” The constructed queries can be read in our paper! Keyword query – (year:2006, title:xml, author:philip) Ranking Function  (vi , ti ) 1 n Score( A, qi )    n i 1 LengthOfPath(vi , vm )  vm is the master entity nodes;  ω(vi, ti) is calculated by using tf*idf weight model. Feature of the function: The Score() consists of two parts ContextScore() and tf*idf weight, and the former is the upper bound of the score of the results. Processing Strategy  Algorithm 1 is used to generate structured queries with their corresponding context score.  Algorithm 2 is used to schedule the query plan according to the conditions:  Users’ requirements, e.g., number of results;  Context scores of all generated queries;  And the intermediate results. Experiments  Dataset:  Sigmod record  three variant of DBLP  Keyword Queries:  q1 (author:David, title:XML)  q2 (year:2002, title:XML) Experimental Results q1 q1(k = 10) q2 q2(k = 20) Conclusions  XBridge is proposed to process keyword query over XML data repository, which can efficiently find the top k results by evaluating generated structured queries.  A precise ranking function is provided to evaluate the relevance of the results.  Limitation of this work:  We take XML schema as tree patterns;  We didn’t consider reference relationships of XML data.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Swinburne Marketing Strategy