Download Swinburne Marketing Strategy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
Processing XML Keyword Search by
Constructing Effective Structured Queries
Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning
Swinburne University of Technology, Australia
Outline

Motivation of Keyword Search in XML

Brief Review of Related Work

Existing Problems

Construct Structured Query Templates

Ranking Function

Processing Algorithms

Conclusions
Motivation of XML Keyword Search

Keyword search is easy-to-use
 Users don’t need to know the structure of XML data and
specific query languages.
 The XML data with different structures can be searched
equivalently by a keyword query because it doesn’t specify
the structures of the retrieved results.
Brief Review of Related Work

We focus on 4 references using label and term as
keyword query format:
 [YunyaoLi2004VLDB] Schema-Free XQuery.
 [DanielaFlorescu2002ComputerNetworks] Integrating
keyword search into XML query processing.
 [SaraCohen2003VLDB] XSEarch: A semantic search engine
for XML.
 [WeidongYang2007CIT] Schema-aware keyword search
over xml streams.

Other relevant work can be found in our paper.
Brief Review of Related Work

All the four work utilized label and term as keyword
query format.

The difference: the first three work shared the similar
basic strategy that first retrieves the relevant keyword
lists and then merges them into the results; while the
last one first generate a big template that covers all the
kinds of results w.r.t. XML schema and then cache the
possible results over xml streams.
Template-based strategy can obtain better
performance [WeidongYang2007CIT]!
Existing Problems

[WeidongYang2007CIT] was
used to query over XML streams,
which is not enough because of the challenges:
 Different templates may exist in one XML data repository.
 Users prefer to see part of the results, e.g., top k results.
 Domain knowledge can be helped to process the labels with
the same meaning.

Therefore, it is required to study the problem of applying
template-based keyword search strategy to XML data
repository.
Construct Structured Query Templates

Example: There are two data sources that conform to t1 and
t2 respectively.
Schema t1
Schema t2
Keyword query – (year:2006, title:xml, author:philip)
Construct Structured Query Templates

Identifying context of keywords
 Determine master entities using labels in keyword query and XML
schema.
 Generate FOR clause for each entity.
 Judge the occurrences of every label under each master entity.
 Once a time – Generate WHERE clauses
 More than once – First cluster and then generate WHERE clauses.

Step 1: determine master entity and its
corresponding label set
 Q1 = “For $b in bibliography/books/book”
 Q2 = “For $a in bibliography/articles/article”

Schema t1
Step 2: only one occurrence of each label in
each master entity.
 Q1 += “Where $b/year=‘2006’ and
$b/title.contains(xml) and
$b/author.contains(philip)”
 Q2 += “Where $a/year=‘2006’ and
$a/title.contains(xml) and
$a/author.contains(philip)”
Keyword query – (year:2006, title:xml, author:philip)

Step 1: determine master entity and its
corresponding label set
 Q = “For $bi in bibliography/bib”
Schema t2

Step 2: only two occurrences of each label in
the master entity. Cluster title and author using
book and article respectively
 Q1 += Q + “For $bo in $bi/book”
 Q2 += Q + “For $a in $bi/article”

Step 3: only one occurrence of each label in
each cluster.
 Q1 += “Where $bi/year=‘2006’ and
$bo/title.contains(xml) and
$bo/author.contains(philip)”
 Q2 …
Keyword query – (year:2006, title:xml, author:philip)
Construct Structured Query Templates

Identifying returned nodes
 Step1: If the cardinality of a master entity satisfies “*” and no
cluster operation is activated, we take the master entity as a
return node in constructed queries;
 Step 2: If the cardinality of a master entity satisfies “*” and
clusters are generated, we first check the root node of each
cluster in a recursive procedure (back to step 1);
 Step 3: If the cardinality of a master entity does not satisfy
“*”, we will probe its ancestor nodes one by one until this
kind of node exists or the root of the xml schema.
Schema t1

Schema t2
Master entities are the returned nodes.
 Q1 += “$b”
 Q2 += “$a”

Roots of clusters are the returned nodes.
 Q1 += “$bo”
 Q2 += “$a”
The constructed queries can be read in our paper!
Keyword query – (year:2006, title:xml, author:philip)
Ranking Function
 (vi , ti )
1 n
Score( A, qi )   
n i 1 LengthOfPath(vi , vm )
 vm is the master entity nodes;
 ω(vi, ti) is calculated by using tf*idf weight model.
Feature of the function: The Score() consists of two parts
ContextScore() and tf*idf weight, and the former is the upper
bound of the score of the results.
Processing Strategy

Algorithm 1 is used to generate structured queries with
their corresponding context score.

Algorithm 2 is used to schedule the query plan
according to the conditions:
 Users’ requirements, e.g., number of results;
 Context scores of all generated queries;
 And the intermediate results.
Experiments

Dataset:
 Sigmod record
 three variant of DBLP

Keyword Queries:
 q1 (author:David, title:XML)
 q2 (year:2002, title:XML)
Experimental Results
q1
q1(k = 10)
q2
q2(k = 20)
Conclusions

XBridge is proposed to process keyword query over
XML data repository, which can efficiently find the top k
results by evaluating generated structured queries.

A precise ranking function is provided to evaluate the
relevance of the results.

Limitation of this work:
 We take XML schema as tree patterns;
 We didn’t consider reference relationships of XML data.