Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Keyword Proximity Search on XML
Graphs
Vagelis Hristidis
Yannis Papakonstatinou
Andrey Balmin
@UCSD
Presenter: Feng Shao
Outline
Introduction
Proximity Keyword Query Semantics
Architecture
XML Decompositions
Execution
Experiment
Conclusion
Introduction
Keyword search is easy-to-use
No
need to know the structure and query
language
XML: labeled graph, representing
semistructured self-describing data.
Feb.10,
5th birthday of XML
From www.w3c.org
Problem--Keyword proximity query
Input: a set of keywords
Results: trees of XML fragments(called target
objects) that contains all the keywords, ranked
according to their size
Assume the existence of schema, facilitates the
presentation of the results and used in
optimizing the performance of the system.
Name[John]personsupplierlineitemlinepartproductdescr[set of VCR and DVD] , size 6
Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR], size 8
Challenges
Presentation of result graphs:
Semantically
meaningful
Avoid a huge number of trivial results
Challenges
Presentation of result graphs:
Semantically
meaningful
Avoid a huge number of trivial results
Providing fast response time
Efficient
storage of data
On-demand execution, guided according to
user’s navigation
Outline
Introduction
Proximity Keyword Query Semantics
Architecture
XML Decompositions
Execution
Experiment
Conclusion
Semantics
XML Graph: a labeled graph
Schema graph: a directed graph
Node v: id(v), label λ(v),value val(v)
Edge: containment and reference edges
Node vs: labelλ(vs), content type type(vs)(all or
choice)
Edge es: containment or refrence, annotated with a
maximum occurrence occ(es)
A XML graph conforms to a schema graph
schema graph
XML Graph
Query semantics
Result: the set of all possible Minimal Total Target Object
What’s MTTON?
Networks(MTTON’s)
Node network j: an uncycled subgraph of G, such that each edge in j
Total node network j of keyword {k1,…,km}: a node network where
Minimal Total Node Network(MTTN): a total node network j where
no node can be removed and j still be a total node network. Score :
is an edge in G
every keyword is contained at least one node n of j
number of edges
Target object of node n: a segment of XML graph, large enough to
be meaningful and semantically identify the node n, and as small as
possible.
MTTON(cont.)
Given a MTNN j with nodes v1, . . . , vn there is a
corresponding MTTON t, which is a tree whose
nodes is a minimal set of target objects {t1, . . . , tm} such that
for every node nk ∈ j there is a tl ∈ t such that target(nk) = tl.
There is an edge from a target object ti to a target object tj if
there is an edge ( or a path) from a node that belongs to ti to a
node that belongs to tj .
The score of a MTTON j is the score of its corresponding
MTNN.
MTNN: name
MTNN:namepersonnation
MTTN & MTTON
Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR]
Target object
Defined from an administrator using the Target Schema
Segment (TSS) graph
TSS graph: a partial mapping of nodes in G
A node tS is created in GTSS for each set S = {s1, . . . , sw} of
nodes of G that are mapped to tS.
An edge (tS, tS’) is created in GTSS if the schema graph has nodes
s ∈ S and s ‘∈ S’, that are connected directly through an edge
(s,s’) or indirectly through a path of dummy schema nodes.
Target decomposition: given the TSS graph, decompose
XML graph into target objects, connected to each other
Example
MTTN & MTTON
Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR]
Presentation Graph
Naïve method: multiple threads,
evaluating various plans for producing
MTTON’s, and outputs as they come.
Pro:
fast response time
Con: many trivial results
Interactive interface: allows navigation
and hides the trivial results
Presentation Graph
Outline
Introduction
Proximity Keyword Query Semantics
Architecture
XML Decompositions
Execution
Experiment
Conclusion
Architecture
Load Stage
Keyword: <TO_id,node_id, schema_node>
The number of nodes of each type and etc.
A decomposition of the TSS graph into
fragments, which correspond to connection
relations that allow efficient retrieval of
MTTON’s.
Given an object id instantly return the whole target object
Example of decomposition
Query processing
Keyword: TV, VCR
Keyword: <TO_id,node_id, schema_node>
Execution Plan
Candidate Network
Schema graph and TSS graph
Candidate TSS Network
Connection relations schema
Execution Plan
Schema graph
TSS graph
Connection relations
Outline
Introduction
Proximity Keyword Query Semantics
Architecture
XML Decompositions
Execution
Experiment
Conclusion
XML Decomposition
Decompose TSS graph into fragments
Determines how the connections are stored in the
database
Dramatically change the performance
Example:
a
a
Decomposition Tradeoff
# fragments v.s. performance
Minimal decomposition
A fragment is built for each edge of TSS graph
Candidate TSS network C of size S, requires S-1 joins
Maximal decomposition
A fragment F is built for every possible candidate TSS network C
C requires zero joins.
Not feasible in practice
Tradeoff (cont.)
Clustering and indexing are critical
Classify TSS graph, based on the storage redundancy in
the corresponding connection relations.
Maximal decomp.: multi-attribute indices
Non-maximal decomp.: a connection relation R is clustered on
the direction that R is used
Example
4NF, inlined( non-MVD,no-4NF)
Decomposition Algorithm
See paper
Outline
Introduction
Proximity Keyword Query Semantics
Architecture
XML Decompositions
Execution
Experiment
Conclusion
Execution
Goal: fast response time
Web search engine-like presentation
Use
inlined decomposition
Use thread pool
Use nest-loop joins
Example:
Outmost loop: over TSS partVCR,name
Optimization: store partial results
Execution
Presentation graphs(on-demand)
Initially,
Xkeyword decomposition is used to
retrieve the top result of each CN.
Then use a combination of decompositions to
find the minimal connection of the expanded
nodes.
Outline
Introduction
Architecture
Proximity Keyword Query Semantics
XML Decompositions
Execution
Experiment
Conclusion
Experiments
Measure various decompositions , for top-K and
full results
Evaluate the performance of algorithm for
search engine-like presentation method and ondemand expansion method
Data: DBLP XML database, 2 keywords
Maximum size of CTSSN: M = 6
Max size of fragments: L = 2
Decompositions
Execution algorithm
Speedup = optimized algorithm / naïve, non-caching algorithm
Execution algorithm
Keyword queries: the names of two authors, k1 and k2
Candidate Network: Authork1 Paper Authork2
Time measured: average time to expand a Paper node
Outline
Introduction
Architecture
Proximity Keyword Query Semantics
XML Decompositions
Execution
Experiment
Conclusion
Conclusion
Xkeyword is built on a relational database and, hence,
can accommodate very large graphs.
Present keyword proximity search semantics, extended
to capture the novel result presentation method.
Present an architecture allowing for choosing which
connections will be precomputed
Address on-demand performance requirement
Demo: http://www.db.ucsd.edu/Xkeyword