Download The Complexity of XPath Evaluation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

CMOS wikipedia , lookup

Transistor–transistor logic wikipedia , lookup

Integrated circuit wikipedia , lookup

Regenerative circuit wikipedia , lookup

RLC circuit wikipedia , lookup

TRIAC wikipedia , lookup

Transcript
The Complexity of XPath
Evaluation
Paper By:
Georg Gottlob
Cristoph Koch
Reinhard Pichler
Presented By:
Royi Ronen
Introduction
• All major XPath evaluating algorithms run
in exponential time.
• Paper’s main goals:
– Prove that the “XPath problem” P-complete.
– Prove that other related problems are
LOGCFL-complete.
XPath – Quick Reminder
• XPath is a query language for XML
documents.
• Navigating through a document:
/descendant::a/child::b selects nodes
named “b” that have a father named “a”.
• Testing nodes:
/descendant::a/child::b[@c=3] requires
that b’s attribute c equals 3.
Sketch: How P-Completeness is
proven
• In order to prove P-Completeness of a
problem, we have to prove:
– Membership in P;
– P-Hardness;
P-Hard
P-Complete
P
XPath is P-Complete
• Sketch:
1. Membership of XPath in P is already
proven (By the same authors).
2. P-Hardness of XPath will be proven by
reduction from the monotone circuit
problem (which is known to be PComplete) to Core XPath (a subset of
XPath with its main features). Why is it
enough?
Monotone Boolean Circuit Problem
• A Monotone Boolean circuit is a circuit with
many inputs and one output that uses the
following Boolean gates only:
– AND
– OR
– DUMMY
• Given a circuit and its inputs, solving the
problem is stating the output.
• The problem is P-Complete.
A Monotone Boolean Circuit
• Item 3 in the handout:
Core XPath - Definition
XPath is has many features, and is inconvenient for
theoretical treatment. Therefore Core XPath, a subset of
XPath with its main features is defined by the following
grammar (Item 1 in the handout):
locpath ::= ‘/’ locpath | locpath ‘/’ locpath |
locpath ‘|’ locpath | locstep.
locstep ::= axis ‘::’ ntst `[' bexpr `]' . . . ‘[‘ bexpr ‘]’.
bexpr ::= bexpr ‘and’ bexpr | bexpr ‘or’ bexpr |
‘not(’ bexpr ‘)’ | locpath.
axis ::= ‘self’ | ‘child’ | ‘parent’ |
‘descendant’ | ‘descendant-or-self’ |
‘ancestor’ | ‘ancestor-or-self’
‘following’ | ‘following-sibling’
‘preceding’ | ‘preceding-sibling’.
The Corresponding Languages
• The paper shows direct reductions
between the problems.
• We will show the same reduction, but
between the corresponding languages,
since it is the methodology used in the
Technion Computability course.
• The proofs are equivalent.
The Corresponding Languages
• L-Core XPath:
{(Q,D) | Q is a Core XPath query, D is a
valid document and Q yields a
non-empty result when run on D}
• L-Monotone Circuit:
{(C,I) | C is a monotone circuit, I is a set of
inputs to C and C evaluates 1
when run on I}

The Reduction
• Reduction is our tool to prove that one
language is at least as hard as another.
• Here we will show: L-Circuit is reducible to
L-Core XPath. It proves that L-Core XPath
is at least as hard as L-Circuit, therefore
P-Hard.
• We have to build (Q,D) that yields a
nonempty result iff (C,I) evaluates to 1.
The circuit layered
• An equivalent
monotone circuit, in
which only one nondummy gate exists in
every layer (Item 4 in
the handout).
• The gates are
ordered, data can
flow from lower to
higher indexed gates
only.
Q and D
• D is built as follows:
M inputs, Here M=4
N non-input gates, Here N=5
Total of 2(M+N)+1 nodes.
Nodes are tagged, from the alphabet: {0,1,Ii,Oi,G }
Where i is from {1,2,…,N}
Tagging Rules
• V1-VM are tagged each with its input value, e.g.
0 or 1.
• VM+N Is tagged R, Vi is tagged G (inc. VM+N).
• If gate Gi is an input to gate GM+k (i<M+k), Ik is
added to Vi and Ok – to VM+k.
• V’1..M are tagged Ii and Oi, where i is in {1,..,N}.
• V’M+i are tagged Ik and Ok, where k is in {i,..,N}.
These tags will be used by the query.
A Simple Example
D
C

G1
1
V0
0
1 G
V1 I1
0 G
V2 I1
G O1
V3
R
V’1
I1
O1
V’2
I1
O1
V’3
I1
O1
The Query
• The query in the output of the reduction is:
/descendant-or-self::[T(R) and  N ]
 k := descendant-or-self::[T(Ok) and parent::*[  k ]]
 k := not(child::*[T(Ik) and not(  k)]) Evaluation of Gk by:
selecting V0 iff all (one of)
If GM+k is an AND Gate
 k := child::*[T(Ik) and (  k)]
Gk inputs are (is) 1 and
the gate is “AND” (“OR”).
If GM+k is an OR/DUMMY Gate
 k := ancestor-or-self::*[T(G) and  k 1 ]
 0 := T(1) End of
Pushing down
results
recursion
The reduction can be achieved in logarithmic space
Sub-queries Meaning
 k : ancestor  or  self ::*[T (G)  k 1 ]
Returns nodes in the previous iteration and their tagged children, e.g. pushes
“down” results by including the children.
 k : not (child ::*[T ( I k )  not ( k )])
Returns the root iff all the inputs to gate k are true, in an AND gate.
 k : child ::*[T ( I k )  ( k )])
Returns the root iff at least one of the inputs to gate k is true, in an OR gate. In
both cases, returns the nodes that represent gates that were previously evaluated
to true.
k : descendant  or  self ::*[T (Ok )  parent ::[ k ]]
Includes Vk iff the root was returned by the previous sub-query.
/ descendant  or  self ::*[T ( R)  k ]
Returns the rightmost node iff the output gate is evaluated to true. (No other gate
is tagged R).
The Query - Example
/ descendant  or  self ::*[T ( R)  1 ]
V0
1 : descendant  or  self ::*[T (O1 )  parent ::[ 1 ]]
 1 : not (child ::*[T ( I1 )  not (1 )])
1 : ancestor  or  self [T (G)  T (1)]
0
1 G
V1 I1
0 G
V2 I1
G O1
V3 R
V’1
I1
O1
V’2
I1
O1
V’3
I1
O1
Discussion
It is enough to show that:
Vi
[  k ]
iff Gi evaluates to true
Reason: T(R) is true for the rightmost node only.
If the last gate evaluates to 1, then the
result of the query consists of that node,
and (Q,D) is in Circuit.
Otherwise, the result is empty, and (Q,D) is
not in Circuit.
Tagged Tree Example
For C in the handout
I23
G1
I24 1
G
I1 0
G
I1-I5
I1-I5
I1-I5
O1-O5 O1-O5 O1-O5
O
I1 G
O1 I34 I5 O2
G
G
I1-I5
I1-I5
I2-I5
O1-O5 O1-O5 O2-O5
and
and
O3 I 5
G
O4 I 5
G
O5 R
G
I5
I3-I5
I4-I5
O3-O5 O4-O5 O5
and
and
or
Discussion
•  k consists of the values of the k nodes in
layer k of the circuit.
• It can also be viewed as the situation at the kth tick of a clock in a synchronous system.
• Proof:
Vi [  k ] iff Gi evaluates to true
Despite P-Completeness
• Problems that are P-Complete are considered
inherently sequential, and thus cannot benefit from
parallelization.
• However, for real-world use, it may be very useful
to find subsets of the problem and classify them
into lower complexity classes (easier problems).
• Does anyone recall a well known problem that can
benefit from such manipulation?
• The paper continues by looking for how to
degenerate the problem.
First Modification Trial
• Only usage of the axes: child, parent and
descendant-or-self is allowed.
• The modification doesn’t yield lower
complexity. The same reduction will work
after changing:
ancestor-or-self::*
to
descendant-or-self::*/parent::*
Second Modification Trial
• Let Positive Core-XPath be:
Core-XPath \ Queries that use negation.
• This problem is a member of LOGCFL.
• LOGCFL problems can be reduced in logarithmic
space to a context free language.
• Being context free embodies the ability to be
parallelized. Segments do not dependant on each
other.
• The reduction is very similar. It uses the problem
of semi-bounded circuits for the reduction.
WF and Positive WF
• WF is a subset of XPath that allows CoreXPath, arithmetic operations and
conditions using position() last() and
constants.
• Where is WF?
• Positive WF is LOGCFL-Complete. The
proof of hardness resembles the proof we
have just seen.
The Global Picture
BACKUP
• BACKUP
PF is NL-Complete
• PF is the problem of navigating through an
XML document, with no conditions
allowed.
• NL is the class of problems solved by a
Turing Machine that uses, nondeterministically, logarithmic space.
• Proof: PF is NL-Complete.
– Membership in NL (By random guessing)
– NL-Hardness