Download スライド 1 - Researchmap

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Transcript
Succinct Data Structures
Kunihiko Sadakane
National Institute of Informatics
BP Representation [3]
• Each node is represented by a pair of matching
open and close parentheses
• 2n bits for n nodes
1
• The size matches the lower bound
2
3
6
4
5
7
8
1
2
6
3
BP
P
4
5
7
8
((()()())(()()))
2
Data Structure for findclose [4]
• Divide the parentheses sequence into blocks of
length B = ½ log n
– b(p): block number containing p
– (p): position of parenthesis matching p
– parenthesis p is said to be far ⇔ b(p)  b((p))
• Far open parenthesis p is said to be opening pioneer
⇔ For the far open parenthesis q which immediately
precedes p, b((p))  b((q))
• Represent positions of parentheses which match
with opening pioneers are represented by 0,1 vector
r
(
q p
( (
(p) (q) (r)
)
)
)
3
Lemma: Let  denote the number of blocks. Then
the number of opening pioneers is at most 23.
Proof: A graph whose nodes correspond to the blocks
and whose edges are (b(p), b((p)) is an outer-planar
graph.
Opening/closing pioneers form a BP again.
 = n/B = 2n/log n ⇒ Length of BP is O(n/log n)
4
Representing Recursive Structure
• opening pioneers and their matching parentheses
are represented by a 0,1 vector B
  p  select B, findcloseP1, rank B, p
• B is a sparse vector of length 2n with O(n/log n) 1’s
– Can be represented in O(n log log n/log n) bits
P
r
(
(p) (q) (r)
q p
( (
)
)
)
B 0100 0101 0000 0000 0010 1001
P1((()))
5
• Let S(n) denote the size of BP representation for
an n node tree
– S(n) = 2n + O(n log log n/log n) + S(O(n/log n))
• If the number of nodes becomes O(n/log2 n),
a naïve data structure which stores all the answers
uses only O(n/log n) bits
• Therefore S(n) = 2n + O(n log log n/log n)
6
Algorithm for findclose
•
•
•
•
•
•
To compute (p) = findclose(P,p)
If p is not far, (p) is computed by a table
Find the pioneer p* that immediately precedes p
Find (p*) using the BP for pioneers
If p is not pioneer, b((p))  b((p*))
The position of (p) is determined from the
difference between depths of p and p*
p* p
( (
(p) (p*)
)
)
7
enclose
• Let (p) = enclose(P,p)
• If b((p)) = b(p), (p) is found from a table
• If b((p))  b(p), store those positions
– also store positions of matching parentheses
– if there are more than one pairs of parentheses, store only
the outermost one
• Recur for extracted parentheses
(
(
(()))(
)
)
)
8
Additional Basic Operations on BP
• rankp(P,i): number of pattern p in P[1..i]
• selectp(P,i): position of i-th occurrence of p in P
• If the length of p is constant, rank/select is done in
O(1) time
1
3
2
4
5 6
11
8
7
9
10
1 2 3 4 5 6
7
8 9 10
11
P(()((()())())(()())())
rank()(P,10) = 3
9
Operations on Leaves [5]
• Each leaf is represented by()in BP
• Position of i-th leaf = select()(P, i)
• Number of leaves in a subtree, leftmost/rightmost
leaf in a subtree are also found
1
3
2
4
5 6
11
8
7
9
10
1 2 3 4 5 6
7
8 9 10
11
P(()((()())())(()())())
Subtree rooted at 3
10
Node Depths
• Define excess array E[i] = rank((P,i)  rank)(P,i)
depth(v) = E[v]
• E is not explicitly stored; it can be computed by
the rank index on P
1
3
2
4
11
8
7
9
10
1 2 3 4 5 6
7
8 9 10
11
P (()((()())())(()())())
E 1212343432321232321210
5 6
11
Lowest Common Ancestor (lca)
• lca = lowest common ancestor
• u = lca(v,w): common ancestor
of v and w which is furthest
from root
• Found in O(1) time
u
v
w
12
• u = parent(RMQE(v,w)+1)
– E is the excess array, which represents node depths
m = RMQE(v,w): the index of a minimum value
in E[v..w] (RMQ = Range Minimum Query)
u
146
w5
3
7
v2
1 3
5
2
6
4
1 7 3 2 1 3
5
5 2
4
6
P (()((()())())(()())())
E 1212343432321232321210
u
v
mw
13
DFUDS Representation [6]
• It encodes the degrees of nodes in unary codes in
depth-first order
(DFUDS = Depth First Unary Degree Sequence)
1
• Degree d ⇒ d (’s, followed by a )
• Add a dummy ( at the beginning 2
6
• 2n bits
3
DFUDS
4
5
7
8
U ((()((())))(()))
1
2
3 4 5
6
7 8
14
Lemma: The DFUDS of an n node ordered tree forms
a balanced parentheses sequence of length 2n.
Proof: For n = 1, the root has no children (degree 0).
Its DFUDS is ().
Assume that for any tree with at most n1 nodes,
the lemma holds.
Let U1, U2,..., Up denote the DFUDS for p trees.
(Summation of numbers of nodes is n1, total length
of their DFUDS’s is 2n2)
Consider a tree whose root has those trees as its children.
The DFUDS U for this tree is
(( p )U1*U1* U *p
Degree of root = p
Ui whose dummy parenthesis
at the head is removed
15
Head dummy parenthesis
From the assumption of the induction, Ui is balanced.
Because the head open parenthesis is removed, it lacks
an open parenthesis to be balanced.
The head dummy open parenthesis of U and the
parentheses sequence for the root node ((p) have p open
parentheses unbalanced.
Therefore U is balanced.
The number of nodes is n and the length of the sequence
is 2n. This proves the lemma.
p
*
1
*
1
(( )U U U
*
p
Ui whose dummy parenthesis
at the head is remove
Degree of root = p
Head dummy parenthesis
16