Download v - Researchmap

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Red–black tree wikipedia , lookup

B-tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Discrete Methods in
Mathematical Informatics
Kunihiko Sadakane
The University of Tokyo
http://researchmap.jp/sada/resources/
A Variety of Trees
• Ordered/Unordered trees
– Child nodes are ordered/not ordered
– Concerning unordered trees, trees can be regarded as the
same if they become the same shape by reordering
children
• Edges are labeled/not labeled
– Node labels can be represented by edge labels on the
edge toward parent nodes
2
Applications of Ordered Trees
• An abstract data type for “dictionary”
• A data structure D is called dictionary if for a set S
5
and a key k, it supports
– Search(D, k): returns Yes iff k  S
– Insert(D, x): adds x to S
– Delete(D, x): removes x from S
8
1
7
• A binary search tree is used as a dictionary.
– Any balanced binary search tree supports the above
operations in O(log n) time for a set of n elements.
– We assume that element comparison is done in O(1) time.
3
Radix Search Trees
• We assume keys are b-bit integers
• All elements are stored in leaves
• Left (right) subtree of root stores all the keys
whose first bit is 0 (1).
• To search for a key, we traverse the tree from
the root. At a node with depth d, we go to left
(right) if d-th bit of the key is 0 (1).
• All operations are done in O(b) time.
A
A 000
B 001
C 010
D 011
E 100
F 101
G 110
H 111
G
B
4
Tries
•
•
•
•
Trie is a data structure for storing a set of strings.
A node has at most  children (: alphabet size)
Each edge is labeled a character c
c
d
The concatenation of characters on edges
from the root to a node coincides with a
o
the string represented by the node
b
t
g
c
A Trie for S = {cab, cat, do, doc, dog}
5
Compressed Tries
• Compressed Tries are obtained from standard Tries
by compressing chains of redundant nodes
– redundant node = a node with only one child
– #nodes  #leaves1
• An edge represents a string
b
c
a
t
d
o
g
c
S = {cab, cat, do, doc, dog}
6
Suffix Trees
• The compressed trie for all suffixes of a string
0
• The suffix tree for a string consists of
$
–
–
–
–
–
n leaves (n: length of string)
 n1 internal nodes
edge labels
node depths
suffix links
a
7
b
a
1
b
a
c
6
c
2
3
5
b c
1
3
b
c
2
4
1234567
ababac$
7
Operations on Trees
• Binary search trees
– left(v), right(v): returns the left/right child node of v
– key(v): returns the key stored in node v
• Tries
– child(v, c): returns a child w of v with edge label c
– key(v): returns the key stored in node v
• Compressed tries
– child(v, c): returns a child w of v with edge label c…
– edge(w, d): returns d-th character on edge pointing to w
v
a
– key(v)
b
child(v, a) = w
edge(w, 2) = b
w
8
Succinct Representations of
Ordered Trees
• LOUDS (level order unary degree sequence)
– [Jacobson 89]
• BP (balanced parentheses)
– [Munro, V. Raman 97, 01]
• DFUDS (depth first unary degree sequence)
– [Benoit et al. 99, 05]
9
LOUDS Representation
• Degrees of nodes are encoded by unary codes in
breadth-first order
– degree d → 1d0
• 2n+1 bits for n nodes (matches the lower bound)
• i-th node is represented by
1
i-th 1 (ones-based numbering)
2
4
1
LOUDS
L
2 3
4 5 6
3
5
6
7
8
7 8
10110111011000000
10
Tree Navigational Operations (1)
(i  1)
• i-th node: select1(L, i)
• firstchild(x)
– y := select0(rank1(L,x))+1
– if L[y] = 0 then 1 else y
1
• lastchild(x)
– y := select0(rank1(L,x)+1)1
– if L[y] = 0 then 1 else y
4
1
LOUDS
L
2 3
2
3
5
4 5 6
6
7
8
7 8
10110111011000000
11
Tree Navigational Operations (2)
• sibling(x)
– if L[x+1] = 0 then 1 else x+1
•
•
•
•
parent(x) = select1(rank0(L,x))
degree(x) = lastchild(x)  firstchild(x) + 1
Merits: implemented by only rank/select
Demerits: cannot compute
1
subtree sizes
2
4
1
LOUDS
L
2 3
4 5 6
5
3
6
7
8
7 8
10110111011000000
12
• child(v, c)
Tries using LOUDS
w = firstchild(v), r = rank1(L, w), k = 0
while (L[w+k] != 0) {
if (C[r+k] == c) return w+k
k = k+1
}
– Binary search can be also used x 2
• key(v) is stored in array
indexed with rank1(L, v)
1
LOUDS
a
1
c
3
z
a
y
4
5
2 3
4 5 6
6
7 8
7
d
8
L 10110111011000000
13
C _acxyzad
• A trie with n nodes and size- label set supporting
child(v, c) in O(log ) time is represented in
n(2+log ) + o(n) bits.
– O() time sequential search is faster for small .
• By using an auxiliary array, key(v) is performed in
O(1) time.
14
Numbering of Nodes
Ones-based
s1(i)
s0(r1(x))+1
s0(r1(x)+1)1
x+1
s1(r0(x))
i-th node
firstchild(x)
lastchild(x)
sibling(x)
parent(x)
Zeros-based
s0(i+1)
s0(r1(s0(r0(x)1)+2)
s0(r1(x)+1)
s0(r0(x))
s0(r0(s1(r0(x)1)+1))
Ones-based
1
L
2 3
4 5 6
1
7 8
10110111011000000
1
2
3 4 5 6 7 8
2
3
Zeros-based
sc(i) = selectc(L, i), rc(i) = rankc(L, i)
4
5
6
7
15
8
The Merits and Demerits of LOUDS
• Implemented by rank and select
– easy to implement
– fast in practice
• Suitable for labeled trees
– child() is fast because of locality of reference
– Tx library by Okanohara
• Many operations are not supported in O(1) time
– subtree size
– level ancestor
– lowest common ancestor, etc.
16
BP Representation
• Each node is represented by a pair of matching
open and close parentheses
• 2n bits for n nodes
1
• The size matches the lower bound
2
3
6
4
5
7
8
1
2
6
3
BP
P
4
5
7
8
((()()())(()()))
17
Basic Operations on BP
• A node is represented by the position of (
• findclose(P,i): returns the position of )matching
with( at P[i]
• enclose(P,i): returns the position of ( which
encloses ( at P[i] enclose findclose
1
3
2
4
5 6
11
8
7
9
1 2 3 4 5 6
7
8 9 10
11
P(()((()())())(()())())
10
18
Tree Navigational Operations
•
•
•
•
parent(v) = enclose(P,v)
firstchild(v) = v + 1
sibling(v) = findclose(P,v) + 1
lastchild(v) = findopen(P, findclose(P,v)1)
1
enclose
3
2
4
5 6
11
8
7
9
findclose
1 2 3 4 5 6
7
8 9 10
11
(()((()())())(()())())
10
19
Tries using BP
• child(v, c)
w = firstchild(v)
while (w != NIL) {
if (C[rank((P, w)] == c) return w
w = sibling(w)
2
}
x
• key(v) is stored in array
indexed with rank((P, v)
BP
P
C
y
3
4
1 2 3
4
a 1 c
z
a
5
5
6
d
7
6 7
8
8
((()()())(()()))
_axyzcad
20
Number of Descendants (Subtree Size)
• The size of the subtree rooted at v is
subtreesize(v) = (findclose(P,v)v+1)/2
• degree (#children) can be computed by repeatedly
applying findclose, but it takes time proportional
to the number of children
1
3
2
4
5 6
11
8
7
9
1 2 3 4 5 6
7
8 9 10
11
P (()((()())())(()())())
10
21
Additional Basic Operations on BP
• rankp(P,i): number of pattern p in P[1..i]
• selectp(P,i): position of i-th occurrence of p in P
• If the length of p is constant, rank/select is done in
O(1) time
1
3
2
4
5 6
11
8
7
9
10
1 2 3 4 5 6
7
8 9 10
11
P(()((()())())(()())())
rank()(P,10) = 3
22
Operations on Leaves
• Each leaf is represented by()in BP
• Position of i-th leaf = select()(P, i)
• Number of leaves in a subtree, leftmost/rightmost
leaf in a subtree are also found
1
3
2
4
5 6
11
8
7
9
10
1 2 3 4 5 6
7
8 9 10
11
P(()((()())())(()())())
Subtree rooted at 3
23
Node Depths
• Define excess array E[i] = rank((P,i)  rank)(P,i)
depth(v) = E[v]
• E is not explicitly stored; it can be computed by
the rank index on P
1
3
2
4
11
8
7
9
10
1 2 3 4 5 6
7
8 9 10
11
P (()((()())())(()())())
E 1212343432321232321210
5 6
24
Lowest Common Ancestor (lca)
• lca = lowest common ancestor
• u = lca(v,w): common ancestor
of v and w which is furthest
from root
• Found in O(1) time
u
v
w
25
• u = parent(RMQE(v,w)+1)
– E is the excess array, which represents node depths
m = RMQE(v,w): the index of a minimum value
in E[v..w] (RMQ = Range Minimum Query)
u
146
w5
3
7
v2
1 3
5
2
6
4
1 7 3 2 1 3
5
5 2
4
6
P (()((()())())(()())())
E 1212343432321232321210
u
v
mw
26
The Merits and Demerits of BP
• More supported operations
– subtree size
– lowest common ancestor (lca)
– level ancestors
• Succinct indexes are complicated
– the more supported operations, the more index space
– o(n) size indexes cannot be ignored in practice
• degree, i-th child, etc. were difficult to implement
27
• No locality of child labels
References
[1] G. Jacobson. Space-efficient Static Trees and Graphs. In Proc. IEEE
FOCS, pages 549–554, 1989.
[2] O'Neil Delpratt, Naila Rahman, Rajeev Raman: Engineering the
LOUDS Succinct Tree Representation. WEA 2006: 134-145.
[3] J. I. Munro and V. Raman. Succinct Representation of Balanced
Parentheses and Static Trees. SIAM Journal on Computing,
31(3):762–776, 2001.
[4] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal
representation for balanced parentheses. Theoretical Computer Science,
368:231–246, December 2006.
[5] J. Ian Munro, Venkatesh Raman, and S. Srinivasa Rao. Space
efficient suffix trees. Journal of Algorithms, 39:205–222, 2001.
[6] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S.
Rao. Representing Trees of Higher Degree. Algorithmica, 43(4):275–
28
292, 2005.
[7] J. Jansson, K. Sadakane, and W.-K. Sung. Ultra-succinct
Representation of Ordered Trees. In Proc. ACM-SIAM SODA, pages
575–584, 2007.
[8] A. Farzan and J. I. Munro. A Uniform Approach Towards Succinct
Representation of Trees. In Proc. SWAT, LNCS 5124, pages 173–184,
2008.
[9] A. Farzan, R. Raman, and S. S. Rao. Universal Succinct
Representations of Trees? In Proc. ICALP, LNCS 5555, pages 451–
462, 2009.
[10] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan.
Compressing and indexing labeled trees, with applications. Journal of
the ACM, 57(1):4:1–4:33, 2009.
[11] R. F. Geary, R. Raman, and V. Raman. Succinct ordinal trees with
levelancestor queries. ACM Trans. Algorithms, 2:510–534, 2006.
[12] H.-I. Lu and C.-C. Yeh. Balanced parentheses strike back. ACM
Transactions on Algorithms (TALG), 4(3):No. 28, 2008.
29