Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo http://researchmap.jp/sada/resources/ A Variety of Trees • Ordered/Unordered trees – Child nodes are ordered/not ordered – Concerning unordered trees, trees can be regarded as the same if they become the same shape by reordering children • Edges are labeled/not labeled – Node labels can be represented by edge labels on the edge toward parent nodes 2 Applications of Ordered Trees • An abstract data type for “dictionary” • A data structure D is called dictionary if for a set S 5 and a key k, it supports – Search(D, k): returns Yes iff k S – Insert(D, x): adds x to S – Delete(D, x): removes x from S 8 1 7 • A binary search tree is used as a dictionary. – Any balanced binary search tree supports the above operations in O(log n) time for a set of n elements. – We assume that element comparison is done in O(1) time. 3 Radix Search Trees • We assume keys are b-bit integers • All elements are stored in leaves • Left (right) subtree of root stores all the keys whose first bit is 0 (1). • To search for a key, we traverse the tree from the root. At a node with depth d, we go to left (right) if d-th bit of the key is 0 (1). • All operations are done in O(b) time. A A 000 B 001 C 010 D 011 E 100 F 101 G 110 H 111 G B 4 Tries • • • • Trie is a data structure for storing a set of strings. A node has at most children (: alphabet size) Each edge is labeled a character c c d The concatenation of characters on edges from the root to a node coincides with a o the string represented by the node b t g c A Trie for S = {cab, cat, do, doc, dog} 5 Compressed Tries • Compressed Tries are obtained from standard Tries by compressing chains of redundant nodes – redundant node = a node with only one child – #nodes #leaves1 • An edge represents a string b c a t d o g c S = {cab, cat, do, doc, dog} 6 Suffix Trees • The compressed trie for all suffixes of a string 0 • The suffix tree for a string consists of $ – – – – – n leaves (n: length of string) n1 internal nodes edge labels node depths suffix links a 7 b a 1 b a c 6 c 2 3 5 b c 1 3 b c 2 4 1234567 ababac$ 7 Operations on Trees • Binary search trees – left(v), right(v): returns the left/right child node of v – key(v): returns the key stored in node v • Tries – child(v, c): returns a child w of v with edge label c – key(v): returns the key stored in node v • Compressed tries – child(v, c): returns a child w of v with edge label c… – edge(w, d): returns d-th character on edge pointing to w v a – key(v) b child(v, a) = w edge(w, 2) = b w 8 Succinct Representations of Ordered Trees • LOUDS (level order unary degree sequence) – [Jacobson 89] • BP (balanced parentheses) – [Munro, V. Raman 97, 01] • DFUDS (depth first unary degree sequence) – [Benoit et al. 99, 05] 9 LOUDS Representation • Degrees of nodes are encoded by unary codes in breadth-first order – degree d → 1d0 • 2n+1 bits for n nodes (matches the lower bound) • i-th node is represented by 1 i-th 1 (ones-based numbering) 2 4 1 LOUDS L 2 3 4 5 6 3 5 6 7 8 7 8 10110111011000000 10 Tree Navigational Operations (1) (i 1) • i-th node: select1(L, i) • firstchild(x) – y := select0(rank1(L,x))+1 – if L[y] = 0 then 1 else y 1 • lastchild(x) – y := select0(rank1(L,x)+1)1 – if L[y] = 0 then 1 else y 4 1 LOUDS L 2 3 2 3 5 4 5 6 6 7 8 7 8 10110111011000000 11 Tree Navigational Operations (2) • sibling(x) – if L[x+1] = 0 then 1 else x+1 • • • • parent(x) = select1(rank0(L,x)) degree(x) = lastchild(x) firstchild(x) + 1 Merits: implemented by only rank/select Demerits: cannot compute 1 subtree sizes 2 4 1 LOUDS L 2 3 4 5 6 5 3 6 7 8 7 8 10110111011000000 12 • child(v, c) Tries using LOUDS w = firstchild(v), r = rank1(L, w), k = 0 while (L[w+k] != 0) { if (C[r+k] == c) return w+k k = k+1 } – Binary search can be also used x 2 • key(v) is stored in array indexed with rank1(L, v) 1 LOUDS a 1 c 3 z a y 4 5 2 3 4 5 6 6 7 8 7 d 8 L 10110111011000000 13 C _acxyzad • A trie with n nodes and size- label set supporting child(v, c) in O(log ) time is represented in n(2+log ) + o(n) bits. – O() time sequential search is faster for small . • By using an auxiliary array, key(v) is performed in O(1) time. 14 Numbering of Nodes Ones-based s1(i) s0(r1(x))+1 s0(r1(x)+1)1 x+1 s1(r0(x)) i-th node firstchild(x) lastchild(x) sibling(x) parent(x) Zeros-based s0(i+1) s0(r1(s0(r0(x)1)+2) s0(r1(x)+1) s0(r0(x)) s0(r0(s1(r0(x)1)+1)) Ones-based 1 L 2 3 4 5 6 1 7 8 10110111011000000 1 2 3 4 5 6 7 8 2 3 Zeros-based sc(i) = selectc(L, i), rc(i) = rankc(L, i) 4 5 6 7 15 8 The Merits and Demerits of LOUDS • Implemented by rank and select – easy to implement – fast in practice • Suitable for labeled trees – child() is fast because of locality of reference – Tx library by Okanohara • Many operations are not supported in O(1) time – subtree size – level ancestor – lowest common ancestor, etc. 16 BP Representation • Each node is represented by a pair of matching open and close parentheses • 2n bits for n nodes 1 • The size matches the lower bound 2 3 6 4 5 7 8 1 2 6 3 BP P 4 5 7 8 ((()()())(()())) 17 Basic Operations on BP • A node is represented by the position of ( • findclose(P,i): returns the position of )matching with( at P[i] • enclose(P,i): returns the position of ( which encloses ( at P[i] enclose findclose 1 3 2 4 5 6 11 8 7 9 1 2 3 4 5 6 7 8 9 10 11 P(()((()())())(()())()) 10 18 Tree Navigational Operations • • • • parent(v) = enclose(P,v) firstchild(v) = v + 1 sibling(v) = findclose(P,v) + 1 lastchild(v) = findopen(P, findclose(P,v)1) 1 enclose 3 2 4 5 6 11 8 7 9 findclose 1 2 3 4 5 6 7 8 9 10 11 (()((()())())(()())()) 10 19 Tries using BP • child(v, c) w = firstchild(v) while (w != NIL) { if (C[rank((P, w)] == c) return w w = sibling(w) 2 } x • key(v) is stored in array indexed with rank((P, v) BP P C y 3 4 1 2 3 4 a 1 c z a 5 5 6 d 7 6 7 8 8 ((()()())(()())) _axyzcad 20 Number of Descendants (Subtree Size) • The size of the subtree rooted at v is subtreesize(v) = (findclose(P,v)v+1)/2 • degree (#children) can be computed by repeatedly applying findclose, but it takes time proportional to the number of children 1 3 2 4 5 6 11 8 7 9 1 2 3 4 5 6 7 8 9 10 11 P (()((()())())(()())()) 10 21 Additional Basic Operations on BP • rankp(P,i): number of pattern p in P[1..i] • selectp(P,i): position of i-th occurrence of p in P • If the length of p is constant, rank/select is done in O(1) time 1 3 2 4 5 6 11 8 7 9 10 1 2 3 4 5 6 7 8 9 10 11 P(()((()())())(()())()) rank()(P,10) = 3 22 Operations on Leaves • Each leaf is represented by()in BP • Position of i-th leaf = select()(P, i) • Number of leaves in a subtree, leftmost/rightmost leaf in a subtree are also found 1 3 2 4 5 6 11 8 7 9 10 1 2 3 4 5 6 7 8 9 10 11 P(()((()())())(()())()) Subtree rooted at 3 23 Node Depths • Define excess array E[i] = rank((P,i) rank)(P,i) depth(v) = E[v] • E is not explicitly stored; it can be computed by the rank index on P 1 3 2 4 11 8 7 9 10 1 2 3 4 5 6 7 8 9 10 11 P (()((()())())(()())()) E 1212343432321232321210 5 6 24 Lowest Common Ancestor (lca) • lca = lowest common ancestor • u = lca(v,w): common ancestor of v and w which is furthest from root • Found in O(1) time u v w 25 • u = parent(RMQE(v,w)+1) – E is the excess array, which represents node depths m = RMQE(v,w): the index of a minimum value in E[v..w] (RMQ = Range Minimum Query) u 146 w5 3 7 v2 1 3 5 2 6 4 1 7 3 2 1 3 5 5 2 4 6 P (()((()())())(()())()) E 1212343432321232321210 u v mw 26 The Merits and Demerits of BP • More supported operations – subtree size – lowest common ancestor (lca) – level ancestors • Succinct indexes are complicated – the more supported operations, the more index space – o(n) size indexes cannot be ignored in practice • degree, i-th child, etc. were difficult to implement 27 • No locality of child labels References [1] G. Jacobson. Space-efficient Static Trees and Graphs. In Proc. IEEE FOCS, pages 549–554, 1989. [2] O'Neil Delpratt, Naila Rahman, Rajeev Raman: Engineering the LOUDS Succinct Tree Representation. WEA 2006: 134-145. [3] J. I. Munro and V. Raman. Succinct Representation of Balanced Parentheses and Static Trees. SIAM Journal on Computing, 31(3):762–776, 2001. [4] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [5] J. Ian Munro, Venkatesh Raman, and S. Srinivasa Rao. Space efficient suffix trees. Journal of Algorithms, 39:205–222, 2001. [6] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing Trees of Higher Degree. Algorithmica, 43(4):275– 28 292, 2005. [7] J. Jansson, K. Sadakane, and W.-K. Sung. Ultra-succinct Representation of Ordered Trees. In Proc. ACM-SIAM SODA, pages 575–584, 2007. [8] A. Farzan and J. I. Munro. A Uniform Approach Towards Succinct Representation of Trees. In Proc. SWAT, LNCS 5124, pages 173–184, 2008. [9] A. Farzan, R. Raman, and S. S. Rao. Universal Succinct Representations of Trees? In Proc. ICALP, LNCS 5555, pages 451– 462, 2009. [10] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. Journal of the ACM, 57(1):4:1–4:33, 2009. [11] R. F. Geary, R. Raman, and V. Raman. Succinct ordinal trees with levelancestor queries. ACM Trans. Algorithms, 2:510–534, 2006. [12] H.-I. Lu and C.-C. Yeh. Balanced parentheses strike back. ACM Transactions on Algorithms (TALG), 4(3):No. 28, 2008. 29