* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download スライド 1 - Researchmap
Survey
Document related concepts
Transcript
Succinct Data Structures Kunihiko Sadakane National Institute of Informatics BP Representation [3] • Each node is represented by a pair of matching open and close parentheses • 2n bits for n nodes 1 • The size matches the lower bound 2 3 6 4 5 7 8 1 2 6 3 BP P 4 5 7 8 ((()()())(()())) 2 Data Structure for findclose [4] • Divide the parentheses sequence into blocks of length B = ½ log n – b(p): block number containing p – (p): position of parenthesis matching p – parenthesis p is said to be far ⇔ b(p) b((p)) • Far open parenthesis p is said to be opening pioneer ⇔ For the far open parenthesis q which immediately precedes p, b((p)) b((q)) • Represent positions of parentheses which match with opening pioneers are represented by 0,1 vector r ( q p ( ( (p) (q) (r) ) ) ) 3 Lemma: Let denote the number of blocks. Then the number of opening pioneers is at most 23. Proof: A graph whose nodes correspond to the blocks and whose edges are (b(p), b((p)) is an outer-planar graph. Opening/closing pioneers form a BP again. = n/B = 2n/log n ⇒ Length of BP is O(n/log n) 4 Representing Recursive Structure • opening pioneers and their matching parentheses are represented by a 0,1 vector B p select B, findcloseP1, rank B, p • B is a sparse vector of length 2n with O(n/log n) 1’s – Can be represented in O(n log log n/log n) bits P r ( (p) (q) (r) q p ( ( ) ) ) B 0100 0101 0000 0000 0010 1001 P1((())) 5 • Let S(n) denote the size of BP representation for an n node tree – S(n) = 2n + O(n log log n/log n) + S(O(n/log n)) • If the number of nodes becomes O(n/log2 n), a naïve data structure which stores all the answers uses only O(n/log n) bits • Therefore S(n) = 2n + O(n log log n/log n) 6 Algorithm for findclose • • • • • • To compute (p) = findclose(P,p) If p is not far, (p) is computed by a table Find the pioneer p* that immediately precedes p Find (p*) using the BP for pioneers If p is not pioneer, b((p)) b((p*)) The position of (p) is determined from the difference between depths of p and p* p* p ( ( (p) (p*) ) ) 7 enclose • Let (p) = enclose(P,p) • If b((p)) = b(p), (p) is found from a table • If b((p)) b(p), store those positions – also store positions of matching parentheses – if there are more than one pairs of parentheses, store only the outermost one • Recur for extracted parentheses ( ( (()))( ) ) ) 8 Additional Basic Operations on BP • rankp(P,i): number of pattern p in P[1..i] • selectp(P,i): position of i-th occurrence of p in P • If the length of p is constant, rank/select is done in O(1) time 1 3 2 4 5 6 11 8 7 9 10 1 2 3 4 5 6 7 8 9 10 11 P(()((()())())(()())()) rank()(P,10) = 3 9 Operations on Leaves [5] • Each leaf is represented by()in BP • Position of i-th leaf = select()(P, i) • Number of leaves in a subtree, leftmost/rightmost leaf in a subtree are also found 1 3 2 4 5 6 11 8 7 9 10 1 2 3 4 5 6 7 8 9 10 11 P(()((()())())(()())()) Subtree rooted at 3 10 Node Depths • Define excess array E[i] = rank((P,i) rank)(P,i) depth(v) = E[v] • E is not explicitly stored; it can be computed by the rank index on P 1 3 2 4 11 8 7 9 10 1 2 3 4 5 6 7 8 9 10 11 P (()((()())())(()())()) E 1212343432321232321210 5 6 11 Lowest Common Ancestor (lca) • lca = lowest common ancestor • u = lca(v,w): common ancestor of v and w which is furthest from root • Found in O(1) time u v w 12 • u = parent(RMQE(v,w)+1) – E is the excess array, which represents node depths m = RMQE(v,w): the index of a minimum value in E[v..w] (RMQ = Range Minimum Query) u 146 w5 3 7 v2 1 3 5 2 6 4 1 7 3 2 1 3 5 5 2 4 6 P (()((()())())(()())()) E 1212343432321232321210 u v mw 13 DFUDS Representation [6] • It encodes the degrees of nodes in unary codes in depth-first order (DFUDS = Depth First Unary Degree Sequence) 1 • Degree d ⇒ d (’s, followed by a ) • Add a dummy ( at the beginning 2 6 • 2n bits 3 DFUDS 4 5 7 8 U ((()((())))(())) 1 2 3 4 5 6 7 8 14 Lemma: The DFUDS of an n node ordered tree forms a balanced parentheses sequence of length 2n. Proof: For n = 1, the root has no children (degree 0). Its DFUDS is (). Assume that for any tree with at most n1 nodes, the lemma holds. Let U1, U2,..., Up denote the DFUDS for p trees. (Summation of numbers of nodes is n1, total length of their DFUDS’s is 2n2) Consider a tree whose root has those trees as its children. The DFUDS U for this tree is (( p )U1*U1* U *p Degree of root = p Ui whose dummy parenthesis at the head is removed 15 Head dummy parenthesis From the assumption of the induction, Ui is balanced. Because the head open parenthesis is removed, it lacks an open parenthesis to be balanced. The head dummy open parenthesis of U and the parentheses sequence for the root node ((p) have p open parentheses unbalanced. Therefore U is balanced. The number of nodes is n and the length of the sequence is 2n. This proves the lemma. p * 1 * 1 (( )U U U * p Ui whose dummy parenthesis at the head is remove Degree of root = p Head dummy parenthesis 16