Download Background

Chapter 6: Union-Find and Related Structures CS 6 3 1 0 A DVA NCE D D ATA S T R U CTU RE S HA DHA MUHI & HA S NA A I MA D Outline: • Background • 6.1 Union-Find: Merging Classes of a Partition • 6.2 Union-Find with Copies and Dynamic Segment Trees • 6.3 List Splitting • 6.4 Problems on Root-Directed Trees • 6.5 Maintaining a Linear Order Union WHOM??? & Find WHAT??? Background: Disjoint Data Set D i s j oi n t -set da ta s tru c tu re i s a da t a s t r uc t ur e t h a t ke e p s t r a c k o f a s e t o f e l e men t s p a r t i t io ne d i n t o a n umbe r o f di s jo i n t (n o n -o ve r lap ping) s ubs e t s . D i s jo in t -s et da t a s t r uc t ur e i s s o me t i me s c a l l e d a u ni o n - f i nd d a t a s t r u ct u r e . U n i on - f ind a lg ori thm i s a n a l g or it h m t h a t p e r f o r ms t h r e e us e f ul o p e r a t i ons o n s uch a da t a s t r uc t ur e : • F i n d : det er mi n e wh i ch s ubs et a p a r t i cular el ement i s i n. T h i s ca n be us ed f o r det er mi n i n g i f t w o e l e me n t s a r e i n t h e s a me s ubs e t . • U n i on : jo i n t w o s ubs e t s i n t o a s i n g le s ubs e t . • M a ke s et : ma ke s a s e t c o n t a ini ng o n l y a g i v e n e l e me n t ( a s i n g le t on) . Wi t h t h es e t h r ee op er a t ions, ma ny p r a ct i cal p a r t i t ioning p r obl ems ca n be s o l v ed. Background: Disjoint Data Set • Given a set {1, 2, …, n} of n elements. • Initially each element is in a different set. {1}, {2}, …, {n} • An intermixed sequence of union and find operations is performed. Background: Set as Tree S = { 2, 4 , 5 , 9 , 1 1 , 13, 30 } Some p os s i b l e tre e re p re s entati on s 5 13 13 4 2 9 11 30 4 5 13 9 4 5 11 11 30 2 2 9 30 Background: Root-Directed Trees • Union-Find structure can be represented by trees with parent pointers. • The root of the tree is called the Representative of the tree (class). Representative= 7 Representative= 13 13 4 9 7 5 11 8 3 22 30 10 2 1 6 20 16 14 12 Background: Union Operation • Union(i,j) : i and j are the roots of two different trees, i != j. • To unite the trees, make one tree a subtree of the other. • Time Complexity of Union operation is O(1) Background: Union Operation 7 13 4 9 5 11 8 3 22 6 30 2 10 1 Union(7,13) 20 16 14 12 Which tree should become a subtree of the other??? Background: Smart Union Approaches There are two approaches to apply union operation: • Union by Height • Union by Weight Background: Union by Height • Make tree with smaller height a subtree of the other tree. Union(7,13) = 13 Height = 4 13 4 9 2 5 11 7 Height = 3 30 8 3 22 1 6 10 20 16 14 12 Background: Union by Weight • The tree with fewer number of elements becomes subtree of the other tree. Union(7,13) = 7 Weight = 8 7 13 4 9 2 1 8 3 Weight = 10 22 6 5 10 11 30 20 16 14 12 Background: Find Operation • Find(i) is to identify the set that contains element i. • Start at the node that represents element i and climb up the tree until the root is reached. • Return the element in the root (Representative). • Time Complexity of find operation is O(h) 7 13 Find(9) = 13 9 4 Find(14) = 7 5 11 8 3 22 30 10 2 1 6 20 16 14 12 Quick Quiz: What is the total number of union operations to reach the last tree? n-1 What is the time complexity for Union (1, n)? O(n-1) = O(n) What are the applications that utilize this structure? There are many applications that utilize this structure. Some of them are: • Check the network connectivity. • Kruskal's minimum spanning tree algorithm. • Find least common ancestor. 6.1 Union-Find: Merging Classes of a Partition The classical version of the union-find structure works in the following model: • Items can be inserted into a set, each initially forming a one-element partition class. Items are identified by a pointer. So we have the following operations: • insert: takes an item, returns pointer to the node representing the item, and creates a oneelement class for it. (make_set operation) • join: takes two pointers to nodes and joins the classes containing these items. (union operation) • same_class: takes two pointers to nodes and decides whether their items are in the same class. (find operation with two inputs) 6.1 Union-Find: Merging Classes of a Partition same_class (i, j): we can query whether two items are in the same class by following from both nodes the path to their respective roots; they are in the same class if they reach the same root. Examples: same_class (h,i), same_class (m,r) join (i, j): We can join two classes by connecting the root of one tree to the root of the other tree. Example: join (a,d) 6.1 Union-Find: Merging Classes of a Partition • When joining two trees, which of the two roots should become the new root? • The time taken by the query is the length of the path to the root. Long path=long query time. join (a,c) The best-known solution by using the following two techniques: • Union by rank: Each node has rank field, which starts on insertion as 0. Each time we join two classes, the root with the larger rank becomes the new root, and if both roots have the same rank, we increase the rank of one of them. (rank is either the height or the weight of the tree) • Path compression: after each update, we go along the path and make all the nodes point directly to the root. Path compression 6.1 Union-Find: Merging Classes of a Partition • Worst case for each operation is O(h) which is O(log n) even without path compression. • O(log n) upper bound is only for a single operation while for sequence of operations a better amortized bound is achieved (after path compression). • The amortized complexity is expressed in a version of inverse Ackermann function which is a slow growing function unlike the fast growing Ackermann function. A(m, 0) = 0 for m ≥ 1, A(m, 1) = A(m − 1, 2) for m ≥ 1, A(0, n) = 2n for n ≥ 0, A(m, n) = A(m − 1, A(m, n − 1)) for m ≥ 1, n ≥ 2. • We define as inverse Ackermann function the function: α(n) = min{i | A(i, 1) > n} m = number of operations n = number of elements. 6.1 Union-Find: Merging Classes of a Partition Theorem. The union-find structure with union by rank and path compression supports the operations insert in O(1) and same_set and join in O(log n) time on a set with n elements. A sequence of m same_set or join operations on a set with n elements takes O((m + n)α(n)) time. 6.1 Union-Find: Merging Classes of a Partition • If we follow a node v over a sequence of operations, initially its rank is 0 and then it increases by some join operations, but only while v is still the root of its tree. • Once v becomes a non-root node, its rank cannot change further and it is not possible for a non-root node to become a root again. • If the node is a root then its level is 0, once the node becomes non-root its level increases. • After m sequence of operations done on node v, we first observe that the work done with v while v is a root node is O(1) and each operation touches at most two root nodes, and so the part of the work done on root nodes by the m operations is O(m). • The majority of the work is done on nodes when they are non-root nodes which is path compression • To reduce the worst-case complexity of the union-find operations, tree height which is determined by the number of nodes and their indegree should be reduced by increasing the indegree of the nodes. • Reduction is performed in join operation on two roots with same height and small indegree by redirecting edges of one root to the other root so that the new root has larger indegree but same height. • Height of a tree with n nodes and k indegree is Θ (logk(n)) = Θ (log(n) / log(k)) 6.1 Union-Find: Merging Classes of a Partition • Time of join operation depends on the indegree of the root plus the height of the tree which can not be better than O( log(n) / log(log(n)) ) • Rules for joining two components with roots r and s:• If r->height >= s->height: s and its lower neighbors are made pointed to a lower neighbor of r • Else if r->height = s->height: All lower neighbors of s are added to the list of lower neighbors of r, • If r->height > r->indegree: s is made to point to one lower neighbor of r. • Else s becomes the new root with r as its only lower neighbor 6.1 Union-Find: Merging Classes of a Partition Theorem. The union-find structure described before supports the operations insert in O(1) and same_set and join in O( log(n) / log(log(n) ) time on a set with n elements. 6.2 Union-Find with Copies and Dynamic Segment Trees • Model is very restricted (sets have to be disjoint and we can take only unions of them). • After n − 1 unions everything is in the same class. • Union-copy structure: keeps track of a set of items represented by fingers, and sets also represented by fingers. Example: Items = { 1, 2, 3, 4, 5, 6, 7} items Sets = A, B, C, D, E sets 6.2 Union-Find with Copies and Dynamic Segment Trees The underlying representation of the set system is as follows: • The data structure consists of item nodes, set nodes, and sets in two extended union-find structures – labeled A and B – which allow both normal and listing queries. items items sets sets • Each item node has exactly one outgoing edge. • A has at least two incoming edges and one outgoing edge. • B has exactly one incoming edge and at least two outgoing edges. • Each set node has exactly one incoming edge. • Each set node has exactly one outgoing edge. • B has at least two incoming edges and one outgoing edge. • A has exactly one incoming edge and at least two outgoing edges. • Each item node has exactly one incoming edge. 6.2 Union-Find with Copies and Dynamic Segment Trees It supports the following operations which are symmetric with respect to the role of items and sets: create item create set insert list sets join sets copy set destroy set list items join items copy item destroy item 6.2 Union-Find with Copies and Dynamic Segment Trees list_items: lists all items for a given set. Example: list_items (B). 1. Put the initial outgoing edge of the set node on the stack. 2. While the stack is not empty, take the next edge from the stack. 2.1 If this edge goes to an item node, list that item. 2.2 If this edge goes to union-find structure A, perform a listing query and put all outgoing edges on the stack. 2.3 If this edge goes to union-find structure B, perform a naming query and put the one outgoing edge on the stack. • The total complexity of a list items query that returns k items is O(k uf(n)). • The same holds for list sets query. 6.2 Union-Find with Copies and Dynamic Segment Trees copy_set: Creates representation for a new set, which is a copy of the given set, and returns a finger to it. It takes only constant time O(1). The same holds for copy_items query. • Given the set node, we follow the outgoing edge. There are only two cases: 1. The outgoing edge of the set node directly goes to an item node or to the structure A: We create a new set node and a set with two new elements in structure B. The two set nodes are joined to the elements in B, and the name of the set in B is the previous outgoing pointer of the node to be duplicated. Then we return the new set node. 2. The outgoing edge of the set node goes to the structure B: We create a new set node and a new element in B, join the set node to the element, and insert the element in the set that the previous outgoing edge pointed to. Case 1 Case 1 Case 2 6.2 Union-Find with Copies and Dynamic Segment Trees join_sets: Replaces the first set by the union of two given sets and destroys the other set. Requires the two sets to be disjoint. Given two set nodes X and Y. 1. Both go to nodes in structure A. 2. The first set node points to a node in structure A and the second to a node in structure B. 3. Both set nodes point to nodes in structure B or item nodes. 4. The first set node points to a node in structure A and the second to an item node. So the complexity of join_sets and join_items is O(uf(n)). Case 1 Case 2 Case 3 Case 4 6.2 Union-Find with Copies and Dynamic Segment Trees insert: Inserts a given item y in a given set X such that y ∉ X. 1. The set node points to a node in structure A, and the item node points to a node in structure A or a set node. 2. The set node points to a node in structure A, and the item node points to a node in structure B. 3. The set node points to a node in structure B or an item node, and the item node points to a node in structure A or a set node. 4. The set node points to a node in structure B or an item node, and the ite node points to a node in structure B. Case 1 Case 2 Case 3 Case 4 6.2 Union-Find with Copies and Dynamic Segment Trees Theorem. The union-copy structure keeps track of a system of sets of total size n, supporting the operations • create item, create set, insert, copy set, copy item in O(1); • list sets, list items in output-sensitive O(k uf(n)) time if the output has size k; • join sets, join items in O(uf(n)) time; • destroy set, destroy item in O(k uf(n)) time if the size of the destroyed object was k. 6.2 Union-Find with Copies and Dynamic Segment Trees • Segment tree is static data structure. • It can be dynamic using union-find structure. • Theorem. A segment tree that uses the union-copy structure to represent the sets associated with the tree nodes supports insert into a tree already containing n intervals in O(log n) time and list intervals for a query value contained in k intervals in O(log n + k uf(n)) output-sensitive time. Each set of intervals is represented by union-find structure 6.3 List Splitting • In union-find model, we continue joining classes until all elements are in one class, but • How to split a class into two classes, which element goes to which subclass? • This is achieved by assuming that all elements are linearly ordered. • Split is specified by an item by cutting the list in the right of the item to have smaller lists • The model for list-splitting problem is initially an ordered list of n items, each of them with a weight. • Later, this list is replaced by a set of lists which partition the items into intervals in the original ordering. • It supports the following operations: • split: splits the current list into two lists. • same_list: checks if two items are in the same list. • max_weight: returns a finger to the item of maximum weight. 6.3 List Splitting split (e) split (h) split (d) 6.3 List Splitting • The three operations listed previously can be supported by a balanced search tree such as heightbalanced trees or red-black trees. • We build a single balanced tree from the list in O(n) time as preprocessing and include in each node a pointer to the maximum weight item in its subtree. • Then each splitting operation splits the current tree in two trees: for each same_list query we just go up to the root of the current tree and check whether both nodes arrive at the same root. • For the max weight query we go to the root and report the pointer stored in it. • Each of these operations takes just O(log n) worst-case time. • This is even a dynamic data structure: we can insert new elements in a sublist as neighbor of a given element, and we can delete elements and join lists again if the tree supports this 6.3 List Splitting • Theorem. Using any balanced search tree that supports split and join, we can build a dynamic structure that supports list splitting, with operations split, same list, max weight, join, insert, and delete, all in O(log n) worst-case time on a list of initial length n. 6.4 Problems on Root-Directed Trees Least common ancestor (lca) : given two nodes of the tree, each node defines a path to the root. What is the first node that lies on both paths? lca=first common ancestor of two given nodes 6.4 Problems on Root-Directed Trees The lca structure keeps track of a set of root-directed trees and supports at least following operations: • create tree: Creates a new tree with just one node, the root, and returns a pointer to that root. • add leaf: Adds a new leaf that is linked to a given node and returns a pointer to that new leaf node. • lca: Returns a pointer to the least common ancestor of the two given nodes or NULL if they are not in the same tree. • link: Takes two nodes x and y and different subtrees, of which x is root of its subtree, and links the subtrees by introducing an edge from x to y (not all structures support it) • delete leaf: Removes a given node, which must be a leaf. • cut: Removes the link from a given node to its upper neighbor, making the given node the root of a new tree. • find root: Returns a pointer to the root of a given node. • depth: Returns the distance to the root of a given node. 6.4 Problems on Root-Directed Trees Case 1: x and y have equal depths Forward pointers of x 0 1 2 j Length= 2j 0 1 1 2 2 4 3 8 … … 2j Forward pointers of y Log h 6.4 Problems on Root-Directed Trees Case 2: x and y have different depths (k1 and k2, k1>k2) • Replace the node at depth k1 by the node along its path to the root k1 − k2 steps on. • Generate the equal depth case. Example: k1=9 k2=6 lca K1-k2=3 x` Forward pointers of x` x y Forward pointers of y 6.4 Problems on Root-Directed Trees Theorem. The lca structure based on trees with lists of exponential forward pointers attached to the nodes supports create tree, depth in O(1) and add leaf, delete leaf, lca, and find root in time O(log h), where h is the maximum height of the trees in the underlying set. 6.5 Maintaining a Linear Order • The structure we are implementing is not necessarily a linked list BUT the underlying abstract model is a set with a linear order, which can be visualized by a list. • Maintain a linear order under insertion and deletion. • The elements are identified by fingers (pointers). Therefore, the following operations should be supported: • insert(x, y): Inserts x as immediate smaller neighbor of y and returns a finger to x. • delete(x): Deletes element x. • compare(x, y): Decides whether x is smaller than y in the current linear order. • This problem would be easy if the elements came with a key and the order was the order of the keys. Then we needed just a key comparison to check the order relation. Our problem is that we have to assign these keys based on the neighbor information at the insertion time. • Solution: use a balanced search tree with elements on leaves and compare two elements (left-toright order) by going the path up to the root and checking the order in which the paths enter their first common vertex. 6.5 Maintaining a Linear Order • Theorem. Using a balanced search tree that allows constant time update at a known location, we can maintain a linear order with O(1) worst-case time of insert and delete and O(log n) worst-case time of compare. References: • Fundamentals of Data Structures in C++, Ellis Horowitz, Sartaj Sahni and Dinesh Mehta, 2nd Edition. http://inside.mines.edu/~dmehta/FDS_CPP/PPT/index.shtml • Advanced Data Structures, Peter Brass, Cambridge University Press, 2008. • Disjoint-Set Data Structures, Lecture 16. http://ocw.mit.edu/index.htm (MIT open courses). • Computer Algorithms C++, Horowitz and Sahni, 1998. • Union-copy structures and dynamic segment trees, Kreveld and Overmars, 1991. • Efficiency of a good But Not Linear Set Union Algorithm, E. Tarjan, University of California, 1975. Any Questions?? Thank You..

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Background