Download Amortized Analysis and Union-Find - Algorithms for Massive Data Sets

Amortized Analysis and Union-Find 02283, Inge Li Gørtz mandag den 28. feb 2011 1 Today • Amortized analysis • 3 different methods • 2 examples • Union-Find data structures • Worst-case complexity • Amortized complexity mandag den 28. feb 2011 2 Amortized Analysis • Amortized analysis. • Average running time per operation over a worst-case sequence of operations. • Time required to perform a sequence of data operations is averaged over all the operations performed. • Motivation: traditional worst-case-per-operation analysis can give too pessimistic bound if the only way of having an expensive operation is to have a lot of cheap ones before it. • Different from average case analysis: average over time, not input. mandag den 28. feb 2011 3 Amortized Analysis • Methods. • Aggregate method • Accounting method • Potential method mandag den 28. feb 2011 4 Aggregate method • Aggregate. • Determine total cost. • Amortized cost = total cost/#operations. mandag den 28. feb 2011 5 Dynamic Tables • Doubling strategy. • Start with empty array of size 1. • Insert: If array is full create a new array of double the size and reinsert all elements. • Analysis: n insert operations. Assume n is a power of 2. • Number of insertions 1 + 2 + 4 + ... + 2log n = O(n). • Total cost: O(n). • Amortized cost per insert: O(1). mandag den 28. feb 2011 6 Accounting method • Accounting. • Some types of operations are overcharged. • Credit allocated with elements in the data structure used to pay for subsequent operations • Total credit non-negative at all times -> total amortized cost an upper bound on the actual cost. mandag den 28. feb 2011 7 Dynamic Tables • Amortized costs: • Amortized cost of insertion: 3 • 1 for own insertion • 1 for its first reinsertion. • 1 to pay for reinsertion of one of the items that have already been reinserted once. mandag den 28. feb 2011 8 Dynamic Tables • Analysis: keep 2 credits on each element in the array that is beyond the middle. • table not full: insert costs 1, and we have 2 credits to save. • table full, i.e., doubling: half of the elements have 2 credits each. Use these to pay for reinsertion of all in the new array. • Amortized cost per operation: 3. 2 2 2 x x x x x x x x x x x 2 2 2 2 2 2 2 2 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x mandag den 28. feb 2011 9 Example: Stack with MultiPop • Stack with MultiPop. • Push(e): push element e onto stack. • MultiPop(k): pop top k elements from the stack • Worst case: Implement via linked list or array. • Push: O(1). • MultiPop: O(k). • Amortized cost per operation: 2. mandag den 28. feb 2011 10 Stack: Aggregate Analysis • Amortized analysis. Sequence of n Push and MultiPop operations. • Each object popped at most once for each time it is pushed. • #pops on non-empty stack ≤ #Push operations ≤ n. • Total time O(n). • Amortized cost per operation: 2n/n = 2. mandag den 28. feb 2011 11 Stack: Accounting Method • Amortized analysis. Sequence of n Push and MultiPop operations. • Pay 2 credits for each Push. • Keep 1 credit on each element on the stack. • Amortized cost per operation: • Push: 2 • MultiPop: 1 (to pay for pop on empty stack). mandag den 28. feb 2011 12 Potential method • Potential functions. • Prepaid credit (potential) associated with the data structure (money in the bank). • Can be used to pay for future operations. • Ensure there is always enough “money in the bank”. • Amortized cost of an operation: potential cost plus increase in potential due to the operation. • Di: data structure after i operations • Potential function Φ(Di) maps Di onto a real value. • amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1). mandag den 28. feb 2011 13 Potential Functions • Amortized cost: • amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1). • Stack. • Φ(Di) = #elements on the stack. • amortized cost of Push = 1 + Δ(Di) = 2. • amortized cost of MultiPop(k): If k’=min(k,|S|) elements are popped. • if S ≠ : amortized cost = k‘+ Φ(Di) -Φ(Di-1) = k’ - k’ = 0. • if S = : amortized cost = 1 + Δ(Di) = 1. mandag den 28. feb 2011 14 Potential Functions • Amortized cost: • amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1). • Dynamic tables ! 2(k − L/2) if k ≥ L/2 • Φ(Di) = 0 otherwise • L = current array size, k = number of elements in array. • amortized cost of insertion: • Array not full: amortized cost = 1 + 2 = 3 • Array full (doubling): Actual cost = L + 1, Φ(Di-1) = L, Φ(Di)=2: amortized cost = L + 1 + (2 - L) = 3. mandag den 28. feb 2011 15 Amortized Cost vs Actual Cost • Total cost: • ∑ amortized cost = ∑(actual cost + Δ(Di)) =∑ actual cost + Φ(Dn) Φ(D0). • ∑ actual cost = ∑ amortized cost + Φ(D0) - Φ(Dn). • If potential always nonnegative and Φ(D0) = 0 then ∑ actual cost ≤ ∑ amortized cost. mandag den 28. feb 2011 16 Potential Method • Summary: 1. Pick a potential function, Φ, that will work (art). 2. Use potential function to bound the amortized cost of the operations you're interested in. 3. Bound Φ(D0) - Φ(Dfinal) • Techniques to find potential functions: if the actual cost of an operation is high, then decrease in potential due to this operation must be large, to keep the amortized cost low. mandag den 28. feb 2011 17 Union-Find Data Structures mandag den 28. feb 2011 18 Union-Find Data Structure • Union-Find data structure: • Makeset(x): Create a singleton set containing x and return its identifier. • Union(A,B): Combine the sets identified by A and B into a new set, destroying the old sets. Return the identifier of the new set. • Find(x): Return the identifier of the set containing x. • Only requirement for identifier: find(x) = find(y) iff x and y are in the same set. • Applications: Connectivity, Kruskal’s algorithm for MST, ... mandag den 28. feb 2011 19 A Simple Union-Find Data Structure • Quick-Union: • Each set represented by a tree. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x. • Find(x): Follow parent pointers to the root. Return the root. • Union(A,B): Make root(B) a child of root(A). mandag den 28. feb 2011 20 A Simple Union-Find Data Structure • Quick-Union: • Union(A,B): Make root(B) a child of root(A). 1 2 3 4 2 3 4 5 6 7 8 9 6 7 8 9 8 9 Union(7,5) 1 5 Union(3,1) 2 3 4 6 1 7 5 Union(7,8) 2 3 4 6 1 7 5 9 8 Union(3,7) 2 3 1 4 9 7 5 mandag den 28. feb 2011 6 8 21 A Simple Union-Find Data Structure • Quick-Union: • Each set represented by a tree. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x. • Find(x): Follow parent pointers to the root. Return the root. • Union(A,B): Make root(B) a child of root(A). • Analysis: • Make-Set(x) and Union(A,B): O(1) • Find(x): O(h), where h is the height of the tree containing x. Worst-case O(n). mandag den 28. feb 2011 22 A Simple Union-Find Data Structure • Quick Find: • Each set represented by a tree of height at most one. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x and size(x) = 1. • Find(x): Follow parent pointer to root. Return root. • Union(A,B): Move all elements from smallest set to larger set (change parent pointers). I.e., set p(B) = A and size(A) = size(A) + size(B). mandag den 28. feb 2011 23 A Simple Union-Find Data Structure • Quick Find: • Union(A,B): Move all elements from smallest set to larger set (change parent pointers). 1 2 3 4 2 3 4 5 6 7 8 9 6 7 8 9 8 9 Union(7,5) 1 5 Union(3,1) 2 3 4 6 1 7 5 Union(7,8) 2 3 4 6 1 7 5 9 8 Union(3,7) 2 4 6 7 1 mandag den 28. feb 2011 3 5 9 8 24 A Simple Union-Find Data Structure • Quick Find: • Each set represented by a tree of height at most one. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x and size(x) = 1. • Find(x): Follow parent pointer to root. Return root. • Union(A,B): Move all elements from smallest set to larger set (change parent pointers). I.e., set p(B) = A and size(A) = size(A) + size(B). • Analysis: • Make-Set(x) and Find(x): O(1) • Union(A,B): O(n) mandag den 28. feb 2011 25 Amortized Complexity of Quick-Find • Amortized analysis: Consider a sequence of k Unions. • Observation 1: How many elements can be touched by the k Unions? • Consider an element x: • What can we say about the size of the set containing x before and after a union that changes x’s parent pointer? • How large can the set containing x be after the k Unions? • How many times can x’s parent pointer be changed? mandag den 28. feb 2011 26 Amortized Complexity of Quick-Find • Amortized analysis: • Each time x’s parent pointer changes the size of the set containing it at least doubles. • At most 2k elements can be touched by k unions. • Size of set containing x after k unions at most 2k. • x’s parent pointer is updated at most lg(2k) times. • In total O(k log k) parent pointers updated in a sequence of k unions. • Amortized time per union: O(log k). • Lemma. Using the Quick-Find data structure a Find operation takes worst case time O(1), a Make-Set operation time O(1), and a sequence of n Union operations takes time O(n log n). mandag den 28. feb 2011 27 A Better Union-Find Data Structure • Union-by-Weight or Union-by-Rank. • Union-by-Weight. Make the root of the smallest tree a child of the root of the bigger tree. • Union-by-Rank. Each node x has an integer rank(x) associated. • Make-Set(x): Create a new node x. Set p(x) = x and rank(x) = 0. • Find(x): Follow parent pointers to the root. Return the root. • Union(A,B): 3 cases: • rank(A) > rank(B). Make B a child of A. • rank(A) < rank(B). Make A a child of B. • rank(A) = rank(B). Make B a child of A and set rank(A) = rank(A)+1. mandag den 28. feb 2011 28 Analysis of Union-by-Rank • Increasing ranks. • rank(x) < rank(p(x)). • A root of set containing x: Find(x) takes O(rank(A)+1) time. • rank(A) ≤ lg n: Show |A| ≥ 2rank(A) by induction. • A=Makeset(x): rank(A)=0 and |A| = 20 = 1. • A=Union(B,C): 2 cases • rank(A)=rank(B) or rank(A)=rank(C): ok, since set only got larger. • rank(B)=rank(C)=k and rank(A)=k+1. |A| = |B| + |C| ≥ 2k + 2k = 2k+1. • Lemma. Using the Union-by-Rank data structure a Find operation takes worst case time O(log n), and a Make-Set or Union operation takes time O(1). mandag den 28. feb 2011 29 Path Compression • Path compression. After each Find(x) operation, for all nodes y on the path from x to the root, set p(y) = x. • Example. Find(6). 7 7 3 5 1 4 9 8 6 1 2 4 3 5 8 9 6 2 mandag den 28. feb 2011 30 Path Compression • Path compression. After each Find(x) operation, for all nodes y on the path from x to the root, set p(y) = x. • Tarjan: Union-by-Rank and path compression. Starting from an empty data structure the total time of n Makes-Set operations, at most n Union operations, and m Find operations takes time O(n + m α(n)). • α(n). Extremely slowly growing inverse Ackermann’s function. • Analysis complicated, but algorithm simple. • 2 one-pass variants path halving and path splitting. Same asymptotic running time. mandag den 28. feb 2011 31 Ackermann’s function • Ackermann’s function ! j+1 if k = 0, Ak (j) = (j+1) Ak−1 (j) if k ≥ 1. • Inverse Ackermann: α(n) = min{k : Ak (1) ≥ n}. • Grows extremely slowly. mandag den 28. feb 2011 32 Path Compression: Analysis • Potential function • Potential of node x after i operations: φi (x). • Potential of forest after i operations: Φi (x) = ! φi (x). x • Auxiliary functions: • level(x) = max{k : rank(p(x)) ≥ Ak (rank(x))}. • iter(x) = max{j : rank(p(x)) ≥ (j) Alevel(x) (rank(x))} • Properties: (1) 0 ≤ level(x) ≤ α(x). (2) 1 ≤ iter(x) ≤ rank(x). mandag den 28. feb 2011 33 Path Compression: Analysis • Potential function ! α(n) · rank(x) if x root or rank(x) = 0 • φi (x) = (α(n) − level(x)) · rank(x) − iter(x) if x not root and rank(x) ≥ 1. • Potential of forest after i operations: Φi (x) = ! φi (x). x • Properties: (1) 0 ≤ level(x) ≤ α(x). (2) 1 ≤ iter(x) ≤ rank(x). (3) 0 ≤ φi (x) ≤ α(n) · rank(x). (4) If x not root and rank(x) > 0, φi (x) < α(n) · rank(x). mandag den 28. feb 2011 34 Path Compression: Analysis • Potential function ! α(n) · rank(x) if x root or rank(x) = 0 • φi (x) = (α(n) − level(x)) · rank(x) − iter(x) if x not root and rank(x) ≥ 1. • Potential of forest after i operations: Φi (x) = ! φi (x). x • Makeset(x): O(1) • Union(A,B): O(α(n)) • Find(x): O(α(n)) • Lemma (used in analysis of Union and Find). Suppose x not root, and ith operation Union or Find. Then φi (x) ≤ φi−1 (x). If also rank(x) > 0 and either level(x) or iter(x) changes, then φi (x) ≤ φi−1 (x) − 1. mandag den 28. feb 2011 35 Summary • Amortized analysis. • 2 Examples: Dynamic tables, stack with multipop. • Union-Find Data Structure. • Union-by-Rank + path compression: worst case + amortized bounds. mandag den 28. feb 2011 36

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Amortized Analysis and Union-Find - Algorithms for Massive Data Sets