Download Amortized Analysis and Union-Find - Algorithms for Massive Data Sets

Amortized Analysis and Union-Find 02283, Inge Li Gørtz 1 Today • Amortized analysis • 3 different methods • 2 examples • Union-Find data structures • Worst-case complexity • Amortized complexity 2 Amortized Analysis • Amortized analysis. • Average running time per operation over a worst-case sequence of operations. • Time required to perform a sequence of data operations is averaged over all the operations performed. • Motivation: traditional worst-case-per-operation analysis can give too pessimistic bound if the only way of having an expensive operation is to have a lot of cheap ones before it. • Different from average case analysis: average over time, not input. 3 Amortized Analysis • Methods. • Aggregate method • Accounting method • Potential method 4 Aggregate method • Aggregate. • Determine total cost. • Amortized cost = total cost/#operations. 5 Dynamic Tables • Doubling strategy. • Start with empty array of size 1. • Insert: If array is full create a new array of double the size and reinsert all elements. • Analysis: n insert operations. Assume n is a power of 2. • Number of insertions 1 + 2 + 4 + ... + 2log n = O(n). • Total cost: O(n). • Amortized cost per insert: O(1). 6 Accounting method • Accounting. • Some types of operations are overcharged. • Credit allocated with elements in the data structure used to pay for subsequent operations • Total credit non-negative at all times -> total amortized cost an upper bound on the actual cost. 7 Dynamic Tables • Amortized costs: • Amortized cost of insertion: 3 • 1 for own insertion • 1 for its first reinsertion. • 1 to pay for reinsertion of one of the items that have already been reinserted once. 8 Dynamic Tables • Analysis: keep 2 credits on each element in the array that is beyond the middle. • table not full: insert costs 1, and we have 2 credits to save. • table full, i.e., doubling: half of the elements have 2 credits each. Use these to pay for reinsertion of all in the new array. • Amortized cost per operation: 3. 2 2 2 x x x x x x x x x x x 2 2 2 2 2 2 2 2 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 9 Example: Stack with MultiPop • Stack with MultiPop. • Push(e): push element e onto stack. • MultiPop(k): pop top k elements from the stack • Worst case: Implement via linked list or array. • Push: O(1). • MultiPop: O(k). • Amortized cost per operation: 2. 10 Stack: Aggregate Analysis • Amortized analysis. Sequence of n Push and MultiPop operations. • Each object popped at most once for each time it is pushed. • #pops on non-empty stack ≤ #Push operations ≤ n. • Total time O(n). • Amortized cost per operation: 2n/n = 2. 11 Stack: Accounting Method • Amortized analysis. Sequence of n Push and MultiPop operations. • Pay 2 credits for each Push. • Keep 1 credit on each element on the stack. • Amortized cost per operation: • Push: 2 • MultiPop: 1 (to pay for pop on empty stack). 12 Potential method • Potential functions. • Prepaid credit (potential) associated with the data structure (money in the bank). • Can be used to pay for future operations. • Ensure there is always enough “money in the bank”. • Amortized cost of an operation: potential cost plus increase in potential due to the operation. • Di: data structure after i operations • Potential function Φ(Di) maps Di onto a real value. • amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1). 13 Potential Functions • Amortized cost: • amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1). • Stack. • Φ(Di) = #elements on the stack. • amortized cost of Push = 1 + Δ(Di) = 2. • amortized cost of MultiPop(k): If k’=min(k,|S|) elements are popped. • if S ≠ ∅: amortized cost = k‘+ Φ(Di) -Φ(Di-1) = k’ - k’ = 0. • if S = ∅: amortized cost = 1 + Δ(Di) = 1. 14 Potential Functions • Amortized cost: • amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1). • Dynamic tables 2(k • Φ(Di) = 0 L/2) if k ⇥ L/2 otherwise • L = current array size, k = number of elements in array. • amortized cost of insertion: • Array not full: amortized cost = 1 + 2 = 3 • Array full (doubling): Actual cost = L + 1, Φ(Di-1) = L, Φ(Di)=2: amortized cost = L + 1 + (2 - L) = 3. 15 Amortized Cost vs Actual Cost • Total cost: • ∑ amortized cost = ∑(actual cost + Δ(Di)) =∑ actual cost + Φ(Dn) Φ(D0). • ∑ actual cost = ∑ amortized cost + Φ(D0) - Φ(Dn). • If potential always nonnegative and Φ(D0) = 0 then ∑ actual cost ≤ ∑ amortized cost. 16 Potential Method • Summary: 1. Pick a potential function, Φ, that will work (art). 2. Use potential function to bound the amortized cost of the operations you're interested in. 3. Bound Φ(D0) - Φ(Dfinal) • Techniques to find potential functions: if the actual cost of an operation is high, then decrease in potential due to this operation must be large, to keep the amortized cost low. 17 Union-Find Data Structures 18 Union-Find Data Structure • Union-Find data structure: • Makeset(x): Create a singleton set containing x and return its identifier. • Union(A,B): Combine the sets identified by A and B into a new set, destroying the old sets. Return the identifier of the new set. • Find(x): Return the identifier of the set containing x. • Only requirement for identifier: find(x) = find(y) iff x and y are in the same set. • Applications: Connectivity, Kruskal’s algorithm for MST, ... 19 A Simple Union-Find Data Structure • Quick-Union: • Each set represented by a tree. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x. • Find(x): Follow parent pointers to the root. Return the root. • Union(A,B): Make root(B) a child of root(A). 20 A Simple Union-Find Data Structure • Quick-Union: • Union(A,B): Make root(B) a child of root(A). 1 2 3 4 2 3 4 5 6 7 8 9 6 7 8 9 8 9 Union(7,5) 1 5 Union(3,1) 2 3 4 6 1 7 5 Union(7,8) 2 3 4 6 1 7 5 9 8 Union(3,7) 2 3 1 4 6 9 7 5 8 21 A Simple Union-Find Data Structure • Quick-Union: • Each set represented by a tree. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x. • Find(x): Follow parent pointers to the root. Return the root. • Union(A,B): Make root(B) a child of root(A). • Analysis: • Make-Set(x) and Union(A,B): O(1) • Find(x): O(h), where h is the height of the tree containing x. Worst-case O(n). 22 A Simple Union-Find Data Structure • Quick Find: • Each set represented by a tree of height at most one. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x and size(x) = 1. • Find(x): Follow parent pointer to root. Return root. • Union(A,B): Move all elements from smallest set to larger set (change parent pointers). I.e., set p(B) = A and size(A) = size(A) + size(B). 23 A Simple Union-Find Data Structure • Quick Find: • Union(A,B): Move all elements from smallest set to larger set (change parent pointers). 1 2 3 4 2 3 4 5 6 7 8 9 6 7 8 9 8 9 Union(7,5) 1 5 Union(3,1) 2 3 4 6 1 7 5 Union(7,8) 2 3 4 6 1 7 5 9 8 Union(3,7) 2 4 6 7 1 3 5 9 8 24 A Simple Union-Find Data Structure • Quick Find: • Each set represented by a tree of height at most one. Elements are represented by nodes. Root is also identifier. • Make-Set(x): Create a new node x. Set p(x) = x and size(x) = 1. • Find(x): Follow parent pointer to root. Return root. • Union(A,B): Move all elements from smallest set to larger set (change parent pointers). I.e., set p(B) = A and size(A) = size(A) + size(B). • Analysis: • Make-Set(x) and Find(x): O(1) • Union(A,B): O(n) 25 Amortized Complexity of Quick-Find • Amortized analysis: Consider a sequence of k Unions. • Observation 1: How many elements can be touched by the k Unions? • Consider an element x: • What can we say about the size of the set containing x before and after a union that changes x’s parent pointer? • How large can the set containing x be after the m Unions? • How many times can x’s parent pointer be changed? 26 Amortized Complexity of Quick-Find • Amortized analysis: • Each time x’s parent pointer changes the size of the set containing it at least doubles. • At most 2k elements can be touched by k unions. • Size of set containing x after k unions at most 2k. • x’s parent pointer is updated at most lg(2k) times. • In total O(k log k) parent pointers updated in a sequence of k unions. • Amortized time per union: O(log k). • Lemma. Using the Quick-Find data structure a Find operation takes worst case time O(1), a Make-Set operation time O(1), and a sequence of n Union operations takes time O(n log n). 27 A Better Union-Find Data Structure • Union-by-Weight or Union-by-Rank. • Union-by-Weight. Make the root of the smallest tree a child of the root of the bigger tree. • Union-by-Rank. Each node x has an integer rank(x) associated. • Make-Set(x): Create a new node x. Set p(x) = x and rank(x) = 0. • Find(x): Follow parent pointers to the root. Return the root. • Union(A,B): 3 cases: • rank(A) > rank(B). Make B a child of A. • rank(A) < rank(B). Make A a child of B. • rank(A) = rank(B). Make B a child of A and set rank(A) = rank(A)+1. 28 Analysis of Union-by-Rank • Increasing ranks. • rank(x) < rank(p(x)). • A root of set containing x: Find(x) takes O(rank(A)+1) time. • rank(A) ≤ lg n: Show |A| ≥ 2rank(A) by induction. • A=Makeset(x): rank(A)=0 and |A| = 20 = 1. • A=Union(B,C): 2 cases • rank(A)=rank(B) or rank(A)=rank(C): ok, since set only got larger. • rank(B)=rank(C)=k and rank(A)=k+1. |A| = |B| + |C| ≥ 2k + 2k = 2k+1. • Lemma. Using the Union-by-Rank data structure a Find operation takes worst case time O(log n), and a Make-Set or Union operation takes time O(1). 29 Path Compression • Path compression. After each Find(x) operation, for all nodes y on the path from x to the root, set p(y) = x. • Example. Find(6). 7 7 3 5 1 4 9 8 6 1 2 4 3 5 8 9 6 2 30 Path Compression • Path compression. After each Find(x) operation, for all nodes y on the path from x to the root, set p(y) = x. • Tarjan: Union-by-Rank and path compression. Starting from an empty data structure the total time of n Makes-Set operations, at most n Union operations, and m Find operations takes time O(n + m α(n)). • α(n). Extremely slowly growing inverse Ackermann’s function. • Analysis complicated, but algorithm simple. • 2 one-pass variants path halving and path splitting. Same asymptotic running time. 31 Ackermann’s function • Ackermann’s function Ak (j) = j+1 if k = 0, (j+1) Ak 1 (j) if k 1. • Inverse Ackermann: (n) = min{k : Ak (1) n}. • Grows extremely slowly. 32 Path Compression: Analysis • Potential function • Potential of node x after i operations: • Potential of forest after i operations: i (x). i (x) = i (x). x • Auxiliary functions: (for x, where rank(x) >0) • level(x) = max{k : rank(p(x)) • iter(x) = max{j : rank(p(x)) Ak (rank(x))}. (j) Alevel(x) (rank(x))} • Properties: (1) 0  level(x) < ↵(n) (2) 1 iter(x) rank(x). 33 Path Compression: Analysis • Potential function (n) · rank(x) ( (n) level(x)) · rank(x) • ⇥i (x) = if x root or rank(x) = 0 iter(x) if x not root and rank(x) ⇤ 1. • Potential of forest after i operations: i (x) = i (x). x • Properties: (1) 0  level(x) < ↵(n) (2) 1 iter(x) rank(x). (3) 0 ⇥ ⇥i (x) ⇥ (n) · rank(x). (4) If x not root and rank(x) > 0, ⇥i (x) < (n) · rank(x). 34 Path Compression: Analysis • Potential function • ⇥i (x) = (n) · rank(x) ( (n) level(x)) · rank(x) if x root or rank(x) = 0 iter(x) if x not root and rank(x) ⇤ 1. • Potential of forest after i operations: • Makeset(x): O(1) i (x) = i (x). x • Union(A,B) og Find(x): O(α(n)) • Lemma 1. Suppose x not root, and ith operation Union or Find. Then • x’s potential cannot increase • if x has positive rank and iter or level changes, then x’s potential decrease by at least 1. 35 Path Compression: Analysis • Lemma 1. Suppose x not root, and ith operation Union or Find. Then • x’s potential cannot increase • if x has positive rank and iter or level changes, then x’s potential decrease by at least 1. • Proof: • x not root => rank(x) unchanged • n does not change => α(n) • rank(x) = 0: ok 36 Path Compression: Analysis • Lemma 1. Suppose x not root, and ith operation Union or Find. Then • x’s potential cannot increase • if x has positive rank and iter or level changes, then x’s potential decrease by at least 1. • Proof: rank(x)>0 • level increases monotonically over time • level unchanged => iter either increase or unchanged • both unchanged: ok 37 Path Compression: Analysis • Lemma 1. Suppose x not root, and ith operation Union or Find. Then • x’s potential cannot increase • if x has positive rank and iter or level changes, then x’s potential decrease by at least 1. • Proof: rank(x)>0 • level unchanged and iter increases: iter increase by at least 1. • level increases: • level increases by at least 1 => (α(n)-level)rank(x)-iter(x) drops by at least rank(x). • iter might decrease at most rank(x)-1 • x’s potential decrease by at least 1. 38 Path Compression: Analysis • Lemma 1. Suppose x not root, and ith operation Union or Find. Then • x’s potential cannot increase • if x has positive rank and iter or level changes, then x’s potential decrease by at least 1. • Lemma. The amortized cost of Union(A,B) is O(α(n)). • Assume B made parent of A. • Real cost 1. Show increase in potential at most α(n). • Only A’s, B’s and the children of B’s potential can change. • potential of B’s children can only decrease. • A: decreases due to property 4 (before operation it was α(n)rank(A)). • B: root both before and after. rank increase by at most 1 => potential of B increases with at most α(n). 39 Path Compression: Analysis • Lemma 1. Suppose x not root, and ith operation Union or Find. Then • x’s potential cannot increase • if x has positive rank and iter or level changes, then x’s potential decrease by at least 1. • Lemma. The amortized cost of Find(x) is O(α(n)). • Real cost s = length of path. • Show decrease in potential at least max{0, s - α(n)+2}. • No nodes potential increase: (Lemma + rank of root unchanged). • Show at least max{0, s - α(n)+2} nodes’ potential decrease with at least 1. • Amortized cost = s - max{0, s - α(n)+2} = O(α(n)). 40 Path Compression: Analysis • Show at least max{0, s - α(n)+2} nodes’ potential decrease with at least 1. • x node such that • rank(x) > 0 • x has an ancestor y (not the root) with level(x) = level(y) before the Find operation. • x’s potential decrease by at least 1: k = level(x), i = iter(x) before Find. • Before Find operation rank(p(y)) Ak (rank(y)) Ak (rank(p(x)) Ak (Aik (rank(x)) = (i+1) Ak (rank(x)). • After: rank(p(x)) = rank(p(y)), rank(p(y)) not decreased, and rank(x) unchanged: Ak (rank(p(x))) (i+1) Ak (rank(x)) • Either iter(x) or level (x) increases. Lemma 1 implies potential of x decreases. 41 Summary • Amortized analysis. • 2 Examples: Dynamic tables, stack with multipop. • Union-Find Data Structure. • Union-by-Rank + path compression: worst case + amortized bounds. 42

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Amortized Analysis and Union-Find - Algorithms for Massive Data Sets