Download Amortized Analysis and Union-Find - Algorithms for Massive Data Sets

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Array data structure wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Amortized Analysis and Union-Find
02283, Inge Li Gørtz
mandag den 28. feb 2011
1
Today
• Amortized analysis
• 3 different methods
• 2 examples
• Union-Find data structures
• Worst-case complexity
• Amortized complexity
mandag den 28. feb 2011
2
Amortized Analysis
• Amortized analysis.
• Average running time per operation over a worst-case sequence of
operations.
• Time required to perform a sequence of data operations is
averaged over all the operations performed.
• Motivation: traditional worst-case-per-operation analysis can give
too pessimistic bound if the only way of having an expensive
operation is to have a lot of cheap ones before it.
• Different from average case analysis: average over time, not input.
mandag den 28. feb 2011
3
Amortized Analysis
• Methods.
• Aggregate method
• Accounting method
• Potential method
mandag den 28. feb 2011
4
Aggregate method
• Aggregate.
• Determine total cost.
• Amortized cost = total cost/#operations.
mandag den 28. feb 2011
5
Dynamic Tables
• Doubling strategy.
• Start with empty array of size 1.
• Insert: If array is full create a new array of double the size and
reinsert all elements.
• Analysis: n insert operations. Assume n is a power of 2.
• Number of insertions 1 + 2 + 4 + ... + 2log n = O(n).
• Total cost: O(n).
• Amortized cost per insert: O(1).
mandag den 28. feb 2011
6
Accounting method
• Accounting.
• Some types of operations are overcharged.
• Credit allocated with elements in the data structure used to pay for
subsequent operations
• Total credit non-negative at all times -> total amortized cost an
upper bound on the actual cost.
mandag den 28. feb 2011
7
Dynamic Tables
• Amortized costs:
• Amortized cost of insertion: 3
• 1 for own insertion
• 1 for its first reinsertion.
• 1 to pay for reinsertion of one of the items that have already
been reinserted once.
mandag den 28. feb 2011
8
Dynamic Tables
• Analysis: keep 2 credits on each element in the array that is beyond
the middle.
• table not full: insert costs 1, and we have 2 credits to save.
• table full, i.e., doubling: half of the elements have 2 credits each.
Use these to pay for reinsertion of all in the new array.
• Amortized cost per operation: 3.
2 2 2
x x x x x x x x x x x
2 2 2 2 2 2 2 2
x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x
mandag den 28. feb 2011
9
Example: Stack with MultiPop
• Stack with MultiPop.
• Push(e): push element e onto stack.
• MultiPop(k): pop top k elements from the stack
• Worst case: Implement via linked list or array.
• Push: O(1).
• MultiPop: O(k).
• Amortized cost per operation: 2.
mandag den 28. feb 2011
10
Stack: Aggregate Analysis
• Amortized analysis. Sequence of n Push and MultiPop operations.
• Each object popped at most once for each time it is pushed.
• #pops on non-empty stack ≤ #Push operations ≤ n.
• Total time O(n).
• Amortized cost per operation: 2n/n = 2.
mandag den 28. feb 2011
11
Stack: Accounting Method
• Amortized analysis. Sequence of n Push and MultiPop operations.
• Pay 2 credits for each Push.
• Keep 1 credit on each element on the stack.
• Amortized cost per operation:
• Push: 2
• MultiPop: 1 (to pay for pop on empty stack).
mandag den 28. feb 2011
12
Potential method
• Potential functions.
• Prepaid credit (potential) associated with the data structure (money
in the bank).
• Can be used to pay for future operations.
• Ensure there is always enough “money in the bank”.
• Amortized cost of an operation: potential cost plus increase in
potential due to the operation.
• Di: data structure after i operations
• Potential function Φ(Di) maps Di onto a real value.
• amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1).
mandag den 28. feb 2011
13
Potential Functions
• Amortized cost:
• amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1).
• Stack.
• Φ(Di) = #elements on the stack.
• amortized cost of Push = 1 + Δ(Di) = 2.
• amortized cost of MultiPop(k): If k’=min(k,|S|) elements are popped.
• if S ≠
: amortized cost = k‘+ Φ(Di) -Φ(Di-1) = k’ - k’ = 0.
• if S =
: amortized cost = 1 + Δ(Di) = 1.
mandag den 28. feb 2011
14
Potential Functions
• Amortized cost:
• amortized cost = actual cost + Δ(Di) = actual cost + Φ(Di) - Φ(Di-1).
• Dynamic tables
!
2(k − L/2) if k ≥ L/2
• Φ(Di) =
0
otherwise
• L = current array size, k = number of elements in array.
• amortized cost of insertion:
• Array not full: amortized cost = 1 + 2 = 3
• Array full (doubling): Actual cost = L + 1, Φ(Di-1) = L, Φ(Di)=2:
amortized cost = L + 1 + (2 - L) = 3.
mandag den 28. feb 2011
15
Amortized Cost vs Actual Cost
• Total cost:
• ∑ amortized cost = ∑(actual cost + Δ(Di)) =∑ actual cost + Φ(Dn) Φ(D0).
• ∑ actual cost = ∑ amortized cost + Φ(D0) - Φ(Dn).
• If potential always nonnegative and Φ(D0) = 0 then
∑ actual cost ≤ ∑ amortized cost.
mandag den 28. feb 2011
16
Potential Method
• Summary:
1. Pick a potential function, Φ, that will work (art).
2. Use potential function to bound the amortized cost of the
operations you're interested in.
3. Bound Φ(D0) - Φ(Dfinal)
• Techniques to find potential functions: if the actual cost of an
operation is high, then decrease in potential due to this operation
must be large, to keep the amortized cost low.
mandag den 28. feb 2011
17
Union-Find Data Structures
mandag den 28. feb 2011
18
Union-Find Data Structure
• Union-Find data structure:
• Makeset(x): Create a singleton set containing x and return its identifier.
• Union(A,B): Combine the sets identified by A and B into a new set, destroying
the old sets. Return the identifier of the new set.
• Find(x): Return the identifier of the set containing x.
• Only requirement for identifier: find(x) = find(y) iff x and y are in the same set.
• Applications: Connectivity, Kruskal’s algorithm for MST, ...
mandag den 28. feb 2011
19
A Simple Union-Find Data Structure
• Quick-Union:
• Each set represented by a tree. Elements are represented by nodes. Root is
also identifier.
• Make-Set(x): Create a new node x. Set p(x) = x.
• Find(x): Follow parent pointers to the root. Return the root.
• Union(A,B): Make root(B) a child of root(A).
mandag den 28. feb 2011
20
A Simple Union-Find Data Structure
• Quick-Union:
• Union(A,B): Make root(B) a child
of root(A).
1
2
3
4
2
3
4
5
6
7
8
9
6
7
8
9
8
9
Union(7,5)
1
5
Union(3,1)
2
3
4
6
1
7
5
Union(7,8)
2
3
4
6
1
7
5
9
8
Union(3,7)
2
3
1
4
9
7
5
mandag den 28. feb 2011
6
8
21
A Simple Union-Find Data Structure
• Quick-Union:
• Each set represented by a tree. Elements are represented by nodes. Root is
also identifier.
• Make-Set(x): Create a new node x. Set p(x) = x.
• Find(x): Follow parent pointers to the root. Return the root.
• Union(A,B): Make root(B) a child of root(A).
• Analysis:
• Make-Set(x) and Union(A,B): O(1)
• Find(x): O(h), where h is the height of the tree containing x. Worst-case O(n).
mandag den 28. feb 2011
22
A Simple Union-Find Data Structure
• Quick Find:
• Each set represented by a tree of height at most one. Elements are
represented by nodes. Root is also identifier.
• Make-Set(x): Create a new node x. Set p(x) = x and size(x) = 1.
• Find(x): Follow parent pointer to root. Return root.
• Union(A,B): Move all elements from smallest set to larger set (change parent
pointers). I.e., set p(B) = A and size(A) = size(A) + size(B).
mandag den 28. feb 2011
23
A Simple Union-Find Data Structure
• Quick Find:
• Union(A,B): Move all elements
from smallest set to larger set
(change parent pointers).
1
2
3
4
2
3
4
5
6
7
8
9
6
7
8
9
8
9
Union(7,5)
1
5
Union(3,1)
2
3
4
6
1
7
5
Union(7,8)
2
3
4
6
1
7
5
9
8
Union(3,7)
2
4
6
7
1
mandag den 28. feb 2011
3
5
9
8
24
A Simple Union-Find Data Structure
• Quick Find:
• Each set represented by a tree of height at most one. Elements are
represented by nodes. Root is also identifier.
• Make-Set(x): Create a new node x. Set p(x) = x and size(x) = 1.
• Find(x): Follow parent pointer to root. Return root.
• Union(A,B): Move all elements from smallest set to larger set (change parent
pointers). I.e., set p(B) = A and size(A) = size(A) + size(B).
• Analysis:
• Make-Set(x) and Find(x): O(1)
• Union(A,B): O(n)
mandag den 28. feb 2011
25
Amortized Complexity of Quick-Find
• Amortized analysis: Consider a sequence of k Unions.
• Observation 1: How many elements can be touched by the k Unions?
• Consider an element x:
• What can we say about the size of the set containing x before and after a
union that changes x’s parent pointer?
• How large can the set containing x be after the k Unions?
• How many times can x’s parent pointer be changed?
mandag den 28. feb 2011
26
Amortized Complexity of Quick-Find
• Amortized analysis:
• Each time x’s parent pointer changes the size of the set containing it at least
doubles.
• At most 2k elements can be touched by k unions.
• Size of set containing x after k unions at most 2k.
• x’s parent pointer is updated at most lg(2k) times.
• In total O(k log k) parent pointers updated in a sequence of k unions.
• Amortized time per union: O(log k).
• Lemma. Using the Quick-Find data structure a Find operation takes worst case
time O(1), a Make-Set operation time O(1), and a sequence of n Union
operations takes time O(n log n).
mandag den 28. feb 2011
27
A Better Union-Find Data Structure
• Union-by-Weight or Union-by-Rank.
• Union-by-Weight. Make the root of the smallest tree a child of the root of the
bigger tree.
• Union-by-Rank. Each node x has an integer rank(x) associated.
• Make-Set(x): Create a new node x. Set p(x) = x and rank(x) = 0.
• Find(x): Follow parent pointers to the root. Return the root.
• Union(A,B): 3 cases:
• rank(A) > rank(B). Make B a child of A.
• rank(A) < rank(B). Make A a child of B.
• rank(A) = rank(B). Make B a child of A and set rank(A) = rank(A)+1.
mandag den 28. feb 2011
28
Analysis of Union-by-Rank
• Increasing ranks.
• rank(x) < rank(p(x)).
• A root of set containing x: Find(x) takes O(rank(A)+1) time.
• rank(A) ≤ lg n: Show |A| ≥ 2rank(A) by induction.
• A=Makeset(x): rank(A)=0 and |A| = 20 = 1.
• A=Union(B,C): 2 cases
• rank(A)=rank(B) or rank(A)=rank(C): ok, since set only got larger.
• rank(B)=rank(C)=k and rank(A)=k+1.
|A| = |B| + |C| ≥ 2k + 2k = 2k+1.
• Lemma. Using the Union-by-Rank data structure a Find operation takes worst
case time O(log n), and a Make-Set or Union operation takes time O(1).
mandag den 28. feb 2011
29
Path Compression
• Path compression. After each Find(x) operation, for all nodes y on the path from
x to the root, set p(y) = x.
• Example. Find(6).
7
7
3
5
1
4
9
8
6
1
2
4
3
5
8
9
6
2
mandag den 28. feb 2011
30
Path Compression
• Path compression. After each Find(x) operation, for all nodes y on the path from
x to the root, set p(y) = x.
• Tarjan: Union-by-Rank and path compression. Starting from an empty data
structure the total time of n Makes-Set operations, at most n Union operations,
and m Find operations takes time O(n + m α(n)).
• α(n). Extremely slowly growing inverse Ackermann’s function.
• Analysis complicated, but algorithm simple.
• 2 one-pass variants path halving and path splitting. Same asymptotic running
time.
mandag den 28. feb 2011
31
Ackermann’s function
• Ackermann’s function
!
j+1
if k = 0,
Ak (j) =
(j+1)
Ak−1 (j) if k ≥ 1.
• Inverse Ackermann: α(n) = min{k : Ak (1) ≥ n}.
• Grows extremely slowly.
mandag den 28. feb 2011
32
Path Compression: Analysis
• Potential function
• Potential of node x after i operations: φi (x).
• Potential of forest after i operations: Φi (x) =
!
φi (x).
x
• Auxiliary functions:
• level(x) = max{k : rank(p(x)) ≥ Ak (rank(x))}.
• iter(x) = max{j : rank(p(x)) ≥
(j)
Alevel(x) (rank(x))}
• Properties:
(1) 0 ≤ level(x) ≤ α(x).
(2) 1 ≤ iter(x) ≤ rank(x).
mandag den 28. feb 2011
33
Path Compression: Analysis
• Potential function
!
α(n) · rank(x)
if x root or rank(x) = 0
• φi (x) = (α(n) − level(x)) · rank(x) − iter(x) if x not root and rank(x) ≥ 1.
• Potential of forest after i operations: Φi (x) =
!
φi (x).
x
• Properties:
(1) 0 ≤ level(x) ≤ α(x).
(2) 1 ≤ iter(x) ≤ rank(x).
(3) 0 ≤ φi (x) ≤ α(n) · rank(x).
(4) If x not root and rank(x) > 0, φi (x) < α(n) · rank(x).
mandag den 28. feb 2011
34
Path Compression: Analysis
• Potential function
!
α(n) · rank(x)
if x root or rank(x) = 0
• φi (x) = (α(n) − level(x)) · rank(x) − iter(x) if x not root and rank(x) ≥ 1.
• Potential of forest after i operations: Φi (x) =
!
φi (x).
x
• Makeset(x): O(1)
• Union(A,B): O(α(n))
• Find(x): O(α(n))
• Lemma (used in analysis of Union and Find). Suppose x not root, and
ith operation Union or Find. Then φi (x) ≤ φi−1 (x).
If also rank(x) > 0 and either level(x) or iter(x) changes, then
φi (x) ≤ φi−1 (x) − 1.
mandag den 28. feb 2011
35
Summary
• Amortized analysis.
• 2 Examples: Dynamic tables, stack with multipop.
• Union-Find Data Structure.
• Union-by-Rank + path compression: worst case + amortized bounds.
mandag den 28. feb 2011
36