Download Computer Algorithms Lecture 8 Quicksort Ch. 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computer Algorithms
Lecture 12
Medians and Order Statistics
Ch. 9
Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR
1
Order Statistic
• ith order statistic: ith smallest element of a set of n distinct
elements.
• Minimum: first order statistic.
• Maximum: nth order statistic.
• Median: “half-way point” of the set.
– Unique, when n is odd – occurs at i = (n+1)/2.
– Two medians when n is even.
• Lower median, at i = n/2.
• Upper median, at i = n/2+1.
– In general:
• Lower median, at i= (n+1)/2
• Upper median, at i= (n+1)/2
• For consistency, “median” will refer to the lower median.
2
Selection Problem
• Selection problem:
– Input: A set A of n distinct numbers and a number i,
with 1 i  n.
– Output: the element x  A that is larger than exactly i -1
other elements of A.
• Obviously, this can be solve in O(n lg n). Why?
• Is it really necessary to sort all the numbers?
• We will study faster linear-time algorithms.
– For the special cases when i = 1 and i = n.
– For the general problem.
3
Minimum (Maximum)
Minimum (A)
1. min  A[1]
2. for i  2 to length[A]
3.
do if min > A[i]
4.
then min  A[i]
5. return min
Maximum can be determined similarly.
• T(n) = (n).
• No. of comparisons: n – 1.
• Can we do better?
•worst-case
• Why not?
• Minimum(A) has optimal # of comparisons.
4
Simultaneous Minimum and
Maximum
• Some applications need to determine both the maximum and minimum
of a set of elements.
– Example: Graphics program trying to fit a set of points onto a
rectangular display.
• Independent determination of maximum and minimum requires 2n – 2
comparisons.
• Can we reduce this number?
– Yes.
5
Simultaneous Minimum and
Maximum
• At most 3n/2 comparisons
• Setting up initial values:
– n is odd: set both min and max to the first element
– n is even: compare the first two elements, assign the smallest
one to min and the largest one to max
• Total number of comparisons:
– n is odd: we do 3(n-1)/2 comparisons
– n is even: we do 1 initial comparison + 3(n-2)/2 more
comparisons = 3n/2 - 2 comparisons
6
Example: Simultaneous Min, Max
•
n = 5 (odd), array A = {2, 7, 1, 3, 4}
1. Set min = max = 2
2. Compare elements in pairs:
–
1 < 7  compare 1 with min and 7 with max
 min = 1, max = 7
–
3 comparisons
3 < 4  compare 3 with min and 4 with max
 min = 1, max = 7
3 comparisons
We performed: 3(n-1)/2 = 6 comparisons
7
Example: Simultaneous Min, Max
•
n = 6 (even), array A = {2, 5, 3, 7, 1, 4}
1.
Compare 2 with 5: 2 < 5
2.
Set min = 2, max = 5
3.
Compare elements in pairs:
–
3 < 7  compare 3 with min and 7 with max
1 comparison
3 comparisons
 min = 2, max = 7
–
1 < 4  compare 1 with min and 4 with max
 min = 1, max = 7
3 comparisons
We performed: 3n/2 - 2 = 7 comparisons
8
Simultaneous Minimum and
Maximum. Summary
• At most 3n/2 comparisons
– Process elements in pairs
– Maintain the minimum and maximum of elements seen so far
– Don’t compare each element to the minimum and maximum
separately
– Compare the elements of a pair to each other
– Compare the larger element to the maximum so far, and
compare the smaller element to the minimum so far
– This leads to only 3 comparisons for every 2 elements
• Total no. of comparisons, C  3n/2.
– For odd n: C = 3n/2.
– For even n: C = 3(n – 2)/2 + 1 (For the initial comparison).
= 3n/2 – 2 < 3n/2.
9
General Selection Problem
• Seems more difficult than Minimum or Maximum.
– Yet, has solutions with same asymptotic complexity as Minimum
and Maximum.
• We will study 2 algorithms for the general problem.
– One with expected linear-time complexity.
– A second, whose worst-case complexity is linear.
10
Selection in Expected Linear Time
• The general selection problem seems much harder, but runs in (n)
• Works like QUICKSORT (Randomized Partition)
– Partition the array recursively
– Unlike QUICKSORT: only works on one partition!
– QUICKSORT runs in O(n lg n), SELECT runs (n)
• Exploits the abilities of Randomized-Partition (RP).
– RP returns the index k in the sorted order of a randomly chosen
element (pivot).
• If the order statistic we are interested in, i, equals k, then we are
done.
• Else, reduce the problem size using its other ability.
– RP rearranges the other elements around the random pivot.
• If i < k, selection can be narrowed down to A[1..k – 1].
• Else, select the (i – k)th element from A[k+1..n].
(Assuming RP operates on A[1..n]. For A[p..r], change k appropriately.)
11
Randomized Quicksort: review
Quicksort(A, p, r)
if p < r then
q := Rnd-Partition(A, p, r);
Quicksort(A, p, q – 1);
Quicksort(A, q + 1, r)
fi
A[p..r]
5
A[p..q – 1] A[q+1..r]
Partition
Rnd-Partition(A, p, r)
i := Random(p, r);
A[r]  A[i];
x, i := A[r], p – 1;
for j := p to r – 1 do
if A[j]  x then
i := i + 1;
A[i]  A[j]
fi
od;
A[i + 1]  A[r];
return i + 1
5
5
5
12
Randomized-Select
Randomized-Select(A, p, r, i) // select i-th order statistic.
1. if p = r
2. then return A[p]
3. q  Randomized-Partition(A, p, r)
4. k  q – p + 1
5. if i = k
6.
then return A[q]
7. elseif i < k
8.
then return Randomized-Select(A, p, q – 1, i)
9.
else return Randomized-Select(A, q+1, r, i – k)
The pivot is not included in any of the subarrays!!
13
Analysis
• Worst-case Complexity:
– (n2) – As we could get unlucky and always works on a subarray
that is only one element smaller than the previous subarray.
• Average-case Complexity:
– (n) – Intuition: Because the pivot is chosen at random, we expect
that we get rid of half of the list each time we choose a random
pivot q.
– Why (n) and not (n lg n)?
14
Random Variables and Expectation
Def.: (Discrete) random variable X: a function from a
sample space S to the real numbers.
– It associates a real number with each possible outcome of an
experiment
E.g.: X = face of one fair dice
– Possible values: {1, 2, 3, 4, 5, 6}
– Probability to take any of the values:
1/6
15
Random Variables and Expectation
• Expected value (expectation, mean) of a discrete random
variable X is:
E[X] = Σx x Pr{X = x}
– “Average” over all possible values of random variable X
E.g.: X = face of one fair dice
E[X] = 11/6 + 21/6 + 31/6 + 41/6 + 51/6 + 61/6 = 3.5
16
Example
E.g.: flipping two coins:
– Earn $3 for each head, lose $2 for each tail
– X: random variable representing your earnings
– Three possible values for variable X:
• 2 heads  x = $3 + $3 = $6, Pr{2 H’s} = ¼
• 2 tails  x = -$2 - $2 = -$4, Pr{2 T’s} = ¼
• 1 head, 1 tail  x = $3 - $2 = $1, Pr{1 H, 1 T} = ½
– The expected value of X is:
E[X] = 6  Pr{2 H’s} + 1  Pr{1 H, 1 T} – 4 Pr{2 T’s}
=6¼+1½-4¼=1
17
More Examples
E.g: X = lottery earnings
– 1/15,000,000 probability to win a 16,000,000 prize
0 and 16,000,000
Probability to win 0:
1 - 1/15,000,000
– Possible values:
–
E[X] = 16,000,000
1
14,999,999 16
0

 1.07
15,000,000
15,000,000 15
18
Indicator Random Variables
• Given a sample space S and an event A, we define the
indicator random variable I{A} associated with A:
– I{A} =
1
if A occurs
0
if A does not occur
• The expected value of an indicator random variable XA is:
E[XA] = Pr {A}
• Proof:
E[XA] = E[I{A}] = 1  Pr{A} + 0  Pr{Ā} = Pr{A}
19
Example
• Determine the expected number of heads obtained when flipping a
coin
– Space of possible values:
S = {H, T}
– Random variable Y: takes on the values H and T, each with probability ½
• Indicator random variable XH: the coin coming up heads (Y = H)
– Counts the number of heads obtain in the flip
– XH = I {Y = H} =
1
if Y = H
0
if Y = T
• The expected number of heads obtained in one flip of the coin is:
E[XH] = E [I {Y = H}] = 1  Pr{Y = H} + 0  Pr{Y = T} =
=1½+0½=½
20
Analysis of Expected Time for
Randomized-Select
• The analysis follows that of randomized quicksort
• Let T(n) = the random variable for the running time of
RANDOMIZED-SELECTo n an input of size n, assuming random
numbers are independent.
• For k= 0, 1, …, n–1, define the indicator random variable
1 if PARTITION generates a k: n–k–1 split,
Xk=
0 otherwise.
21
Average-case Analysis
• A call to RandomizedSelect may
– Terminate immediately with the correct answer,
– Recurse on A[p..q – 1], or
– Recurse on A[q+1..r].
• To obtain an upper bound, assume that the ith smallest element that we
want is always in the larger subarray.
• RP takes O(n) time on a problem of size n.
• Hence, recurrence for T(n) is:
n
T ( n)   X k  (T (max(k  1, n  k ))  O( n))
k 1
• For a given call of RS, Xk =1 for exactly one value of k, and Xk = 0 for all
other k.
22
Analysis of Expected Time for
Randomized-Select
• To obtain an upper bound, assume that the ith element
always falls in the larger side of the partition:
T(max{0, n–1}) + Θ(n) if 0:n–1 split,
T(max{1, n–2}) + Θ(n) if 1:n–2 split
•
•
•
T(n)=
T(max{n–1, 0}) + Θ(n) if n–1:0 split
n 1
  X k (T (max{ k , n  k  1})  (n))
k 0
23
Average-case Analysis
n
T (n)   X k  (T (max(k  1, n  k ))  O (n))
k 1
n
  X k  T (max(k  1, n  k ))  O (n)
k 1
Taking expectation, we have
n

E[T (n)]  E  X k  T (max(k  1, n  k ))  O (n)
 k 1

n
  E[ X k  T (max(k  1, n  k ))]  O(n)
(by linearity of expectation)
k 1
n
  E[ X k ]  E[T (max(k  1, n  k ))]  O (n)
(by Eq. (C.23))
k 1
n
1
   E[T (max(k  1, n  k ))]  O(n)
k 1 n
(by Eq. (9.1))
24
Average-case Analysis (Contd.)
k  1 if k  n / 2
max(k  1, n  k )  
 n  k if k  n / 2
The summation is expanded
1  E (T (n  1))  E (T (n  2))    E (T (n  n / 2))  

E[T (n)]  O(n)  
n  E (T ( n / 2))    E (T (n  1))

• If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once.
• If n is even, T(n – 1) thru T(n/2) occur twice.
Thus, we have
2 n 1
E[T (n)] 
E[T (k )]  O(n).

n k  n / 2 
25
Average-case Analysis (Contd.)
• We solve the recurrence by substitution.
• Guess E[T(n)] = O(n).
2 n 1
E[T (n)] 
 ck  an
n k  n / 2 
n / 2 1 
2c  n 1
   k   k   an
n  k 1
k 1 

c  3n 2 n
 
  2   an
n 4 2

 3n 1 2 
 c     an
 4 2 n
3cn c

  an
4 2
 cn c

 cn     an 
4 2

 cn c

cn     an   cn,
4 2

cn c
   an  0
4 2
 n(c / 4  a )  c / 2, or
c/2
2c
n

, if c  4a.
c / 4  a c  4a
Thus, if we assume T(n) = O(1) for
n < 2c/(c – 4a), we have E[T(n)] =
O(n).
26
Summary of Randomized Order
Statistic Selection
• Works fast: linear expected time.
• Excellent algorithm in practice.
• But, the worst case is very bad: Θ(n2).
• Question:
– Is there an algorithm that runs in linear time in the worst case?
• IDEA:
– Generate a good pivot recursively.
27
Selection in Worst-Case Linear Time
• Algorithm Select:
– Like RandomizedSelect, finds the desired element by
recursively partitioning the input array.
– Unlike RandomizedSelect, is deterministic.
• Uses a variant of the deterministic Partition routine.
• Partition is told which element to use as the pivot.
– Achieves linear-time complexity in the worst case by
• Guaranteeing that the split is always “good” at each
Partition.
• How can a good split be guaranteed?
28
Guaranteeing a Good Split
• We will have a good split if we can ensure that the pivot is the median
element or an element close to the median.
• Hence, determining a reasonable pivot is the first step.
29
Choosing a Pivot
• Median-of-Medians:
– Divide the n elements into n/5 groups.
•  n/5 groups contain 5 elements each; 1 group contains
n mod 5 < 5 elements.
• Determine the median of each of the groups.
– Sort each group using Insertion Sort. Pick the median from the
sorted list of group elements.
• Recursively find the median x of the n/5 medians.
• Recurrence for running time (of median-of-medians):
– T(n) = O(n) + T(n/5) + ….
30
Algorithm Select
• Determine the median-of-medians x (using the procedure on the
previous slide.)
• Partition the input array around x using the variant of Partition.
• Let k be the index of x that Partition returns.
• If k = i, then return x.
• Else if i < k, then apply Select recursively to A[1..k–1] to find the ith
smallest element.
• Else if i > k, then apply Select recursively to A[k+1..n] to find the (i – k)th
smallest element.
31
Choosing Pivot
1. Divide the n elements into groups of 5.
32
Choosing Pivot
Lesser
element
Greater
element
1. Divide the n elements into groups of 5. Find the median
of each 5-element group.
33
Choosing Pivot
x
1. Divide the n elements into groups of 5. Find the median
of each 5-element group.
2. Recursively SELECT the median x of the  n/5 group
medians to be the pivot.
34
Choosing Pivot
x
3. At least half the group medians are >=x, which is at
least   n/5  /2  =  n/10  group medians.
• Therefore, at least 3  n/10  elements are ≤x.
• Similarly, at least 3  n/10  elements are ≥x.
35
Choosing Pivot
Arrows point from larger to smaller elements.
Elements < x n/5 groups of 5 elements each.
n/5th group of n mod 5
elements.
Median-of-medians, x
Elements > x
36
Interesting Results
• Assumption: Elements are distinct. Why?
• At least half of the n/5 medians are
greater than x.
• Thus, at least half of the n/5 groups
contribute 3 elements that are greater
than x.
– The last group and the group containing
x may contribute fewer than 3 elements.
Exclude these groups. At least half of
the medians are greater than the
median of medians
• Hence, the no. of elements > x is at least
 1  n   3n
3    2  
6
 2  5   10
• Thus, in the worst case, Select is called recursively on at most 7n/10+6
elements.
37
Working with Recursion
T(n)
SELECT (i,n)
1. Divide the n elements into groups of 5; Find the
Θ(n)
median of each 5-element group (insertion sort)
2. Recursively SELECT the median x of the  n/5 
T(n/5)
group medians to be the pivot.
Θ(n) 3. Partition around the pivot x. Let k= rank(x).
4. if i= k then return x
elseif i< k
T(7n/10)
then recursively SELECT the ith smallest
element in the lower part
else recursively SELECT the (i–k)th smallest
element in the upper part
38
The Recurrence Equation
• T (n) = T (n/5) + T (7n/10 + 6) + O(n)
Time to find the
median of medians
The remaining part
of the problem
Dividing, finding the medians
and partitioning around the
median of medians.
• This turns out to be linear
• Assume T(n)  (1), for n  140.
39
Solving the recurrence
• To show: T(n) = O(n)  cn for suitable c and all n > 0.
• Assume: T(n)  cn for suitable c and all n  140.
• Substituting the inductive hypothesis into the recurrence,
– T(n)  c n/5 + c(7n/10+6)+an
–cn/10 + 7c + an  0  c 
 cn/5 + c + 7cn/10 + 6c + an
10a(n/(n – 70)), when n >
= 9cn/10 + 7c + an
70.
= cn +(–cn/10 + 7c + an)
 cn, if –cn/10 + 7c + an  0.
For n  140, c  20a.
• n/(n–70) is a decreasing function of n. Verify.
• Hence, c can be chosen for any n = n0 > 70, provided it can be assumed
that T(n) = O(1) for n  n0.
• Thus, Select has linear-time complexity in the worst case.
40
Guaranteeing a Good Split.
Summary
• We will have a good split if we can ensure that the pivot is the median
element or an element close to the median.
• Hence, determining a reasonable pivot is the first step.
• Which element do you want to partition around?
• Worst-case: elements are sorted
• Does RANDOMIZED-SELECT guarantee a good split?
• Idea: recursively find the median of medians
1. Divide elements into groups of 5
2. Find the medians of those five elements
3. Find the median of those medians
4. Partition around that median
5. Your partition point will have k elements to the left, and
n -k elements to the right. Make decision.
41
Summary
• Finding the ith order statistic using sorting takes (n lg n).
• Not necessary to sort everything
• Can leverage off of the PARTITION method from
QUICKSORT
• You can guarantee a good split by finding the median of
medians
42