Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer Algorithms Lecture 12 Medians and Order Statistics Ch. 9 Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR 1 Order Statistic • ith order statistic: ith smallest element of a set of n distinct elements. • Minimum: first order statistic. • Maximum: nth order statistic. • Median: “half-way point” of the set. – Unique, when n is odd – occurs at i = (n+1)/2. – Two medians when n is even. • Lower median, at i = n/2. • Upper median, at i = n/2+1. – In general: • Lower median, at i= (n+1)/2 • Upper median, at i= (n+1)/2 • For consistency, “median” will refer to the lower median. 2 Selection Problem • Selection problem: – Input: A set A of n distinct numbers and a number i, with 1 i n. – Output: the element x A that is larger than exactly i -1 other elements of A. • Obviously, this can be solve in O(n lg n). Why? • Is it really necessary to sort all the numbers? • We will study faster linear-time algorithms. – For the special cases when i = 1 and i = n. – For the general problem. 3 Minimum (Maximum) Minimum (A) 1. min A[1] 2. for i 2 to length[A] 3. do if min > A[i] 4. then min A[i] 5. return min Maximum can be determined similarly. • T(n) = (n). • No. of comparisons: n – 1. • Can we do better? •worst-case • Why not? • Minimum(A) has optimal # of comparisons. 4 Simultaneous Minimum and Maximum • Some applications need to determine both the maximum and minimum of a set of elements. – Example: Graphics program trying to fit a set of points onto a rectangular display. • Independent determination of maximum and minimum requires 2n – 2 comparisons. • Can we reduce this number? – Yes. 5 Simultaneous Minimum and Maximum • At most 3n/2 comparisons • Setting up initial values: – n is odd: set both min and max to the first element – n is even: compare the first two elements, assign the smallest one to min and the largest one to max • Total number of comparisons: – n is odd: we do 3(n-1)/2 comparisons – n is even: we do 1 initial comparison + 3(n-2)/2 more comparisons = 3n/2 - 2 comparisons 6 Example: Simultaneous Min, Max • n = 5 (odd), array A = {2, 7, 1, 3, 4} 1. Set min = max = 2 2. Compare elements in pairs: – 1 < 7 compare 1 with min and 7 with max min = 1, max = 7 – 3 comparisons 3 < 4 compare 3 with min and 4 with max min = 1, max = 7 3 comparisons We performed: 3(n-1)/2 = 6 comparisons 7 Example: Simultaneous Min, Max • n = 6 (even), array A = {2, 5, 3, 7, 1, 4} 1. Compare 2 with 5: 2 < 5 2. Set min = 2, max = 5 3. Compare elements in pairs: – 3 < 7 compare 3 with min and 7 with max 1 comparison 3 comparisons min = 2, max = 7 – 1 < 4 compare 1 with min and 4 with max min = 1, max = 7 3 comparisons We performed: 3n/2 - 2 = 7 comparisons 8 Simultaneous Minimum and Maximum. Summary • At most 3n/2 comparisons – Process elements in pairs – Maintain the minimum and maximum of elements seen so far – Don’t compare each element to the minimum and maximum separately – Compare the elements of a pair to each other – Compare the larger element to the maximum so far, and compare the smaller element to the minimum so far – This leads to only 3 comparisons for every 2 elements • Total no. of comparisons, C 3n/2. – For odd n: C = 3n/2. – For even n: C = 3(n – 2)/2 + 1 (For the initial comparison). = 3n/2 – 2 < 3n/2. 9 General Selection Problem • Seems more difficult than Minimum or Maximum. – Yet, has solutions with same asymptotic complexity as Minimum and Maximum. • We will study 2 algorithms for the general problem. – One with expected linear-time complexity. – A second, whose worst-case complexity is linear. 10 Selection in Expected Linear Time • The general selection problem seems much harder, but runs in (n) • Works like QUICKSORT (Randomized Partition) – Partition the array recursively – Unlike QUICKSORT: only works on one partition! – QUICKSORT runs in O(n lg n), SELECT runs (n) • Exploits the abilities of Randomized-Partition (RP). – RP returns the index k in the sorted order of a randomly chosen element (pivot). • If the order statistic we are interested in, i, equals k, then we are done. • Else, reduce the problem size using its other ability. – RP rearranges the other elements around the random pivot. • If i < k, selection can be narrowed down to A[1..k – 1]. • Else, select the (i – k)th element from A[k+1..n]. (Assuming RP operates on A[1..n]. For A[p..r], change k appropriately.) 11 Randomized Quicksort: review Quicksort(A, p, r) if p < r then q := Rnd-Partition(A, p, r); Quicksort(A, p, q – 1); Quicksort(A, q + 1, r) fi A[p..r] 5 A[p..q – 1] A[q+1..r] Partition Rnd-Partition(A, p, r) i := Random(p, r); A[r] A[i]; x, i := A[r], p – 1; for j := p to r – 1 do if A[j] x then i := i + 1; A[i] A[j] fi od; A[i + 1] A[r]; return i + 1 5 5 5 12 Randomized-Select Randomized-Select(A, p, r, i) // select i-th order statistic. 1. if p = r 2. then return A[p] 3. q Randomized-Partition(A, p, r) 4. k q – p + 1 5. if i = k 6. then return A[q] 7. elseif i < k 8. then return Randomized-Select(A, p, q – 1, i) 9. else return Randomized-Select(A, q+1, r, i – k) The pivot is not included in any of the subarrays!! 13 Analysis • Worst-case Complexity: – (n2) – As we could get unlucky and always works on a subarray that is only one element smaller than the previous subarray. • Average-case Complexity: – (n) – Intuition: Because the pivot is chosen at random, we expect that we get rid of half of the list each time we choose a random pivot q. – Why (n) and not (n lg n)? 14 Random Variables and Expectation Def.: (Discrete) random variable X: a function from a sample space S to the real numbers. – It associates a real number with each possible outcome of an experiment E.g.: X = face of one fair dice – Possible values: {1, 2, 3, 4, 5, 6} – Probability to take any of the values: 1/6 15 Random Variables and Expectation • Expected value (expectation, mean) of a discrete random variable X is: E[X] = Σx x Pr{X = x} – “Average” over all possible values of random variable X E.g.: X = face of one fair dice E[X] = 11/6 + 21/6 + 31/6 + 41/6 + 51/6 + 61/6 = 3.5 16 Example E.g.: flipping two coins: – Earn $3 for each head, lose $2 for each tail – X: random variable representing your earnings – Three possible values for variable X: • 2 heads x = $3 + $3 = $6, Pr{2 H’s} = ¼ • 2 tails x = -$2 - $2 = -$4, Pr{2 T’s} = ¼ • 1 head, 1 tail x = $3 - $2 = $1, Pr{1 H, 1 T} = ½ – The expected value of X is: E[X] = 6 Pr{2 H’s} + 1 Pr{1 H, 1 T} – 4 Pr{2 T’s} =6¼+1½-4¼=1 17 More Examples E.g: X = lottery earnings – 1/15,000,000 probability to win a 16,000,000 prize 0 and 16,000,000 Probability to win 0: 1 - 1/15,000,000 – Possible values: – E[X] = 16,000,000 1 14,999,999 16 0 1.07 15,000,000 15,000,000 15 18 Indicator Random Variables • Given a sample space S and an event A, we define the indicator random variable I{A} associated with A: – I{A} = 1 if A occurs 0 if A does not occur • The expected value of an indicator random variable XA is: E[XA] = Pr {A} • Proof: E[XA] = E[I{A}] = 1 Pr{A} + 0 Pr{Ā} = Pr{A} 19 Example • Determine the expected number of heads obtained when flipping a coin – Space of possible values: S = {H, T} – Random variable Y: takes on the values H and T, each with probability ½ • Indicator random variable XH: the coin coming up heads (Y = H) – Counts the number of heads obtain in the flip – XH = I {Y = H} = 1 if Y = H 0 if Y = T • The expected number of heads obtained in one flip of the coin is: E[XH] = E [I {Y = H}] = 1 Pr{Y = H} + 0 Pr{Y = T} = =1½+0½=½ 20 Analysis of Expected Time for Randomized-Select • The analysis follows that of randomized quicksort • Let T(n) = the random variable for the running time of RANDOMIZED-SELECTo n an input of size n, assuming random numbers are independent. • For k= 0, 1, …, n–1, define the indicator random variable 1 if PARTITION generates a k: n–k–1 split, Xk= 0 otherwise. 21 Average-case Analysis • A call to RandomizedSelect may – Terminate immediately with the correct answer, – Recurse on A[p..q – 1], or – Recurse on A[q+1..r]. • To obtain an upper bound, assume that the ith smallest element that we want is always in the larger subarray. • RP takes O(n) time on a problem of size n. • Hence, recurrence for T(n) is: n T ( n) X k (T (max(k 1, n k )) O( n)) k 1 • For a given call of RS, Xk =1 for exactly one value of k, and Xk = 0 for all other k. 22 Analysis of Expected Time for Randomized-Select • To obtain an upper bound, assume that the ith element always falls in the larger side of the partition: T(max{0, n–1}) + Θ(n) if 0:n–1 split, T(max{1, n–2}) + Θ(n) if 1:n–2 split • • • T(n)= T(max{n–1, 0}) + Θ(n) if n–1:0 split n 1 X k (T (max{ k , n k 1}) (n)) k 0 23 Average-case Analysis n T (n) X k (T (max(k 1, n k )) O (n)) k 1 n X k T (max(k 1, n k )) O (n) k 1 Taking expectation, we have n E[T (n)] E X k T (max(k 1, n k )) O (n) k 1 n E[ X k T (max(k 1, n k ))] O(n) (by linearity of expectation) k 1 n E[ X k ] E[T (max(k 1, n k ))] O (n) (by Eq. (C.23)) k 1 n 1 E[T (max(k 1, n k ))] O(n) k 1 n (by Eq. (9.1)) 24 Average-case Analysis (Contd.) k 1 if k n / 2 max(k 1, n k ) n k if k n / 2 The summation is expanded 1 E (T (n 1)) E (T (n 2)) E (T (n n / 2)) E[T (n)] O(n) n E (T ( n / 2)) E (T (n 1)) • If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once. • If n is even, T(n – 1) thru T(n/2) occur twice. Thus, we have 2 n 1 E[T (n)] E[T (k )] O(n). n k n / 2 25 Average-case Analysis (Contd.) • We solve the recurrence by substitution. • Guess E[T(n)] = O(n). 2 n 1 E[T (n)] ck an n k n / 2 n / 2 1 2c n 1 k k an n k 1 k 1 c 3n 2 n 2 an n 4 2 3n 1 2 c an 4 2 n 3cn c an 4 2 cn c cn an 4 2 cn c cn an cn, 4 2 cn c an 0 4 2 n(c / 4 a ) c / 2, or c/2 2c n , if c 4a. c / 4 a c 4a Thus, if we assume T(n) = O(1) for n < 2c/(c – 4a), we have E[T(n)] = O(n). 26 Summary of Randomized Order Statistic Selection • Works fast: linear expected time. • Excellent algorithm in practice. • But, the worst case is very bad: Θ(n2). • Question: – Is there an algorithm that runs in linear time in the worst case? • IDEA: – Generate a good pivot recursively. 27 Selection in Worst-Case Linear Time • Algorithm Select: – Like RandomizedSelect, finds the desired element by recursively partitioning the input array. – Unlike RandomizedSelect, is deterministic. • Uses a variant of the deterministic Partition routine. • Partition is told which element to use as the pivot. – Achieves linear-time complexity in the worst case by • Guaranteeing that the split is always “good” at each Partition. • How can a good split be guaranteed? 28 Guaranteeing a Good Split • We will have a good split if we can ensure that the pivot is the median element or an element close to the median. • Hence, determining a reasonable pivot is the first step. 29 Choosing a Pivot • Median-of-Medians: – Divide the n elements into n/5 groups. • n/5 groups contain 5 elements each; 1 group contains n mod 5 < 5 elements. • Determine the median of each of the groups. – Sort each group using Insertion Sort. Pick the median from the sorted list of group elements. • Recursively find the median x of the n/5 medians. • Recurrence for running time (of median-of-medians): – T(n) = O(n) + T(n/5) + …. 30 Algorithm Select • Determine the median-of-medians x (using the procedure on the previous slide.) • Partition the input array around x using the variant of Partition. • Let k be the index of x that Partition returns. • If k = i, then return x. • Else if i < k, then apply Select recursively to A[1..k–1] to find the ith smallest element. • Else if i > k, then apply Select recursively to A[k+1..n] to find the (i – k)th smallest element. 31 Choosing Pivot 1. Divide the n elements into groups of 5. 32 Choosing Pivot Lesser element Greater element 1. Divide the n elements into groups of 5. Find the median of each 5-element group. 33 Choosing Pivot x 1. Divide the n elements into groups of 5. Find the median of each 5-element group. 2. Recursively SELECT the median x of the n/5 group medians to be the pivot. 34 Choosing Pivot x 3. At least half the group medians are >=x, which is at least n/5 /2 = n/10 group medians. • Therefore, at least 3 n/10 elements are ≤x. • Similarly, at least 3 n/10 elements are ≥x. 35 Choosing Pivot Arrows point from larger to smaller elements. Elements < x n/5 groups of 5 elements each. n/5th group of n mod 5 elements. Median-of-medians, x Elements > x 36 Interesting Results • Assumption: Elements are distinct. Why? • At least half of the n/5 medians are greater than x. • Thus, at least half of the n/5 groups contribute 3 elements that are greater than x. – The last group and the group containing x may contribute fewer than 3 elements. Exclude these groups. At least half of the medians are greater than the median of medians • Hence, the no. of elements > x is at least 1 n 3n 3 2 6 2 5 10 • Thus, in the worst case, Select is called recursively on at most 7n/10+6 elements. 37 Working with Recursion T(n) SELECT (i,n) 1. Divide the n elements into groups of 5; Find the Θ(n) median of each 5-element group (insertion sort) 2. Recursively SELECT the median x of the n/5 T(n/5) group medians to be the pivot. Θ(n) 3. Partition around the pivot x. Let k= rank(x). 4. if i= k then return x elseif i< k T(7n/10) then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part 38 The Recurrence Equation • T (n) = T (n/5) + T (7n/10 + 6) + O(n) Time to find the median of medians The remaining part of the problem Dividing, finding the medians and partitioning around the median of medians. • This turns out to be linear • Assume T(n) (1), for n 140. 39 Solving the recurrence • To show: T(n) = O(n) cn for suitable c and all n > 0. • Assume: T(n) cn for suitable c and all n 140. • Substituting the inductive hypothesis into the recurrence, – T(n) c n/5 + c(7n/10+6)+an –cn/10 + 7c + an 0 c cn/5 + c + 7cn/10 + 6c + an 10a(n/(n – 70)), when n > = 9cn/10 + 7c + an 70. = cn +(–cn/10 + 7c + an) cn, if –cn/10 + 7c + an 0. For n 140, c 20a. • n/(n–70) is a decreasing function of n. Verify. • Hence, c can be chosen for any n = n0 > 70, provided it can be assumed that T(n) = O(1) for n n0. • Thus, Select has linear-time complexity in the worst case. 40 Guaranteeing a Good Split. Summary • We will have a good split if we can ensure that the pivot is the median element or an element close to the median. • Hence, determining a reasonable pivot is the first step. • Which element do you want to partition around? • Worst-case: elements are sorted • Does RANDOMIZED-SELECT guarantee a good split? • Idea: recursively find the median of medians 1. Divide elements into groups of 5 2. Find the medians of those five elements 3. Find the median of those medians 4. Partition around that median 5. Your partition point will have k elements to the left, and n -k elements to the right. Make decision. 41 Summary • Finding the ith order statistic using sorting takes (n lg n). • Not necessary to sort everything • Can leverage off of the PARTITION method from QUICKSORT • You can guarantee a good split by finding the median of medians 42