Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sorting HKOI Training Team (Advanced) 2006-01-21 What is sorting? Given: A list of n elements: A1,A2,…,An Re-arrange the elements to make them follow a particular order, e.g. Ascending Order: A1 ≤ A2 ≤ … ≤ An Descending Order: A1 ≥ A2 ≥ … ≥ An We will talk about sorting in ascending order only Why is sorting needed? Some algorithms works only when data is sorted e.g. binary search Better presentation of data Often required by problem setters, to reduce workload in judging Why learn Sorting Algorithms? C++ STL already provided a sort() function Unfortunately, no such implementation for Pascal This is a minor point, though Why learn Sorting Algorithms? Most importantly, OI problems does not directly ask for sorting, but its solution may be closely linked with sorting algorithms In most cases, C++ STL sort() is useless. You still need to write your own “sort” So… it is important to understand the idea behind each algorithm, and also their strengths and weaknesses Some Sorting Algorithms… Bubble Sort Insertion Sort Selection Sort Shell Sort Heap Sort Merge Sort Quick Sort Counting Sort Radix Sort How many of them do you know? Bubble, Insertion, Selection… Simple, in terms of Idea, and Implementation Unfortunately, they are inefficient O(n2) – not good if N is large Algorithms being taught today are far more efficient than these Shell Sort Named after its inventor, Donald Shell Observation: Insertion Sort is very efficient when n is small when the list is almost sorted Shell Sort 2 1 4 7 4 4 8 3 6 4 7 7 Divide the list into k non-contiguous segments Elements in each segments are k-elements apart In the beginning, choose a large k so that all segments contain a few elements (e.g. k=n/2) Sort each segment with Insertion Sort Shell Sort 2 1 4 4 4 8 3 6 7 7 Definition: A list is said to be “k-sorted” when A[i] ≤ A[i+k] for 1 ≤ i ≤ n-k Now the list is 5-sorted Shell Sort 2 1 ≥2 4 7 4 8 3 6 4 7 ≥1 InsertInsert After each pass, reduces k (e.g. by half) Although the number of elements in each segments increased, the segments are usually mostly sorted Sort each segments with Insertion Sort again Shell Sort 2 1 4 7 ≥4 4 8 3 6 4 7 ≥7Insert Insert After each pass, reduces k (e.g. by half) Although the number of elements in each segments increased, the segments are usually mostly sorted Sort each segments with Insertion Sort again Shell Sort 2 1 ≥2 4 7 <4 4 8 <4 3 6 4 7 Insert After each pass, reduces k (e.g. by half) Although the number of elements in each segments increased, the segments are usually mostly sorted Sort each segments with Insertion Sort again Shell Sort 2 1 3 ≥1 7 4 <7 8 4 <8 6 4 7 Insert After each pass, reduces k (e.g. by half) Although the number of elements in each segments increased, the segments are usually mostly sorted Sort each segments with Insertion Sort again Shell Sort 2 1 3 6 4 7 4 8 ≥4 4 7 Insert After each pass, reduces k (e.g. by half) Although the number of elements in each segments increased, the segments are usually mostly sorted Sort each segments with Insertion Sort again Shell Sort 2 1 3 6 4 7 4 ≥7 8 4 <8 7 Insert After each pass, reduces k (e.g. by half) Although the number of elements in each segments increased, the segments are usually mostly sorted Sort each segments with Insertion Sort again Shell Sort 2 1 1 2 3 6 4 4 7 4 4 6 7 4 7 8 Finally, k is reduced to 1 The list look like mostly sorted Perform 1-sort, i.e. the ordinary Insertion Sort Shell Sort – Worse than Ins. Sort? In Shell Sort, we still have to perform an Insertion Sort at last A lot of operations are done before the final Insertion Sort Isn’t it worse than Insertion Sort? Shell Sort – Worse than Ins. Sort? The final Insertion Sort is more efficient than before All sorting operations before the final one are done efficiently k-sorts compare far-apart elements Elements “moves” faster, reducing amount of movement and comparison Shell Sort – Increment Sequence In our example, k starts with n/2, and half its value in each pass, until it reaches 1, i.e. {n/2, n/4, n/8, …, 1} This is called the “Shell sequence” In a good Increment Sequence, all numbers should be relatively prime to each other Hibbard’s Sequence: {2m-1, 2m-1-1, …, 7, 3, 1} Shell Sort – Analysis Average Complexity: O(n1.5) Worse case of Shell Sort with Shell Sequence: O(n2) When will it happen? Heap Sort In Selection Sort, we scan the entire list to search for the maximum, which takes O(n) time Are there better way to get the maximum? With the help of a heap, we may reduce the searching time to O(lg n) Heap Sort – Build Heap Create a Heap with the list 1. 2 8 5 7 1 2 4 8 7 5 1 4 Heap Sort Pick the maximum, restore the heap property 2. 8 7 5 2 1 8 4 7 2 5 1 4 Heap Sort Repeat step 2 until heap is empty 3. 7 4 5 2 1 7 8 4 2 5 1 Heap Sort Repeat step 2 until heap is empty 3. 5 4 1 2 7 5 8 4 2 1 Heap Sort Repeat step 2 until heap is empty 3. 4 2 1 5 7 4 8 2 1 Heap Sort Repeat step 2 until heap is empty 3. 2 1 4 5 7 2 8 1 Heap Sort Repeat step 2 until heap is empty 3. 1 2 4 5 7 8 1 Heap Sort – Analysis Complexity: O(n lg n) Not a stable sort Difficult to implement Merging Given two sorted list, merge the list to form a new sorted list A naïve approach: Append the second list to the first list, then sort them Slow, takes O(n lg n) time Are there any better way? Merging We make use of a property of sorted lists: The first element is always the minimum What does that imply? An additional array is needed store temporary merged list Pick the smallest number from the uninserted numbers and append them to the merged list Merging List A 1 3 7 List B 2 3 6 Temp 9 Merge Sort Merge sort follows the divide-andconquer approach Divide: Divide the n-element sequence into two (n/2)-element subsequences Conquer: Sort the two subsequences recursively Combine: Merge the two sorted subsequence to produce the answer Merge Sort 1. Divide the list into two 2 8 5 Merge Sort 5 8 7 1 1 4 4 7 Merge Sort 2. Call Merge Sort recursively to sort the two subsequences Merge Sort 3. Merge the list (to temporary array) 2 5 8 1 4 4. Move the elements back to the list 7 Merge Sort – Analysis Complexity: O(n lg n) Stable Sort Not an “In-place” sort What is a stable sort? i.e. Additional memory required Easy to implement, no knowledge of other data structures needed Stable Sort What is a stable sort? The name of a sorting algorithm A sorting algorithm that has stable performance over all distribution of elements, i.e. Best ≈ Average ≈ Worse A sorting algorithm that preserves the original order of duplicated keys Stable Sort Original List 1 3a 5 3b 4 2 Stable Sort 1 2 3a 3b 4 5 Un-stable Sort 1 2 3b 3a 4 5 Stable Sort Which sorting algorithms is/are stable? Stable Un-stable Bubble Sort Selection Sort Insertion Sort Shell Sort Merge Sort Heap Sort Stable Sort In our previous example, what is the difference between 3a and 3b? When will stable sort be more useful? Sorting records Multiple keys Quick Sort Quick Sort also uses the Divide-andConquer approach Divide: Divide the list into two by partitioning Conquer: Sort the two list by calling Quick Sort recursively Combine: Combine the two sorted list Quick Sort – Partitioning Given: A list and a “pivot” (usually an element in the list) < Pivot Pivot ≥ Pivot Re-arrange the elements so that Elements on the left-hand side of “pivot” are less than the pivot, and Elements on the right-hand side of the “pivot” are greater than or equal to the pivot Quick Sort – Partitioning e.g. Take the first element as pivot 4 Pivot 6 7 0 9 3 9 4 lo ≥ pivot? ≥ pivot? < pivot? < pivot? < pivot? < pivot? hi < pivot? Swap all pairs of elements that meets the following criteria: The left one is greater than or equal to pivot The right one is smaller than pivot Swap pivot with A[hi] Quick Sort After partitioning: 0 3 4 Quick Sort Pivot 7 4 9 6 6 7 9 Quick Sort Apply Quick Sort on both lists 4 9 Quick Sort – Analysis Complexity Best: O(n lg n) Worst: O(n2) Average: O(n lg n) When will the worst case happen? How to avoid the worst case? In-Place Sort Not a stable sort Counting Sort Consider the following list of numbers 5, 4, 2, 1, 4, 3, 4, 2, 5, 1, 4, 5, 3, 2, 3, 5, 5 Range of numbers = [1,5] We may count the occurrence of each number 1 2 3 4 5 2 3 3 4 5 Counting Sort (1) With the frequency table, we can reconstruct the list in ascending order 1 2 3 4 5 2 3 3 4 5 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5 Counting Sort (1) Can we sort records with this counting sort? Is this sort stable? Counting Sort (2) An alternative way: use cumulative frequency table and a temporary array Given the following “records” 3 2 1 2 2 Frequency Table Cumulative 1 2 3 1 3 4 2 6 3 Counting Sort (2) 3 2 1 2 2 3 1 2 3 1 0 2 3 4 1 6 5 4 Counting Sort – Analysis Complexity: O(n+k), where k is the range of numbers Not an In-place sort Stable Sort (Method 2) Cannot be applied on data with wide ranges Radix Sort Counting Sort requires a “frequency table” The size of frequency table depends on the range of elements If the range is large (e.g. 32-bit), it may be infeasible, if not impossible, to create such a table Radix Sort We may consider a integer as a “record of digits”, each digit is a key Significance of keys decrease from left to right e.g. the number 123 consists of 3 digits Leftmost digit: 1 (Most significant) Middle digit: 2 Rightmost digit: 3 (Least signficant) Radix Sort Now, the problem becomes a multi-key record sorting problem Sort the records on the least significant key with a stable sort Repeat with the 2nd least significant key, 3rd least significant key, and so on Radix Sort For all keys in these “records”, the range is [0,9] Narrow range We apply Counting Sort to do the sorting here Radix Sort Original List 101 0 97 141 110 997 733 Radix Sort Sort on the least significant digit 101 097 141 110 997 733 0 1 1 2 3 4 3 3 4 4 5 6 7 4 4 6 8 9 6 6 Radix Sort Sort on the 2nd least significant digit 110 101 141 733 097 997 0 1 1 2 3 4 2 2 3 4 5 6 7 4 4 4 8 9 4 6 Radix Sort Lastly, the most significant digit 101 110 733 141 097 997 0 1 1 2 3 4 4 4 4 4 5 6 7 4 4 5 8 9 5 6 Radix Sort – Analysis Complexity: O(dn), where d is the number of digits Not an In-place Sort Stable Sort Can we run Radix Sort on Real numbers? String? Choosing Sorting Algorithms List Size Data distribution Data Type Availability of Additional Memory Cost of Swapping/Assignment Choosing Sorting Algorithms List Size If N is small, any sorting algorithms will do If N is large (e.g. ≥5000), O(n2) algorithms may not finish its job within time limit Data Distribution If the list is mostly sorted, running QuickSort with “first pivot” is extremely painful Insertion Sort, on the other hand, is very efficient in this situation Choosing Sorting Algorithms Data Type It is difficult to apply Counting Sort and Radix Sort on real numbers or any other data types that cannot be converted to integers Availability of Additional Memory Merge Sort, Counting Sort, Radix Sort require additional memory Choosing Sorting Algorithms Cost of Swapping/Assignment Moving large records may be very timeconsuming Selection Sort takes at most (n-1) swap operations Swap pointers of records (i.e. swap the records logically rather than physically)