Download Sorting

Document related concepts

Positional notation wikipedia , lookup

Transcript
Sorting
HKOI Training Team (Advanced)
2006-01-21
What is sorting?
Given: A list of n elements: A1,A2,…,An
 Re-arrange the elements to make them
follow a particular order, e.g.

Ascending Order: A1 ≤ A2 ≤ … ≤ An
 Descending Order: A1 ≥ A2 ≥ … ≥ An


We will talk about sorting in ascending
order only
Why is sorting needed?

Some algorithms works only when data
is sorted


e.g. binary search
Better presentation of data

Often required by problem setters, to reduce
workload in judging 
Why learn Sorting Algorithms?
C++ STL already provided a sort()
function
 Unfortunately, no such implementation
for Pascal 


This is a minor point, though
Why learn Sorting Algorithms?
Most importantly, OI problems does not
directly ask for sorting, but its solution
may be closely linked with sorting
algorithms
 In most cases, C++ STL sort() is useless.
You still need to write your own “sort”
 So… it is important to understand the
idea behind each algorithm, and also
their strengths and weaknesses

Some Sorting Algorithms…









Bubble Sort
Insertion Sort
Selection Sort
Shell Sort
Heap Sort
Merge Sort
Quick Sort
Counting Sort
Radix Sort
How many of
them do you
know?
Bubble, Insertion, Selection…

Simple, in terms of
Idea, and
 Implementation


Unfortunately, they are inefficient


O(n2) – not good if N is large
Algorithms being taught today are far
more efficient than these
Shell Sort
Named after its inventor, Donald Shell
 Observation: Insertion Sort is very
efficient when

n is small
 when the list is almost sorted

Shell Sort
2

1
4
7
4
4
8
3
6
4
7
7
Divide the list into k non-contiguous segments
 Elements in each segments are k-elements
apart
 In the beginning, choose a large k so that all
segments contain a few elements (e.g. k=n/2)
 Sort each segment with Insertion Sort
Shell Sort
2
1
4
4
4
8
3
6
7
7
Definition: A list is said to be “k-sorted” when
A[i] ≤ A[i+k] for 1 ≤ i ≤ n-k
 Now the list is 5-sorted

Shell Sort
2
1
≥2

4
7
4
8
3
6
4
7
≥1
InsertInsert
After each pass, reduces k (e.g. by half)
 Although the number of elements in each
segments increased, the segments are usually
mostly sorted
 Sort each segments with Insertion Sort again
Shell Sort
2
1
4
7
≥4

4
8
3
6
4
7
≥7Insert Insert
After each pass, reduces k (e.g. by half)
 Although the number of elements in each
segments increased, the segments are usually
mostly sorted
 Sort each segments with Insertion Sort again
Shell Sort
2
1
≥2

4
7
<4
4
8
<4
3
6
4
7
Insert
After each pass, reduces k (e.g. by half)
 Although the number of elements in each
segments increased, the segments are usually
mostly sorted
 Sort each segments with Insertion Sort again
Shell Sort
2
1
3
≥1

7
4
<7
8
4
<8
6
4
7
Insert
After each pass, reduces k (e.g. by half)
 Although the number of elements in each
segments increased, the segments are usually
mostly sorted
 Sort each segments with Insertion Sort again
Shell Sort
2
1
3
6
4
7
4
8
≥4

4
7
Insert
After each pass, reduces k (e.g. by half)
 Although the number of elements in each
segments increased, the segments are usually
mostly sorted
 Sort each segments with Insertion Sort again
Shell Sort
2
1
3
6
4
7
4
≥7

8
4
<8
7
Insert
After each pass, reduces k (e.g. by half)
 Although the number of elements in each
segments increased, the segments are usually
mostly sorted
 Sort each segments with Insertion Sort again
Shell Sort
2
1

1
2
3
6
4
4
7
4
4
6
7
4
7
8
Finally, k is reduced to 1
 The list look like mostly sorted
 Perform 1-sort, i.e. the ordinary Insertion Sort
Shell Sort – Worse than Ins. Sort?
In Shell Sort, we still have to perform an
Insertion Sort at last
 A lot of operations are done before the
final Insertion Sort
 Isn’t it worse than Insertion Sort?

Shell Sort – Worse than Ins. Sort?
The final Insertion Sort is more efficient
than before
 All sorting operations before the final one
are done efficiently
 k-sorts compare far-apart elements
 Elements “moves” faster, reducing
amount of movement and comparison

Shell Sort – Increment Sequence
In our example, k starts with n/2, and half
its value in each pass, until it reaches 1,
i.e. {n/2, n/4, n/8, …, 1}
 This is called the “Shell sequence”
 In a good Increment Sequence, all
numbers should be relatively prime to
each other
 Hibbard’s Sequence: {2m-1, 2m-1-1, …, 7,
3, 1}

Shell Sort – Analysis
Average Complexity: O(n1.5)
 Worse case of Shell Sort with Shell
Sequence: O(n2)


When will it happen?
Heap Sort
In Selection Sort, we scan the entire list
to search for the maximum, which takes
O(n) time
 Are there better way to get the
maximum?
 With the help of a heap, we may reduce
the searching time to O(lg n)

Heap Sort – Build Heap
Create a Heap with the list
1.
2
8
5
7
1
2
4
8
7
5
1
4
Heap Sort
Pick the maximum, restore the heap property
2.
8
7
5
2
1
8
4
7
2
5
1
4
Heap Sort
Repeat step 2 until heap is empty
3.
7
4
5
2
1
7
8
4
2
5
1
Heap Sort
Repeat step 2 until heap is empty
3.
5
4
1
2
7
5
8
4
2
1
Heap Sort
Repeat step 2 until heap is empty
3.
4
2
1
5
7
4
8
2
1
Heap Sort
Repeat step 2 until heap is empty
3.
2
1
4
5
7
2
8
1
Heap Sort
Repeat step 2 until heap is empty
3.
1
2
4
5
7
8
1
Heap Sort – Analysis
Complexity: O(n lg n)
 Not a stable sort
 Difficult to implement

Merging
Given two sorted list, merge the list to
form a new sorted list
 A naïve approach: Append the second
list to the first list, then sort them



Slow, takes O(n lg n) time
Are there any better way?
Merging

We make use of a property of sorted lists:
The first element is always the minimum

What does that imply?
An additional array is needed store
temporary merged list
 Pick the smallest number from the uninserted numbers and append them to
the merged list

Merging
List A
1
3
7
List B
2
3
6
Temp
9
Merge Sort

Merge sort follows the divide-andconquer approach
Divide: Divide the n-element sequence into
two (n/2)-element subsequences
 Conquer: Sort the two subsequences
recursively
 Combine: Merge the two sorted
subsequence to produce the answer

Merge Sort
1. Divide the list into two
2
8
5
Merge Sort
5
8
7
1
1
4
4
7
Merge Sort
2. Call Merge Sort recursively to sort the two
subsequences
Merge Sort
3. Merge the list (to temporary array)
2
5
8
1
4
4. Move the elements back to the list
7
Merge Sort – Analysis
Complexity: O(n lg n)
 Stable Sort



Not an “In-place” sort


What is a stable sort?
i.e. Additional memory required
Easy to implement, no knowledge of
other data structures needed 
Stable Sort

What is a stable sort?
The name of a sorting algorithm
 A sorting algorithm that has stable
performance over all distribution of
elements, i.e. Best ≈ Average ≈ Worse
 A sorting algorithm that preserves the
original order of duplicated keys

Stable Sort
Original List
1
3a
5
3b
4
2
Stable Sort
1
2
3a
3b
4
5
Un-stable Sort
1
2
3b
3a
4
5
Stable Sort

Which sorting algorithms is/are stable?
Stable
Un-stable
Bubble Sort
Selection Sort
Insertion Sort
Shell Sort
Merge Sort
Heap Sort
Stable Sort
In our previous example, what is the
difference between 3a and 3b?
 When will stable sort be more useful?

Sorting records
 Multiple keys

Quick Sort

Quick Sort also uses the Divide-andConquer approach
Divide: Divide the list into two by partitioning
 Conquer: Sort the two list by calling Quick
Sort recursively
 Combine: Combine the two sorted list

Quick Sort – Partitioning

Given: A list and a “pivot” (usually an
element in the list)
< Pivot

Pivot
≥ Pivot
Re-arrange the elements so that
Elements on the left-hand side of “pivot” are
less than the pivot, and
 Elements on the right-hand side of the
“pivot” are greater than or equal to the pivot

Quick Sort – Partitioning

e.g. Take the first element as pivot
4
Pivot

6
7
0
9
3
9
4
lo ≥ pivot?
≥ pivot?
< pivot?
< pivot?
< pivot?
< pivot?
hi < pivot?
Swap all pairs of elements that meets
the following criteria:
The left one is greater than or equal to pivot
 The right one is smaller than pivot


Swap pivot with A[hi]
Quick Sort

After partitioning:
0
3
4
Quick Sort Pivot

7
4
9
6
6
7
9
Quick Sort
Apply Quick Sort on both lists
4
9
Quick Sort – Analysis

Complexity
Best: O(n lg n)
 Worst: O(n2)
 Average: O(n lg n)

When will the worst case happen?
 How to avoid the worst case?
 In-Place Sort
 Not a stable sort

Counting Sort
Consider the following list of numbers
5, 4, 2, 1, 4, 3, 4, 2, 5, 1, 4, 5, 3, 2, 3, 5, 5
 Range of numbers = [1,5]
 We may count the occurrence of each
number

1
2
3
4
5
2
3
3
4
5
Counting Sort (1)

With the frequency table, we can
reconstruct the list in ascending order
1
2
3
4
5
2
3
3
4
5
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5
Counting Sort (1)
Can we sort records with this counting
sort?
 Is this sort stable?

Counting Sort (2)
An alternative way: use cumulative
frequency table and a temporary array
 Given the following “records”

3
2
1
2
2
Frequency Table
Cumulative
1
2
3
1
3
4
2
6
3
Counting Sort (2)
3
2
1
2
2
3
1
2
3
1
0
2
3
4
1
6
5
4
Counting Sort – Analysis
Complexity: O(n+k), where k is the range
of numbers
 Not an In-place sort
 Stable Sort (Method 2)
 Cannot be applied on data with wide
ranges

Radix Sort
Counting Sort requires a “frequency
table”
 The size of frequency table depends on
the range of elements
 If the range is large (e.g. 32-bit), it may
be infeasible, if not impossible, to create
such a table

Radix Sort
We may consider a integer as a “record
of digits”, each digit is a key
 Significance of keys decrease from left to
right
 e.g. the number 123 consists of 3 digits

Leftmost digit: 1 (Most significant)
 Middle digit: 2
 Rightmost digit: 3 (Least signficant)

Radix Sort
Now, the problem becomes a multi-key
record sorting problem
 Sort the records on the least significant
key with a stable sort
 Repeat with the 2nd least significant key,
3rd least significant key, and so on

Radix Sort
For all keys in these “records”, the range
is [0,9]  Narrow range
 We apply Counting Sort to do the sorting
here

Radix Sort
Original List
101 0 97 141 110 997 733
Radix Sort
Sort on the least significant digit
101 097 141 110 997 733
0
1
1
2
3
4
3
3
4
4
5
6
7
4
4
6
8
9
6
6
Radix Sort
Sort on the 2nd least significant digit
110 101 141 733 097 997
0
1
1
2
3
4
2
2
3
4
5
6
7
4
4
4
8
9
4
6
Radix Sort
Lastly, the most significant digit
101 110 733 141 097 997
0
1
1
2
3
4
4
4
4
4
5
6
7
4
4
5
8
9
5
6
Radix Sort – Analysis
Complexity: O(dn), where d is the
number of digits
 Not an In-place Sort
 Stable Sort
 Can we run Radix Sort on

Real numbers?
 String?

Choosing Sorting Algorithms
List Size
 Data distribution
 Data Type
 Availability of Additional Memory
 Cost of Swapping/Assignment

Choosing Sorting Algorithms

List Size
If N is small, any sorting algorithms will do
 If N is large (e.g. ≥5000), O(n2) algorithms
may not finish its job within time limit


Data Distribution
If the list is mostly sorted, running QuickSort
with “first pivot” is extremely painful
 Insertion Sort, on the other hand, is very
efficient in this situation

Choosing Sorting Algorithms

Data Type


It is difficult to apply Counting Sort and
Radix Sort on real numbers or any other
data types that cannot be converted to
integers
Availability of Additional Memory

Merge Sort, Counting Sort, Radix Sort
require additional memory
Choosing Sorting Algorithms

Cost of Swapping/Assignment
Moving large records may be very timeconsuming
 Selection Sort takes at most (n-1) swap
operations
 Swap pointers of records (i.e. swap the
records logically rather than physically)
