Non-comparision Sorting
From
← Advanced Sorts | ↑ Sorting | External Sorting → |
You may have noticed that the advanced sorts all seemed to have an upper bound of O( N log N ). It can be shown, using a type of proof called an Information Theoretic proof, that any sorting algorithm that sorts by comparing values can not do better than Ω( N log N ). This includes all the sorting algorithms that we know of and all those that might be developed in the future. If an algorithm uses comparisons, it cannot be better than Ω ( N log N ).
Are there any algorithms that don’t compare two values? We can consider two: Bucket Sort and Radix Sort
Bucket Sort
This sort requires that I know information about the data to be sorted, so it cannot be classified as a general sorting algorithm. Suppose I know that I have a file of integers, all of which are between 1 and 5. I can then make an array, A[ 6 ], which is initialized to 0 for each location A[ 1 ] to A[ 5 ]. Now, while my file is not empty, I read an integer, i, from the file and increment A[ i ]. For example, given the numbers: 3, 5, 2, 3, 1, 4, 4, 1, 5, 3, 2, 2, 1; after processing the file, the array looks like:
(...need diagram..) 0 3 3 3 2 2
0 1 2 3 4 5
Now, for each index, i, I just print out A[ i ] copies. That is, I print 3 1’s, 3 2’s, 3 3’s, 2 4’s, and 2 5’s. So the sorted file is: 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5.
Radix Sort
Radix Sort is sometimes called Card Sort because it was originally designed to sort punch cards based on the sequence number that is one each card. It is also often incorrectly referred to as an O( N ) algorithm. However, this is true only when the number of items to be sorted is bounded, that is, cannot exceed a certain amount. In the general case, Radix Sort is O( N log N ).
So, how does Radix Sort work? Suppose I have the following base 10 numbers: 329, 457, 557, 839, 436, 720, 355. We would set up 10 “bins” or queues (because these are base 10 values) and iterate three times (because each number has 3 digits). We would get the following:
(..need table here..)
Now, after the third iteration, printing out the contents of the queues gives: 329, 355, 436, 457, 657, 720, 839.