Advanced Sorts

From

Revision as of 14:15, 25 March 2009 by 209.237.84.181 (Talk)
Jump to: navigation, search

We will consider 4 advanced sorting algorithms: Merge Sort, Shell’s Sort, Heap Sort, and Quick Sort.

Contents

Merge Sort

The basis Merge Sort algorithm is to divide the array in half. Sort each half separately, and merge the two halves back into one array. Does this actually work? Consider using a basic bubble sort on an array of 1,000 items (a really dumb idea!). Basic Bubble Sort takes N2 – 2N + 1 comparisons. When N is 1,000, this is 998,001 comparisons. If the array is divided into two arrays of 500, each array would take 249,001 comparison and it would take 1,000 (i.e. N ) steps to perform the merge. Thus, a total of 2 × 249,001 + 1,000 steps are required, and this is a total of 499,002 steps versus 998,001 steps.


The next question is, “Is Bubble Sort a good choice to use for sorting each side?” Why not just use Merge Sort? So now we have a recursive sorting algorithm.

If the array has size 1 then 
	it is sorted 
else 
	Divide the array in half 
	Merge Sort each half 
	Merge the two halves.


Analyzing the run-time performance of a recursive algorithm requires the use of a mathematical technique called a recurrence relation. If we apply this technique to the Merge Sort algorithm, we find that it is O( N log N ). This is better than the simple sorts, which were O( N2 ).


The mathematical analysis, however, hides certain details about Merge Sort. First, the recursion takes time because of the overhead of making recursive calls. Part of this overhead is the creation of many activation records, since each recursive call to a method required a new activation record be pushed onto the run-time stack. Secondly, the merge process requires the use of a temporary array, so Merge Sort has a high memory requirement. Remember the trade-off between time and space? To make something use less time often requires using more space.


Shell’s Sort

This sort, sometimes called Shell Sort, was developed by Donald Shell and published in 1959 (CACM). This sort is an extension of Insertion Sort. It uses the idea that Insertion Sort runs quickly given two conditions: (1) The array being sorted is small, and (2) the array being sorted is almost in order.


The basic algorithm for Shell’s Sort is the following:

Divide the array to be sorted into n subarrays, 
where the distance (or the stride) between the elements of the subarrays is n.

Use Insertion Sort to sort each subarray.

Choose a smaller value for n and repeat this process

Example: 81, 94, 11, 96, 12, 35, 17, 95, 28, 54, 41, 75, 15 with a stride of 3.


The sequence of value used for n is called the increment sequence and the performance of Shell’s Sort is very dependent upon the increment sequence. There are three rules that the increment sequence must follow. Let I1, I2, …, Ik be the increment sequence.

The original sequence proposed by Shell is

(...need to use the math editor here...)


However, we can generate a particular set of values and show that this sequence can take O( N2 ) time. That is, a problem with Shell’s original sequence is that, while it is easy to implement, there are too many overlaps.


A better sequence was proposed by T. H. Hibbard in 1963, where


Ik = 2k – 1, for k = 1, 2, 3, …


This gives the sequence: 1, 3, 7, 15, 31, ….


Another increment sequence was given by Donald Knuth. His sequence has the form


Ik = ½( 3k + 1 ), for k = 1, 2, 3, ….


This gives the sequence: 1, 2, 5, 14, 41, 122, 365, 1094, …


In practice, however, the best increment sequence seems to be one given by Robert Sedgewick. In his sequence, increments have one of two forms:


Ik = either 9 × 4k - 9×2k + 1 or 4k - 3×2k + 1


This gives the sequence: 1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, …. Using Sedgewick’s sequence, it has been shown that the average running time for Shell’s Sort is O( N7/6 ).

Heap Sort

We have already seen what is a heap data structure. Heap Sort is an extension of Selection Sort. However, in place of running a method to find the minimum or maximum value in the unsorted area of the array, we “heapify” the unsorted area and then remove the minimum or maximum value—depending on whether it is a min-heap or a max-heap—and swap this value with the next open slot in the sorted portion of the array. Since we can embed a binary tree in an array, heap sort can be implemented using an array. And, if the array is of size N, Heap Sort can be shown to have an upper bound of O( N log N ).


While Heap Sort is not the fastest of the advanced sorts, it does have some good points. First, Heap Sort is very space efficient. That is, an array can be sorted in place with only one extra array slot needed: this is the temporary space needed as part of a swap routine. Secondly, as is the case with Selection Sort, Heap Sort does not have a best or worst case. It takes about the same time regardless of the initial order of the array.


Quick Sort

We now look at what is generally considered the fastest of the advanced sorts: Quick Sort, published by C. A. R. Hoare in 1962. It, like Merge Sort, is an example of a Divide-And- Conquer algorithm. Unlike Merge Sort—which divides the array in half, sorts each half, and then merges the results—Quick Sort basically merges the results first and then sorts each side.

MediaWiki Appliance - Powered by TurnKey Linux