Advanced Sorts

From

Revision as of 14:10, 25 March 2009 by 209.237.84.181 (Talk)
Jump to: navigation, search

We will consider 4 advanced sorting algorithms: Merge Sort, Shell’s Sort, Heap Sort, and Quick Sort.

Merge Sort

The basis Merge Sort algorithm is to divide the array in half. Sort each half separately, and merge the two halves back into one array. Does this actually work? Consider using a basic bubble sort on an array of 1,000 items (a really dumb idea!). Basic Bubble Sort takes N2 – 2N + 1 comparisons. When N is 1,000, this is 998,001 comparisons. If the array is divided into two arrays of 500, each array would take 249,001 comparison and it would take 1,000 (i.e. N ) steps to perform the merge. Thus, a total of 2 × 249,001 + 1,000 steps are required, and this is a total of 499,002 steps versus 998,001 steps.


The next question is, “Is Bubble Sort a good choice to use for sorting each side?” Why not just use Merge Sort? So now we have a recursive sorting algorithm.

If the array has size 1 then 
	it is sorted 
else 
	Divide the array in half 
	Merge Sort each half 
	Merge the two halves.


Analyzing the run-time performance of a recursive algorithm requires the use of a mathematical technique called a recurrence relation. If we apply this technique to the Merge Sort algorithm, we find that it is O( N log N ). This is better than the simple sorts, which were O( N2 ).


The mathematical analysis, however, hides certain details about Merge Sort. First, the recursion takes time because of the overhead of making recursive calls. Part of this overhead is the creation of many activation records, since each recursive call to a method required a new activation record be pushed onto the run-time stack. Secondly, the merge process requires the use of a temporary array, so Merge Sort has a high memory requirement. Remember the trade-off between time and space? To make something use less time often requires using more space.


Shell’s Sort

This sort, sometimes called Shell Sort, was developed by Donald Shell and published in 1959 (CACM). This sort is an extension of Insertion Sort. It uses the idea that Insertion Sort runs quickly given two conditions: (1) The array being sorted is small, and (2) the array being sorted is almost in order.


The basic algorithm for Shell’s Sort is the following:

Divide the array to be sorted into n subarrays, where the distance (or the stride) between the elements of the subarrays is n.

Use Insertion Sort to sort each subarray.

Choose a smaller value for n and repeat this process

Example: 81, 94, 11, 96, 12, 35, 17, 95, 28, 54, 41, 75, 15 with a stride of 3.


The sequence of value used for n is called the increment sequence and the performance of Shell’s Sort is very dependent upon the increment sequence. There are three rules that the increment sequence must follow. Let I1, I2, …, Ik be the increment sequence.

The original sequence proposed by Shell is

(...need to use the math editor here...)


However, we can generate a particular set of values and show that this sequence can take O( N2 ) time. That is, a problem with Shell’s original sequence is that, while it is easy to implement, there are too many overlaps.


A better sequence was proposed by T. H. Hibbard in 1963, where Ik = 2k – 1, for k = 1, 2, 3, … This gives the sequence: 1, 3, 7, 15, 31, …. Another increment sequence was given by Donald Knuth. His sequence has the form Ik = ½( 3k + 1 ), for k = 1, 2, 3, …. This gives the sequence: 1, 2, 5, 14, 41, 122, 365, 1094, … In practice, however, the best increment sequence seems to be one given by Robert Sedgewick. In his sequence, increments have one of two forms: IS 320 page 36 Revision: 4/2/2008 Ik = either 9 × 4k - 9×2k + 1 or 4k - 3×2k + 1 This gives the sequence: 1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, …. Using Sedgewick’s sequence,

Personal tools
MediaWiki Appliance - Powered by TurnKey Linux