Searching

From

(Difference between revisions)
Jump to: navigation, search

Admin (Talk | contribs)
(Created page with 'We will consider searching techniques in this section. First we will consider simple linear search, then binary search, and then hashing techniques. ==Linear Search== Consider ...')
Newer edit →

Revision as of 14:34, 24 March 2009

We will consider searching techniques in this section. First we will consider simple linear search, then binary search, and then hashing techniques.


Linear Search

Consider a list structure, where the data in the list is not in any kind of order. If I want to know if a certain value is contained in the list, I need to start from the front of the list and check each item until I find what I am looking for or until I reach the end of the list. This is called linear search and it is O( N ) where N is the number of items in the list.

Binary Search

Can searching a list be made faster? Consider a list of integers where the integers are in order from smallest to highest and I want to know if 50 is in the list. One way to search for 50 is to start at the head of the list and examine each element. If I find 50, I can stop. If I find a number larger than 50, I can also stop because the list is ordered.

While this is a better way of searching than using an unorganized list, I am not taking full advantage of the fact that the list is in order. Consider if the list is stored in an array and there are 1000 numbers in the array A[ 0, 999 ] and I am looking for a 50. Suppose I compare 50 to the value in A[ 500 ]. If A[ 500 ] == 50, I am in luck. But if A[ 500 ] < 50, then I need to search between A[ 501 ] and A[ 999 ]; otherwise, I need to search between A[ 0 ] and A[ 499 ]..

As an algorithm, binary search does the following:

Let the bounds of the array be called Low and High and let Mid be an integer. Let Found be a Boolean value that is initialized to false. While( ( Low < = High ) and ( not Found ) ) Let Mid = ( Low + High ) / 2 If ( A[ Mid ] == 50 then Found = true Else If (A[ Mid ] < 50 ) Low = Mid + 1 Else High = Mid – 1

What is the run-time order of this algorithm? It is ( Log2 N ). This is because it always cut the search space in half. Notice that we can not use this algorithm on a list that is implemented as a linked list. And the array must always be maintained as an ordered list.

Here is where we see what is meant by asymptotic measures. If the size of the table is small, say 8 slots, then the expected number of searches for linear search is 4 and the expected number of searches for binary search is 3. When the work that is necessary to maintain the table in order is factored in, it is probably not worth while using a binary search. On the other hand, if the size of the table is 1,000,000. Then the expected number of searches for sequential search is 500,000, but the maximum number of searches for binary search is 20.

Now, what if our list is not in an array, but a linked list? Well, this is where a structure such as a binary search tree or a B+ tree come in. Consider a B+ tree of order 100. The root has 100 children, 10,000 grandchildren, and 1,000,000 great-grandchildren. Thus, a tree of 1,000,000 nodes only has a depth from the root to the leaves of 3.

But again we see that if the size of the list is not large enough, more work is necessary than might be saved from a linear search to use one of these structures.

Personal tools
MediaWiki Appliance - Powered by TurnKey Linux