Searching

From

(Difference between revisions)
Jump to: navigation, search
m (Protected "Searching" ([edit=autoconfirmed] (indefinite) [move=autoconfirmed] (indefinite)))
Line 1: Line 1:
-
We will consider searching techniques in this section. First we will consider simple linear
+
In theoretical computer science, a search algorithm, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms. The set of all possible solutions to a problem is called the search space. Brute-force search, otherwise known as naïve or uninformed, algorithms use the simplest method of the searching through the search space, whereas informed search algorithms use heuristic functions to apply knowledge about the structure of the search space to try to reduce the amount of time spent searching.
-
search, then binary search, and then hashing techniques.
+
<br/><br/>
 +
In the context of a data structures course, we are most interested in seach as it relates to ADT's, in particular searching a container or list. List search algorithms are perhaps the most basic kind of search algorithm. The goal is to find one element of a set by some key (perhaps containing other information related to the key). As this is a common problem in computer science, the computational complexity of these algorithms has been well studied.
 +
<br/><br/>
 +
 +
The simplest such algorithm is linear search, which simply examines each element of the list in order. It has expensive O(n) running time, where n is the number of items in the list, but can be used directly on any unprocessed list. A more sophisticated list search algorithm is binary search; it runs in O(log n) time. This is significantly better than linear search for large lists of data, but it requires that the list be sorted before searching (see sorting algorithms) and also be random access. Interpolation search is better than binary search for large sorted lists with fairly even distributions (O(log (log n)) complexity), but has a worst-case running time of O(n).
 +
<br/><br/>
 +
 +
We will consider ADT searching techniques in this section. First we will consider simple linear
 +
search, then binary search, and then hashing techniques.
 +
<br/><br/>
==Linear Search==
==Linear Search==

Revision as of 07:00, 6 April 2009

In theoretical computer science, a search algorithm, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms. The set of all possible solutions to a problem is called the search space. Brute-force search, otherwise known as naïve or uninformed, algorithms use the simplest method of the searching through the search space, whereas informed search algorithms use heuristic functions to apply knowledge about the structure of the search space to try to reduce the amount of time spent searching.

In the context of a data structures course, we are most interested in seach as it relates to ADT's, in particular searching a container or list. List search algorithms are perhaps the most basic kind of search algorithm. The goal is to find one element of a set by some key (perhaps containing other information related to the key). As this is a common problem in computer science, the computational complexity of these algorithms has been well studied.

The simplest such algorithm is linear search, which simply examines each element of the list in order. It has expensive O(n) running time, where n is the number of items in the list, but can be used directly on any unprocessed list. A more sophisticated list search algorithm is binary search; it runs in O(log n) time. This is significantly better than linear search for large lists of data, but it requires that the list be sorted before searching (see sorting algorithms) and also be random access. Interpolation search is better than binary search for large sorted lists with fairly even distributions (O(log (log n)) complexity), but has a worst-case running time of O(n).

We will consider ADT searching techniques in this section. First we will consider simple linear search, then binary search, and then hashing techniques.

Linear Search

Consider a list structure, where the data in the list is not in any kind of order. If I want to know if a certain value is contained in the list, I need to start from the front of the list and check each item until I find what I am looking for or until I reach the end of the list. This is called linear search and it is O( N ) where N is the number of items in the list.

Binary Search

Can searching a list be made faster? Consider a list of integers where the integers are in order from smallest to highest and I want to know if 50 is in the list. One way to search for 50 is to start at the head of the list and examine each element. If I find 50, I can stop. If I find a number larger than 50, I can also stop because the list is ordered.

While this is a better way of searching than using an unorganized list, I am not taking full advantage of the fact that the list is in order. Consider if the list is stored in an array and there are 1000 numbers in the array A[ 0, 999 ] and I am looking for a 50. Suppose I compare 50 to the value in A[ 500 ]. If A[ 500 ] == 50, I am in luck. But if A[ 500 ] < 50, then I need to search between A[ 501 ] and A[ 999 ]; otherwise, I need to search between A[ 0 ] and A[ 499 ]..

As an algorithm, binary search does the following:

LET the bounds of the array be called Low and High and let Mid be an integer. 
LET Found be a Boolean value that is initialized to false. 

WHILE ( ( Low < = High ) and ( not Found ) ) 
	LET Mid = ( Low + High ) / 2 
	IF ( A[ Mid ] == 50 ) 
		Found = true 
	ELSE IF (A[ Mid ] < 50 ) 
		Low = Mid + 1 
	ELSE 
		High = Mid – 1
        ENDIF

What is the run-time order of this algorithm? It is ( Log2 N ). This is because it always cut the search space in half. Notice that we can not use this algorithm on a list that is implemented as a linked list. And the array must always be maintained as an ordered list.

Here is where we see what is meant by asymptotic measures. If the size of the table is small, say 8 slots, then the expected number of searches for linear search is 4 and the expected number of searches for binary search is 3. When the work that is necessary to maintain the table in order is factored in, it is probably not worth while using a binary search. On the other hand, if the size of the table is 1,000,000. Then the expected number of searches for sequential search is 500,000, but the maximum number of searches for binary search is 20.

Now, what if our list is not in an array, but a linked list? Well, this is where a structure such as a binary search tree or a B+ tree come in. Consider a B+ tree of order 100. The root has 100 children, 10,000 grandchildren, and 1,000,000 great-grandchildren. Thus, a tree of 1,000,000 nodes only has a depth from the root to the leaves of 3.

But again we see that if the size of the list is not large enough, more work is necessary than might be saved from a linear search to use one of these structures.

Hashing

...Short, generic description of Hashing here, the state that the more complex details are provided in a later section..

Personal tools
MediaWiki Appliance - Powered by TurnKey Linux