Searching

From

(Difference between revisions)
Jump to: navigation, search
(Binary Search)
 
(11 intermediate revisions not shown)
Line 1: Line 1:
-
We will consider searching techniques in this section. First we will consider simple linear
+
<noinclude>
-
search, then binary search, and then hashing techniques.
+
{{CS2:BookNav|base=/|up=Contents: CS2|prev=Algorithms and Psuedocode|next=Sorting}}
 +
</noinclude>
 +
<br/>
 +
In theoretical computer science, a search algorithm, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms. The set of all possible solutions to a problem is called the search space. Brute-force search, otherwise known as naïve or uninformed, algorithms use the simplest method of the searching through the search space, whereas informed search algorithms use heuristic functions to apply knowledge about the structure of the search space to try to reduce the amount of time spent searching.
 +
<br/><br/>
 +
In the context of a data structures course, we are most interested in seach as it relates to ADT's, in particular searching a container or list. List search algorithms are perhaps the most basic kind of search algorithm. The goal is to find one element of a set by some key (perhaps containing other information related to the key). As this is a common problem in computer science, the computational complexity of these algorithms has been well studied.
 +
<br/><br/>
-
==Linear Search==
+
The simplest such algorithm is linear search, which simply examines each element of the list in order. It has expensive O(n) running time, where n is the number of items in the list, but can be used directly on any unprocessed list. A more sophisticated list search algorithm is binary search; it runs in O(log n) time. This is significantly better than linear search for large lists of data, but it requires that the list be sorted before searching (see sorting algorithms) and also be random access. Interpolation search is better than binary search for large sorted lists with fairly even distributions (O(log (log n)) complexity), but has a worst-case running time of O(n).
-
Consider a list structure, where the data in the list is not in any kind of order. If I want to
+
<br/><br/>
-
know if a certain value is contained in the list, I need to start from the front of the list and
+
-
check each item until I find what I am looking for or until I reach the end of the list. This is
+
-
called linear search and it is O( N ) where N is the number of items in the list.
+
 +
We will consider the most basic searching techniques in this section (simple array searches). First we will consider linear search and then binary search. Search is a key behavior of each of the ADT's that we will look at later, so it will be a common topic in this course.
 +
<br/><br/>
 +
 +
== Linear/Sequential Search ==
 +
Consider a list structure, where the data in the list is not in any kind of order. If I want to know if a certain value is contained in the list, I need to start from the front of the list and check each item until I find what I am looking for or until I reach the end of the list. This is called '''linear''' (or '''sequential''') search.<br><br>
 +
<br/><br/>
 +
 +
The psuedocode below shows linear search on an array. This algorithm checks each item in the array, in sequential order, to see if it matches the one that we seek. The SequentialSearch function is short and easy to read. Note that it returns in the array index of where it found the match or -1 if no match was found. The -1 can be used because it is generally not a valid array index.
 +
<br>
 +
<pre>FUNCTION SequentialSearch( IntArray, Count, Target)
 +
  FOR k  Count 1
 +
      IF IntArray[k] equals Target
 +
        RETURN k  // Target was found
 +
      ENDIF
 +
  ENDFOR
 +
 +
  RETURN -1  // If reach here, Target was not found.
 +
ENDFUNCION
 +
</pre>
 +
<br/>
 +
What is the running time of this linear search? Let's use n instead of Count, as the number of data items in the array. Since the main task that gets repeated in the search is the if test that compares an item in the array with the Target, it makes sense to estimate the running time by counting the number of such data items tested. (This is sometimes called the number of probes of the array.)<br><br>In the best case, we find a match on the first probe. The first data item in the array is the one we wanted. In the worst case the number of probes is n; we have to check all of the data items in the array. In the average case the number of probes is approximately n/2. We can summarize the results as follows:<br>
 +
<br/>
 +
{| border="1" cellspacing="1" cellpadding="1" width="200"
 +
|-
 +
| Case
 +
| Run Time
 +
|-
 +
| Best
 +
| Theta( 1)
 +
|-
 +
| Average
 +
| Theta( n)
 +
|-
 +
| Worst
 +
| O( n)
 +
|}
 +
<br/><br/>
== Binary Search ==
== Binary Search ==
-
Can searching a list be made faster? Consider a list of integers where the integers are in order from smallest to highest and I want to know if 50 is in the list. One way to search for 50 is to start at the head of the list and examine each element. If I find 50, I can stop. If I find a number larger than 50, I can also stop because the list is ordered.
+
Can searching a list be made faster? Certainly!
 +
<br/><br/>
 +
 
 +
Another way to look for a target item in an array is by using a binary search. A binary search requires that all of the data be in order, typically in ascending order. (Ascending order means that as you follow the indices in order 0, 1, 2,... you get the data items in order such as 125, 200, 219,...).
 +
<br/><br/>
 +
 
 +
Consider a list of integers where the integers are in order from smallest to highest and I want to know if 50 is in the list. One way to search for 50 is to start at the head of the list and examine each element. If I find 50, I can stop. If I find a number larger than 50, I can also stop because the list is ordered.
<br/><br/>
<br/><br/>
Line 43: Line 89:
But again we see that if the size of the list is not large enough, more work is necessary than might be saved from a linear search to use one of these structures.
But again we see that if the size of the list is not large enough, more work is necessary than might be saved from a linear search to use one of these structures.
<br/><br/>
<br/><br/>
 +
----
 +
{{CS2/ChapNav}}
 +
----
 +
[[Category:CS2|{{SUBPAGENAME}}]]

Current revision as of 18:20, 9 April 2009

← Algorithms and Psuedocode ↑ Contents: CS2 Sorting →


In theoretical computer science, a search algorithm, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms. The set of all possible solutions to a problem is called the search space. Brute-force search, otherwise known as naïve or uninformed, algorithms use the simplest method of the searching through the search space, whereas informed search algorithms use heuristic functions to apply knowledge about the structure of the search space to try to reduce the amount of time spent searching.

In the context of a data structures course, we are most interested in seach as it relates to ADT's, in particular searching a container or list. List search algorithms are perhaps the most basic kind of search algorithm. The goal is to find one element of a set by some key (perhaps containing other information related to the key). As this is a common problem in computer science, the computational complexity of these algorithms has been well studied.

The simplest such algorithm is linear search, which simply examines each element of the list in order. It has expensive O(n) running time, where n is the number of items in the list, but can be used directly on any unprocessed list. A more sophisticated list search algorithm is binary search; it runs in O(log n) time. This is significantly better than linear search for large lists of data, but it requires that the list be sorted before searching (see sorting algorithms) and also be random access. Interpolation search is better than binary search for large sorted lists with fairly even distributions (O(log (log n)) complexity), but has a worst-case running time of O(n).

We will consider the most basic searching techniques in this section (simple array searches). First we will consider linear search and then binary search. Search is a key behavior of each of the ADT's that we will look at later, so it will be a common topic in this course.

Linear/Sequential Search

Consider a list structure, where the data in the list is not in any kind of order. If I want to know if a certain value is contained in the list, I need to start from the front of the list and check each item until I find what I am looking for or until I reach the end of the list. This is called linear (or sequential) search.



The psuedocode below shows linear search on an array. This algorithm checks each item in the array, in sequential order, to see if it matches the one that we seek. The SequentialSearch function is short and easy to read. Note that it returns in the array index of where it found the match or -1 if no match was found. The -1 can be used because it is generally not a valid array index.

FUNCTION SequentialSearch( IntArray, Count, Target)
   FOR k  Count 1
      IF IntArray[k] equals Target
         RETURN k   // Target was found
      ENDIF
   ENDFOR

   RETURN -1  // If reach here, Target was not found.
ENDFUNCION


What is the running time of this linear search? Let's use n instead of Count, as the number of data items in the array. Since the main task that gets repeated in the search is the if test that compares an item in the array with the Target, it makes sense to estimate the running time by counting the number of such data items tested. (This is sometimes called the number of probes of the array.)

In the best case, we find a match on the first probe. The first data item in the array is the one we wanted. In the worst case the number of probes is n; we have to check all of the data items in the array. In the average case the number of probes is approximately n/2. We can summarize the results as follows:

Case Run Time
Best Theta( 1)
Average Theta( n)
Worst O( n)



Binary Search

Can searching a list be made faster? Certainly!

Another way to look for a target item in an array is by using a binary search. A binary search requires that all of the data be in order, typically in ascending order. (Ascending order means that as you follow the indices in order 0, 1, 2,... you get the data items in order such as 125, 200, 219,...).

Consider a list of integers where the integers are in order from smallest to highest and I want to know if 50 is in the list. One way to search for 50 is to start at the head of the list and examine each element. If I find 50, I can stop. If I find a number larger than 50, I can also stop because the list is ordered.

While this is a better way of searching than using an unorganized list, I am not taking full advantage of the fact that the list is in order. Consider if the list is stored in an array and there are 1000 numbers in the array A[ 0, 999 ] and I am looking for a 50. Suppose I compare 50 to the value in A[ 500 ]. If A[ 500 ] == 50, I am in luck. But if A[ 500 ] < 50, then I need to search between A[ 501 ] and A[ 999 ]; otherwise, I need to search between A[ 0 ] and A[ 499 ]..

As an algorithm, binary search does the following:

LET the bounds of the array be called Low and High and let Mid be an integer. 
LET Found be a Boolean value that is initialized to false. 

WHILE ( ( Low < = High ) and ( not Found ) ) 
	LET Mid = ( Low + High ) / 2 
	IF ( A[ Mid ] == 50 ) 
		Found = true 
	ELSE IF (A[ Mid ] < 50 ) 
		Low = Mid + 1 
	ELSE 
		High = Mid – 1
        ENDIF

What is the run-time order of this algorithm? It is ( Log2 N ). This is because it always cut the search space in half. Notice that we can not use this algorithm on a list that is implemented as a linked list. And the array must always be maintained as an ordered list.

Here is where we see what is meant by asymptotic measures. If the size of the table is small, say 8 slots, then the expected number of searches for linear search is 4 and the expected number of searches for binary search is 3. When the work that is necessary to maintain the table in order is factored in, it is probably not worth while using a binary search. On the other hand, if the size of the table is 1,000,000. Then the expected number of searches for sequential search is 500,000, but the maximum number of searches for binary search is 20.

Now, what if our list is not in an array, but a linked list? Well, this is where a structure such as a binary search tree or a B+ tree come in. Consider a B+ tree of order 100. The root has 100 children, 10,000 grandchildren, and 1,000,000 great-grandchildren. Thus, a tree of 1,000,000 nodes only has a depth from the root to the leaves of 3.

But again we see that if the size of the list is not large enough, more work is necessary than might be saved from a linear search to use one of these structures.


CS2: Data Structures
Theory of Computation - ADT Preliminaries
Linear ADTs - Tree ADTs - Graph ADTs - Unordered Collection ADTs


Personal tools
MediaWiki Appliance - Powered by TurnKey Linux