Hash Tables

From

Revision as of 18:20, 28 March 2009 by 209.237.84.181 (Talk)
Jump to: navigation, search

Hashing is called a Key to Address system. Objects (records) are stored in a table, and each object stored occupies a slot in the table. The location of the slot is called the address of the object. The address of an object is obtained using a function called a hash function. Each object has an associated key. The hash function takes the key as a parameter and returns the address of the object.

Contents

Hash Table Behavior

A good hash function should distribute the objects throughout the table with equal probability. A good hash function should minimize collisions. In addition, a good hash function should be easy to calculate.
For example, a simple hash function for a table of size M is:

FUNCTION hash( key )
    RETURN (key % M);
ENDFUNCTION


However, this function works well only if the keys have an equal distribution.

Other examples of hash functions: ELF (Executable and Linking Format [Unix System V]), and Soundex

Collision Resolution

There are two basic techniques of collision resolution: Open Hashing (also called Separate Chaining) and Closed Hashing (also called Open Addressing).

Open Hashing

Open Hashing means that collisions are stored outside the table. It may be an overflow area or it may be a series of linked lists:
...IMAGE HERE...
Another version of open hashing is called Bucket Hashing. In bucket hashing, the slots of the table are organized into groups called buckets. The hash function returns a bucket number, and the item is inserted into an open slot in the bucket. If the bucket is full, there is an overflow area.

Closed Hashing

In closed hashing, collisions are stored in the table itself. This can create a condition known as a secondary collision.

One type of collision resolution is know as probing. In probing, the hash function becomes:

h( k ) + p( i ), where i is an iteration value and p( 0 ) = 0. The simplest of these is linear probing, where p( i ) = i. This, however, can result in a problem called clustering, which can increase the probability of secondary collisions.


Suppose the table has ten slots and the hash function gives an equal probability to any slot in the table. Thus the probability is 0.1 that a key hashes to any one slot. Now suppose that slots 0 through 3 are filled. Then the probability that the next value will end up in slot 4 is 0.5, because anything that hashes to slots 0, 1, 2, 3, or 4 will end up in slot 4.

In Quadratic Probing, the probe function is p( i ) = i2. With quadratic probing, if the size of the table is a prime and the table is less than one-half full, an new element can always be inserted. However, if the table is more than half full, the insertion might fail.

A problem with probing is that the same sequence is used for all keys. So, as collisions occur, secondary collisions increase. An alternative to this is double hashingItalic text. Double hashing is defined by the function: h1( k ) + i * h2( k ). Note that it is important that h2( k ) not produce a 0. A good choice for a double hashing function is the let the table size be a prime number M and to let R be another prime that is less than M, then h2( k ) = R – ( k % R ).

Personal tools
MediaWiki Appliance - Powered by TurnKey Linux