Sets and MultiSets (Bags)

From

Jump to: navigation, search
← Unordered Collection ADTs ↑ Contents: CS2 Maps and Dictionaries →


In computer science, a set is a collection (container) of certain values, without any particular order, and no repeated values. It corresponds with a finite set in mathematics. Disregarding sequence, and the fact that there are no repeated values, it is the same as a list. A set can be seen as an associative array (partial mapping) in which the value of each key-value pair is ignored.

Implementations

Sets can be implemented using various data structures. Ideal set data structures make it efficient to check if an object is in the set, as well as enabling other useful operations such as iterating through all the objects in the set, performing a union or intersection of two sets, or taking the complement of a set in some limited domain. Any associative array data structure can be used to implement a set by letting the set of keys be the elements of the set and ignoring the values. Because of the similarity to associative arrays, sets are commonly implemented in the same ways, namely, a self-balancing binary search tree for sorted sets (which has O(log n) for most operations), or a hash table for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations). A sorted linear hash table[1] may be used to provide deterministically ordered sets.

Other popular methods include arrays. In particular a subset of the integers 1..n can be implemented efficiently as an n-bit bit array, which also support very efficient union and intersection operations. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.

However, very few of these data structures support set operations such as union or intersection efficiently. For these operations, more specialized set data structures exist.

Multiset (Bag)

A variation of the set is the multiset or bag, which is the same as a set data structures, but allows repeated values. Formally, a multiset can be thought of as an associative array that maps unique elements to positive integers, indicating the mulplicity of the element, although actual implementation may vary. C++'s Standard Template Library provides the "multiset" class for the sorted multiset, and SGI's STL provides the "hash_multiset" class, which implements a set using a hash table. Apache Commons Collections provides Bag and SortedBag interface for Java; along with implementing classes like HashBag and TreeBag, analogous to similarly-named Set implementations.


CS2: Data Structures
Theory of Computation - ADT Preliminaries
Linear ADTs - Tree ADTs - Graph ADTs - Unordered Collection ADTs


Personal tools
MediaWiki Appliance - Powered by TurnKey Linux