Primitive and Composite Structures
From
(→Arrays) |
|||
Line 9: | Line 9: | ||
== Arrays == | == Arrays == | ||
+ | An array is a homogeneous collection of data that is stored sequentially in memory. An array has fast random access because of this: Since the data is homogeneous, I know the size, in bytes, of each entry in the array. Since the storage is sequential, if I know the starting address of the array, it is easy to calculate the address of any slot. Most languages provide an array as a basic data type. Many languages also allow for multidimensional arrays. | ||
- | + | An array is a particular method of storing '''elements''' of indexed data. Elements of data are stored sequentially in blocks within the array. Each element is referenced by an '''index''', or subscript. | |
- | <br>''' | + | The index is usually a number used to address an element in the array. For example, if you were storing information about each day in August, you would create an array with an index capable of addressing 31 values -- one for each day of the month. Indexing rules are language dependent, however most languages use either 0 or 1 as the first element of an array. |
+ | |||
+ | The concept of an array can be daunting to the uninitiated, but it is really quite simple. Think of a notebook with pages numbered 1 through 12. Each page may or may not contain information on it. The notebook is an ''array'' of pages. Each page is an ''element'' of the array 'notebook'. Programmatically, you would retrieve information from a page by referring to its number or ''subscript'', i.e., notebook(4) would refer to the contents of page 4 of the array notebook. | ||
+ | |||
+ | |||
+ | <center>[[Image:SimpleArray.png]]<br>''The notebook (array) contains 12 pages (elements)''</center> | ||
+ | |||
+ | |||
+ | Arrays can also be multidimensional - instead of accessing an element of a one-dimensional list, elements are accessed by two or more indices, as from a matrix or tensor. | ||
+ | |||
+ | Multidimensional arrays are as simple as our notebook example above. To envision a multidimensional array, think of a calendar. Each page of the calendar, 1 through 12, is an element, representing a month, which contains approximately 30 elements, which represent days. Each day may or may not have information in it. Programmatically then, calendar(4,15) would refer to the 4th month, 15th day. Thus we have a two-dimensional array. To envision a three-dimensional array, break each day up into 24 hours. Now calendar(4,15,9) would refer to 4th month, 15th day, 9th hour. | ||
+ | |||
+ | |||
+ | <center>[[Image:MultidimensionalArray.png]]<br>''A simple 6 element by 4 element array''</center> | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | <br> | ||
+ | === Address Calculation Example: === | ||
+ | Given A[ 4, 3 ], which has 4 rows and 3 columns. Suppose this is an array of ints, so each slot in the array takes 4 bytes. If the array is stored in column major order beginning at address 100, then the memory layout is the following. | ||
(..need diagram..) | (..need diagram..) |
Revision as of 21:47, 9 April 2009
← Memory Allocation | ↑ Contents: CS2 | Concept of an Element (Node) → |
All languages start with primitive elements such as characters, floating point and fixed point numbers. While these hold data and require memory to store, they are not normally considered data structures.
Because many programming languages provide an array structure and a record structure, these are often considered primitive structures also. So, we begin with a short look at both an array and a record.
Arrays
An array is a homogeneous collection of data that is stored sequentially in memory. An array has fast random access because of this: Since the data is homogeneous, I know the size, in bytes, of each entry in the array. Since the storage is sequential, if I know the starting address of the array, it is easy to calculate the address of any slot. Most languages provide an array as a basic data type. Many languages also allow for multidimensional arrays.
An array is a particular method of storing elements of indexed data. Elements of data are stored sequentially in blocks within the array. Each element is referenced by an index, or subscript.
The index is usually a number used to address an element in the array. For example, if you were storing information about each day in August, you would create an array with an index capable of addressing 31 values -- one for each day of the month. Indexing rules are language dependent, however most languages use either 0 or 1 as the first element of an array.
The concept of an array can be daunting to the uninitiated, but it is really quite simple. Think of a notebook with pages numbered 1 through 12. Each page may or may not contain information on it. The notebook is an array of pages. Each page is an element of the array 'notebook'. Programmatically, you would retrieve information from a page by referring to its number or subscript, i.e., notebook(4) would refer to the contents of page 4 of the array notebook.
The notebook (array) contains 12 pages (elements)
Arrays can also be multidimensional - instead of accessing an element of a one-dimensional list, elements are accessed by two or more indices, as from a matrix or tensor.
Multidimensional arrays are as simple as our notebook example above. To envision a multidimensional array, think of a calendar. Each page of the calendar, 1 through 12, is an element, representing a month, which contains approximately 30 elements, which represent days. Each day may or may not have information in it. Programmatically then, calendar(4,15) would refer to the 4th month, 15th day. Thus we have a two-dimensional array. To envision a three-dimensional array, break each day up into 24 hours. Now calendar(4,15,9) would refer to 4th month, 15th day, 9th hour.
A simple 6 element by 4 element array
Address Calculation Example:
Given A[ 4, 3 ], which has 4 rows and 3 columns. Suppose this is an array of ints, so each slot in the array takes 4 bytes. If the array is stored in column major order beginning at address 100, then the memory layout is the following.
(..need diagram..)
100 104 108 112 116 120 124 128 132 136 140 144
[0,0] [1,0] [2,0] [3,0] [0,1] [1,1] [2,1] [3,1] [0,2] [1,2] [2,2] [3,2]
To access location [2, 1] requires moving 1 column plus 2 more slots. Since a column is 4 slots or 16 bytes, then the address of [2, 1] is
100 + 1 column + 2 slots or 100 bytes + 4 slots * 4 bytes per slot + 2 slots * 4 bytes per slot or 100 bytes + 16 bytes + 8 bytes = 124 bytes.
However, to access the entire array we would typically write
for( int i = 0; i < 4; i ++ ) for( int j = 0; j < 3, j++ )
This gives the sequence: [0,0], [0,1], [0,2], [1,0], [1,1], [1,2], [2,1], [2,2], [3,0], [3,1], [3,2].
Is this a problem? Here is where the theoretical meets the practical. Consider a large array and paging.
Sparse arrays: A sparse array is one where more than half of it is empty. In this case is the array the best choice for the storage of the data?
Records
A record is a heterogeneous collection of data. The components of a record structure are usually referred to as fields. Many languages provide the ability to construct records. For example, a Java class that has no methods or has only accessor and mutator methods is a form of a record structure. In this case, the class instance variables would be the fields of the record.
Variant Records: A variant record is one where a single field can represent multiple data types, but only one type at a time. If you know what an overlay refers to in an operating system, a variant record is the same idea. That is, a variant record could have one field that might represent an int, or a double, or a boolean. It would have enough memory allocated to it to represent the biggest element, in this case a double, but only one component could be represented at a time.
CS2: Data Structures
Theory of Computation - ADT Preliminaries
Linear ADTs - Tree ADTs - Graph ADTs - Unordered Collection ADTs