Math 61 -- Notes C

Notes C:
Additional Topics

Math 61 -- D. C. Smolarski, S.J.
Santa Clara University, Department of Mathematics and Computer Science

[Return to Math 61 Homepage | Return to Notes Listing Page]

Dynamic Data Structures
Comparison Example
Parallel Computers and Data Structures
Intelligent Choice of Data Structures
Miscellaneous

Dynamic Data Structures

An older view of programming held that there were two aspects to every running program:

     static part -- data structures
   /
   \
     dynamic part -- program code

However, this division was always considered imperfect, since there was overlap, in that the data could influence how the program ran, and so could be said to have a dynamic aspect to it.

Some work has been done in dynamic data structures (cf. Knuth, Vol 3), i.e., structures in which the stored information may regularly change locations. Specific research has been done in seeing how such dynamic structures affect running programs (cf. work by Robert E. Tarjan, cf. 1987 Turing Award Lecture, "Algorithm Design," in Communications of the ACM , 30 (March 1987), 3, pp 205-212).

In addition, in a language like LISP, where data can become part of the program, the distinction between program and data is irrelevant. Data is Program ==> Data is dynamic.

Comparison Example

(A) Given this array

	__1___2___3___4___5___6___7___8___9__
	|_1_|_2_|_3_|_4_|_5_|_6_|_7_|_8_|_9_|

the associated "decision tree" for binary search would be

                     |
		     5
		   </ \>
		   /   \
	 	  2     7
		</ \> </ \>
	        1   3 6   8
		     \>    \>
		      4     9

For example, if you were searching for 4, binary search would first compare against 5, then against 2, then against 3, then finally against 4.

Now, suppose we search for and access records in this array 50 times, such that

    4    is accessed	25 times       =  25 times total
    9    is accessed	 8 times       =   8 times total
 1,3,6,8 are accessed	 3 times each  =  12 times total
   2,7   are accessed	 2 times each  =   4 times total
    5	 is accessed	  once	       =   1 time  total
				       ------
				          50 times total

The number of comparisons would be:

4 comparisons to access 4:       4x25	= 100 comparisons
4 comparisons to access 9:       4x8	=  32 comparisons
3 comparisons to access 1,3,6,8: 3x3x4  =  36 comparisons
2 comparisons to access 2,7:     2x2x2	=   8 comparisons
1 comparison  to access 5:       1	=   1 comparison
				      ________
					   177 comparisons

Average = 177/50 = 3.54 comparisons/elements-accessed.

(B) Suppose the array was arranged in this order:

	__1___2___3___4___5___6___7___8___9__
	|_4_|_9_|_1_|_3_|_6_|_8_|_2_|_7_|_5_|

and you do a linear search!

Suppose we access the array in the same way as before (for a total of 50 accessions).

The number of comparisons would be:

1 comparison  to access 4: 1x25	=	25 comparisons
2 comparisons to access 9: 2x8	=	16 comparisons
3 comparisons to access 1: 3x3	=	 9 comparisons
4 comparisons to access 3: 4x3	=	12 comparisons
5 comparisons to access 6: 5x3	=	15 comparisons
6 comparisons to access 8: 6x3	=	18 comparisons
7 comparisons to access 2: 7x2	=	14 comparisons
8 comparisons to access 7: 8x2	=	16 comparisons
9 comparisons to access 5: 9x1	=	 9 comparisons
					---
				       134 comparisons

Average = 134/50 = 2.68 comparisons/elements-accessed.

The number of comparisons has been reduced by 43, which is about a 25% reduction.

(C) Even the second example is somewhat static. Suppose our algorithm were such that each time an element is accessed, it is moved to the front of the array, with all other elements shifting over one. In this case, the data structure itself would be constantly in flux.

Over the long run, this approach could be faster than binary search, and it does not require that we know the frequency of accessions, as the second method shown above requires.

In addition, this approach may actually be able to speed up certain subsections of the program. For example, suppose 9 is accessed 5 times in a row and then ignored. The first accession would require the most work, and then the next 4 accessions would require only one comparison each.

We can also ask the same questions about efficiency if the underlying data structure is a tree. Perhaps balanced binary trees are not the ideal structure. Perhaps a tree should be rebalanced, moving an item toward the root position each time it is accessed?

Parallel Computers and Data Structures

The new parallel computers being developed can drastically alter our approach to data structures and associated algorithms. As one example, take the problem of searching, and the concept of a "parallel search" algorithm.

If you had 64 processors, and an array of 64 items, you could have each processor compare the desired KEY against the items in the array AT THE SAME TIME. The processors could then report back to a common word, bit by bit, if the comparison was true or false. The number generated by the bits of the report word would give the location(s) of the KEY.

This entire process would take merely one time cycle no matter what size the array is (so long as the number of processors is greater than or equal to the number of elements being searched). The most efficient non-parallel searching scheme, binary search, is O(log n).

We have just begun to explore the possibilities that parallel computers offer us. No one knows for sure how data structures may change for use in parallel machines with truly parallel algorithms.

Intelligent Choice of Data Structures

People must be prudent in their use of data structures. Not every structure can be used in every application, and some applications may demand a variety of structures depending on the specific situation.

-->For example, take a square matrix. Suppose it is 100 by 100. The "best" way to implement a matrix, if almost all the places have non-zero numbers, is to use a two-dimensional array. This would demand 10,000 memory locations. -->Suppose the matrix were symmetric (i.e., about the main diagonal)

i.e., a_ij = a_ji. There are (approximately) only half the amount of possibly different values necessary for storage. To be exact, there are n²/2 + n/2 spaces for an nxn matrix. In the case of 100 by 100 matrix, we would need only 5,050 memory locations, which is a considerable savings from the "full" amount given above. The question now becomes: how does a programmer store the elements and retrieve them easily, given the original indices?

-->Suppose the matrix were tri-diagonal, i.e., zeroes everywhere but on the major diagonal and the two diagonals closest to it.

In this case we could store the matrix as three vectors, i.e., as a 3 by 100 matrix, using only (about) 300 spaces (compared to 10,000 or 5,050 as above)!

Here, we would have the correspondence

    a_ii<==> A(2,I)  (main diagonal)
 a_i+1,i<==> A(1,I)  (sub-diagonal)
 a_i,i+1<==> A(3,I)  (super-diagonal)

In this case, the storage and retrieval is easy, but the operations on matrices, e.g., matrix multiplication, may be difficult to program correctly.

-->Suppose the matrix were very sparse, i.e., > 80% of the elements were 0.

Given this situation, one might want to use linked lists as in various texts. Even with the extra space/nodes needed for links to other nodes and storage of the subscripts, a great savings in space may be achieved.

===>This example shows the complexity of using an Abstract Data Structure and implementing it in the concrete. We used one concept: a square matrix. Yet four different implementations were discussed.

This points to the fact that data structures cannot be used blindly! We should not use more space than necessary, and we should not use data structure that would unnecessarily make our algorithm code inefficient!

Miscellaneous

There are many other concepts that can be studied in an introductory course. However, lack of time and the preferences of the instructor have determined that certain topics be covered, other topics be omitted, and some topics be briefly mentioned.

Some of these other concepts include the following:

Offset coding of integers
Data reliability - parity checks
Tries (from retrieval trees)
B-trees, 2-3 trees
Using trees for game strategies
Using trees for Huffman coding algorithm
External sorting (cf. T&A, pp 830ff)
PERT networks
Graph algorithms -- Network Flow problems

Notes C: Additional Topics

Math 61 -- D. C. Smolarski, S.J. Santa Clara University, Department of Mathematics and Computer Science