Download EECS 560

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
EECS 560
Project 2
Due dates: November 2, 2010
November 16, 2010
Note there are two due dates for this project. The first due date is for the proof that all your
structures have been correctly implemented. The second due date is for the report which
includes the data generated by running the timing tests on your structures and your analysis of
that data. For this project the report will be a significant portion of your grade.
Description: In this project you will compare several different data structures that can be
used to implement priority queues. That is, you will “experiment” with various structures and
determine which one performs best under what circumstances. This will require that you
generate random sets of data to compare the efficiencies of the various structures for the
different operations and then running timing tests on the randomly generated data sets. The
data structures to be tested are d-heaps, binary heaps, minmax heaps, binomial queues, and
skew heaps. The random generation process is described below. The most important thing
in testing is to be sure that the set of data is identical for each type of data structure and that all
tests of the same size for the same data structure are done on different data sets. Thus, you
might find it easiest to do the runs for each structure in a separate program.
Part A: In this part you are to look at d-heaps for various values of d and compare the
efficiency of the insert and deletemin operations for the following d values: 3, 5, and 9. You
will build the d-heaps for various values of n by randomly generating a total of n values
between 1 and 4n. There will be duplicate values, but that’s O.K. Use the following values of
n: 25,000, 50,000, 100,000, and 200,000. Build the d-heaps by using a modification of
buildheap. Do not include the time required to build the initial structure in your timing tests.
Part B: In the second part of the project you will compare the skew heaps, minmax heaps and
binomial queues with the minheap. In order to make things easy, use the buildheap to get the
original structure for the minheap, and skew heaps. Thereafter use the insert and deletemin
operations defined for that structure. The original data is to be generated exactly as in part 1
using n = 50,000, 100,000, 200,000, and 400,000.
On the web page is a paper entitled “Min-Max Heaps and Generalized Priority Queues” by
Atkinson, et. al. from the Communications of the ACM, Volume 29, Number 10, p. 996. Use
that paper as the reference for the min-max heap.
For the binomial queue, you will have to determine how you want to “organize” the trees in the
queue. If in doubt your method is appropriate, check with Chris or me.
Using a random number generator
To ensure that the timing tests are “fair,” you must use the same seed for the random number
generator for each structure. Note: you only seed the random number generator once, at the
beginning of a run. If you reseed it during the timing tests you will effectively be testing the
same data set each time.
Structuring the timing tests
After building the initial heap, here's the way to generate your test data for a random sequence
of operations.
1. Generate a random integer between 2n and 5n. This will be the number of operations to
perform. Let's call it M.
2. Perform these steps M times:
a. Generate a random number x such that 0  x  1
b. If 0  x < 0.5, perform a deletemin operation.
If 0.5  x  1, generate a random integer y such that 1  y  4n and insert y.
Then, generate the data to initialize a new structure and repeat the process above. To get the
average time, you must run a minimum of 10 timing tests for each structure for each value of n.
To get better timing results, you may want to run even more tests for the smaller data sets. Do
all tests on lists of the same size in the same run of a program.
Bonus part: This part is optional so make sure everything else is working properly before
doing this part.
To test the individual operations in order to try to experimentally determine the complexity of
the two operations you will first generate heaps of the indicated size. Instead of randomly
generating operations, you will instead do the following:
1. Randomly generate n/2 integers between 2n and 5n and insert each into the heap structure
as soon as it is generated. (Thus, you’re testing the insert operation.)
2. Perform n/2 deletemin operations.
Again, run a minimum of 10 tests for each structure for each value of n. Get separate timing
results for each of the two operations in order to try to determine the complexity of each
operation for each structure.
Requirements:
1. To ensure that the timing tests are “fair,” use the same seed for the random number
generator for each of the heap variations. To get accurate timing you may need to run
more than ten tests on the short lists. Record and report both the number of operations for
each data set (i.e. the M value) as well as the average number of deletemin operations
and insertions for each group of tests. For example, you run 20 tests on heaps initially
containing 25,000 integers. Report the results from each of the 20 tests and also the
average number of deletemin and insert operations.
2. You must clearly show that each of your operations is working correctly. (This must be
done before you do the timing.) It will probably be easiest to do this in a separate program
so that additional code can be used to print out information about the heaps, and/or to read
from a data file. For example, after each deletemin and insertion, print out the heap. Use
relatively small randomly generated data sets (size between 20 and 50) to illustrate the
correctness of your implementations. E-mail me the code for the correctness tests when
you turn in the written report. All code must run on the department’s linux machines.
To turn in:
A two-part written report that includes:
Part 1 (Due November 1):
a. A thorough description of how your data structures were implemented, including any
problems you ran into.
b. A hard copy of the code used for the correctness verification
c. All of the data (test cases, results of operations, etc.) gathered to prove the correctness
of the implementation. Present the data so that each heap structure is clear. At the
minimum, there should be the initial set of data for each structure, the heap built from
that data, and at least two (of each operation) inserts and deletemins. These must be
data output by the computer, not put in by hand. For the minheap, an array is an
appropriate way in which to present the correctness results. For d-heaps use d = 5 in
your correctness testing. For the skew heaps, minmax heaps and binomial queues you
will need to devise a representation of the structure. Please draw (by hand is fine) a
picture of the corresponding structure next to the output for that structure.
Part 2 (Due November 16): Turn in the following
a. A complete description of how your testing was done. This should include the number
of tests, when the timing was started and stopped for each data set, any problems you
ran into, etc.
b. A discussion of your part A results and the conclusions about which value of d gives the
best performance, and an explanation of why.
c. A tabular summary of all the data obtained for each set of tests for the heap variations in
part B. So, for each structure, for each data set size, for each set of data, report the
number of insertions, number of deletemin operations and the time taken for this test.
The last line in each table should be the average of the numbers reported in that table.
One possible table structure is this:
Set size 100,000, structure minheap
Test number
# of insertions
# of deletemins
time
d. A thorough analysis of your data and conclusions you can draw from it. For example,
use the data gathered to estimate the complexity of the insert and deletemin operations
for your programs and see how well your results agree with what the complexity should
be. Try to explain any anomalies—i.e. data that doesn’t seem to match the predicted
performance.