Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EECS 560 Project 2 Due dates: November 2, 2010 November 16, 2010 Note there are two due dates for this project. The first due date is for the proof that all your structures have been correctly implemented. The second due date is for the report which includes the data generated by running the timing tests on your structures and your analysis of that data. For this project the report will be a significant portion of your grade. Description: In this project you will compare several different data structures that can be used to implement priority queues. That is, you will “experiment” with various structures and determine which one performs best under what circumstances. This will require that you generate random sets of data to compare the efficiencies of the various structures for the different operations and then running timing tests on the randomly generated data sets. The data structures to be tested are d-heaps, binary heaps, minmax heaps, binomial queues, and skew heaps. The random generation process is described below. The most important thing in testing is to be sure that the set of data is identical for each type of data structure and that all tests of the same size for the same data structure are done on different data sets. Thus, you might find it easiest to do the runs for each structure in a separate program. Part A: In this part you are to look at d-heaps for various values of d and compare the efficiency of the insert and deletemin operations for the following d values: 3, 5, and 9. You will build the d-heaps for various values of n by randomly generating a total of n values between 1 and 4n. There will be duplicate values, but that’s O.K. Use the following values of n: 25,000, 50,000, 100,000, and 200,000. Build the d-heaps by using a modification of buildheap. Do not include the time required to build the initial structure in your timing tests. Part B: In the second part of the project you will compare the skew heaps, minmax heaps and binomial queues with the minheap. In order to make things easy, use the buildheap to get the original structure for the minheap, and skew heaps. Thereafter use the insert and deletemin operations defined for that structure. The original data is to be generated exactly as in part 1 using n = 50,000, 100,000, 200,000, and 400,000. On the web page is a paper entitled “Min-Max Heaps and Generalized Priority Queues” by Atkinson, et. al. from the Communications of the ACM, Volume 29, Number 10, p. 996. Use that paper as the reference for the min-max heap. For the binomial queue, you will have to determine how you want to “organize” the trees in the queue. If in doubt your method is appropriate, check with Chris or me. Using a random number generator To ensure that the timing tests are “fair,” you must use the same seed for the random number generator for each structure. Note: you only seed the random number generator once, at the beginning of a run. If you reseed it during the timing tests you will effectively be testing the same data set each time. Structuring the timing tests After building the initial heap, here's the way to generate your test data for a random sequence of operations. 1. Generate a random integer between 2n and 5n. This will be the number of operations to perform. Let's call it M. 2. Perform these steps M times: a. Generate a random number x such that 0 x 1 b. If 0 x < 0.5, perform a deletemin operation. If 0.5 x 1, generate a random integer y such that 1 y 4n and insert y. Then, generate the data to initialize a new structure and repeat the process above. To get the average time, you must run a minimum of 10 timing tests for each structure for each value of n. To get better timing results, you may want to run even more tests for the smaller data sets. Do all tests on lists of the same size in the same run of a program. Bonus part: This part is optional so make sure everything else is working properly before doing this part. To test the individual operations in order to try to experimentally determine the complexity of the two operations you will first generate heaps of the indicated size. Instead of randomly generating operations, you will instead do the following: 1. Randomly generate n/2 integers between 2n and 5n and insert each into the heap structure as soon as it is generated. (Thus, you’re testing the insert operation.) 2. Perform n/2 deletemin operations. Again, run a minimum of 10 tests for each structure for each value of n. Get separate timing results for each of the two operations in order to try to determine the complexity of each operation for each structure. Requirements: 1. To ensure that the timing tests are “fair,” use the same seed for the random number generator for each of the heap variations. To get accurate timing you may need to run more than ten tests on the short lists. Record and report both the number of operations for each data set (i.e. the M value) as well as the average number of deletemin operations and insertions for each group of tests. For example, you run 20 tests on heaps initially containing 25,000 integers. Report the results from each of the 20 tests and also the average number of deletemin and insert operations. 2. You must clearly show that each of your operations is working correctly. (This must be done before you do the timing.) It will probably be easiest to do this in a separate program so that additional code can be used to print out information about the heaps, and/or to read from a data file. For example, after each deletemin and insertion, print out the heap. Use relatively small randomly generated data sets (size between 20 and 50) to illustrate the correctness of your implementations. E-mail me the code for the correctness tests when you turn in the written report. All code must run on the department’s linux machines. To turn in: A two-part written report that includes: Part 1 (Due November 1): a. A thorough description of how your data structures were implemented, including any problems you ran into. b. A hard copy of the code used for the correctness verification c. All of the data (test cases, results of operations, etc.) gathered to prove the correctness of the implementation. Present the data so that each heap structure is clear. At the minimum, there should be the initial set of data for each structure, the heap built from that data, and at least two (of each operation) inserts and deletemins. These must be data output by the computer, not put in by hand. For the minheap, an array is an appropriate way in which to present the correctness results. For d-heaps use d = 5 in your correctness testing. For the skew heaps, minmax heaps and binomial queues you will need to devise a representation of the structure. Please draw (by hand is fine) a picture of the corresponding structure next to the output for that structure. Part 2 (Due November 16): Turn in the following a. A complete description of how your testing was done. This should include the number of tests, when the timing was started and stopped for each data set, any problems you ran into, etc. b. A discussion of your part A results and the conclusions about which value of d gives the best performance, and an explanation of why. c. A tabular summary of all the data obtained for each set of tests for the heap variations in part B. So, for each structure, for each data set size, for each set of data, report the number of insertions, number of deletemin operations and the time taken for this test. The last line in each table should be the average of the numbers reported in that table. One possible table structure is this: Set size 100,000, structure minheap Test number # of insertions # of deletemins time d. A thorough analysis of your data and conclusions you can draw from it. For example, use the data gathered to estimate the complexity of the insert and deletemin operations for your programs and see how well your results agree with what the complexity should be. Try to explain any anomalies—i.e. data that doesn’t seem to match the predicted performance.