Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland. Department of Computer Science Technical Report Series Algorithm Analysis Adam Duffy and Tom Dowling NUIM-CS-TR-2003-10 http://www.cs.may.ie/ Tel: +353 1 7083847 Fax: +353 1 7083848 Algorithm Analysis A Technical Report Version 1.4 Adam Duffy and Tom Dowling November 13, 2003 1 Abstract Traditionally, the emphasis among computer scientists has been on the more rigorous and theoretical modes of worst-case and average-case analysis of algorithms. However, theoretical analysis cannot tell the full story about real-world performance of algorithms. This has resulted in a growing interest in experimental analysis. This paper presents the development of a solution that aides in the experimental analysis of Java-based implementations. 2 1 Introduction ”Until recently experimental analysis of algorithms has been almost invisible in the theoretical computer science literature, although experimental work dominates algorithmic research in most other areas of computer science” [Johnson, 2002]. The author of the paper quoted above lists many pitfalls and pet peeves with regards to experimental analysis as practised in the field of computer science. This paper discusses the design and development of software to assist in experimental analysis of algorithm performance of Java-based implementations to meet the standards listed in Johnson’s paper. We identify two roles with regard to performing experimental analysis of algorithms: the Experimenter and the Analyst. The Experimenter performs the experiments and the Analyst analyzes the result of the experiments. The role of Experimenter can be automated and is done so in this paper. The software developed, the Timing API, allows the Analyst to create the test data and input this to the Timing API. The API carries out the experiments and records the results for later analysis by the Analyst. This removes the tedium of developing software to test implementations, running the tests and recording the results of the test essentially carrying out the tasks of the Experimenter. We demonstrate the use of the Timing API by performing experimental analysis on a Polynomial Arithmetic API developed by the Crypto Group at NUI Maynooth [Burnett et al., 2003]. Furthermore, this paper discusses modifications to the Java Grande Forum (JGF) Sequential Benchmark [Bull et al., 2000a,b] to improve the reporting and interpretation of the benchmark results as well as allowing the benchmark to be interrupted and restarted as desired. The JGF benchmark is used here to benchmark the Java Virtual Machine (JVM) so that comparisons of the same algorithms on different machines can be made. The paper is structured as follows. Section 2 gives a brief overview of Theoretical Algorithm Analysis. Following that, Section 3 mentions Experimental Algorithm Analysis and some of the issues involved in performing such analysis. Sections 4 and 5 describe the design and implementation of the Timing API developed to automate the role of the Experimenter. The Java Grande Forum Benchmark is covered in Section 6 along with details of improvements made by the author to the benchmark. Section 7 provides an example illustrating the use of the Timing API on another Java-based API. Conclusions and future work are discussed in Section 8. 3 2 Theoretical Algorithm Analysis The complexity of an algorithm is the cost of using the algorithm. Space complexity is the amount of memory needed. Time complexity is the amount of computation time needed to run the algorithm. Complexity is not an absolute measure, but rather a bounding function characterizing the behavior of the algorithm as the size of the data set increases. This leads to the Big-Oh Notation which is briefly defined here. For a more comprehensive introduction, see [Wilf, 1994]. 2.1 Upper Bound A function T (n) is of O(f (n)) iff there exist positive constants k and n0 such that |T (n)| ≤ k|f (n)| for all n ≥ n0 . 2.2 Lower Bound A function T (n) is of Ω(f (n)) iff there exist positive constants k and n0 such that k|f (n)| ≤ |T (n)| for all n ≥ n0 . 2.3 Tight Bound A function T (n) is of Θ(f (n)) iff it is of O(f (n)) and it is of Ω(f (n)). 2.4 Relation Between Bounds Some of the hierarchy of complexities is as follows O(1) < O(log n) < O(n) < O(n log n) < O(n2 ) < O(n3 ) < O(2n ) < O(n!) 3 Experimental Algorithms Analysis There are three different approaches to analyzing algorithms. 1. Worst-Case Analysis 2. Average-Case Analysis 3. Experimental Analysis Johnson’s paper discusses how best to perform experimental analysis of algorithms. The author discusses problems of performing this type of analysis and how to overcome them. These include Pitfalls. Defined by Johnson as ”temptations and practices that can lead experimenters into substantial wastes of time”. 1. Lost code/data. Loss of the original code and/or data means the loss of the best approach to comparing it and future algorithms. Pet Peeves. ”Favourite annoyances” of Johnson. These are common practices that are misguided. 1. The millisecond testbed. Regardless of the resolution of timings they cannot be more precise than the resolution of the underlying operating system. 2. The one-run study. A one-run study is where the tests are run once only as opposed to the preferred method of running tests many times to reduce errors. 4 3. Using the best result found as an evaluation criteria. It is preferable to use the average result of a number of runs rather than using the best result from those runs. This is related to Suggestion 1 located below. 4. The uncalibrated machine. The raw speed of a processor is an unreliable predictor of performance. This Pet Peeve leads directly to Suggestion 2 located below. 5. The lost testbed. Similar to Pitfall 1, it refers to the test data used in the experiments not being documented. 6. False precision. Related to Pet Peeve 1 this relates to reporting of a timing without accounting for the resolution of the system clock. 7. Failure to report overall running time. The total time taken to run the tests as opposed to the average running time. Suggestions. Given by Johnson in order for experimenters and authors to avoid pitfalls and to improve on the quality of results from experimental analysis. 1. Use variance reduction techniques. This simply means using the same testbed for each run of the experiments use the same set of instances for all your tests. 2. Use benchmark algorithms to calibrate machine speeds. Allows future researchers to normalize their results and compare them to results obtained by the original researcher. In response to the points above, the Timing API was created. The remaining unlisted points in the aforementioned paper relate to the selection of test data, the nature of the tests performed and subsequent presentation of data. 4 The Timing API Design The design of the Timing API is heavily influenced by existing implementations for performing unit tests. 4.1 Use Case Model Use cases document the behaviour of the system from the user’s point of view [Stevens and Pooley, 2000]. An individual use case, shown as a named oval, represents a kind of task which has to be done with support from the system under development. An actor, shown as a stick person, represents a user of the system. A user is anything external to the system which interacts with the system. The use case model (Figure 1) reveals that there are two actors in performing experimental analysis of algorithms. The first actor, Experimenter, is the sole user of the Timing API. The tasks of the other actor, Analyst, are outside the scope of the API. 4.2 Class Model Class models are used to document the static structure of the system [Stevens and Pooley, 2000]. Figure 2 contains the class model for the Timing API. The TimeRunner class is the driver class that loads in the configuration files and passes them onto the TimeConfiguration class. This 5 Figure 1: Use Case Model 6 class in turn creates a TimeSuiteConfiguration instance which contains information such as the threshold and the stopwatch - both of which will be discussed in greater detail further on - to be used by the Timing API. TimeSuite instances contain the unit times. A unit time is defined below as being the execution time of a single unit of code. Each unit time produces a TimeReport containing the results of the experiment on that unit time. The TimeReport instance will be recorded using result reporting. 5 The Timing API Implementation The implementation consists of three main components. . Timing Tests . Timing Mechanism . Result Reporting Each of these components are discussed here in turn. 5.1 Timing Tests Unit tests are so named because they test a single unit of code - typically a method or function. One or more unit tests are contained within a suite. Futhermore, suites can be contained within other suites as sub-suites. Similarly, we define a unit time to be the execution time of a single unit of code (Figure 3). Artima’s SuiteRunner [Venners et al., 2003] is an architecture for writing unit tests that offers advanced facilities for reporting results of the unit tests. It was hoped that an extension to SuiteRunner would be sufficient for the purposes of unit timing. Preliminary work demonstrated otherwise as a significant amount of the SuiteRunner classes are package protected making extensions to it difficult. As a result, the Timing API code base was developed from scratch. The Experimenter extends the unit time class TimeSuite to create a timing test as follows: import nuim.cs.crypto.tamull.TimeSuite; public class MyTimeSuite extends TimeSuite{ public void timeMyMethod(){ myMethod(); } } Note that the code unit - typically a method itself - being timed must be placed within a method of a particular signature with return type void. The signature of a method is a combination of the method’s name and its parameter types. In this case, the signature is a method beginning with the letters time and consisting of no parameters. The Seek API (Figure 4) contains functionality for extracting all methods of a particular signature and return type. This functionality is contained within the MethodSeeker class. The signature of the method and the return type have to be specified by the developer using the API. Thus, the Seek API is used by the Timing API to extract the types of methods described in the preceding paragraph. See appendix B.2.2 for an example of the Seek API in action. 7 Figure 2: The Class Model for the Timing API Figure 3: Unit Tests and Unit Times 8 Figure 4: Seek API The ClassSeeker class is not directly used by the Timing API but it can still play a role. For example, it may be used to find all the classes that are subclasses of the TimeSuite class. These classes can then be passed into the Timing API and the relevant methods extracted as explained before. By saving the code and the input and output files used, the Experimenter addresses Johnson’s Pet Peeve 5 of the lost testbed and Pitfall 1 of lost code/data. This allows for reproducibility of the results obtained and for comparisons to be made in the future against the implemented algorithms and possible improvements of the same algorithms. ”Just as scientists are required to preserve their lab notebooks, it is still good policy to keep the original data if you are going to publish results and conclusions based on it” [Johnson, 2002]. 5.2 Timing Mechanism The Timing API utilizes the Stopwatch API in order to obtain the timings. The Stopwatch API (Figure 5) was developed to mimic the functionality of a conventional physical stopwatch. There are two types of stopwatches in the Stopwatch API. . System Stopwatch . Thread Stopwatch It is worth noting that regardless of the resolution of the stopwatches (milliseconds for the System Stopwatch and nanoseconds for the Thread Stopwatch) they cannot be more precise than the underlying operating system. On Windows NT/2000, the granularity is 10 milliseconds and on Windows 98 it is 50 milliseconds. For more information see [java.sun.com, 2001]. A threshold parameter is used to ensure that all tests exceed the granularity of the operating system by a significant amount answering Pet Peeve 1 of the millisecond testbed and Pet Peeve 6 of false precision. The test is run repeatedly until the threshold is reached. The total running time and the number of times the algorithm was executed is reported, not just the average running time. This requirement is in response to Pet Peeve 7 of failure to report overall running time. 9 Figure 5: Stopwatch API 5.2.1 System Stopwatch This stopwatch uses the textbook approach to simple performance analysis in Java by calling the System.currentTimeMillis() method before and after the code to be measured. The difference between the two times gives the elapsed time. This is comparable to using a stopwatch when testing GUI activity, and works fine if elapsed time is really what you want. The downside is that this approach may include much more than your code’s execution time. Time used by other processes on the system or time spent waiting for I/O can result in inaccurately high timing numbers. 5.2.2 Thread Stopwatch This stopwatch is useful for obtaining elapsed timings for a thread. The Java Virtual Machine Profiler Interface (JVMPI) provides the correct information on CPU time spent in the current thread accessed from within a Java program. This avoids the downside of using the System Stopwatch mentioned in the previous paragraph. This involves the use of the Java Native Interface (JNI) API. The basis for this particular stopwatch can be found at [Gørtz, 2003]. 5.3 Result Reporting The Reporter API uses the Observer design pattern (Figure 6). Note, in the diagram the word Observable may be substituted for the word Subject. The intent of the design pattern is to define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically. More details of the design pattern can be found in [Cooper, 1998]. In the case of the Reporter API, the Observer is the DispatchReporter and the Subjects are the Reporters. A message is sent to the DispatchReporter which then relays that message to the individual Reporters allowing for the same message to be reported at the same time in many different media (Figure 7). Examples of such reporters are . Console Display . GUI Display . Flat Text File 10 Figure 6: Class Diagram of Observer Pattern Figure 7: Reporter API . XML File . Relational Database The Timing API uses the Reporter API to record the timing results of the tests and of the benchmarks. Good reporting avoids Pitfall 1 of lost code/data and repeated runs of the tests address Pet Peeve 2 of the one-run study and Pet Peeve 3 of using the best result found as an evaluation criteria. By using the Observer design pattern, the results can be simultaneously displayed on screen and in one or more files of differing formats. The displaying of results on screen provides feedback to the experimenter as to the progress of the tests, see Figure 8. The Reporter API allows the experimenter to select the desired file format such as flat file, XML, CSV (Comma-Seperated Value), etc. without specific code being included in the Timing API for each format. 11 Figure 8: GUI Display of the Timing API 5.4 Timing API Extension There are two factors when timing a unit of code - the algorithm and the size of the input to the algorithm. In cases where a large number of algorithms and/or input data are being timed, it is prohibitive to write code for each timing to be performed. To deal with this, the Timing API has been extended to make use of the Reflection API [Green, 2003] to allow methods and input data to be contained within an XML file. Each timing is performed and recorded and the corresponding entry in the XML file is processed. See Appendix B.2 for more details. 6 Benchmarks There is many Java Virtual Machine benchmarks such as the SPEC JVM98 and Java Grande Forum (JGF) Sequential Benchmark. The JGF benchmark was selected due to the availability of the source code which allows users to understand exactly what the benchmark is testing. Also, the JGF benchmark is available free of charge. The purpose of the JGF Benchmarks are to test aspects of Java execution environments [Bull et al., 2000a]. While there are several different JGF Benchmarks, the Sequential Benchmark (version 2.0) is sufficient for timing tests related to sequential, as opposed to multi-threaded or distributed, software. Version 2.0 of the Sequential Benchmark - as developed by the EPCC of the University of Edinburgh on behalf of the JGF - consists of 3 sections 1 Low Level Operations. This section is designed to test low-level operations such as the performance of arithmetic operations, of creating objects and so on. 2 Kernels. Created for testing short codes likely to be found in Java Grande applications such as Fast Fourier Transform. 3 Applications. This section is intended to be representative of Grande applications by containing actual Grande applications. A Grande application is one which uses large amounts of processing, I/O bandwidth, or memory. 12 In this instance, the JGF Sequential Benchmark is used to calibrate the machine before running timing tests, see Pet Peeve 4 (the uncalibrated machine) and Suggestion 2 (use benchmark algorithms to calibrate machine speeds). ”Future researchers can the calibrate their own machines in the same way and, based on the benchmark data reported in the original paper, attempt to normalize the old results to their current machines” [Johnson, 2002]. However, some alterations were necessary to the JGF Sequential Benchmark in order to satisfy the Johnson’s requirements for experimental analysis. The shortcomings were 1. Lack of configuration. The experimenter had to run each suite or section separately as a driver class was required for each. To run a custom made selection of the benchmark tests required the development and compilation of new code - a cumbersome and overly complicated approach. 2. Insufficient reporting of results. The results of the benchmark tests were only reported to the console only. It was necessary to cut-npaste the results to a text file before any analysis could be performed. This may result in the loss of data. 3. Result interpretation. Only the performance value of the benchmark tests are reported on completion of the test. The performance value is a derived value, based on the time and opcount values. This approach is not in keeping with scientific standards of observation and measurement. 4. Interrupt unfriendly. If the benchmark were interrupted due to say power failure the tests would have to be restarted from the beginning. Furthermore, the machine that the benchmark is running on is off-limits until the completion of the benchmark as other running processes may affect the result. This is a problem considering that the entire benchmark can take several hours to complete. 6.1 About JGF Numbers For each section of the benchmark suite we calculate an average performance relative to a reference system, the result of which is referred to as the JGF number. This system, as selected by the authors of Java Grande, is an Sun Ultra Enterprise 3000, 250 MHz, running Solaris 2.6 and using Sun JDK 1.2.1 02 (Production version). The JGF numbers are calculated using the geometric mean as follows b Y pci 1b ) ps i i=1 ( (1) where pci is the performance value achieved on the system being benchmarked, where psi is the performance value for the benchmark of the reference system and where i is the ith benchmark and b is the number of benchmarks in the section. ”Averaged performance metrics can be produced for groups of benchmarks, using the geometric mean of the relative performances on each benchmark” [Bull et al., 2000b]. For the machine used by the author, the JGF number was found to be 2.59824 for Section 1, 4.97861 for Section 2 and 6.57787 for Section 3. To clarify, low level operations (Section 2) on the author’s machine 13 are over two and a half times greater than that of the reference system. See appendix A.1 for more details of the machine used by the author and appendix A.2 for the details of the JGF numbers quoted. 6.2 Configuration The benchmark XML configuration file allows for custom made selection of benchmark tests to be run without the development of new driver code. The configuration file has the following format <benchmarks> <benchmark> <classname>uk.ac.ed.epcc.jgf.section3.euler.JGFEulerBench</classname> <size>A</size> </benchmark> . . . </benchmarks> where each classname is the name of a class that is a subclass of one of the following 1. uk.ac.ed.epcc.jgf.jgfutil.JGFSection1 2. uk.ac.ed.epcc.jgf.jgfutil.JGFSection2 3. uk.ac.ed.epcc.jgf.jgfutil.JGFSection3 Subclasses of JGFSection1 are Low Level Operations, subclasses of JGFSection2 are Kernels and subclasses of JGFSection3 are Applications. A range of data sizes for each benchmark in Sections 2 and 3 is provided to avoid dependence on particular data sizes. 6.3 Result Reporting Using the reporting facilities of the Timing API allows for results to be both displayed on the console and stored in an XML file. The XML file also contains details such as the operating system name and version and details of the JVM that the benchmark was executed on. The new design also ensures that only one result for each benchmark (i.e., the most recent result) is stored into the specified XML file, irrespective of the number of times the same benchmark is executed. It is possible for other reporting methods to be appended at a later date. The XML report file has the following format <?xml version="1.0" encoding="UTF-8"?> <benchmarks name="Java Grande" type="Sequential" version="2.0" os_name="Windows 2000" os_arch="x86" os_version="5.0" java_vm_specification_version="1.0" java_vm_specification_vendor="Sun Microsystems Inc." java_vm_specification.name="Java Virtual Machine Specification" java_vm_version="1.4.0_01-b03" java_vm_vendor="Sun Microsystems Inc." 14 java_vm_name="Java HotSpot(TM) Client VM"> <benchmark section="Section1" suite="Arith" name="Add:Int" size="-1"> <time>10531</time> <opname>adds</opname> <opcount>1.048576E10</opcount> <calls>1</calls> </benchmark> . . . </benchmarks> 6.4 Result Interpretation The time required by the benchmark is now stored in milliseconds. This means that the output has not been altered in any way in keeping with scientific standards of observation and measurement. The performance value of a benchmark is derived information, thus it is not reported. ”Temporal performance is defined in units of operations per second, where the operation is chosen to be the most appropriate for each individual benchmark ” [Bull et al., 2000b]. Performance = Operations Time (2) This calculation belongs in the analysis of the results and not included in the reported result. As the benchmark results are now stored in XML it is easier to interpret and present the results. Existing code to convert a text file containing results into an html file is unnecessarily complicated. With the modifications detailed above, the process is now simpler. The introduction of templates (such as the freemarker template JGFBenchmark.html found in the template directory) allows for changes to be made to the presentation of results without modifications to code. This also allows for possibility of the results being displayed in a format other than HTML with the creation of an appropriate freemarker template. Futhermore, the calculation of the JGF number has been separated from the presentation logic into a component of its own. Consequently, the interpretation and presentation of the results are now two separate components (JGFBenchmarkNumber.java and JGFBenchmarkPrinter.java respectively). 6.5 Restart Functionality An XML-based status file is used to record the current progress of the benchmark run. This allows the current run to be interrupted and restarted without re-running benchmarks that have already been completed successfully. 7 7.1 An Example The Polynomial Arithmetic API Polynomials and polynomial arithmetic play an important role in many areas of mathematics. The Polynomial API was developed by the Crypto Group, Computer Science Department, National University of Ireland 15 Figure 9: Polynomial API (NUI) Maynooth for the purpose of performing arithmetic operations on unbounded univariate and bivariate polynomials [Burnett et al., 2003] (Figure 9). A univariate (one-variable) polynomial of degree d has the following form cd xd + cd−1 xd−1 + . . . + c2 x2 + c1 x1 + c0 x0 where each term of the form cxe is referred to as a monomial. Each monomial in a univariate polynomial has a coefficient c, a variable x and an exponent e. The largest exponent of all the monomials in a univariate polynomial is referred to as the degree of the polynomial d. A bivariate (two-variables) polynomial has monomials of the form cxex y ey where c is the coefficient, two variables x and y. Each variable has its own exponent value ex and ey . The total degree of a bivariate monomial is the sum of ex and ey and the total degree d of a bivariate polynomial is the largest total degree value for the monomials in the polynomial. 7.1.1 Algorithm Analysis Univariate Polynomial The following are the results for the univariate polynomial class. In the results for the univariate polynomial class BigPolynomial, the variables m represents the number of monomials in the polynomial, d the degree of the polynomial and c the size of the largest coefficient in the polynomial. 16 Figure 10: Univariate Polynomial Multiplication Experimental Analysis 1. Multiplication. There are two strategies available for univariate polynomial multiplication. 1. School Book. The method taught in school books, hence the name, where each term in one polynomial is multiplied with each term in the other in turn. This algorithm has a complexity of O(m2 Minteger (c2 )) where Minteger is the cost of multiply two large integers. 2. Binary Segmentation. This method ”amounts to placing polynomial coefficients strategically within certain large integers, and doing all the arithmetic with one high-precision integer multiply” [Crandall and Pomerance, 2001]. According to Crandall and Pomerance, this has a complexity of O(Minteger (d ln(dc2 ))) where Minteger is the cost of multiply two large integers. Figure 10 shows that the Binary Segmentation algorithm offers significant performance advantage compared to the School Book algorithm. This is in keeping with the theoretical complexity analysis results above. 2. Division. There are two strategies available for univariate polynomial division. 1. Synthetic Division. Synthetic division is a numerical method for dividing a polynomial by a binomial of the form x − k where k is a constant called the divider. This has complexity of O(dMinteger (c2 )) 17 Figure 11: Univariate Polynomial Division Experimental Analysis where Minteger is the cost of multiply two large integers. 2. Pseudo Division. Implementation of Algorithm R from [Knuth, 1998] which has a complexity of O(d2 Minteger (c2 )) where Minteger is the cost of multiply two large integers. Figure 11 shows that the Synthetic Division algorithm performs significantly better than the Pseudo Division algorithm. 3. Evaluation. For univariate polynomial evaluation, there are two strategies available. In the complexity analysis of the evaluation algorithms, v represents the value used to evaluate the polynomial. 1. School Book Evaluation. The method taught in school texts where each term in the polynomial is evaluated using the specified value. This has a complexity of O(mMinteger (c, (Einteger (v, d)))) where Minteger is the cost of multiply two large integers and Einteger is the cost of exponentiating a large integer by another large integer. 2. Horner Evaluation. This uses Horner’s rule to evaluate a polynomial for a given value. O(dMinteger (c, v)) where Minteger is the cost of multiply two large integers. Figure 12 indicates that Horner evaluation performs slightly better than School Book Evaluation. Again, this is in keeping with the theoretical analysis of the two algorithms. Bivariate Polynomial The following is the results for the multiplication operation of the bivariate polynomial class. To date there is only one implementation of 18 Figure 12: Univariate Polynomial Evaluation Experimental Analysis Figure 13: Bivariate Polynomial Multiplication Experimental Analysis 19 the multiplication algorithm for bivariate polynomials, the School Book method. It has the same complexity as the method of the same name for univariate polynomials as it is a simple extension of that algorithm. However, analysis still yields useful information such that the timing required for the algorithm is more dependant on the number of monomials rather than the total degree of the polynomial, see Figure 13. 8 Conclusion This paper detailed the development of the Timing API to automate the role of the Experimenter in performing Experimental Algorithm Analysis. It has created solutions to points raised in Johnson’s paper regarding Experimental Algorithm Analysis and incorporated those solutions in the Timing API. Consequently, this API evaluates the real-time performance of different algorithms. The benchmarking carried out in this paper will allow others to normalize the results obtained to their own machines. Therefore, our API will not be adversely affected by different machine speeds or specifications. Future work will involve the use of the Timing API as a decision tool in selecting the most efficient component algorithms for use in elliptic curve cryptosystems. The example given measures the performance of one such component. The selection will be based on the type of curve selected and the underlying machine as demonstrated by our Polynomial Arithmetic API example. 20 Appendices A A.1 Specification and Benchmarking Specification of Author’s Computer The machine used by the author was a Pentium 4 with CPU speed of 2.2 GHz. It contained a 50 Gigabyte hard disk drive with Windows 2000 installed. The Java Virtual Machine (JVM) used was Java HotSpotTM Client VM version 1.4.0-01-b03 and was supplied by Sun Microsystems Inc. A.2 JGF Sequential Benchmark Results The following results were obtained over 3 runs of the JGF Sequential Benchmark. The results of each run differ due to other processes required by the operating system. The averages cited in 6.1 are the values obtained by getting the geometric mean of the section results obtained in the following benchmark executions. The first execution of the benchmark occurred on 14th July 2003, the second on 16th July 2003 and the third on 18th July 2003. A.2.1 Section 1 - Low Level Operations The average result of the three runs Benchmark Run 1 Arith 0.2866377481298084 Assign 5.648397831816057 Cast 3.246348732759473 Create 1.6513457919015857 Exception 3.63815389262372 Loop 1.5714585719591492 Math 2.5126814385039196 Method 5.662726689464254 Serial 6.925287645101215 Overall 2.5699201771451845 A.2.2 for Section 1 is 2.59824. Run 2 Run 3 0.2874571637081641 0.28723619273533096 5.65588170107578 5.656594447830065 3.2409166098580866 3.199605876693666 1.699746341115175 1.760785792448639 3.799868978202154 3.7290343654054343 1.648256004839142 1.6088281489633403 2.545608540148679 2.5342586893189187 5.68258397724788 5.6712894064149 7.082573741801907 7.079956513750215 2.6164736578515475 2.6083325181787878 Section 2 - Kernels The average result of the three runs Benchmark Run 1 Size A 4.563771698823964 Size B 5.077057191094815 5.095853975765617 Size C Overall 4.905888201542877 21 for Section 2 is 4.97861. Run 2 Run 3 4.8936401925684425 4.916076794555665 5.007531531717145 5.012785493413412 5.125666632138881 5.139283399867139 5.008050443128609 5.0218852174879824 A.2.3 Section 3 - Applications The average result of the three runs Benchmark Run 1 Size A 6.946361082485771 Size B 6.204399738165623 Overall 6.564906768673763 22 for Section 3 is 6.57787. Run 2 Run 3 6.986994916894107 6.930853897519336 6.272015340189315 6.187887419670448 6.619859462298622 6.5488429237563635 B B.1 B.1.1 Software Instructions Java Grande Forum Benchmark Building JGF Benchmark To build the JGF Benchmark requires that Ant version 1.5 (or greater) be installed [Ant, 2003]. Navigate to the directory containing the JGF Benchmark project and type c:\java\grande>ant release This will create the grande.jar allowing the benchmarks to be run. B.1.2 Running Benchmarks To run the benchmarks, navigate to the release directory of the grande project and type the following command c:\java\grande\release>java -classpath library/gulch.jar;library/jdom.jar; library/lyra.jar;library/tamull.jar; grande.jar -Xmx512M -noverify nuim.cs.crypto.grande.JGFBenchmarkRunner -c [xml benchmark configuration file name] -r [xml reporter configuration file name] If using Windows, an alternative would be to use the batch file supplied in the release directory named benchmark.bat as follows. c:\java\grande\release>benchmark -c [xml benchmark configuration file name] -r [xml reporter configuration file name] The benchmark runner application requires the JDOM jar (see link at the bottom) to read and write the xml files. The [xml configuration file] parameter should contain the location of the xml configuration file. If this parameter is omitted, then the default xml configuration file in the config directory (config.xml) is used instead. The [xml result file] parameter should contain the location of the xml result file. If this parameter is omitted, then the results are recorded to a file named benchmark.xml in the current directory. The [xml status file] parameter is the name of the xml status file. The status file is used to record the current progress of the benchmark run. This allows the current run to be interrupted and restarted without re-running benchmarks that have already been completed successfully. If this parameter is omitted, then the status is recorded to a file named status.xml in the current directory. Therefore, using the default parameters above, it is possible to execute all the benchmarks as follows c:\java\grande\release>java -classpath library/gulch.jar;library/jdom.jar; library/lyra.jar;library/tamull.jar; grande.jar 23 -Xmx512M -noverify nuim.cs.crypto.grande.JGFBenchmarkRunner or simply with batch file as follows c:\java\grande\release>benchmark The -Xmx512M flag is required to increase the maximum Java heaps size to 512Mb as many of the benchmark tests are memory intensive. The -noverify flag is required since the Serial benchmark in Section 1 compiles correctly but fails the byte-code verifier. B.1.3 Running Benchmark Result Interpretation This should be run only after all benchmark suites for all sizes have been run. To run the benchmark printer, navigate to the grande directory and type the following command C:\java\grande\release>java -classpath library/jdom.jar;library/freemarker.jar; library/gulch.jar;library/lyra.jar; library/tamull.jar;library/xscribe.jar; grande.jar nuim.cs.crypto.grande.JGFBenchmarkPrinter -s [XML standard result file name] -c [XML current result file name] -t [template file name] -o [output file name] If using Windows, an alternative would be to use the batch file supplied in the release directory named printer.bat as follows. c:\java\grande\release>printer -s [XML standard result file name] -c [XML current result file name] -t [template file name] -o [output file name] The benchmark printer application requires the JDOM jar (see the link at the bottom) to read the xml files. The Freemarker and Xscribe jars (see the links at the bottom) are required for the presentation of the results using the template. The [xml standard result file] paramter should contain the location of the xml standard result file. The xml standard result file should contain the performance values of the benchmarks on a configuration selected by the EPCC. It is unlikely that you will wish to use this parameter since if it is omitted, then the default xml standard result file in the data directory (standard benchmark.xml) is used. The [xml current result file name] parameter should contain the location of the xml result file obtained from running the benchmarks (see Running Benchmarks above). If this parameter is omitted, then the application uses the file named benchmark.xml in the current directory if such a file exists. The [template file name] parameter should contain the location of the template file. If this parameter is omitted, then the default template file in the template directory (JGFBenchmark.html) is used 24 Figure 14: Java Grande Benchmark Result Interpretation instead. The [output file name] parameter should contain the location of the output file. If this parameter is omitted, then the result is placed in a file named output.html in the current directory. The screen-shot (Figure 14) shows the resulting HTML file from the interpretation process. The HTML file contains the details of the JVM as well as an overall JGF number of 4.945 and other derived information such as the performance of each of the benchmark tests. B.2 B.2.1 Timing API Building Timing API To build the Timing API requires that Ant version 1.5 (or greater) be installed [Ant, 2003]. Navigate to the directory containing the Timing API project and type c:\java\tamull>ant release This will create the tamull.jar allowing timing experiments to be performed. B.2.2 Performing Unit Timing Tests A unit timing test can be created as follows import nuim.cs.crypto.tamull.TimeSuite; public class MyTimeSuite extends TimeSuite{ public void timeMyMethod(){ myMethod(); 25 } public void timeMyOtherMethod(){ myOtherMethod(); } public void foobarThis() { // this method will not be timed } } Note that the code unit - typically a method itself - being timed must be placed within a method of a particular signature with return type void. The signature of a method is a combination of the method’s name and its parameter types. In this case, the signature is a method beginning with the letters time and consisting of no parameters. Unit timing tests can be executed with the following command c:\yourdirectory\yourproject>java -classpath [jardirectory]asrat.jar; [jardirectory]gulch.jar; [jardirectory]jdom.jar; [jardirectory]lyra.jar; [jardirectory]tamull.jar; -r [xml reporter configuration file name] -t [threshold time] -u [threshold units to power of 10] -w [stopwatch implementation] -s [xml suite configuration file name] The structure of the reporter XML file is as follows <reporters> <reporter> <class>my.class.path.MyReporter</class> <config> <parameter1></parameter1> . . <parameterN></parameterN> </config> </reporter> <reporter> . </reporter> </reporters> The structure of the suite XML file is as follows <suites> <suite> <class>my.class.path.MyTimeSuite</class> </suite> 26 <suite> . </suite> </suites> Valid values for -t and -u flag are numeric values. If the desired threshold is 50 milliseconds (50 x 10− 3 seconds) the values should be -t 50 -u -3. The -w flag should contain a string with the class name, in fully qualified form, of the stopwatch implementation. All stopwatch implementations must implement the nuim.cs.crypto.gulch.Stopwatch interface. An example of a class that does implement the Stopwatch interface is nuim.cs.crypto.gulch.system.SystemStopwatch. B.2.3 Using Timing API Extension The Timing API has been extended to make use of the Reflection API [Green, 2003] to allow methods and input data to be contained within an XML file. Each timing is performed and recorded and the corresponding entry in the XML file is processed. The format of the XML file is as follows <auto> <!-- Declare The Classes Being Used --> <class name="my.class.path.MyClass" alias="MyClass"/> <class name="my.class.path.MyOtherClass" alias="MyOtherClass"/> . . <!-- Declare The Methods Being Used --> <method class="MyClass" name="myMethod" alias="MyMethod"> <parameter>MyOtherClass</paremeter> . . </method> . . <!-- Declare The Instances Being Used --> <instance class="MyClass" name="MyClassInstance"> <parameter>Some Initialization String Here</parameter> . . </instance> <instance class="MyOtherClass" name="MyOtherClassInstance"> . </instance> . . <!-- Declare The Timings To Be Performed --> <time method="MyMethod" instance="MyClassInstance"> <parameter>MyOtherClassInstance</parameter> </time> . 27 . </auto> There may be one or more time elements in the XML file. The methods and instances used in the element must be declared in the file in the form of method and instance elements. These elements may have zero or more parameter elements. In the case of the method element, the parameter should contain a class. For a instance, the parameter should contain a string. Both of these elements require that classes be declared in class elements. The unit timing test is performed with the same command as before but replacing the -s flag with the -a flag. c:\yourdirectory\yourproject>java -classpath [jardirectory]asrat.jar; [jardirectory]gulch.jar; [jardirectory]jdom.jar; [jardirectory]lyra.jar; [jardirectory]tamull.jar; -r [xml reporter configuration file name] -t [threashold time] -u [threashold units to power of 10] -w [stopwatch implementation] -a [xml timing test configuration file name] B.3 Software Libraries This section gives a brief overview of the many software libraries used by the author. Note that JAR stands for Java ARchive. ∗ developed by the author + extended by the author. B.3.1 asrat.jar∗ The Seek API contains code to search for classes in specified locations that extend a specified class and/or is the specified class. The API also contains functionality to search for methods of a particular signature and having a selected return type. B.3.2 freemarker.jar The Freemarker [Geer and Bayer, 2001] API was developed by... B.3.3 grande.jar+ The Java Grande Benchmark [Bull et al., 2000a,b] was developed by the Edinburgh Parallel Computing Centre (EPCC). They are leading the benchmarking initiative of the Java Grande Forum, and have developed a suite of benchmarks to measure different execution environments of Java against each other. 28 B.3.4 gulch.jar∗ The Stopwatch API contains code to mimic the functionality of a conventional physical stopwatch. B.3.5 jdom.jar The JDOM API [Hunter and McLaughlin, 2003] was developed for the purpose of processing XML files and was developed by the JDOM open source project. B.3.6 reporter.jar∗ The Reporter API has functionality to allow for data to be reported to different reporters simultaneously. B.3.7 tamull.jar∗ The Timing API allows for timing tests to be performed on units of code. B.3.8 xscribe.jar∗ The XScribe API [Duffy, 2002] provides an alternative to using XSL (eXtended Style Language) for converting XML from one form to another. 29 References Ant. Jakarta ant project. http://ant.jakarta.apache.org, 2003. J. M. Bull, L. A. Smith, M. D. Westhead, D. S. Henty, and R. A. Davey. A benchmark suite for high performance java. Concurrency: Practice and Experience, pages 375–388, 2000a. J. M. Bull, L. A. Smith, M. D. Westhead, D. S. Henty, and R. A. Davey. Benchmarking java grande applications. pages 63–73, 2000b. Andrew Burnett, Adam Duffy, , Tom Dowling, and Claire Whelan. A java api for polynomial arithmetic. Principles and Practice of Programming in Java, 2003. James W. Cooper. The Design Patterns Java Companion. AddisonWesley, 1st edition, 1998. Richard Crandall and Carl Pomerance. Prime Numbers - A Computational Perspective. Springer-Verlag, 1st edition, 2001. Adam Duffy. Xscribe api. http://www.ijug.org/member/articles/xscribe/, 2002. Benjamin Geer and Mike Bayer. http://freemarker.sourceforge.net/, 2001. Freemarker api. Jesper Gørtz. Use the jvm profiler interface for accurate timing. http://www.javaworld.com/javatips/jw-javatif94.html, 2003. Dale Green. The java tutorial : The reflection http://java.sun.com/docs/books/tutorial/reflect/, 2003. api. Jason Hunter and Brett http://www.jdom.org, 2003. api. McLaughlin. Jdom java.sun.com. System.currenttimemillis() granularity too coarse. http://developer.java.sun.com/developer/bugParade/bugs/4423429.html, 2001. David S. Johnson. A theoretician’s guide to the experimental analysis of algorithms. Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges, pages 215–250, 2002. Donald E. Knuth. The Art Of Computer Programming: Volume 2 - Seminumerical Algorithms. Addison-Wesley, Reading, Mass., 1998. Perdita Stevens and Rob Pooley. Using UML - Software Engineering with Objects and Components. Addison-Wesley, 1st edition, 2000. Bill Venners, Matt Gerrans, and Frank Sommers. Why we refactored junit: The story of an open source endeavor. http://www.artima.com/suiterunner/why.html, 2003. 30 Herbert S. Wilf. Algorithms and Complexity. A. K. Peters Ltd, 1st edition, 1994. 31