Download Algorithm Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
National University of Ireland, Maynooth
Maynooth, Co. Kildare, Ireland.
Department of Computer Science
Technical Report Series
Algorithm Analysis
Adam Duffy and Tom Dowling
NUIM-CS-TR-2003-10
http://www.cs.may.ie/
Tel: +353 1 7083847
Fax: +353 1 7083848
Algorithm Analysis
A Technical Report
Version 1.4
Adam Duffy and Tom Dowling
November 13, 2003
1
Abstract
Traditionally, the emphasis among computer scientists has been on
the more rigorous and theoretical modes of worst-case and average-case
analysis of algorithms. However, theoretical analysis cannot tell the full
story about real-world performance of algorithms. This has resulted in a
growing interest in experimental analysis. This paper presents the development of a solution that aides in the experimental analysis of Java-based
implementations.
2
1
Introduction
”Until recently experimental analysis of algorithms has been almost invisible in the theoretical computer science literature, although experimental
work dominates algorithmic research in most other areas of computer science” [Johnson, 2002].
The author of the paper quoted above lists many pitfalls and pet peeves
with regards to experimental analysis as practised in the field of computer
science. This paper discusses the design and development of software to
assist in experimental analysis of algorithm performance of Java-based
implementations to meet the standards listed in Johnson’s paper.
We identify two roles with regard to performing experimental analysis of algorithms: the Experimenter and the Analyst. The Experimenter
performs the experiments and the Analyst analyzes the result of the experiments. The role of Experimenter can be automated and is done so
in this paper. The software developed, the Timing API, allows the Analyst to create the test data and input this to the Timing API. The API
carries out the experiments and records the results for later analysis by
the Analyst. This removes the tedium of developing software to test implementations, running the tests and recording the results of the test essentially carrying out the tasks of the Experimenter.
We demonstrate the use of the Timing API by performing experimental analysis on a Polynomial Arithmetic API developed by the Crypto
Group at NUI Maynooth [Burnett et al., 2003].
Furthermore, this paper discusses modifications to the Java Grande
Forum (JGF) Sequential Benchmark [Bull et al., 2000a,b] to improve the
reporting and interpretation of the benchmark results as well as allowing
the benchmark to be interrupted and restarted as desired. The JGF
benchmark is used here to benchmark the Java Virtual Machine (JVM)
so that comparisons of the same algorithms on different machines can be
made.
The paper is structured as follows. Section 2 gives a brief overview of
Theoretical Algorithm Analysis. Following that, Section 3 mentions Experimental Algorithm Analysis and some of the issues involved in performing such analysis. Sections 4 and 5 describe the design and implementation
of the Timing API developed to automate the role of the Experimenter.
The Java Grande Forum Benchmark is covered in Section 6 along with
details of improvements made by the author to the benchmark. Section
7 provides an example illustrating the use of the Timing API on another
Java-based API. Conclusions and future work are discussed in Section 8.
3
2
Theoretical Algorithm Analysis
The complexity of an algorithm is the cost of using the algorithm. Space
complexity is the amount of memory needed. Time complexity is the
amount of computation time needed to run the algorithm. Complexity is
not an absolute measure, but rather a bounding function characterizing
the behavior of the algorithm as the size of the data set increases.
This leads to the Big-Oh Notation which is briefly defined here. For a
more comprehensive introduction, see [Wilf, 1994].
2.1
Upper Bound
A function T (n) is of O(f (n)) iff there exist positive constants k and n0
such that |T (n)| ≤ k|f (n)| for all n ≥ n0 .
2.2
Lower Bound
A function T (n) is of Ω(f (n)) iff there exist positive constants k and n0
such that k|f (n)| ≤ |T (n)| for all n ≥ n0 .
2.3
Tight Bound
A function T (n) is of Θ(f (n)) iff it is of O(f (n)) and it is of Ω(f (n)).
2.4
Relation Between Bounds
Some of the hierarchy of complexities is as follows
O(1) < O(log n) < O(n) < O(n log n) < O(n2 ) < O(n3 ) < O(2n ) < O(n!)
3
Experimental Algorithms Analysis
There are three different approaches to analyzing algorithms.
1. Worst-Case Analysis
2. Average-Case Analysis
3. Experimental Analysis
Johnson’s paper discusses how best to perform experimental analysis
of algorithms. The author discusses problems of performing this type of
analysis and how to overcome them. These include
Pitfalls. Defined by Johnson as ”temptations and practices that
can lead experimenters into substantial wastes of time”.
1. Lost code/data. Loss of the original code and/or data means
the loss of the best approach to comparing it and future algorithms.
Pet Peeves. ”Favourite annoyances” of Johnson. These are common practices that are misguided.
1. The millisecond testbed. Regardless of the resolution of
timings they cannot be more precise than the resolution of the
underlying operating system.
2. The one-run study. A one-run study is where the tests are
run once only as opposed to the preferred method of running
tests many times to reduce errors.
4
3. Using the best result found as an evaluation criteria. It
is preferable to use the average result of a number of runs rather
than using the best result from those runs. This is related to
Suggestion 1 located below.
4. The uncalibrated machine. The raw speed of a processor
is an unreliable predictor of performance. This Pet Peeve leads
directly to Suggestion 2 located below.
5. The lost testbed. Similar to Pitfall 1, it refers to the test
data used in the experiments not being documented.
6. False precision. Related to Pet Peeve 1 this relates to reporting of a timing without accounting for the resolution of the
system clock.
7. Failure to report overall running time. The total time
taken to run the tests as opposed to the average running time.
Suggestions. Given by Johnson in order for experimenters and
authors to avoid pitfalls and to improve on the quality of results
from experimental analysis.
1. Use variance reduction techniques. This simply means
using the same testbed for each run of the experiments use the
same set of instances for all your tests.
2. Use benchmark algorithms to calibrate machine speeds.
Allows future researchers to normalize their results and compare
them to results obtained by the original researcher.
In response to the points above, the Timing API was created. The
remaining unlisted points in the aforementioned paper relate to the selection of test data, the nature of the tests performed and subsequent
presentation of data.
4
The Timing API Design
The design of the Timing API is heavily influenced by existing implementations for performing unit tests.
4.1
Use Case Model
Use cases document the behaviour of the system from the user’s point
of view [Stevens and Pooley, 2000]. An individual use case, shown as a
named oval, represents a kind of task which has to be done with support
from the system under development. An actor, shown as a stick person,
represents a user of the system. A user is anything external to the system
which interacts with the system.
The use case model (Figure 1) reveals that there are two actors in
performing experimental analysis of algorithms. The first actor, Experimenter, is the sole user of the Timing API. The tasks of the other actor,
Analyst, are outside the scope of the API.
4.2
Class Model
Class models are used to document the static structure of the system
[Stevens and Pooley, 2000]. Figure 2 contains the class model for the Timing API. The TimeRunner class is the driver class that loads in the configuration files and passes them onto the TimeConfiguration class. This
5
Figure 1: Use Case Model
6
class in turn creates a TimeSuiteConfiguration instance which contains
information such as the threshold and the stopwatch - both of which will
be discussed in greater detail further on - to be used by the Timing API.
TimeSuite instances contain the unit times. A unit time is defined below
as being the execution time of a single unit of code. Each unit time produces a TimeReport containing the results of the experiment on that unit
time. The TimeReport instance will be recorded using result reporting.
5
The Timing API Implementation
The implementation consists of three main components.
. Timing Tests
. Timing Mechanism
. Result Reporting
Each of these components are discussed here in turn.
5.1
Timing Tests
Unit tests are so named because they test a single unit of code - typically a
method or function. One or more unit tests are contained within a suite.
Futhermore, suites can be contained within other suites as sub-suites.
Similarly, we define a unit time to be the execution time of a single unit
of code (Figure 3).
Artima’s SuiteRunner [Venners et al., 2003] is an architecture for writing unit tests that offers advanced facilities for reporting results of the unit
tests. It was hoped that an extension to SuiteRunner would be sufficient
for the purposes of unit timing. Preliminary work demonstrated otherwise
as a significant amount of the SuiteRunner classes are package protected
making extensions to it difficult. As a result, the Timing API code base
was developed from scratch.
The Experimenter extends the unit time class TimeSuite to create a
timing test as follows:
import nuim.cs.crypto.tamull.TimeSuite;
public class MyTimeSuite extends TimeSuite{
public void timeMyMethod(){
myMethod();
}
}
Note that the code unit - typically a method itself - being timed must
be placed within a method of a particular signature with return type void.
The signature of a method is a combination of the method’s name and its
parameter types. In this case, the signature is a method beginning with
the letters time and consisting of no parameters.
The Seek API (Figure 4) contains functionality for extracting all methods of a particular signature and return type. This functionality is contained within the MethodSeeker class. The signature of the method and
the return type have to be specified by the developer using the API. Thus,
the Seek API is used by the Timing API to extract the types of methods
described in the preceding paragraph. See appendix B.2.2 for an example
of the Seek API in action.
7
Figure 2: The Class Model for the Timing API
Figure 3: Unit Tests and Unit Times
8
Figure 4: Seek API
The ClassSeeker class is not directly used by the Timing API but it can
still play a role. For example, it may be used to find all the classes that are
subclasses of the TimeSuite class. These classes can then be passed into
the Timing API and the relevant methods extracted as explained before.
By saving the code and the input and output files used, the Experimenter addresses Johnson’s Pet Peeve 5 of the lost testbed and Pitfall 1
of lost code/data. This allows for reproducibility of the results obtained
and for comparisons to be made in the future against the implemented
algorithms and possible improvements of the same algorithms. ”Just as
scientists are required to preserve their lab notebooks, it is still good policy
to keep the original data if you are going to publish results and conclusions
based on it” [Johnson, 2002].
5.2
Timing Mechanism
The Timing API utilizes the Stopwatch API in order to obtain the timings.
The Stopwatch API (Figure 5) was developed to mimic the functionality
of a conventional physical stopwatch. There are two types of stopwatches
in the Stopwatch API.
. System Stopwatch
. Thread Stopwatch
It is worth noting that regardless of the resolution of the stopwatches
(milliseconds for the System Stopwatch and nanoseconds for the Thread
Stopwatch) they cannot be more precise than the underlying operating
system. On Windows NT/2000, the granularity is 10 milliseconds and on
Windows 98 it is 50 milliseconds. For more information see [java.sun.com,
2001].
A threshold parameter is used to ensure that all tests exceed the granularity of the operating system by a significant amount answering Pet
Peeve 1 of the millisecond testbed and Pet Peeve 6 of false precision. The test is run repeatedly until the threshold is reached. The total
running time and the number of times the algorithm was executed is reported, not just the average running time. This requirement is in response
to Pet Peeve 7 of failure to report overall running time.
9
Figure 5: Stopwatch API
5.2.1
System Stopwatch
This stopwatch uses the textbook approach to simple performance analysis
in Java by calling the System.currentTimeMillis() method before and after
the code to be measured. The difference between the two times gives the
elapsed time. This is comparable to using a stopwatch when testing GUI
activity, and works fine if elapsed time is really what you want. The
downside is that this approach may include much more than your code’s
execution time. Time used by other processes on the system or time spent
waiting for I/O can result in inaccurately high timing numbers.
5.2.2
Thread Stopwatch
This stopwatch is useful for obtaining elapsed timings for a thread. The
Java Virtual Machine Profiler Interface (JVMPI) provides the correct information on CPU time spent in the current thread accessed from within
a Java program. This avoids the downside of using the System Stopwatch
mentioned in the previous paragraph. This involves the use of the Java
Native Interface (JNI) API. The basis for this particular stopwatch can
be found at [Gørtz, 2003].
5.3
Result Reporting
The Reporter API uses the Observer design pattern (Figure 6). Note, in
the diagram the word Observable may be substituted for the word Subject.
The intent of the design pattern is to define a one-to-many dependency
between objects so that when one object changes state, all its dependents
are notified and updated automatically. More details of the design pattern
can be found in [Cooper, 1998].
In the case of the Reporter API, the Observer is the DispatchReporter
and the Subjects are the Reporters. A message is sent to the DispatchReporter which then relays that message to the individual Reporters allowing
for the same message to be reported at the same time in many different
media (Figure 7). Examples of such reporters are
. Console Display
. GUI Display
. Flat Text File
10
Figure 6: Class Diagram of Observer Pattern
Figure 7: Reporter API
. XML File
. Relational Database
The Timing API uses the Reporter API to record the timing results
of the tests and of the benchmarks. Good reporting avoids Pitfall 1 of
lost code/data and repeated runs of the tests address Pet Peeve 2 of
the one-run study and Pet Peeve 3 of using the best result found
as an evaluation criteria. By using the Observer design pattern, the
results can be simultaneously displayed on screen and in one or more files
of differing formats. The displaying of results on screen provides feedback
to the experimenter as to the progress of the tests, see Figure 8. The
Reporter API allows the experimenter to select the desired file format
such as flat file, XML, CSV (Comma-Seperated Value), etc. without
specific code being included in the Timing API for each format.
11
Figure 8: GUI Display of the Timing API
5.4
Timing API Extension
There are two factors when timing a unit of code - the algorithm and
the size of the input to the algorithm. In cases where a large number of
algorithms and/or input data are being timed, it is prohibitive to write
code for each timing to be performed. To deal with this, the Timing API
has been extended to make use of the Reflection API [Green, 2003] to
allow methods and input data to be contained within an XML file. Each
timing is performed and recorded and the corresponding entry in the XML
file is processed. See Appendix B.2 for more details.
6
Benchmarks
There is many Java Virtual Machine benchmarks such as the SPEC JVM98
and Java Grande Forum (JGF) Sequential Benchmark. The JGF benchmark was selected due to the availability of the source code which allows
users to understand exactly what the benchmark is testing. Also, the JGF
benchmark is available free of charge.
The purpose of the JGF Benchmarks are to test aspects of Java execution environments [Bull et al., 2000a]. While there are several different
JGF Benchmarks, the Sequential Benchmark (version 2.0) is sufficient
for timing tests related to sequential, as opposed to multi-threaded or
distributed, software. Version 2.0 of the Sequential Benchmark - as developed by the EPCC of the University of Edinburgh on behalf of the JGF
- consists of 3 sections
1 Low Level Operations. This section is designed to test low-level operations such as the performance of arithmetic operations, of creating
objects and so on.
2 Kernels. Created for testing short codes likely to be found in Java
Grande applications such as Fast Fourier Transform.
3 Applications. This section is intended to be representative of Grande
applications by containing actual Grande applications. A Grande
application is one which uses large amounts of processing, I/O bandwidth, or memory.
12
In this instance, the JGF Sequential Benchmark is used to calibrate the
machine before running timing tests, see Pet Peeve 4 (the uncalibrated
machine) and Suggestion 2 (use benchmark algorithms to calibrate
machine speeds).
”Future researchers can the calibrate their own machines in the same
way and, based on the benchmark data reported in the original paper, attempt to normalize the old results to their current machines” [Johnson,
2002].
However, some alterations were necessary to the JGF Sequential Benchmark in order to satisfy the Johnson’s requirements for experimental analysis. The shortcomings were
1. Lack of configuration. The experimenter had to run each suite or
section separately as a driver class was required for each. To run
a custom made selection of the benchmark tests required the development and compilation of new code - a cumbersome and overly
complicated approach.
2. Insufficient reporting of results. The results of the benchmark tests
were only reported to the console only. It was necessary to cut-npaste the results to a text file before any analysis could be performed.
This may result in the loss of data.
3. Result interpretation. Only the performance value of the benchmark
tests are reported on completion of the test. The performance value
is a derived value, based on the time and opcount values. This
approach is not in keeping with scientific standards of observation
and measurement.
4. Interrupt unfriendly. If the benchmark were interrupted due to say
power failure the tests would have to be restarted from the beginning. Furthermore, the machine that the benchmark is running on
is off-limits until the completion of the benchmark as other running
processes may affect the result. This is a problem considering that
the entire benchmark can take several hours to complete.
6.1
About JGF Numbers
For each section of the benchmark suite we calculate an average performance relative to a reference system, the result of which is referred to as
the JGF number. This system, as selected by the authors of Java Grande,
is an Sun Ultra Enterprise 3000, 250 MHz, running Solaris 2.6 and using
Sun JDK 1.2.1 02 (Production version). The JGF numbers are calculated
using the geometric mean as follows
b
Y
pci 1b
)
ps
i
i=1
(
(1)
where pci is the performance value achieved on the system being benchmarked, where psi is the performance value for the benchmark of the
reference system and where i is the ith benchmark and b is the number
of benchmarks in the section. ”Averaged performance metrics can be produced for groups of benchmarks, using the geometric mean of the relative
performances on each benchmark” [Bull et al., 2000b].
For the machine used by the author, the JGF number was found to
be 2.59824 for Section 1, 4.97861 for Section 2 and 6.57787 for Section
3. To clarify, low level operations (Section 2) on the author’s machine
13
are over two and a half times greater than that of the reference system.
See appendix A.1 for more details of the machine used by the author and
appendix A.2 for the details of the JGF numbers quoted.
6.2
Configuration
The benchmark XML configuration file allows for custom made selection
of benchmark tests to be run without the development of new driver code.
The configuration file has the following format
<benchmarks>
<benchmark>
<classname>uk.ac.ed.epcc.jgf.section3.euler.JGFEulerBench</classname>
<size>A</size>
</benchmark>
.
.
.
</benchmarks>
where each classname is the name of a class that is a subclass of one of
the following
1. uk.ac.ed.epcc.jgf.jgfutil.JGFSection1
2. uk.ac.ed.epcc.jgf.jgfutil.JGFSection2
3. uk.ac.ed.epcc.jgf.jgfutil.JGFSection3
Subclasses of JGFSection1 are Low Level Operations, subclasses of JGFSection2
are Kernels and subclasses of JGFSection3 are Applications. A range of
data sizes for each benchmark in Sections 2 and 3 is provided to avoid
dependence on particular data sizes.
6.3
Result Reporting
Using the reporting facilities of the Timing API allows for results to be
both displayed on the console and stored in an XML file. The XML file
also contains details such as the operating system name and version and
details of the JVM that the benchmark was executed on. The new design
also ensures that only one result for each benchmark (i.e., the most recent
result) is stored into the specified XML file, irrespective of the number of
times the same benchmark is executed. It is possible for other reporting
methods to be appended at a later date. The XML report file has the
following format
<?xml version="1.0" encoding="UTF-8"?>
<benchmarks
name="Java Grande"
type="Sequential"
version="2.0"
os_name="Windows 2000"
os_arch="x86"
os_version="5.0"
java_vm_specification_version="1.0"
java_vm_specification_vendor="Sun Microsystems Inc."
java_vm_specification.name="Java Virtual Machine Specification"
java_vm_version="1.4.0_01-b03"
java_vm_vendor="Sun Microsystems Inc."
14
java_vm_name="Java HotSpot(TM) Client VM">
<benchmark section="Section1" suite="Arith" name="Add:Int" size="-1">
<time>10531</time>
<opname>adds</opname>
<opcount>1.048576E10</opcount>
<calls>1</calls>
</benchmark>
.
.
.
</benchmarks>
6.4
Result Interpretation
The time required by the benchmark is now stored in milliseconds. This
means that the output has not been altered in any way in keeping with scientific standards of observation and measurement. The performance value
of a benchmark is derived information, thus it is not reported. ”Temporal
performance is defined in units of operations per second, where the operation is chosen to be the most appropriate for each individual benchmark ”
[Bull et al., 2000b].
Performance =
Operations
Time
(2)
This calculation belongs in the analysis of the results and not included in
the reported result.
As the benchmark results are now stored in XML it is easier to interpret and present the results. Existing code to convert a text file containing
results into an html file is unnecessarily complicated. With the modifications detailed above, the process is now simpler. The introduction of
templates (such as the freemarker template JGFBenchmark.html found in
the template directory) allows for changes to be made to the presentation
of results without modifications to code. This also allows for possibility
of the results being displayed in a format other than HTML with the creation of an appropriate freemarker template. Futhermore, the calculation
of the JGF number has been separated from the presentation logic into a
component of its own. Consequently, the interpretation and presentation
of the results are now two separate components (JGFBenchmarkNumber.java and JGFBenchmarkPrinter.java respectively).
6.5
Restart Functionality
An XML-based status file is used to record the current progress of the
benchmark run. This allows the current run to be interrupted and restarted
without re-running benchmarks that have already been completed successfully.
7
7.1
An Example
The Polynomial Arithmetic API
Polynomials and polynomial arithmetic play an important role in many
areas of mathematics. The Polynomial API was developed by the Crypto
Group, Computer Science Department, National University of Ireland
15
Figure 9: Polynomial API
(NUI) Maynooth for the purpose of performing arithmetic operations on
unbounded univariate and bivariate polynomials [Burnett et al., 2003]
(Figure 9).
A univariate (one-variable) polynomial of degree d has the following
form
cd xd + cd−1 xd−1 + . . . + c2 x2 + c1 x1 + c0 x0
where each term of the form cxe is referred to as a monomial. Each
monomial in a univariate polynomial has a coefficient c, a variable x and
an exponent e. The largest exponent of all the monomials in a univariate
polynomial is referred to as the degree of the polynomial d.
A bivariate (two-variables) polynomial has monomials of the form
cxex y ey where c is the coefficient, two variables x and y. Each variable
has its own exponent value ex and ey . The total degree of a bivariate
monomial is the sum of ex and ey and the total degree d of a bivariate polynomial is the largest total degree value for the monomials in the
polynomial.
7.1.1
Algorithm Analysis
Univariate Polynomial
The following are the results for the univariate polynomial class. In the
results for the univariate polynomial class BigPolynomial, the variables
m represents the number of monomials in the polynomial, d the degree of
the polynomial and c the size of the largest coefficient in the polynomial.
16
Figure 10: Univariate Polynomial Multiplication Experimental Analysis
1. Multiplication. There are two strategies available for univariate
polynomial multiplication.
1. School Book. The method taught in school books, hence the
name, where each term in one polynomial is multiplied with
each term in the other in turn. This algorithm has a complexity
of
O(m2 Minteger (c2 ))
where Minteger is the cost of multiply two large integers.
2. Binary Segmentation. This method ”amounts to placing polynomial coefficients strategically within certain large integers,
and doing all the arithmetic with one high-precision integer multiply” [Crandall and Pomerance, 2001]. According to Crandall
and Pomerance, this has a complexity of
O(Minteger (d ln(dc2 )))
where Minteger is the cost of multiply two large integers.
Figure 10 shows that the Binary Segmentation algorithm offers significant performance advantage compared to the School Book algorithm. This is in keeping with the theoretical complexity analysis
results above.
2. Division. There are two strategies available for univariate polynomial division.
1. Synthetic Division. Synthetic division is a numerical method
for dividing a polynomial by a binomial of the form x − k where
k is a constant called the divider. This has complexity of
O(dMinteger (c2 ))
17
Figure 11: Univariate Polynomial Division Experimental Analysis
where Minteger is the cost of multiply two large integers.
2. Pseudo Division. Implementation of Algorithm R from [Knuth,
1998] which has a complexity of
O(d2 Minteger (c2 ))
where Minteger is the cost of multiply two large integers.
Figure 11 shows that the Synthetic Division algorithm performs significantly better than the Pseudo Division algorithm.
3. Evaluation. For univariate polynomial evaluation, there are two
strategies available. In the complexity analysis of the evaluation
algorithms, v represents the value used to evaluate the polynomial.
1. School Book Evaluation. The method taught in school texts
where each term in the polynomial is evaluated using the specified value. This has a complexity of
O(mMinteger (c, (Einteger (v, d))))
where Minteger is the cost of multiply two large integers and
Einteger is the cost of exponentiating a large integer by another
large integer.
2. Horner Evaluation. This uses Horner’s rule to evaluate a polynomial for a given value.
O(dMinteger (c, v))
where Minteger is the cost of multiply two large integers.
Figure 12 indicates that Horner evaluation performs slightly better
than School Book Evaluation. Again, this is in keeping with the
theoretical analysis of the two algorithms.
Bivariate Polynomial
The following is the results for the multiplication operation of the bivariate polynomial class. To date there is only one implementation of
18
Figure 12: Univariate Polynomial Evaluation Experimental Analysis
Figure 13: Bivariate Polynomial Multiplication Experimental Analysis
19
the multiplication algorithm for bivariate polynomials, the School Book
method. It has the same complexity as the method of the same name for
univariate polynomials as it is a simple extension of that algorithm. However, analysis still yields useful information such that the timing required
for the algorithm is more dependant on the number of monomials rather
than the total degree of the polynomial, see Figure 13.
8
Conclusion
This paper detailed the development of the Timing API to automate the
role of the Experimenter in performing Experimental Algorithm Analysis.
It has created solutions to points raised in Johnson’s paper regarding
Experimental Algorithm Analysis and incorporated those solutions in the
Timing API. Consequently, this API evaluates the real-time performance
of different algorithms.
The benchmarking carried out in this paper will allow others to normalize the results obtained to their own machines. Therefore, our API
will not be adversely affected by different machine speeds or specifications.
Future work will involve the use of the Timing API as a decision tool in
selecting the most efficient component algorithms for use in elliptic curve
cryptosystems. The example given measures the performance of one such
component. The selection will be based on the type of curve selected and
the underlying machine as demonstrated by our Polynomial Arithmetic
API example.
20
Appendices
A
A.1
Specification and Benchmarking
Specification of Author’s Computer
The machine used by the author was a Pentium 4 with CPU speed of
2.2 GHz. It contained a 50 Gigabyte hard disk drive with Windows
2000 installed. The Java Virtual Machine (JVM) used was Java
HotSpotTM Client VM version 1.4.0-01-b03 and was supplied by
Sun Microsystems Inc.
A.2
JGF Sequential Benchmark Results
The following results were obtained over 3 runs of the JGF Sequential Benchmark. The results of each run differ due to other processes
required by the operating system. The averages cited in 6.1 are the
values obtained by getting the geometric mean of the section results
obtained in the following benchmark executions. The first execution
of the benchmark occurred on 14th July 2003, the second on 16th
July 2003 and the third on 18th July 2003.
A.2.1
Section 1 - Low Level Operations
The average result of the three runs
Benchmark Run 1
Arith
0.2866377481298084
Assign
5.648397831816057
Cast
3.246348732759473
Create
1.6513457919015857
Exception
3.63815389262372
Loop
1.5714585719591492
Math
2.5126814385039196
Method
5.662726689464254
Serial
6.925287645101215
Overall
2.5699201771451845
A.2.2
for Section 1 is 2.59824.
Run 2
Run 3
0.2874571637081641 0.28723619273533096
5.65588170107578
5.656594447830065
3.2409166098580866 3.199605876693666
1.699746341115175
1.760785792448639
3.799868978202154
3.7290343654054343
1.648256004839142
1.6088281489633403
2.545608540148679
2.5342586893189187
5.68258397724788
5.6712894064149
7.082573741801907
7.079956513750215
2.6164736578515475 2.6083325181787878
Section 2 - Kernels
The average result of the three runs
Benchmark Run 1
Size A
4.563771698823964
Size B
5.077057191094815
5.095853975765617
Size C
Overall
4.905888201542877
21
for Section 2 is 4.97861.
Run 2
Run 3
4.8936401925684425 4.916076794555665
5.007531531717145
5.012785493413412
5.125666632138881
5.139283399867139
5.008050443128609
5.0218852174879824
A.2.3
Section 3 - Applications
The average result of the three runs
Benchmark Run 1
Size A
6.946361082485771
Size B
6.204399738165623
Overall
6.564906768673763
22
for Section 3 is 6.57787.
Run 2
Run 3
6.986994916894107 6.930853897519336
6.272015340189315 6.187887419670448
6.619859462298622 6.5488429237563635
B
B.1
B.1.1
Software Instructions
Java Grande Forum Benchmark
Building JGF Benchmark
To build the JGF Benchmark requires that Ant version 1.5 (or
greater) be installed [Ant, 2003]. Navigate to the directory containing the JGF Benchmark project and type
c:\java\grande>ant release
This will create the grande.jar allowing the benchmarks to be run.
B.1.2
Running Benchmarks
To run the benchmarks, navigate to the release directory of the
grande project and type the following command
c:\java\grande\release>java -classpath
library/gulch.jar;library/jdom.jar;
library/lyra.jar;library/tamull.jar;
grande.jar
-Xmx512M -noverify
nuim.cs.crypto.grande.JGFBenchmarkRunner
-c [xml benchmark configuration file name]
-r [xml reporter configuration file name]
If using Windows, an alternative would be to use the batch file supplied in the release directory named benchmark.bat as follows.
c:\java\grande\release>benchmark
-c [xml benchmark configuration file name]
-r [xml reporter configuration file name]
The benchmark runner application requires the JDOM jar (see
link at the bottom) to read and write the xml files. The [xml configuration file] parameter should contain the location of the xml configuration file. If this parameter is omitted, then the default xml
configuration file in the config directory (config.xml) is used instead.
The [xml result file] parameter should contain the location of the xml
result file. If this parameter is omitted, then the results are recorded
to a file named benchmark.xml in the current directory. The [xml
status file] parameter is the name of the xml status file. The status file is used to record the current progress of the benchmark run.
This allows the current run to be interrupted and restarted without
re-running benchmarks that have already been completed successfully. If this parameter is omitted, then the status is recorded to a
file named status.xml in the current directory.
Therefore, using the default parameters above, it is possible to
execute all the benchmarks as follows
c:\java\grande\release>java -classpath
library/gulch.jar;library/jdom.jar;
library/lyra.jar;library/tamull.jar;
grande.jar
23
-Xmx512M -noverify
nuim.cs.crypto.grande.JGFBenchmarkRunner
or simply with batch file as follows
c:\java\grande\release>benchmark
The -Xmx512M flag is required to increase the maximum Java
heaps size to 512Mb as many of the benchmark tests are memory
intensive. The -noverify flag is required since the Serial benchmark
in Section 1 compiles correctly but fails the byte-code verifier.
B.1.3
Running Benchmark Result Interpretation
This should be run only after all benchmark suites for all sizes have
been run. To run the benchmark printer, navigate to the grande
directory and type the following command
C:\java\grande\release>java -classpath
library/jdom.jar;library/freemarker.jar;
library/gulch.jar;library/lyra.jar;
library/tamull.jar;library/xscribe.jar;
grande.jar
nuim.cs.crypto.grande.JGFBenchmarkPrinter
-s [XML standard result file name]
-c [XML current result file name]
-t [template file name]
-o [output file name]
If using Windows, an alternative would be to use the batch file supplied in the release directory named printer.bat as follows.
c:\java\grande\release>printer
-s [XML standard result file name]
-c [XML current result file name]
-t [template file name]
-o [output file name]
The benchmark printer application requires the JDOM jar (see the
link at the bottom) to read the xml files. The Freemarker and
Xscribe jars (see the links at the bottom) are required for the presentation of the results using the template. The [xml standard result
file] paramter should contain the location of the xml standard result
file. The xml standard result file should contain the performance
values of the benchmarks on a configuration selected by the EPCC.
It is unlikely that you will wish to use this parameter since if it is
omitted, then the default xml standard result file in the data directory (standard benchmark.xml) is used. The [xml current result file
name] parameter should contain the location of the xml result file
obtained from running the benchmarks (see Running Benchmarks
above). If this parameter is omitted, then the application uses the
file named benchmark.xml in the current directory if such a file exists. The [template file name] parameter should contain the location
of the template file. If this parameter is omitted, then the default
template file in the template directory (JGFBenchmark.html) is used
24
Figure 14: Java Grande Benchmark Result Interpretation
instead. The [output file name] parameter should contain the location of the output file. If this parameter is omitted, then the result
is placed in a file named output.html in the current directory. The
screen-shot (Figure 14) shows the resulting HTML file from the interpretation process. The HTML file contains the details of the
JVM as well as an overall JGF number of 4.945 and other derived
information such as the performance of each of the benchmark tests.
B.2
B.2.1
Timing API
Building Timing API
To build the Timing API requires that Ant version 1.5 (or greater)
be installed [Ant, 2003]. Navigate to the directory containing the
Timing API project and type
c:\java\tamull>ant release
This will create the tamull.jar allowing timing experiments to be
performed.
B.2.2
Performing Unit Timing Tests
A unit timing test can be created as follows
import nuim.cs.crypto.tamull.TimeSuite;
public class MyTimeSuite extends TimeSuite{
public void timeMyMethod(){
myMethod();
25
}
public void timeMyOtherMethod(){
myOtherMethod();
}
public void foobarThis() {
// this method will not be timed
}
}
Note that the code unit - typically a method itself - being timed
must be placed within a method of a particular signature with return type void. The signature of a method is a combination of the
method’s name and its parameter types. In this case, the signature
is a method beginning with the letters time and consisting of no
parameters.
Unit timing tests can be executed with the following command
c:\yourdirectory\yourproject>java -classpath
[jardirectory]asrat.jar;
[jardirectory]gulch.jar;
[jardirectory]jdom.jar;
[jardirectory]lyra.jar;
[jardirectory]tamull.jar;
-r [xml reporter configuration file name]
-t [threshold time]
-u [threshold units to power of 10]
-w [stopwatch implementation]
-s [xml suite configuration file name]
The structure of the reporter XML file is as follows
<reporters>
<reporter>
<class>my.class.path.MyReporter</class>
<config>
<parameter1></parameter1>
.
.
<parameterN></parameterN>
</config>
</reporter>
<reporter>
.
</reporter>
</reporters>
The structure of the suite XML file is as follows
<suites>
<suite>
<class>my.class.path.MyTimeSuite</class>
</suite>
26
<suite>
.
</suite>
</suites>
Valid values for -t and -u flag are numeric values. If the desired
threshold is 50 milliseconds (50 x 10− 3 seconds) the values should
be -t 50 -u -3. The -w flag should contain a string with the class
name, in fully qualified form, of the stopwatch implementation. All
stopwatch implementations must implement the nuim.cs.crypto.gulch.Stopwatch
interface. An example of a class that does implement the Stopwatch
interface is nuim.cs.crypto.gulch.system.SystemStopwatch.
B.2.3
Using Timing API Extension
The Timing API has been extended to make use of the Reflection
API [Green, 2003] to allow methods and input data to be contained
within an XML file. Each timing is performed and recorded and the
corresponding entry in the XML file is processed.
The format of the XML file is as follows
<auto>
<!-- Declare The Classes Being Used -->
<class name="my.class.path.MyClass" alias="MyClass"/>
<class name="my.class.path.MyOtherClass" alias="MyOtherClass"/>
.
.
<!-- Declare The Methods Being Used -->
<method class="MyClass" name="myMethod" alias="MyMethod">
<parameter>MyOtherClass</paremeter>
.
.
</method>
.
.
<!-- Declare The Instances Being Used -->
<instance class="MyClass" name="MyClassInstance">
<parameter>Some Initialization String Here</parameter>
.
.
</instance>
<instance class="MyOtherClass" name="MyOtherClassInstance">
.
</instance>
.
.
<!-- Declare The Timings To Be Performed -->
<time method="MyMethod" instance="MyClassInstance">
<parameter>MyOtherClassInstance</parameter>
</time>
.
27
.
</auto>
There may be one or more time elements in the XML file. The methods and instances used in the element must be declared in the file
in the form of method and instance elements. These elements may
have zero or more parameter elements. In the case of the method
element, the parameter should contain a class. For a instance, the
parameter should contain a string. Both of these elements require
that classes be declared in class elements.
The unit timing test is performed with the same command as
before but replacing the -s flag with the -a flag.
c:\yourdirectory\yourproject>java -classpath
[jardirectory]asrat.jar;
[jardirectory]gulch.jar;
[jardirectory]jdom.jar;
[jardirectory]lyra.jar;
[jardirectory]tamull.jar;
-r [xml reporter configuration file name]
-t [threashold time]
-u [threashold units to power of 10]
-w [stopwatch implementation]
-a [xml timing test configuration file name]
B.3
Software Libraries
This section gives a brief overview of the many software libraries
used by the author. Note that JAR stands for Java ARchive.
∗
developed by the author
+
extended by the author.
B.3.1
asrat.jar∗
The Seek API contains code to search for classes in specified locations that extend a specified class and/or is the specified class. The
API also contains functionality to search for methods of a particular
signature and having a selected return type.
B.3.2
freemarker.jar
The Freemarker [Geer and Bayer, 2001] API was developed by...
B.3.3
grande.jar+
The Java Grande Benchmark [Bull et al., 2000a,b] was developed
by the Edinburgh Parallel Computing Centre (EPCC). They are
leading the benchmarking initiative of the Java Grande Forum, and
have developed a suite of benchmarks to measure different execution
environments of Java against each other.
28
B.3.4
gulch.jar∗
The Stopwatch API contains code to mimic the functionality of a
conventional physical stopwatch.
B.3.5
jdom.jar
The JDOM API [Hunter and McLaughlin, 2003] was developed for
the purpose of processing XML files and was developed by the JDOM
open source project.
B.3.6
reporter.jar∗
The Reporter API has functionality to allow for data to be reported
to different reporters simultaneously.
B.3.7
tamull.jar∗
The Timing API allows for timing tests to be performed on units of
code.
B.3.8
xscribe.jar∗
The XScribe API [Duffy, 2002] provides an alternative to using XSL
(eXtended Style Language) for converting XML from one form to
another.
29
References
Ant. Jakarta ant project. http://ant.jakarta.apache.org, 2003.
J. M. Bull, L. A. Smith, M. D. Westhead, D. S. Henty, and R. A.
Davey. A benchmark suite for high performance java. Concurrency: Practice and Experience, pages 375–388, 2000a.
J. M. Bull, L. A. Smith, M. D. Westhead, D. S. Henty, and R. A.
Davey. Benchmarking java grande applications. pages 63–73,
2000b.
Andrew Burnett, Adam Duffy, , Tom Dowling, and Claire Whelan.
A java api for polynomial arithmetic. Principles and Practice of
Programming in Java, 2003.
James W. Cooper. The Design Patterns Java Companion. AddisonWesley, 1st edition, 1998.
Richard Crandall and Carl Pomerance. Prime Numbers - A Computational Perspective. Springer-Verlag, 1st edition, 2001.
Adam Duffy. Xscribe api. http://www.ijug.org/member/articles/xscribe/,
2002.
Benjamin Geer and Mike Bayer.
http://freemarker.sourceforge.net/, 2001.
Freemarker
api.
Jesper Gørtz. Use the jvm profiler interface for accurate timing.
http://www.javaworld.com/javatips/jw-javatif94.html, 2003.
Dale Green.
The java tutorial :
The reflection
http://java.sun.com/docs/books/tutorial/reflect/, 2003.
api.
Jason Hunter and Brett
http://www.jdom.org, 2003.
api.
McLaughlin.
Jdom
java.sun.com. System.currenttimemillis() granularity too coarse.
http://developer.java.sun.com/developer/bugParade/bugs/4423429.html,
2001.
David S. Johnson. A theoretician’s guide to the experimental analysis of algorithms. Data Structures, Near Neighbor Searches,
and Methodology: Fifth and Sixth DIMACS Implementation Challenges, pages 215–250, 2002.
Donald E. Knuth. The Art Of Computer Programming: Volume
2 - Seminumerical Algorithms. Addison-Wesley, Reading, Mass.,
1998.
Perdita Stevens and Rob Pooley. Using UML - Software Engineering
with Objects and Components. Addison-Wesley, 1st edition, 2000.
Bill Venners, Matt Gerrans, and Frank Sommers.
Why we
refactored junit: The story of an open source endeavor.
http://www.artima.com/suiterunner/why.html, 2003.
30
Herbert S. Wilf. Algorithms and Complexity. A. K. Peters Ltd, 1st
edition, 1994.
31