Download Java Performance in the Real World

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Architecture of a standalone C++ application on NT1
Real-world results
Having looked at the reasoning behind potential differences in C++ and Java application performance, the
following section details the actual results of benchmarking both kinds of applications while performing the
tasks outlined above.
All the tests were carried out on a Pentium II 266Mhz machine, with 128 MB RAM, running NT 4.0
Workstation. C++ applications were developed using Microsoft Visual C++ 5.0, Java applications using the
Java Developer’s Kit 1.1.6 and Visual Basic 5.0 was used to develop the non-Java GUI interface. Three
different applications were used to isolate and time specific tasks:
1. A Program Execution App was developed in C++ and Java to measure execution of program
instructions, loops, and method calls.
2. A Memory Analyzer was developed in C++ and Java to measure memory allocation and deallocation.
3. A GUI Plus application was developed in Visual Basic 5.0 (compiled executable) and Java to measure
program loading, graphics, and event handling.
Test
Integer Division
1
Ibid.
Description
This test loops 10 million times on an
integer division.
Time (s)
C++
Time (s)
JIT
Time (s)
Interpreter
1.3
1.4
3.8
This test loops 10 million times calling
a member method, which contains an
Integer division.
1.3
1.5
9
This test calculates the first million
prime numbers. It exercises variable
access, array access, and function-call
invocation.
400
420
1800
Memory Allocation
This test allocates and frees 10 million
32-bit integers
0.7
1.6
1.6
Program Load
This test loads the VB and Java GUI
using an executable to tabulate time
0.6
0.3
0.3
Render GUI
This test measures the time needed to
render a complex screen with buttons,
fields, list boxes, etc.
0.02
0.3
0.3
Perform Events
This test performs 1 million button
clicks on the GUI
0.5
0.5
2
Member Method
First Million Primes
Analysis
The following observations can be made from the test results:




It can be seen that for the first three tests, the JIT-enabled Java application is only slightly (0-15%)
slower than the C++ version, while the interpreted code is 3-8 times slower. The large performance
penalty for interpreted code is because of multiple interpretations of the same code as well as lack of
any optimization. The JIT code is slightly slower because the compiler performs fewer code
optimization and virtually no global optimization and also because of the Java’s use of handles for
object reference
Memory and deallocation is slightly more than twice as slow for Java applications. This is because of
the working of the garbage collector discussed earlier.
Program loading is actually faster in Java, as anticipated. This is primarily due to the difference in
executable size.
While the JIT-enabled Java GUI handles events just as fast as its VB counterpart, it is 15 times slower
in rendering the GUI. This large difference can be attributed to the fact that while the VB application
calls the Win32 graphics subsystem directly, the Java GUI uses the Java Foundation Classes (JFC)
framework which has its own built-in graphics engine.
Distributed applications
Architecture of a Distributed Application
While there exist numerous real-worldFirewall
IT systems that are standalone, a majority of business-critical
applications developed in the last few years
have been distributed,
Application
Servereither using a client-server architecture
or more recently distributed objects. Thus, a true measure of Java’s performance in real world IT systems
can only be gained by comparing the performance of business-critical, distributed Java and C++ Relational Data
applications. The architecture of a typical distributed object application is shown above. It consists
of a GUI
Oracle,
Sybase
Application
Client
(GUI)
Application Server
Objects
RMI, IIOP
JDBC/ODBC, Legacy Data
client that communicates via an object protocol with a set of application server objects that encapsulate the
business logic. These server objects provide persistence by interfacing with object and relational databases
via data access objects. They can also participate in transactions by using Transaction managers such as
Tuxedo and Encina and frequently access legacy data and applications on the mainframe.
In addition to performing all the tasks of a standalone application (program loading and execution, memory
management, accessing system resources and graphics processing), distributed applications also:
1.
2.
3.
4.
Make distributed object requests between client and server objects, among server objects and between
server and data access, transactional or legacy objects. Depending on the object middleware used, these
requests can be in DCOM, CORBA IIOP or in the case of Java applications, RMI.
Access disparate data (relational and non-relational), using a variety of protocols including ODBC,
OLE DB, Embedded SQL and JDBC for Java applications.
Interface with legacy systems using middleware such as CORBA or Microsoft Transaction Server, or
even the JDK running on the mainframe or AS/400.
Make use of threads. Multithreaded clients allow an the GUI to quickly return control to the end-user
while processing a request in a separate thread. Multithreaded servers can handle simultaneous requests
from multiple clients are can scale more easily.
Real-world results
Having described the additional tasks required of distributed applications, the following section compares
the performance of C++ and Java implementations of such applications performing the above tasks.
As before, all the tests were carried out on a Pentium II 266Mhz machine, with 128 MB RAM, running NT
4.0 Workstation. In this case two workstations were used to distributed the application with many clients
and servers running on each. C++ applications were developed using Microsoft Visual C++ 5.0 and Java
applications using the Java Developer’s Kit 1.1.6. Visbroker 3.2 (C++ and Java) was used as the CORBA
object middleware. For data access, the C++ application used Microsoft’s ODBC driver to access data in
MS SQLServer 6.5 while the Java application used a JDBC Type 3 driver from Intersolv to access the same
data. Three different components were used to isolate and time specific tasks:
1.
2.
3.
An Object Request component was developed in C++ and Java to measure distributed object requests.
A Data Access component was developed in C++ and Java to measure data access from SQLServer.
A MultiThread component was developed in C++ and Java to measure synchronized thread calls.
Test
ORB init and bind
Single object
invocation
Multiple object
invocation
Time (s)
C++
Time (s)
JIT
Time (s)
Interpreter
This test measures the time
needed to initialize the CORBA
client and bind to remote
application server
1
0.9
0.9
This test instantiates a remote
object which performs 1 million
operations
0.02
0.03
0.7
This test loops instantiates 3000
remote objects each of which
perform 1 operation
36
22
22
Description
Database connection
Select
Synchronized Method
This test connects to a remote
SQLServer 6.5 database
0.3
1
0.7
This test loops 100 times and
retrieves 10 rows from the
database
27
12
12
This test measures the time
needed to access a synchronized
method 20000 times
10
17
18
Analysis
The following observations can be made from the test results:



A JIT compiler provides limited performance improvement for distributed applications. This can be
surmised by the fact that results for most tests are almost identical between JIT-enabled and interpreted
Java applications. In general, the network hop is the gating factor for distributed object requests while
the database driver is the gating factor for data access. The latter conclusion is drawn from the fact that
the Java application access data more than twice as fast as the C++ application, primarily because it
uses a Type 3 JDBC driver with server-side SQL execution, which is much more efficient than the
client-side execution provided by the C++ ODBC driver.
Remote object instantiation is faster with Java. This could be attributed to a better implementation of
the CORBA Basic Object Adapter in Visbroker for Java vs. Visbroker C++.
As expected, synchronized methods are slower in Java than C++. This is because such methods keep
both a C and Java stack in memory and also execute a significant amount of additional code to provide
thread-safety.
Performance enhancing techniques
While distributed applications in general are not overly affected by Java’s performance limitations, there are
two important reasons for trying to enhance their performance:
A. If a distributed application performs computationally intensive work, or its GUI is fairly complex, then
some of the limitations of the standalone JIT code, as seen in the GUI and method-call tests, can
become more pronounced. A hint of this can be seen in the “single-object invocation” test above where
the C++ version is slightly faster than the Java JIT code, because the remote operation is performing
some computational work
B. While the relative performance of C++ and Java distributed applications may be similar, there is
definitely an advantage in increasing the absolute performance of a Java distributed application, so that
it provides higher thruput, increased transactions/sec, and greater scalability.
Performance can be improved at several levels: At the lowest level, providing a faster Virtual Machine and
better JIT compiler can produce better optimized and faster executing code. A level above this are Java
performance tools and libraries such as specialized libraries for I/O, as well as faster data access drivers.
Finally, some of the biggest performance gains can be obtained by profiling the application code and then
optimizing it using proven techniques. Each of these methods are explored in greater detail below:
Using a faster Virtual Machine and JIT compiler
There are numerous Java VMs and JIT compilers available, especially on popular platforms such as Win95,
NT and Solaris. The speed of VMs is usually rated using one of two popular benchmarks: Jmark 2.0 and
CaffeineMark 3.0. Each runs a variety of tests including processor-intensive tasks, GUI and thread calls on
a particular VM and combines the results into a composite Jmark or CaffeineMark score which can be used
by an evaluator (but more often by the vendor’s marketing folks) to make (or push) a VM selection. Some
of the faster VM’s and JIT compilers on the market today include:
1. Supercede 2.0 Pro compiler and VM with native code generation (http://www.supercede.com/).
2. TowerJ compiler and VM with native code generation (http://www.twr.com).
3. Microsoft VM 3.1 with generational garbage collector. This VM is available as part of the Visual J++
product or as a free download from Microsoft’s Java website at http://www.microsoft.com/java/. In
tests performed by Sun engineers at the JavaOne conference, the MS VM executed 20-45% faster than
the JDK 1.2 beta3 and the JDK 1.1.6 VMs with JIT.
4. Kaffe, available free on 30 operating systems, includes JIT conversion from byte to native code
(http://www.transvirtual.com/kaffe.html).
5. Symantec VM and JIT available as part of Symantec Visual Café and is also bundled with Netscape
Navigator and Sun’s JDK 1.1. (http://www.symantec.com/vcafe/index.html).
6. Inprise VM and JIT available as part of Jbuilder 2.0. (http://www.inprise.com/jbuilder/)
Using Java performance tools and libraries
Tools available for optimizing Java applications include:
1. The javac compiler itself with the –O option for optimization. Using this compiler option provides
some primary optimization and dead code elimination.
2. JAX from IBM can reduce the size of a Java application and make it more efficient (upto 50%
reduction in size) by removing dead code, inlining method calls, etc. It is available for free download at
http://www.alphaworks.ibm.com/formula/JAX.
Java class libraries available for improving performance include:
1. The Windows Foundation Classes (WFC) and the Jdirect API from Microsoft, which allow Java
applications to call the Win32 subsystem directly and thus greatly improve graphics handling, and other
system tasks. The tradeoff is application portability because the API’s to these libraries are used as an
alternative to standard Java AWT/JFC calls. These libraries are available at
http://www.microsoft.com/java/ or with Visual J++.
2. Perflib provides a set of Java classes for high performance sorting, searching, I/O, etc. The routines
claim to be upto 5 times as fast as standard JDK implementations.
(http://www.glenmccl.com/~glenm/perflib/).
3. A variety of Type 3 and 4 JDBC drivers are available for fast, native access to most relational
databases. Some of the popular vendors include Inprise with their Data Gateway and
Microfocus/Intersolv’s DataDirect product suite (http://www.microfocus.com/products/data.htm).
Profiling and Optimizing Application code
While the above techniques can yield significant performance improvements, tuning application code can
potentially provide the greatest “bang for the buck”. This is especially true if a major inefficiency can be
identified and eliminated in the 20% of code that is executed 80% of the time. A good way to discover
programming inefficiencies is to run the Java code through a profiler. Profiling allows the detection of
performance bottlenecks, identification of CPU and memory intensive code and collection of function and
even line-level timing data. Some of the Java profilers in the market include:
1. Visual Quantify from Rational Software (http://www.rational.com/products/visualq/index.html).
2. OptimizeIt from Intuitive Systems (http://www.optimizeit.com/).
3. JProbe from KLGroup (http://www.klgroup.com/).
4. Jinsight, a freeware profiler and memory analyzer from IBM
(http://www.alphaworks.ibm.com/formula/jinsight).
Once the problem areas of an application are identified, there are a number of steps that can be taken to
improve overall performance. At the component level, there are numerous coding techniques that can be
used to increase code efficiency and avoid problem APIs. The “Java performance tuning tips 1.0” article
from IBM and the “Java performance and optimization” article from Inside Java, (available in the
Resources section) both discuss some of these techniques in detail. For improving the performance of
business-critical distributed applications, the following techniques are available:







Avoid synchronized methods in multithreaded applications, if possible. As seen from the tests above,
they are fairly slow and resource intensive. In some cases, it might be better to create two versions of a
method, one synchronized and one non-synchronized, and only use the former when absolutely
necessary.
Pass objects by value when appropriate. Some middleware environments such as CORBA make it
especially convenient to pass objects by reference to a remote module. The problem with this approach
is that anytime the remote module needs access to the object, it needs to make a remote call back to the
passing object. So in cases where a passed object needs to be accessed frequently, it makes more sense
to take the initial hit of passing it by value.
Use JDBC with precompiled SQL, rather than Dynamic SQL, for oft-repeated queries. Precompiled
SQL is stored in the database server and repeatedly executed with new inputs while dynamic SQL is
recompiled every time it is run. Using this technique in our tests has reduced the access time for
repeated queries 2-8 times!
Multiplex database connections across several clients and maintain them rather than creating and
destroying a connection for each client. This has the dual advantages of reduced connection time and
improved scalability of the application.
Many real-world applications require several layers of security, including authentication, authorization,
data encryption and non-repudiation. Since security adds significant overhead to a distributed request
and encryption algorithms are computationally intensive, use the minimum security level possible for a
given operation and user.
A common technique for traversing firewalls is to tunnel the object request through HTTP. While this
approach is the most flexible, it comes with a significant performance penalty. A better approach, when
possible, is to open a minimum set of firewall ports or to use an object-protocol friendly firewall proxy
(such as Wonderwall from Iona).
Finally, if delays can’t be avoided due to a large number of system users or slow-running server
objects, their impact on the client can be minimized by creating a separate thread to handle the object
request and returning control of the GUI back to the user.
Future Improvements
The great interest that Java is receiving from corporate IT departments has caused system vendors to
continue the rapid pace of advancement in this technology. High on their priority list is further
improvements in the speed of Java applications, both standalone and distributed.
For standalone applications, Sun is delivering JDK 1.2 later this year, which promises performance
improvements in strings, vectors, dates and the JIT compiler. Q1 ’99 heralds the availability of the
revolutionary HotSpot compiler from Sun which in preliminary tests at JavaOne ran applications faster than
C++! HotSpot is a cross between a JIT compiler and interpreter and provides its dramatic performance
improvements with a much-improved generational garbage collector, fast thread synchronization and
“adaptive” compilation.
While improved object middleware such as CORBA 2.2 and RMI 2.0 promise to increase the speed of
distributed Java applications, the most significant development in this area is the rapid advancement of Java
Application Servers. These applications servers host the server-side business objects and automatically
provide them with multithreading, database and resource pooling and load-balancing, all of which promise
to make real-world Java applications the fastest and most scalable kind of distributed applications available.
Resources
1.
2.
3.
4.
“Java Performance Tuning Tips 1.0” available at
http://www.software.ibm.com/os/warp/performance/javatip.htm.
“Java performance and optimization” available at http://www.inside-java.com/articles/perf/index.htm.
“Java Optimization Resources” available at http://www.cs.cmu.edu/~jch/java/resources.html.
“HotSpot: A new breed of virtual machine” available at
http://www.idg.net/idg_frames/english/content.cgi?vc=docid_9-26967.html.