Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Architecture of a standalone C++ application on NT1 Real-world results Having looked at the reasoning behind potential differences in C++ and Java application performance, the following section details the actual results of benchmarking both kinds of applications while performing the tasks outlined above. All the tests were carried out on a Pentium II 266Mhz machine, with 128 MB RAM, running NT 4.0 Workstation. C++ applications were developed using Microsoft Visual C++ 5.0, Java applications using the Java Developer’s Kit 1.1.6 and Visual Basic 5.0 was used to develop the non-Java GUI interface. Three different applications were used to isolate and time specific tasks: 1. A Program Execution App was developed in C++ and Java to measure execution of program instructions, loops, and method calls. 2. A Memory Analyzer was developed in C++ and Java to measure memory allocation and deallocation. 3. A GUI Plus application was developed in Visual Basic 5.0 (compiled executable) and Java to measure program loading, graphics, and event handling. Test Integer Division 1 Ibid. Description This test loops 10 million times on an integer division. Time (s) C++ Time (s) JIT Time (s) Interpreter 1.3 1.4 3.8 This test loops 10 million times calling a member method, which contains an Integer division. 1.3 1.5 9 This test calculates the first million prime numbers. It exercises variable access, array access, and function-call invocation. 400 420 1800 Memory Allocation This test allocates and frees 10 million 32-bit integers 0.7 1.6 1.6 Program Load This test loads the VB and Java GUI using an executable to tabulate time 0.6 0.3 0.3 Render GUI This test measures the time needed to render a complex screen with buttons, fields, list boxes, etc. 0.02 0.3 0.3 Perform Events This test performs 1 million button clicks on the GUI 0.5 0.5 2 Member Method First Million Primes Analysis The following observations can be made from the test results: It can be seen that for the first three tests, the JIT-enabled Java application is only slightly (0-15%) slower than the C++ version, while the interpreted code is 3-8 times slower. The large performance penalty for interpreted code is because of multiple interpretations of the same code as well as lack of any optimization. The JIT code is slightly slower because the compiler performs fewer code optimization and virtually no global optimization and also because of the Java’s use of handles for object reference Memory and deallocation is slightly more than twice as slow for Java applications. This is because of the working of the garbage collector discussed earlier. Program loading is actually faster in Java, as anticipated. This is primarily due to the difference in executable size. While the JIT-enabled Java GUI handles events just as fast as its VB counterpart, it is 15 times slower in rendering the GUI. This large difference can be attributed to the fact that while the VB application calls the Win32 graphics subsystem directly, the Java GUI uses the Java Foundation Classes (JFC) framework which has its own built-in graphics engine. Distributed applications Architecture of a Distributed Application While there exist numerous real-worldFirewall IT systems that are standalone, a majority of business-critical applications developed in the last few years have been distributed, Application Servereither using a client-server architecture or more recently distributed objects. Thus, a true measure of Java’s performance in real world IT systems can only be gained by comparing the performance of business-critical, distributed Java and C++ Relational Data applications. The architecture of a typical distributed object application is shown above. It consists of a GUI Oracle, Sybase Application Client (GUI) Application Server Objects RMI, IIOP JDBC/ODBC, Legacy Data client that communicates via an object protocol with a set of application server objects that encapsulate the business logic. These server objects provide persistence by interfacing with object and relational databases via data access objects. They can also participate in transactions by using Transaction managers such as Tuxedo and Encina and frequently access legacy data and applications on the mainframe. In addition to performing all the tasks of a standalone application (program loading and execution, memory management, accessing system resources and graphics processing), distributed applications also: 1. 2. 3. 4. Make distributed object requests between client and server objects, among server objects and between server and data access, transactional or legacy objects. Depending on the object middleware used, these requests can be in DCOM, CORBA IIOP or in the case of Java applications, RMI. Access disparate data (relational and non-relational), using a variety of protocols including ODBC, OLE DB, Embedded SQL and JDBC for Java applications. Interface with legacy systems using middleware such as CORBA or Microsoft Transaction Server, or even the JDK running on the mainframe or AS/400. Make use of threads. Multithreaded clients allow an the GUI to quickly return control to the end-user while processing a request in a separate thread. Multithreaded servers can handle simultaneous requests from multiple clients are can scale more easily. Real-world results Having described the additional tasks required of distributed applications, the following section compares the performance of C++ and Java implementations of such applications performing the above tasks. As before, all the tests were carried out on a Pentium II 266Mhz machine, with 128 MB RAM, running NT 4.0 Workstation. In this case two workstations were used to distributed the application with many clients and servers running on each. C++ applications were developed using Microsoft Visual C++ 5.0 and Java applications using the Java Developer’s Kit 1.1.6. Visbroker 3.2 (C++ and Java) was used as the CORBA object middleware. For data access, the C++ application used Microsoft’s ODBC driver to access data in MS SQLServer 6.5 while the Java application used a JDBC Type 3 driver from Intersolv to access the same data. Three different components were used to isolate and time specific tasks: 1. 2. 3. An Object Request component was developed in C++ and Java to measure distributed object requests. A Data Access component was developed in C++ and Java to measure data access from SQLServer. A MultiThread component was developed in C++ and Java to measure synchronized thread calls. Test ORB init and bind Single object invocation Multiple object invocation Time (s) C++ Time (s) JIT Time (s) Interpreter This test measures the time needed to initialize the CORBA client and bind to remote application server 1 0.9 0.9 This test instantiates a remote object which performs 1 million operations 0.02 0.03 0.7 This test loops instantiates 3000 remote objects each of which perform 1 operation 36 22 22 Description Database connection Select Synchronized Method This test connects to a remote SQLServer 6.5 database 0.3 1 0.7 This test loops 100 times and retrieves 10 rows from the database 27 12 12 This test measures the time needed to access a synchronized method 20000 times 10 17 18 Analysis The following observations can be made from the test results: A JIT compiler provides limited performance improvement for distributed applications. This can be surmised by the fact that results for most tests are almost identical between JIT-enabled and interpreted Java applications. In general, the network hop is the gating factor for distributed object requests while the database driver is the gating factor for data access. The latter conclusion is drawn from the fact that the Java application access data more than twice as fast as the C++ application, primarily because it uses a Type 3 JDBC driver with server-side SQL execution, which is much more efficient than the client-side execution provided by the C++ ODBC driver. Remote object instantiation is faster with Java. This could be attributed to a better implementation of the CORBA Basic Object Adapter in Visbroker for Java vs. Visbroker C++. As expected, synchronized methods are slower in Java than C++. This is because such methods keep both a C and Java stack in memory and also execute a significant amount of additional code to provide thread-safety. Performance enhancing techniques While distributed applications in general are not overly affected by Java’s performance limitations, there are two important reasons for trying to enhance their performance: A. If a distributed application performs computationally intensive work, or its GUI is fairly complex, then some of the limitations of the standalone JIT code, as seen in the GUI and method-call tests, can become more pronounced. A hint of this can be seen in the “single-object invocation” test above where the C++ version is slightly faster than the Java JIT code, because the remote operation is performing some computational work B. While the relative performance of C++ and Java distributed applications may be similar, there is definitely an advantage in increasing the absolute performance of a Java distributed application, so that it provides higher thruput, increased transactions/sec, and greater scalability. Performance can be improved at several levels: At the lowest level, providing a faster Virtual Machine and better JIT compiler can produce better optimized and faster executing code. A level above this are Java performance tools and libraries such as specialized libraries for I/O, as well as faster data access drivers. Finally, some of the biggest performance gains can be obtained by profiling the application code and then optimizing it using proven techniques. Each of these methods are explored in greater detail below: Using a faster Virtual Machine and JIT compiler There are numerous Java VMs and JIT compilers available, especially on popular platforms such as Win95, NT and Solaris. The speed of VMs is usually rated using one of two popular benchmarks: Jmark 2.0 and CaffeineMark 3.0. Each runs a variety of tests including processor-intensive tasks, GUI and thread calls on a particular VM and combines the results into a composite Jmark or CaffeineMark score which can be used by an evaluator (but more often by the vendor’s marketing folks) to make (or push) a VM selection. Some of the faster VM’s and JIT compilers on the market today include: 1. Supercede 2.0 Pro compiler and VM with native code generation (http://www.supercede.com/). 2. TowerJ compiler and VM with native code generation (http://www.twr.com). 3. Microsoft VM 3.1 with generational garbage collector. This VM is available as part of the Visual J++ product or as a free download from Microsoft’s Java website at http://www.microsoft.com/java/. In tests performed by Sun engineers at the JavaOne conference, the MS VM executed 20-45% faster than the JDK 1.2 beta3 and the JDK 1.1.6 VMs with JIT. 4. Kaffe, available free on 30 operating systems, includes JIT conversion from byte to native code (http://www.transvirtual.com/kaffe.html). 5. Symantec VM and JIT available as part of Symantec Visual Café and is also bundled with Netscape Navigator and Sun’s JDK 1.1. (http://www.symantec.com/vcafe/index.html). 6. Inprise VM and JIT available as part of Jbuilder 2.0. (http://www.inprise.com/jbuilder/) Using Java performance tools and libraries Tools available for optimizing Java applications include: 1. The javac compiler itself with the –O option for optimization. Using this compiler option provides some primary optimization and dead code elimination. 2. JAX from IBM can reduce the size of a Java application and make it more efficient (upto 50% reduction in size) by removing dead code, inlining method calls, etc. It is available for free download at http://www.alphaworks.ibm.com/formula/JAX. Java class libraries available for improving performance include: 1. The Windows Foundation Classes (WFC) and the Jdirect API from Microsoft, which allow Java applications to call the Win32 subsystem directly and thus greatly improve graphics handling, and other system tasks. The tradeoff is application portability because the API’s to these libraries are used as an alternative to standard Java AWT/JFC calls. These libraries are available at http://www.microsoft.com/java/ or with Visual J++. 2. Perflib provides a set of Java classes for high performance sorting, searching, I/O, etc. The routines claim to be upto 5 times as fast as standard JDK implementations. (http://www.glenmccl.com/~glenm/perflib/). 3. A variety of Type 3 and 4 JDBC drivers are available for fast, native access to most relational databases. Some of the popular vendors include Inprise with their Data Gateway and Microfocus/Intersolv’s DataDirect product suite (http://www.microfocus.com/products/data.htm). Profiling and Optimizing Application code While the above techniques can yield significant performance improvements, tuning application code can potentially provide the greatest “bang for the buck”. This is especially true if a major inefficiency can be identified and eliminated in the 20% of code that is executed 80% of the time. A good way to discover programming inefficiencies is to run the Java code through a profiler. Profiling allows the detection of performance bottlenecks, identification of CPU and memory intensive code and collection of function and even line-level timing data. Some of the Java profilers in the market include: 1. Visual Quantify from Rational Software (http://www.rational.com/products/visualq/index.html). 2. OptimizeIt from Intuitive Systems (http://www.optimizeit.com/). 3. JProbe from KLGroup (http://www.klgroup.com/). 4. Jinsight, a freeware profiler and memory analyzer from IBM (http://www.alphaworks.ibm.com/formula/jinsight). Once the problem areas of an application are identified, there are a number of steps that can be taken to improve overall performance. At the component level, there are numerous coding techniques that can be used to increase code efficiency and avoid problem APIs. The “Java performance tuning tips 1.0” article from IBM and the “Java performance and optimization” article from Inside Java, (available in the Resources section) both discuss some of these techniques in detail. For improving the performance of business-critical distributed applications, the following techniques are available: Avoid synchronized methods in multithreaded applications, if possible. As seen from the tests above, they are fairly slow and resource intensive. In some cases, it might be better to create two versions of a method, one synchronized and one non-synchronized, and only use the former when absolutely necessary. Pass objects by value when appropriate. Some middleware environments such as CORBA make it especially convenient to pass objects by reference to a remote module. The problem with this approach is that anytime the remote module needs access to the object, it needs to make a remote call back to the passing object. So in cases where a passed object needs to be accessed frequently, it makes more sense to take the initial hit of passing it by value. Use JDBC with precompiled SQL, rather than Dynamic SQL, for oft-repeated queries. Precompiled SQL is stored in the database server and repeatedly executed with new inputs while dynamic SQL is recompiled every time it is run. Using this technique in our tests has reduced the access time for repeated queries 2-8 times! Multiplex database connections across several clients and maintain them rather than creating and destroying a connection for each client. This has the dual advantages of reduced connection time and improved scalability of the application. Many real-world applications require several layers of security, including authentication, authorization, data encryption and non-repudiation. Since security adds significant overhead to a distributed request and encryption algorithms are computationally intensive, use the minimum security level possible for a given operation and user. A common technique for traversing firewalls is to tunnel the object request through HTTP. While this approach is the most flexible, it comes with a significant performance penalty. A better approach, when possible, is to open a minimum set of firewall ports or to use an object-protocol friendly firewall proxy (such as Wonderwall from Iona). Finally, if delays can’t be avoided due to a large number of system users or slow-running server objects, their impact on the client can be minimized by creating a separate thread to handle the object request and returning control of the GUI back to the user. Future Improvements The great interest that Java is receiving from corporate IT departments has caused system vendors to continue the rapid pace of advancement in this technology. High on their priority list is further improvements in the speed of Java applications, both standalone and distributed. For standalone applications, Sun is delivering JDK 1.2 later this year, which promises performance improvements in strings, vectors, dates and the JIT compiler. Q1 ’99 heralds the availability of the revolutionary HotSpot compiler from Sun which in preliminary tests at JavaOne ran applications faster than C++! HotSpot is a cross between a JIT compiler and interpreter and provides its dramatic performance improvements with a much-improved generational garbage collector, fast thread synchronization and “adaptive” compilation. While improved object middleware such as CORBA 2.2 and RMI 2.0 promise to increase the speed of distributed Java applications, the most significant development in this area is the rapid advancement of Java Application Servers. These applications servers host the server-side business objects and automatically provide them with multithreading, database and resource pooling and load-balancing, all of which promise to make real-world Java applications the fastest and most scalable kind of distributed applications available. Resources 1. 2. 3. 4. “Java Performance Tuning Tips 1.0” available at http://www.software.ibm.com/os/warp/performance/javatip.htm. “Java performance and optimization” available at http://www.inside-java.com/articles/perf/index.htm. “Java Optimization Resources” available at http://www.cs.cmu.edu/~jch/java/resources.html. “HotSpot: A new breed of virtual machine” available at http://www.idg.net/idg_frames/english/content.cgi?vc=docid_9-26967.html.