Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HotSpotTM: A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000 Sources of Information From Sun’s web-site – HotSpot white paper http://java.sun.com/products/hotspot/whitepaper.html – Various articles on Sun’s web-site http://java.sun.com/products/hotspot/ From other web-sites – Java on Steroids: Sun's High-Performance Java Implementation, U. Hölzle et.al. (slides from HotChips IX, August 1997) http://www.cs.ucsb.edu/oocsb/papers/HotChips.pdf – The HotSpot Virtual Machine, Bill Venners http://www.artima.com/designtechniques/hotspot.html – HotSpot: A new breed of virtual machine, Eric Amstrong http://www.javaworld.com/jw-03-1998/f_jw-03-hotspot.html 5/1/2000 Zhanyong Wan 2 Overview Why Java is different Why JIT is not good enough What HotSpot does The HotSpot architecture – Memory model – Thread model – Adaptive optimization Conclusions 5/1/2000 Zhanyong Wan 3 History 1st generation JVM – Purely interpreting – 30 - 50 times slower than C++ 2nd generation JVM – JIT compilers – 3 - 10 times slower than C++ Static compilers – Better performance than JIT’s 5/1/2000 Zhanyong Wan 4 The Future? HotSpot – Dynamic, fully optimizing compiler – Close-to-C++ performance – May even exceed the speed of C++ in the future 5/1/2000 Zhanyong Wan 5 Questions of Interest How is it possible that HotSpot runs programs faster than the native code generated by a static optimizing Java compiler? How does HotSpot score? (The collection of technologies used by HotSpot.) Where did they get the ideas? Which of these technologies also apply in other systems (e.g. JIT, static source code/bytecode compiler, C++)? Can Java be made to surpass the performance of C++, or is this a hype? 5/1/2000 Zhanyong Wan 6 Why Java Is Different (to C++) Granularity of factoring – – – – Smaller classes Smaller methods More frequent calls Standard compiler analysis fails Dynamic dispatch – Slower calls for virtual functions – Much more frequent than in C++ Sophisticated run-time system – Allocation, garbage collection – Threads, synchronization Dynamically changing program – Classes loaded/discarded on the fly 5/1/2000 Zhanyong Wan 7 Why Java Is Different (cont’d) Distributed in a portable form – A compiler can generate optimal machine code for a particular processor version • e.g. Pentium vs. Pentium II – Welcomes dynamic compilation (developed in the last decade)! 5/1/2000 Zhanyong Wan 8 Find the Java Bottleneck Time used in a typical Java program executed w/ JDK interpreter: – – – – Allocation/GC: 1/6 Synchronization: 1/6 Byte code: 2/3 Native methods: negligible Byte codes Alloc/GC Synch Native Performance critical code: the “hot spots” 5/1/2000 Zhanyong Wan 9 Why JIT Is Not Good Enough Compiles on method-by-method basis when a method is first invoked Compilation consumes “user time” – Startup latency – Dilemma: either good code or fast compiler • Gains of better optimization may not justify extra compile time • More concerned w/ generating code quickly than w/ generating the quickest code Root of problem: compilation is too eager 5/1/2000 Zhanyong Wan 10 The Baaad Way to Optimize People try to help: the optimization lore – Make methods final or static – Large classes/methods – Avoid interfaces (interface method invocation much slower than regular dynamic method dispatch) – Avoid creating lots of short-lived objects – Avoid synchronization (very expensive) – Against good OO design! “Premature optimization is the root of all evil.” (Donald Knuth) 5/1/2000 Zhanyong Wan 11 The HotSpot Way to Optimize Optimize only when you know you have a problem 1. 2. 3. 4. A program starts off being interpreted A profiler collects run-time info in the background After a while, a set of hot spots is identified A thread is launched to compile the methods in the hot spots • • • Execution of the program is *not* blocked “Take your time!” – fully optimizing Take advantage of the late compilation: run-time info used • • Keeping the footprint small Bytecode is always kept around 5. Once a method is compiled, it doesn’t need to be interpreted 6. Native code can be discarded when the hot spots change 5/1/2000 Zhanyong Wan 12 The HotSpot Way (cont’d) Tackles each of the bottlenecks – Adaptive optimization – Fast, accurate garbage collection – Fast thread synchronization Performance – 2-3 times faster than JITs – Comparable to C++ Most importantly, eliminates the “performance excuse” for poor designs/code 5/1/2000 Zhanyong Wan 13 The HotSpot Architecture Memory model Thread model Adaptive compiler 5/1/2000 Zhanyong Wan 14 The HotSpot Memory Model Object references – Java 2 SDK: as indirect handles • Relocating objects made easy • A significant performance bottleneck – HotSpot: as direct pointers • A performance boost • GC must adjust all reference to an object when it is relocated Object headers – Java 2 SDK: 3-word – HotSpot: 2-word • 2 bits for GC mark (reference count removed?) • An 8% savings in heap size 5/1/2000 Zhanyong Wan 15 Garbage Collection Background GC traditionally considered inefficient – Takes 1/6 of the time in an interpreting JVM – Even worse in a JIT VM Modern GC technology – Performs substantially better than explicit freeing – How can this be true? • Unnecessary copies avoided • Memory segmentation, space locality 5/1/2000 Zhanyong Wan 16 The HotSpot Garbage Collector A high-level GC framework – New collection algorithms can be “plugged-in” – Currently has 3 cooperating GC algorithms Major features – – – – – Fast allocation and reclamation Fully accurate: guarantees full memory reclamation Completely eliminates memory fragmentation Incremental, no perceivable pauses (usually < 10ms) Small memory overhead • 2-bit GC mark per object • 2-word object header (instead of 3- in Java 2 SDK) 5/1/2000 Zhanyong Wan 17 The HotSpot GC: Accuracy A partially accurate (conservative) collector must – Either avoid relocating objects – Or use handles to refer indirectly to objects (slow) The HotSpot collector – Fully accurate – All inaccessible objects can be reclaimed – All objects can be relocated • Eliminates memory fragmentation • Increases memory locality 5/1/2000 Zhanyong Wan 18 The HotSpot GC: the Structure Three cooperating collectors – A generational copying collector • For short-lived objects – A mark-compact “old object” collector • For longer-lived objects when the live object set is small – An incremental “pauseless” collector • For longer-lived objects when the live object set is big 5/1/2000 Zhanyong Wan 19 Generational Copying Collector Observation: the vast majority (often > 95%) of the objects are very short-lived The way it works – A memory area is reserved as an object “nursery” – Allocation is just updating a pointer and checking for overflow: extremely fast – By the time the nursery overflows, most objects in it are dead; the collector just moves the few survivors to the “old object” memory area 5/1/2000 Zhanyong Wan 20 Mark-Compact Collector Rare case – Triggered by low-memory conditions or programmatic requests Time proportional to the size of the set of live objects – Calls for an incremental collector when the size is large 5/1/2000 Zhanyong Wan 21 Incremental Pauseless Collector An alternative to the mark-compact collector Relatively constant pause time even w/ extremely large data set Suitable for server applications and soft-real time applications (games, animations) The way it works – The “train” algorithm – Breaks up GC pauses into tiny pauses – Not a hard-real time algorithm: no guarantee for upper limit on pause times Side-benefit: better memory locality – Tends to relocate tightly-coupled objects together 5/1/2000 Zhanyong Wan 22 The HotSpot Thread Model Native thread support – Currently supports Solaris & 32bit Windows – Preemption – Multiprocessing Per-thread activation stack is shared w/ native methods – Fast calls between C and Java 5/1/2000 Zhanyong Wan 23 Thread Synchronization takes 1/6 of the time in an interpreting JVM – (I think) the proportion can be even higher for a JIT HotSpot’s thread synchronization – – – – 5/1/2000 Ultra-fast (“a breakthrough”) Constant time for all uncontended (no rival) synch Fully scalable to multiprocessor Makes fine-grain synch practical, encouraging good OO design Zhanyong Wan 24 Adaptive Inlining Method invocations reduce the effectiveness of optimizers – Standard optimizers don’t perform well across method boundaries (need bigger block of code) – Inlining is the solution Inlining has problems – Increased memory foot-print – Inlining is harder w/ OO languages because of dynamic dispatching (worse in Java than in C++) HotSpot uses run-time information to – Inline only the critical methods – Limit the set of methods that might be invoked at a certain point 5/1/2000 Zhanyong Wan 25 Dynamic Deoptimization Simple inlining may violate the Java semantics – A program can change the patterns of method invocation – Java program can change on the fly via dynamic class loading/discarding – Optimizations may become invalid Must be able to deoptimize dynamically! – HotSpot can deoptimize (revert back to bytecode?) a hot spot even during the execution of the code for it. 5/1/2000 Zhanyong Wan 26 Fully Optimizing Compiler Performs all the classic optimizations – – – – – Dead code elimination Loop invariant hoisting Common sub-expression elimination Constant propagation And more … Java-specific optimizations – Null-check elimination – Range-check elimination Global graph coloring register allocator Highly portable – Relying on a small machine description file 5/1/2000 Zhanyong Wan 27 Transparent Debugging & Profiling Semantics Native code generation & optimization fully transparent to the programmer – Uses two stacks • One real, one simulating – Overhead of two stacks? Pure bytecode semantics: easy debugging & profiling Question: what’s the point of a transparent profiling semantics? 5/1/2000 Zhanyong Wan 28 Performance Evaluation Micro-benchmarks: not the way – – – – No or few method calls/synchronizations Small live data set No correlation w/ real programs Give unrealistic results for HotSpot SPEC JVM98 benchmark – The only industry-standard benchmark for Java – Predictive of the performance across a number of real applications 5/1/2000 Zhanyong Wan 29 Where are the ideas from? Mostly from the last decade’s academic work – Dynamic compilation – Modern GC – HotSpot puts them together Academic research is relevant! 5/1/2000 Zhanyong Wan 30 (My) Conclusions HotSpot is great – Many new technologies previously only seen in academia Java performance may come close to or exceed the current implementation of C++ However Sun’s argument that Java can be faster than C++ is not convincing yet: – C++ has better control on machine resources – Many technologies used in HotSpot can be exploited for C++ as well. Especially: • Fast synchronization • Dynamic compilation • Maybe GC (for some dialects of C++) – Whether Java can exceed C++ remains to be tested 5/1/2000 Zhanyong Wan 31