Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Run Anywhere The Hardware Platform Perspective Ben Pollan, AMD Java Labs October 28, 2008 Agenda Java Labs Introduction Community Collaboration Performance Optimization Recommendations Leveraging the Latest Hardware Improvements 2 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 AMD Java Labs Dedicated AMD Java Labs organization supports Java development community through 3 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 AMD and Java Relationship Java Platform Performance Trends on x64 SPECjbb2005 on AMD “Barcelona” processor Data obtained from Sun Microsystems. Results not verified by AMD. 4 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Community Collaboration 5 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Contribution Areas Open Source OpenJDK: contributing performance enhancements CodeSleuth: Java profiling plug-in for Eclipse, released as open source project on sourceforge.net Collaboration with proprietary JVM vendors Performance optimizations to leverage hardware features 6 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Pursuing Optimizations For: More efficient use of hardware features Instruction selection – Convert integer to double, float to double, shift by constants Improve cache efficiency – Hashmap: common case usage improvements – BigDecimal: class size reductions – Field reordering / removal Improved performance / profiling data JVMTI: allow tools to track JITed method inlining Performance data via Instruction Based Sampling 7 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Hashmap Optimization Hashmap.get Leading cause of cache misses Statistics showed common use pattern Hashmap.put(int i, object); // where i < # elements … Hashmap.get(int i); Solution For this case, implement hashing functions and buckets as an array lookup Touches less memory, causing fewer cache misses 8 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 JVMTI Method Inlining Information Method address to source code mapping is a common performance analysis task Memory location->JITed code->bytecode->source code JVMTI’s CompiledMethodLoad callback returns information to assist with this mapping Broken…does not provide information for inlined methods OpenJDK extended to make inlining information available via JVMTI Tools writers can use this to produce better mapping Within JVMTI specification (existing void pointer) 9 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Performance Optimization Recommendations 10 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Configuration Options - NUMA Processors have their own local memory, which is less costly to access If running multiple JVMs per system, pin individual JVMs to a processor Windows: start /affinity xx java… where xx is a mask specifying the cores the process will run on Linux: numactl --cpunodebind=processor_num --membind=processor_num java… 11 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Configuration Options – NUMA (cont.) If running single JVM per system Try node interleaved memory setting in BIOS For Sun Solaris JVM, use –XX+UseNUMA For IBM on Linux, heap automatically interleaved Don’t do interleave both (cancels each other out) 12 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Configuration Options – Page Size Page table maintains virtual to physical address mappings Translation Lookaside Buffer (TLB) maintains cache of these With small page sizes, # of mappings can exceed cache size, leading to (slower) page table access For large (2M) page size system configuration instructions, see article in AMD Java Zone: Supersizing Java, parts 1 and 2 For those with even larger requirements, 1G Page support submitted soon for OpenJDK Only Solaris supports this for now Working with Linux distributions for support 13 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Configuration Options – Page Size (cont.) Sun: -XX:+UseLargePages -XX:LargePagesSizeInBytes=n (n=2M or 1G) Oracle: Determines if Large Pages are enabled in the system, then enables their support in JVM IBM: -Xlp 14 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Configuration Options – Compressed References For 64-bit OS and 64-bit JVM, object references stored on the heap are 64-bits Compressed references limit references to 32-bits, using fewer heap resources To use, your memory requirements must be lower than: IBM: 25GB Oracle: 4GB Sun: 32GB To enable: IBM: -Xcompressedrefs Oracle: -XXcompressedRefs Sun: -XX:+UseCompressedOops 15 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Configuration Options - IBM -Xtlhprefetch Causes newly allocated area on the heap to be prefetched with PREFETCHNTA Prevents L2 processor cache from being polluted, because when objects are removed from L1, they aren’t moved to L2 Good to use when you have many short-lived objects 16 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Leveraging the Latest Hardware Improvements 17 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 AMD Hardware Advances Increase in cores per processor Balance workload between GPU and CPU Instruction-based sampling (IBS) Rich set of processor event data Precisely associates event data with the instructions that cause the event JVMs can use this data to make dynamic optimization decisions Lightweight Profiling (LWP) Specification Enable code to make dynamic decisions about how to improve performance Suggestions welcome: [email protected] 18 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 AMD Hardware Advances Advanced Synchronization Facility (ASF) Experimental AMD64 extension Lighter weight locking mechanism Instruction optimizations With each processor release 19 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Taking Advantage of Hardware Features Upgrade to the latest JVMs Many benefits you will get for free – Instruction optimizations – Profiling information, feedback Use the Java Concurrency Classes ParallelArray (JDK7) will take advantage of work that will leverage multiple cores 20 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 More Information at AMD Developer Central • • • Detailed technical articles Documentation, tutorials, and guides Featuring tips to help optimize software for AMD “Barcelona” processors • • • • AMD-optimized build tools (Compilers, JITs) Performance analysis tools (AMD CodeAnalystTM software) Performance libraries – AMD Core Math Library (ACML) and Framewave (open source) • • Inside look at AMD’s vision Community discussions Subscribe to our free newsletter: developer.amd.com 21 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Java Zone on AMD Developer Central 22 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008 Disclaimer The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Trademark Attribution AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. SPECjbb is a registered trademark of Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. © 2008 Advanced Micro Devices, Inc. All rights reserved. 23 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008