Download Run Anywhere

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Run Anywhere
The Hardware Platform Perspective
Ben Pollan, AMD Java Labs
October 28, 2008
Agenda
Java Labs Introduction
Community Collaboration
Performance Optimization Recommendations
Leveraging the Latest Hardware Improvements
2 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
AMD Java Labs
Dedicated AMD Java Labs organization supports Java
development community through
3 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
AMD and Java Relationship
Java Platform Performance Trends on x64
SPECjbb2005 on AMD “Barcelona” processor
Data obtained from Sun Microsystems. Results not verified by AMD.
4 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Community Collaboration
5 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Contribution Areas
Open Source
OpenJDK: contributing performance
enhancements
CodeSleuth: Java profiling plug-in for Eclipse,
released as open source project on
sourceforge.net
Collaboration with proprietary JVM vendors
Performance optimizations to leverage hardware
features
6 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Pursuing Optimizations For:
More efficient use of hardware features
Instruction selection
– Convert integer to double, float to double, shift by
constants
Improve cache efficiency
– Hashmap: common case usage improvements
– BigDecimal: class size reductions
– Field reordering / removal
Improved performance / profiling data
JVMTI: allow tools to track JITed method inlining
Performance data via Instruction Based Sampling
7 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Hashmap Optimization
Hashmap.get
Leading cause of cache misses
Statistics showed common use pattern
Hashmap.put(int i, object); // where i < #
elements
…
Hashmap.get(int i);
Solution
For this case, implement hashing functions and
buckets as an array lookup
Touches less memory, causing fewer cache misses
8 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
JVMTI Method Inlining Information
Method address to source code mapping is a common
performance analysis task
Memory location->JITed code->bytecode->source
code
JVMTI’s CompiledMethodLoad callback returns information
to assist with this mapping
Broken…does not provide information for inlined
methods
OpenJDK extended to make inlining information available
via JVMTI
Tools writers can use this to produce better mapping
Within JVMTI specification (existing void pointer)
9 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Performance Optimization Recommendations
10 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Configuration Options - NUMA
Processors have their own local memory, which is less
costly to access
If running multiple JVMs per system, pin individual JVMs
to a processor
Windows: start /affinity xx java… where xx is
a mask specifying the cores the process will run on
Linux: numactl --cpunodebind=processor_num
--membind=processor_num java…
11 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Configuration Options – NUMA (cont.)
If running single JVM per system
Try node interleaved memory setting in BIOS
For Sun Solaris JVM, use –XX+UseNUMA
For IBM on Linux, heap automatically interleaved
Don’t do interleave both (cancels each other out)
12 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Configuration Options – Page Size
Page table maintains virtual to physical address mappings
Translation Lookaside Buffer (TLB) maintains cache
of these
With small page sizes, # of mappings can exceed cache
size, leading to (slower) page table access
For large (2M) page size system configuration
instructions, see article in AMD Java Zone:
Supersizing Java, parts 1 and 2
For those with even larger requirements, 1G Page support
submitted soon for OpenJDK
Only Solaris supports this for now
Working with Linux distributions for support
13 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Configuration Options – Page Size (cont.)
Sun:
-XX:+UseLargePages
-XX:LargePagesSizeInBytes=n (n=2M or 1G)
Oracle:
Determines if Large Pages are enabled in the
system, then enables their support in JVM
IBM:
-Xlp
14 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Configuration Options – Compressed References
For 64-bit OS and 64-bit JVM, object references stored on
the heap are 64-bits
Compressed references limit references to 32-bits, using
fewer heap resources
To use, your memory requirements must be lower than:
IBM: 25GB
Oracle: 4GB
Sun: 32GB
To enable:
IBM: -Xcompressedrefs
Oracle: -XXcompressedRefs
Sun: -XX:+UseCompressedOops
15 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Configuration Options - IBM
-Xtlhprefetch
Causes newly allocated area on the heap to be
prefetched with PREFETCHNTA
Prevents L2 processor cache from being polluted,
because when objects are removed from L1, they
aren’t moved to L2
Good to use when you have many short-lived
objects
16 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Leveraging the Latest Hardware Improvements
17 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
AMD Hardware Advances
Increase in cores per processor
Balance workload between GPU and CPU
Instruction-based sampling (IBS)
Rich set of processor event data
Precisely associates event data with the instructions
that cause the event
JVMs can use this data to make dynamic optimization
decisions
Lightweight Profiling (LWP) Specification
Enable code to make dynamic decisions about how to
improve performance
Suggestions welcome: [email protected]
18 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
AMD Hardware Advances
Advanced Synchronization Facility (ASF)
Experimental AMD64 extension
Lighter weight locking mechanism
Instruction optimizations
With each processor release
19 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Taking Advantage of Hardware Features
Upgrade to the latest JVMs
Many benefits you will get for free
– Instruction optimizations
– Profiling information, feedback
Use the Java Concurrency Classes
ParallelArray (JDK7) will take advantage of work that
will leverage multiple cores
20 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
More Information at AMD Developer Central
•
•
•
Detailed technical articles
Documentation, tutorials,
and guides
Featuring tips to help
optimize software for AMD
“Barcelona” processors
•
•
•
•
AMD-optimized build tools
(Compilers, JITs)
Performance analysis tools
(AMD CodeAnalystTM
software)
Performance libraries –
AMD Core Math Library
(ACML) and Framewave
(open source)
•
•
Inside look at AMD’s vision
Community
discussions
Subscribe to our free newsletter:
developer.amd.com
21 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Java Zone on AMD Developer Central
22 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008
Disclaimer
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and
typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product
and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between
differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise
correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the
content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR
PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Trademark Attribution
AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. SPECjbb is a registered trademark of
Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their
respective owners.
© 2008 Advanced Micro Devices, Inc. All rights reserved.
23 | Run Anywhere: The Hardware Platform Perspective | October 28, 2008