Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA-11 13 February 2005 Jim Callister Intel Corporation © Intel Corp. 2005 Why Include a PMU? • Ya gotta do something with all those transistors! • Cause my PAPI told me to • To give competitors a fighting chance • To show my boss how great my branch predictor is (ie., get a raise) • To improve the performance of current and future systems 13 February 2005 Itanium Processor PMU ® How much Performance would you give up for PMU Functionality? • Transistors may be “free” but… – – – – – Wires are not! Design time costs Validation costs Documentation costs Time to Market costs • The answer is not 0% – PMU proven to improve performance • But it’s not 10% either! 13 February 2005 Itanium Processor PMU ® The PMU Has Tentacles Everywhere! Collector Collector Collector PMU Central Collector 13 February 2005 Collector Itanium Processor PMU ® What to Architect in the PMU? • “Machine Architecture is a contract between hardware and software” • Architect too much… – Lowers performance through design constraints – Events don’t map well to hardware • Architect too little… – Jeopardizes Software Investment – Discourages Software Support 13 February 2005 Itanium Processor PMU ® Itanium® Architecture: PMU • Architected – Access & Management of PMU Resources • PMD registers for Data, PMC registers to control PMU – Counter Overflow Behavior and Interrupt Handling – Only a few basic counter events • Implementation Dependent – Number of counters, width of counters – Non-counter performance monitors – Events: Encourage use of CPU-specific tables • Itanium architecture protects OS and Tool infrastructure while promoting performance and full visibility 13 February 2005 Itanium Processor PMU ® Performance Events – Let me count the ways… • Which events are important? – How will the events be used? – Do you really care about a cache miss if it doesn’t cause any stalls? • Mapping an event to signals – Needed signal may not be available • On critical path, lack of wires, no signal – Combining signals is problematic • Distance between signals, timing, logic 13 February 2005 Itanium Processor PMU ® Itanium® 2 Processor PMU Events Event Categories Number of Events Cycle Accounting 89 Instruction Execution 42 Branches 69 Caches & TLBs 150 Bus 73 Misc 20 Total 443 13 February 2005 Itanium Processor PMU ® Where are the Performance Problems? • Counters only give type of problem and magnitude of the problem • Use filters on counters (hunt & peck) • Itanium® architecture currently includes: – Opcode Filters – Privilege Level Filters – Instruction Address Range Filters – Data Address Range Filters 13 February 2005 Itanium Processor PMU ® A Better Way to Locate Performance Problems • Event Address Registers (EARs) – Logs information about a single cache miss – The logs are sampled by software – Creates a statistical profile of cache misses • Branch Trace Buffer (BTB) – Logs information about consecutive branches – Logs also sampled by software 13 February 2005 Itanium Processor PMU ® Lend Me an EAR • Instruction & Data EARs – Log Instruction Address of Miss • Data EAR also logs Data Address of Miss – Log Latency of Miss – Filter by latency bin – Have an associated counter event – Can also log TLB misses • And where TLB miss was resolved • Have proven to be extremely useful 13 February 2005 Itanium Processor PMU ® The D-EAR Shadow Effect Latency Counter Busy Miss Latency Counter Busy Miss Recorded Miss Miss Recorded Without extra hardware, these misses would never be recorded! 13 February 2005 Itanium Processor PMU ® The D-EAR Shadow Effect Latency Counter Busy Miss Latency Counter Busy Miss Recorded Miss Miss Recorded Without extra hardware, these misses would never be recorded! The Itanium® 2 Processor Solution •Don’t Track every Opportunity -- randomly pick misses to track •Tradeoff: shadow mitigation versus sampling frequency •Use LFSR to decide which port to sample and if to sample •Every miss has ~1 in 8 chance of being tracked •This mitigates the shadow effect, does not totally eliminate it •Customer feedback indicates it works very well 13 February 2005 Itanium Processor PMU ® The Itanium® 2 Processor’s Branch Trace Buffer (BTB) • An eight entry Circular Buffer • Each entry contains either: – Address & Prediction Data of a branch, or – Address of a branch target • Uses of the BTB – Mis-predicted branch profiler – An efficient Instruction Address Profiler – Path Profiler • Cool use: in conjunction with EARs – Path leading up to sampled miss! 13 February 2005 Itanium Processor PMU ® The Itanium® 2 processor’s PMU Helps Improve Performance 100% 90% Performance Improvement in Percent 80% 70% 50% CAGR App One App Two App Three App Four App Five App Six 60% 50% 40% 30% 20% 10% 0% 0 5 10 15 20 Tuning Time in Weeks 13 February 2005 Itanium Processor PMU ® 25 30 Performance is measured using specific computer systems and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Conclusions • Walking a micron in HW design shoes – Balancing PMU functionality & overall performance • We need to move beyond counters! – Itanium® 2 processors provide EARs and BTBs – What’s next? • The Itanium 2 processor’s PMU has much to offer – Customers are making good use of it – Would like to see more use – how do we do it? • Discussion – What is the long-term vision for the PMU? – What can the PMU provide to improve current and future systems – Did anything “stick” or resonate? Itanium® and Itanium® 2 are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 13 February 2005 Itanium Processor PMU ® For More Information…. http://developer.intel.com/design/itanium/documentation.htm Manuals Intel Itanium Architecture Software Developer's Manuals Volume 1: Application Architecture Part II: Optimization Guide Intel Itanium Architecture Software Developer's Manuals Volume 2: System Architecture Chapter 7: Debugging and Performance Monitoring Chapter 12: Performance Monitoring Support Intel Itanium 2 Processor Reference Manual for Software Development and Optimization Chapter 10: Performance Monitoring Chapter 11: Performance Monitor Events 13 February 2005 Itanium Processor PMU ®