* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Understanding Performance Counter Data
Survey
Document related concepts
Transcript
Understanding Performance Counter Data - 1 Methodology – [Configuration micro-benchmark] – Validation micro-benchmark – used to predict event count – Prediction via tool, mathematical model, and/or simulation – Hardware-reported event count collection via PAPI (instrumented benchmark run 100 times; mean event count and standard deviation calculated) – Comparison/analysis – Report findings Understanding Performance Counter Data - 2 • Can quantify PAPI overhead in some cases, e.g., – Loads and stores – Floating-point operations (on some platforms) • Can show that count is reasonable in others, e.g., – L1 Dcache misses – DTLB misses (R10K) – Multiprocessor cache consistency protocolrelated events (R10K) Understanding Performance Counter Data - 3 • Interesting facts – Stream buffers are incredibly effective! – Itanium has 17% more instructions retired and 17% more Icache misses than predicted – this is due to no-ops – Itanium has 5x TLB misses than predicted – don’t know why yet! – Power3 has 5x (for smaller versions of benchmark) and 2x (for larger versions) TLB misses than predicted – don’t know why yet! Understanding Performance Counter Data - 4 • Interesting facts – Power3 (gcc compiler): single-precision vs. double-precision floating-point add benchmark • ½ the number of floating-point operations for double-precision benchmark due to rounding instructions needed for singleprecision benchmark • 1.39x cycles for single-precision benchmark, as compared to double-precision benchmark