Download Understanding Performance Counter Data

Understanding Performance Counter Data - 1 Methodology – [Configuration micro-benchmark] – Validation micro-benchmark – used to predict event count – Prediction via tool, mathematical model, and/or simulation – Hardware-reported event count collection via PAPI (instrumented benchmark run 100 times; mean event count and standard deviation calculated) – Comparison/analysis – Report findings Understanding Performance Counter Data - 2 • Can quantify PAPI overhead in some cases, e.g., – Loads and stores – Floating-point operations (on some platforms) • Can show that count is reasonable in others, e.g., – L1 Dcache misses – DTLB misses (R10K) – Multiprocessor cache consistency protocolrelated events (R10K) Understanding Performance Counter Data - 3 • Interesting facts – Stream buffers are incredibly effective! – Itanium has 17% more instructions retired and 17% more Icache misses than predicted – this is due to no-ops – Itanium has 5x TLB misses than predicted – don’t know why yet! – Power3 has 5x (for smaller versions of benchmark) and 2x (for larger versions) TLB misses than predicted – don’t know why yet! Understanding Performance Counter Data - 4 • Interesting facts – Power3 (gcc compiler): single-precision vs. double-precision floating-point add benchmark • ½ the number of floating-point operations for double-precision benchmark due to rounding instructions needed for singleprecision benchmark • 1.39x cycles for single-precision benchmark, as compared to double-precision benchmark

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Understanding Performance Counter Data