Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya Dwarkadas, University of Rochester Alper Buyuktosunoglu, IBM T.J. Watson All Instructions are not Created Equal Critical instructions – lie on the program critical path Non-critical instructions – can be slowed without increasing execution time • Potential to improve cache performance (?) [Srinivasan ’01] [Fisk ’99] • Prioritization policies [Fields ’01] [Tune ’01] • Energy-efficient ALUs [Seng ’01] Energy-Delay Trade-Offs • Example energy-delay trade-off techniques: Voltage scaling, transistor sizing, way prediction, serial-access Gated-ground cells, high Vt Normalized delay Transistor sizing Variable threshold voltage 1.7 1.6 1.5 1.4 Vt 1.3 1.2 1.1 1 1 1.2 1.4 1.6 1.8 Normalized dynamic energy 2 Normalized Normalized Leakage Delay Low 8.5 0.88 Nominal 1 1 High 0.23 1.34 Exploiting Criticality • Design two static banks – hot bank: fast and high power cold bank: slow and low power • Instructions have to be classified as critical or not and • Data has to be placed in one of two banks Energy-efficient ALUs are easier to handle as there is no associated storage Criticality Metric Oldest-N: The N oldest instructions in the queue are critical Younger instructions are likely to be on mispredicted paths or can tolerate latencies N can be varied based on program needs Minimal hardware overhead Behavior comparable to more complex metrics AM vpr vortex twolf parser gzip gcc gap eon crafty bzip Percentage of loads with the same behavior as the last invocation Instruction Classification 100 90 80 70 60 50 40 30 20 10 0 Data Classification Percentage of cache blocks 50 45 Exclusively critical 40 35 30 25 20 Exclusively non-critical 15 10 5 0 0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% Percentage of critical accesses to a cache block 90100% Hot-and-Cold Microarchitecture Dispatch Bank Predictor Issue Queue Cold bank Hot bank Criticality Counters Placement Predictor L2 cy c co ld cy c co ld 6 cy c ba se ; c; 6 h- cy c co ld 4 cy c cy c ba se ; c; 4 h- 2 pe na lti es c; 2 h- c; no h- ba se ; HM of IPCs Performance Results 1.3 1.25 1.2 1.15 1.1 1.05 1 Energy Results 700 cold-bank hot-bank L1 energy (pJ/instr) 600 500 400 300 200 100 0 base case h-c; cold=0.6 h-c; cold=0.2 Results Summary • Bank mispredict rate of 9.5% • Criticality mismatch rate of 26% • Performance loss = 2.7% (data reorganization) + (0.8 x slowdown) • L1 cache energy savings of 37% Related Work • Recent split-cache organization by Abella and Gonzalez [ICCD’03] Base Fast Slow • Data allocation based on criticality of accessing instruction Conclusions • Data and instruction classification is reasonably accurate • Overhead from contention is non-trivial • Results are worthwhile in limited settings The use of criticality for data cache reorganization yields little benefit