* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 7810-13
Survey
Document related concepts
History of electric power transmission wikipedia , lookup
Time-to-digital converter wikipedia , lookup
Power engineering wikipedia , lookup
Buck converter wikipedia , lookup
Stray voltage wikipedia , lookup
Pulse-width modulation wikipedia , lookup
Alternating current wikipedia , lookup
Distributed generation wikipedia , lookup
Earthing system wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Life-cycle greenhouse-gas emissions of energy sources wikipedia , lookup
Voltage optimisation wikipedia , lookup
Transcript
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998 Cost of Speculation Mispredict rates 9.9 12.2 23.9 10.4 6.9 4.6 11.3 1.7 Pipeline Gating • Low confidence branches throttle instr fetch until they are resolved • Pipeline gating usually lasts for fewer than five cycles Metrics • SPEC (specificity): fraction of all mispredicted branches detected as low-confidence by the confidence estimator (coverage) • PVN (predictive value of a negative test): probability of a low-confidence branch being incorrectly branch-predicted (accuracy) Confidence Estimators • Perfect: to gauge potential benefits • Static: branches that have low prediction rates • JRS: if a branch has yielded N successive correct predictions, it has high confidence • Saturating counters: unbiased counter value or disagreement in two predictors low confidence • Distance: mpreds are clustered, hence the first 4 branches after a mispredict have low confidence SPEC and PVN SPEC (coverage): mispred branches detected by low-confidence estimator PVN (accuracy): % of low-confidence branches that are branch mpreds • It is easier to achieve a high SPEC value than PVN • A high PVN value can be achieved by using N low-confidence branches to invoke gating – if PVN is 30%, re-defining low-confidence as two low-confidence branches increases PVN to 51% Perfect Gating Results Results • Can gating improve performance? – only if cache pollution is significant • Less than 1% performance loss and up to 38% reduction in extra work • Energy consumption could go up – some work is independent of number of executed instrs (clock distribution) – incr. execution time can incr. Energy • Pipeline gating should reduce power consumption Results CS 7810 Lecture 13 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M. Martonosi Proceedings of ISCA-28 July 2001 Leakage Power Trends • Circuit delay a 1/(V – Vth) • Leakage a num transistors (incr) supply voltage (decr) (exp) low thresh. voltage (incr) • L1 and L2 caches are the biggest contributors (high transistor budgets) Vdd-Gating • Leakage can be reduced by gating off the supply voltage to the circuit • When applied to a cache, the contents of the SRAM cell are lost • Cache decay: apply Vdd-gating when you do not care about cache contents Lifetime of a Cache Line Overheads • Hardware to determine when to decay • Introduces additional cache misses • Normalized cache leakage power = Activeratio (fraction of cache that is powered on) + (Counter overhead : Leak) x activity + (L2 access energy : Leak) x num-misses • Increased execution time (< 0.7%) • L2 access/leakage ratio is ~9 Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost = purchase price Ski trips: Optimal: Heuristic: 5 10 15 20 25 50 $100 $200 $300 $400 $400 $400 $100 $200 $300 $800 $800 $800 Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far Tracking Dead Time • Each line has a 2-bit counter that gets reset on every access and gets incremented every 2500 cycles through a global signal (negligible overhead) • After 10,000 clock cycles, the counter reaches the max value and triggers a decay • Adaptive decay: Start with a short decay period; if you have a quick miss, double the period; if there is no miss, halve the period Results Overheads Other Results • L2 cache is equally suitable to decay techniques -- lifetimes are scaled by a factor of 10, an extra miss also costs a lot more • For their experiments, there is little interference from multiprogramming • Some instructions can easily be identified as last touches to a cache block – potential for early cache decay • Can this apply to bpred, register file? Title • Bullet