* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download power modeling and leakage reduction
Wireless power transfer wikipedia , lookup
Electrification wikipedia , lookup
Control system wikipedia , lookup
Alternating current wikipedia , lookup
Electric power system wikipedia , lookup
Audio power wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Standby power wikipedia , lookup
Mains electricity wikipedia , lookup
Power MOSFET wikipedia , lookup
Power over Ethernet wikipedia , lookup
Power engineering wikipedia , lookup
Earthing system wikipedia , lookup
Leakage Modeling and Reduction Amit Agarwal, Lei He et. al Presenters: Qun Gu Ho-Yan Wong Courtesy of Lei He Outline Introduction Circuit level leakage reduction System level leakage reduction Coupled leakage and thermal simulation and management Power Trends Circuit Power Dynamic Power: determined by circuit performance requirement etc. The percentage is getting smaller. Short_Circuit Power: Both PU and PD circuit partially conduct. Small percentage. (<10%) Leakage Power: Increasingly important, and many issues dependent, such as device geometry, temperature, doping, processing and data pattern dependent, etc. It is very complicated and worthy to study more to improve it. Leakage Power Sources Subthreshold leakage Subthreshold Leakage Gate Leakage Gate Source Drain n+ n+ Reverse Biased Junction BTBT Bulk Reverse Biased Junction BTBT Leakage Gate Leakage Leakage Dependences Circuit Techniques to Reduce Leakage Design Time Techniques Dual threshold CMOS Run Time Techniques Standby Natural Transistor Stacks Sleep Transistor (MTCMOS) Forward/Reverse Body Biasing (VTCMOS) Active Leakage Reduction Techniques Leakage Reduction Techniques Dynamic Vth Scaling (DVTS) Dual Threshold CMOS Adjust Vth approaches in fabrication: • Adjustment of tox (the higher tox, the higher Vth) How? • Low Vth for critical path • High Vth for non-critical path Concerns: • It is not so straigtht forward to do this. Sometime tradeoff exist between high Vth and low Vth applications. • Vth variation cannot be always success at low voltage supplies. • Increasing the number of critical paths will sometimes hurt circuit performance. Natural Transistor Stacks How? • Reduce the leakage by stacking the devices. Concerns: • Trade off between speed and power • Data pattern determined • Trade off with other leakage power ( gate leakage) Sleep Transistor (MTCMOS) How? Inserts an extra series connected transistor (sleep transistor with high Vth) in the PU/PD path of a gate and turns it ‘off’ in the standby mode of operation. Disadvantages: • Increase area and delay • Data retention problem • Hard to turn on completely at very low supply voltages Improvements for MTCMOS -VRC Virtual power/ground Rails Clamp (VRC) Solves data retention problem with diodes Virtual level changes are clamped Allow data to be retained in SRAM arrays Alternatives: Super cutoff CMOS (with low Vth) (SCCMOS) In standby mode, PMOS gate is Vcc+0.4v, NMOS is Vss-0.4v to fully cut off leakage. Forward/Reverse Body Biasing (VTCMOS) RBB (Reverse Body Bias): zero body bias in active mode, a deep reverse bias in standby mode. Disadvantages: • Increase PN junction reverse leakage • Scaling down technology worsen short channel effects and weaken the Vth modulation capability FBB (Forward Body Bias): high Vth in standby mode, forward body biasing to achieve better current drive in active mode. Disadvantages: • Larger junction capacitance • High body effect for stack devices Technology improvement for high Vth: • Different doping profile • Higher work function materials Dynamic Vth Scaling (DVTS) How? • When critical path replica frequency is less then reference CLK, adjust bias to decrease Vth. • Otherwise adjust bias to increase Vth. Results: • The lowest Vth is delivered (NBB-no body bias) if the highest performance is required. • When the performance demand is low, clock frequency is lowered and Vth is raised via RBB to reduce the run time leakage power dissipation. Process Variation and Leakage Variation Sources: • Channel length • Transistor width • Oxide thickness • Flat-band voltage • Random dopant effect The effects of larger spread of leakage: • Robustness of logic circuits. • Circuit design margin. IDSAT and IOFF variation measured (150nm process). Circuit Techniques for Compensation Process Variation: • Adaptive body biasing for process compensation • Process variation compensation in dynamic circuits Adaptive Body Biasing for Process Compensation Due to the worsening parameter fluctuation: • Some dies may not meet the target frequency. • Others exceed the leakage power constraints. How? • The slow dies which fail to meet the desired frequency can be forward body biased to improve performance which paying more leakage power. • On the other hand, excess leakage dies can be reverse body biased to meet the leakage power specifications. Effects: So adaptive body bias reduces the spread of the die frequency distribution by 7X, compared to a conventional zero body bias. Process Variation Compensation in Dynamic Circuits (I) Dynamic Circuits need keepers to compensate leakage current to keep data. The consideration for keepers size: • Unnecessary large keeper size will hurt circuit performance • Excess leakage dies can not meet the robustness requirements without enough keeper size. Programmable keeper size scheme: A desired effective keeper width can be chosen among {0, W, 2W, …7W} according to the control bit. Process Variation Compensation in Dynamic Circuits (II) Simulation Results: • 5X reduction in the number of robustness failing dies and 10% improvement in average performance. • Variation spread of the robustness and delay distribution is reduced by 55% and 35% System Level Leakage Reduction Motivation Leakage characteristics and reduction Coupled leakage and thermal simulation and management Power and thermal simulation Dynamic power and thermal management Vdd scaling with cooling selection Motivation Leakage current has increased due to scaling in Vt, L, and tox Leakage power becomes more important due to high leakage devices and low activity rates Leakage power depends greatly on temperature Power States at System Level 3 Power states defined at system level: 1. Active Mode – circuit in operation; P= Pd + Ps 2. Standby Mode – circuit is idle but ready to execute; P= Ps 3. Inactive Mode – circuit is deactivated by leakage reduction techniques; P < Ps System Level Leakage Power Modeling Early model: Ps = Vdd * N FET * k design * Ileakage Later model, with application of 2 leakage power reduction techniques (later): Ps = Vdd * Ngate * Iavg Leakage Power Characteristics Minimum Idle Time (M.I.T) M.I.T. = {Es-i + Ei-s – Pi * (ts-i + ti-s)} / (Ps – Pi) Idle Period Leakage power reduction is useful only when Idle Period > M.I.T. Runtime Leakage Reduction for Caches Caches dissipate large amount of leakage power due to large SRAM array structures Different techniques are developed to reduce L1 cache Ps, e.g. DRI, SWAY Basic principle is to dynamically turn off partial cache array structure Ps Reduction for L2 Caches L2 cache has much larger miss penalty, so approach for L1 can not be directly applied Use VRC to reduce Ps , and use time-out based control mechanisms to shutdown L2-cache data portion Time out threshold could be fixed (FTO), dynamic, or by feedback control (FCTO) Ps Reduction for L2 Caches cont’d FTO Time out threshold is set as M.I.T. FCTO Adjust the time-out threshold with the proportionalintegral (PI) feedback controller Update time-out threshold according to N: L2 cache miss rate in previous time window Told: Time-out threshold in previous time window New timeout threshold T = Told + (N – Setpoint) * Gain Circuits for FCTO Request address: Data word Tag Index Block offset Timeout controller hit/miss Hit? Yes Counter = Data potion Tag potion Wakeup signal Threshold controller Wakeup/ shutdown signals Check for tag match Shutdown signal Threshold controller Timeout controller hit/miss Nmiss Mux - X setpoint gain Threshold output + Threshold register Comparison of L2 Leakage Reduction Time-out (FTO and FCTO) achieve much smaller performance penalty Targeting at 1% performance loss, FCTO obtains more power reduction than FTO does. Power reduction (%) Benchmark FTO FCTO SWAY go 52.21 63.80 li 12.92 equake art Performance penalty (%) DRI FTO FCTO SWAY DRI 57.55 56.79 1.06 1.10 9.95 7.39 27.87 26.64 26.56 0.93 1.07 7.28 7.71 35.75 48.61 46.40 45.71 0.84 1.01 9.73 10.58 0.07 2.20 2.17 2.18 0.37 0.92 3.18 3.14 System Level Leakage Reduction Motivation Leakage characteristics and reduction Coupled leakage and thermal simulation and management Power and thermal simulation Dynamic power and thermal management Vdd scaling with cooling selection Temperature Aware Computing Initial conditions (T, delay) Performance simulator (e.g. SimpleScalar, IMPACT) uArch Floorplan packaging Dynamic power estimation (e.g. Wattch) Leakage estimation Coupled power and thermal simulator (e.g. PTscalar, PowerImpact) Workload (e.g. Spec 2k) Adjusted conditions (T, delay) Temperature-aware architecture techniques (DVS, DTM, reconfigurability power model, GALS, etc) Leakage Model with Temperature Scaling Exponential scaling based on BSIM3v3 Logic circuits in ITRS 100nm technology: 1986.13Vdd 4396.09 Ps Ngate Vdd Iavg (T0 ,Vdd 0 ) T exp T 2 Memory units in ITRS 100nm technology: 1986.13Vdd 4396.09 Pl (T ,Vdd ) (5.30 1010 words 1.72 109 wordsize ) T 2 exp Vdd T 711.92Vdd 3725.53 Pc (T ,Vdd ) 5.29 1010 words wordsize T 2 exp Vdd T Delay with Vdd and Temperature Scaling Based on SPICE level 1 model, transistor saturation current Isat is proportional to (Vdd Vt ) T 100% We obtain delay(Vdd ,T ) 1 Isat Vdd T 1.19 (Vdd Vt )1.2 ITRS 100nm technology Normalized gate delay 95% T=100oC 90% T=80oC T=60oC 85% 80% 75% 1 1.1 1.2 Vdd (V) 1.3 Thermal Modeling For the lumped RC thermal circuit Thermal resistance Rth: the ability to remove heat to the ambient in steady-state condition Thermal capacitance Cth: capture the delay between a change in power and the corresponding change in the temperature Thermal time constant τ= Rth * Cth Distributed model is needed for accurate solution Coupled Power and Thermal Simulation Simulate time step ts < 0.5% of time constant (~106 cycles) will give negligible temperature and power calculation errors Clock gating reduces dynamic power and also leakage energy Leakage energy changes with operation temperature Leakage Power at Different Temperature Normalized total power 100% 80% 60% 100nm, 3.33GHz, 1.2V 40% 20% 0% 35 85 110 Dep Benchmark art 35 85 110 Dep Benchmark gcc Temperature (oC) Dynamic power uP similar to DEC Alpha 21264 and with clock gating Leakage differs by up to 2X between 80oC and 110oC Leakage power Differs for different applications too. Coupled thermal and power simulation is a must Thermal Runaway Thermal runaway is caused by the positive feedback loop between on-resistor, temperature, and power Also a result of the interaction between leakage power and temperature Component temperature ↑leakage power ↑ exponentially temperature ↑ If cooling not adequate, both keep increasing Thermal Runaway cont’d Assume no throttling and constant power consumption, conditions for thermal runaway is equivalent to d2T/dt2 > 0 Lowest temperature to meet TR criteria is runaway temperature Dynamic Power and Thermal Management (DPTM) Goal: Maximize throughput subject to maximum on-chip temperature constraint For each time window = X cycles, stop or throttle instruction fetch in X cycles 0<=δ <=1 Feedback controller (Proportional Integral) to adjust δ: For each time window, updateδ according to Current maximum on-chip temperature δ in previous time window Dynamic Power and Thermal Management (DPTM) Fetch toggling toggles I-cache, I-TLB, branch prediction and decode units Dynamic frequency scaling (DFS) and Dynamic Voltage Scaling (DVS) adjust the clock freq and Vdd stall Activity migration move activities to another component copy of lower temperature Need for Temperature Dependent Leakage Model Dynamic thermal management using fetch toggling with PI feedback controller Implemented 2 models: simple (fixed Ps) and accurate (Ps is temp. dependent) Validation of PI-based DPTM Compared with two practices: No dynamic management Lower Vdd to avoid thermal violations Cooling down If reaching the thermal threshold, stop the whole processor until the maximum temperature is X oC lower than the threshold X = 5 in our experiments Throughput (BIPS) System Performance 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 Max throughput 1 1.1 1.2 1.3 Vdd (V) Feedback control, Max T=80C No management, Max T=110C Simple cooling down, Max T=80C DPTM by feedback control may improve throughput by up to 11% compared to no DPTM case DPTM allows designing for common workload but not the worst case => thermal speculation Active Cooling Direct water-spray cooling Thermal resistance 0.067 compare to 0.8 for conventional heatsink Microchannel with liquid coolant, … Impacts of Water Cooling 0.4 Throughput (BIPS) 6 0.3 5 water cooling, Max T=60oC 4 0.2 3 2 1 0.1 Power efficiency (BIPS/W) 7 Air cooling, Max T=80oC 0 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Vdd (V) Increases the maximum throughput by 30% Improves power efficiency by 9% and slows down the decay of power efficiency References Amit Agarwal et. al, “Leakage Mechanisms and Leakage Control for Nano-Scale CMOS Circuits”, Purdue University. Lei He et. al, “System Level Leakage Reduction Considering the Interdependence of Temperature and Leakage”, UCLA.