Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils 1 Computer Structure 2012 – Power Management Processor Power Components The power consumed by a processor consists of – Dynamic power: power for toggling transistors and lines from 01 or 10 αCV2f : α – activity, C – capacitance, V – voltage, f – frequency – Leakage power: leakage of transistors under voltage function of: Z – total size of all transistors, V – voltage, t – temperature Peak power must not exceed the thermal constrains – Power generates heat Heat must be dissipated to keep transistors within allowed temperature – Peak power determines peak frequency (and thus peak performance) – Also affects form factor, cooling solution cost, and acoustic noise Average power – Determines battery life (for mobile devices), electricity bill, air-condition bill – Average power = Total Energy / Total time Including low-activity and idle-time (~90% idle time for client) 2 Computer Structure 2012 – Power Management Performance per Watt In small form-factor devices thermal budget limits performance – Old target: get max performance – New target: get max performance at a given power envelope Performance per Watt Increasing f also requires increasing V (~linearly) – Dynamic Power = αCV2f = Kf3 X% performance costs ~3X% power – A power efficient feature – better than 1:3 performance : power Otherwise it is better to just increase frequency (and voltage) Vmin is the minimal operation voltage – Once at Vmin, reducing frequency no longer reduces voltage – At this point a feature is power efficient only if it is 1:1 performance : power Active energy efficiency tradeoff – Energyactive = Poweractive × Timeactive Poweractive / Perfactive – Energy efficient feature: 1:1 performance : power 3 Computer Structure 2012 – Power Management Platform Power Processor average power is <10% of the platform CLK 5% LAN Fan DVD 2% 2% ICH 2% 3% Display (panel + inverter) 33% HDD 8% GFX 8% Misc. 8% CPU 10% MCH 9% 4 Power Supply 10% Computer Structure 2012 – Power Management Managing Power Typical CPU usage varies over time – Bursts of high utilization & long idle periods (~90% of time in client) Optimize power and energy consumption – High power when high performance is needed – Low power at low activity or idle Enhanced Intel SpeedStep® Technology – Multi voltage/frequency operating points – OS changes frequency to meet performance needs and minimize power – Referred to as processor Performance states = P-States OS notifies CPU when no tasks are ready for execution – CPU enters sleep state, called C-state – Using MWAIT instruction, with C-state level as an argument – Tradeoff between power and latency 5 Deeper sleep more power savings longer to wake Computer Structure 2012 – Power Management P-states Operation frequncies are called P-states = Performance states – P0 is the highest frequency – P1,2,3… are lower frequencies – Pn is the min Vcc point = Energy efficient point DVFS = Dynamic Voltage and Frequency Scaling – Power = CV2f ; f = KV Power ~ f3 – Program execution time ~ 1/f – E = P×t E ~ f2 Pn is the most energy efficient point Power P0 P1 – Going up/down the cubic curve of power High cost to achieve frequency large power savings for some small frequency reduction P2 Pn Freq 6 Computer Structure 2012 – Power Management C-States: C0 C0: CPU active state Active Core Power Local Clocks and Logic Clock Distribution Leakage 7 Computer Structure 2012 – Power Management C-States: C1 C0: CPU active state C1: Halt state: • • • • Active Core Power Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops Clock Distribution Leakage 8 Computer Structure 2012 – Power Management C-States: C3 C0: CPU active state C1: Halt state: • • • • Active Core Power Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: • Stop remaining core clocks • Flush internal core caches Leakage 9 Computer Structure 2012 – Power Management C-States: C6 C0: CPU active state C1: Halt state: • • • • Active Core Power Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: • Stop remaining core clocks • Flush internal core caches C6 state: Leakage • Processor saves architectural state • Turn off power gate, eliminating leakage Core power goes to ~0 10 Computer Structure 2012 – Power Management Putting it all together CPU running at max power and frequency Periodically enters C1 C0 P0 20 18 16 Power [W] 14 12 10 8 6 4 C1 2 0 Time 11 Computer Structure 2012 – Power Management Putting it all together Going into idle period – Gradually enters deeper C states – Controlled by OS C0 P0 20 18 16 Power [W] 14 12 10 8 C2 6 C3 4 C1 2 C4 0 Time 12 Computer Structure 2012 – Power Management Putting it all together Tracking CPU utilization history – OS identifies low activity – Switches CPU to lower P state C0 P0 20 18 16 C0 P1 Power [W] 14 12 10 8 C2 6 C3 4 C1 2 C4 0 Time 13 Computer Structure 2012 – Power Management Putting it all together CPU enters Idle state again C0 P0 20 18 16 C0 P1 Power [W] 14 12 10 8 C2 6 C3 4 C1 2 C2 C4 C3 C4 0 Time 14 Computer Structure 2012 – Power Management Putting it all together Further lowering the P state DVD play runs at lowest P state C0 P0 20 18 16 C0 P1 Power [W] 14 12 10 8 C2 6 C3 4 C1 2 C2 C4 C0 P2 C3 C4 0 Time 15 Computer Structure 2012 – Power Management Voltage and Frequency Domains Two Independent Variable Power Planes VCC Periphery – CPU cores, ring and LLC Embedded power gates – each core can be turned off individually Cache power gating – turn off portions or all cache at deeper sleep states – Graphics processor Can be varied or turned off when not active Shared frequency for all IA32 cores and ring Independent frequency for PG Fixed Programmable power plane for System Agent – Optimize SA power consumption – System On Chip functionality and PCU logic – Periphery: DDR, PCIe, Display Embedded power gates VCC SA VCC Core (Gated) VCC Core (Gated) VCC Core (ungated) VCC Core (Gated) VCC Core (Gated) VCC Graphics VCC Periphery 16 Computer Structure 2012 – Power Management Turbo Mode P1 is guaranteed frequency – CPU and GFX simultaneous heavy load at worst case conditions – Actual power has high dynamic range P0 is max possible frequency – the Turbo frequency – P1-P0 has significant frequency range (GHz) – OS treats P0 as any other P-state Requesting is when it needs more performance – P1 to P0 range is fully H/W controlled Frequency transitions handled completely in HW PCU keeps silicon within existing operating limits – Systems designed to same specs, with or without Turbo Mode Pn is the energy efficient state – Lower than Pn is controlled by Thermal-State 17 “Turbo” H/W Control Single thread or lightly loaded applications GFX <>CPU balancing frequency P0 1C P1 OS Visible States OS Control T-state & Throttle Pn LFM Computer Structure 2012 – Power Management Turbo Mode Power Gating Zero power for inactive cores 18 Core 3 Core 2 Core 1 Core 0 Workload Lightly Threaded Frequency (F) Core 3 Core 2 Core 1 Core 0 Frequency (F) No Turbo Computer Structure 2012 – Power Management Turbo Mode Power Gating Zero power for inactive cores Turbo Mode Use thermal budget of inactive core to increase frequency of active cores 19 Core 1 Core 0 Workload Lightly Threaded Frequency (F) Core 3 Core 2 Core 1 Core 0 Frequency (F) No Turbo Computer Structure 2012 – Power Management Turbo Mode Power Gating Zero power for inactive cores Turbo Mode Use thermal budget of inactive core to increase frequency of active cores 20 Core 1 Core 0 Workload Lightly Threaded Frequency (F) Core 3 Core 2 Core 1 Core 0 Frequency (F) No Turbo Computer Structure 2012 – Power Management Turbo Mode Turbo Mode Increase frequency within thermal headroom 21 Core Core3 3 Core Core2 2 Core 11 Core Core Core0 0 Active cores running workloads < TDP Frequency (F) Core 3 Core 2 Core 1 Core 0 Frequency (F) No Turbo Computer Structure 2012 – Power Management Turbo Mode Power Gating Zero power for inactive cores Turbo Mode Increase frequency within thermal headroom 22 Core 2 Core 3 Core 1 Core 0 Workload Lightly Threaded And active cores < TDP Frequency (F) Core 3 Core 2 Core 1 Core 0 Frequency (F) No Turbo Computer Structure 2012 – Power Management Thermal Capacitance Classic Model Steady-State Thermal Resistance Steady-State Thermal Resistance AND Dynamic Thermal Capacitance Temperature Temperature Design guide for steady state New Model Classic model response Time More realistic response to power changes Time Temperature rises as energy is delivered to thermal solution Thermal solution response is calculated at real-time 23 Foil taken from IDF 2011 Computer Structure 2012 – Power Management Intel® Turbo Boost Technology 2.0 Power After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds C0/P0 (Turbo) Turbo Boost 2.0 In Steady State conditions the power stabilizes on TDP Use accumulated energy budget to enhance user experience “TDP” Sleep or Low power Time Buildup thermal budget during idle periods 24 Foil taken from IDF 2011 Computer Structure 2012 – Power Management Core and Graphic Power Budgeting • Cores and Graphics integrated on the same die with separate voltage/frequency controls; tight HW control • Full package power specifications available for sharing • Power budget can shift between Cores and Graphics Core Power [W] Heavy CPU workload Total package power Sandy Bridge Next Gen Turbo Sum of max power for short periods Realistic concurrent max power Specification Core Power Applications Heavy Graphics workload Specification Graphics Power 25 Foil taken from IDF 2011 Graphics Power [W] Computer Structure 2012 – Power Management