* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Low-Power SoC Design Techniques
Power factor wikipedia , lookup
Three-phase electric power wikipedia , lookup
Opto-isolator wikipedia , lookup
Electrical substation wikipedia , lookup
Wireless power transfer wikipedia , lookup
Power inverter wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Pulse-width modulation wikipedia , lookup
Variable-frequency drive wikipedia , lookup
Audio power wikipedia , lookup
Power over Ethernet wikipedia , lookup
Electrification wikipedia , lookup
Electric power system wikipedia , lookup
Stray voltage wikipedia , lookup
Standby power wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Surge protector wikipedia , lookup
History of electric power transmission wikipedia , lookup
Buck converter wikipedia , lookup
Power electronics wikipedia , lookup
Power engineering wikipedia , lookup
Voltage optimisation wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Various Low-Power SoC Design Techniques Chong-Min Kyung KAIST Contents • Introduction • Power Management using Voltage Island Technique • Energy (Power) Management Approach by ARM • Low Power Design Example with Samsung AP based on ARM 920T • IBM Low Power Design using PowerPC • Conclusions Why Low Power? • Limited Battery Capacity (Mobile Devices) • For Minimal Heat Dissipation (Heat Sink, Cooler, System Size/Weight/Cost) • For Chip/System Reliability • Save Energy; it’s limited after all! Power vs. Energy • Power-Critical Applications ; – Heat Dissipation Requirement – Power/Ground Metal Line Width – Power/Ground Bounce due to IR drop • Energy-Critical Applications ; – Battery Lifetime – Heat Dissipation Requirement Applications for Low Power Technology • Medical ; Implantable hearing-aid, cardiac pacemaker • Mobile Devices ; cellular phone • Military Devices ; • Hard-to-access points ; Space • Too-many-to-access points ; Sensors/Actuators in Ubiquitous World • Power Management using Voltage Island Technique Typical Power Optimization Procedure Applications H/W Description and Synthesis Initial Layout Standard Cell/Wire Place/Route and Layout Gate-Level Power Optimization Functional Partitioning Constraints (Delay, Power, Area, Noise) Cell/Interconnect Delay and Power Modeling Switching Activity Vdd, Vt, Wg, Wint Optimization Power optimized Net List Customized Layout Parameterized Cell/Wire Design Place/Route and Layout N Verification for Min-Power, Delay, Area, Noise Y Optimized Vdd, Vt, Wg, Wint Technology Files Parasitic (Resistance, Capacitance) Interconnects from layout Power Challenge Active power density increasing with device scaling and increased frequency Leakage power density increasing due to lower Vt and gate leakage Stressing packaging, cooling, battery life, etc. Complicates IDDq testing as well Source from Bergamaschi Thinning gate oxides increase gate tunneling leakage Low Power Levers Dynamic Techniques Structural Techniques Clock gating Voltage Islands Data gating Multi-threshold devices Power gating Multi-oxide devices Variable frequency Minimize capacitance by custom design Variable voltage supply Power efficient circuits Variable device threshold Parallelism in micro-architecture Standby Mode Leakage Suppression Disconnect inactive logic from supply in standby mode Multi-threshold use higher Vt header/footer suppresses logic leakage gate & sub-threshold Multi-oxide Use thick oxide header/footer suppresses gate leakage Header/footer gate voltage Overdrive: increase freq. under-drive: reduces leakage Header/footer well bias Forward bias : increase freq. Reverse bias : reduce leakage Voltage Islands Standby Power Reduction Mechanism On-chip supervisor manages standby power Clock gating Functional clock gating (fine clock control) Voltage scaling, shutdown SOC latch save/restore Timeout and interrupt driven DC/DC Supplies Select Shutdown 1.0-1.8V Scalable VDD Domain 3.3V I/O System Clks Freeze I/O Freeze Clk RTC Irq PG Wake Reset 3 SoC Logic LSSD Latches Suspend Ctrl Logic Battery Backed Domain Scan Ctrl Logic Reset Logic Scan Chains IIC Ctrl Serial NVRAM Clk Data Voltage Island Concept Vdd1 Vdd2 Vddo SWITCH SWITCH Logic Low VT Logic Trade off power for delay by running functional blocks at different voltages Can use mix of Low and High Vt to balance performance and leakage Switch off inactive blocks to reduce leakage power Requires IP standards for power management, clock gating, etc. Delay vs. Voltage 30 Std. Vt 25 IP2 Power Management Unit E.g.: Telecom ASIC with 1.0/1.2 V islands saved : 16 % active power 50 % standby power 20 Ddelay (ps) IP1 Low Vt 15 10 5 0 0.7 0.8 0 .9 1.0 1.1 Voltage (Vdd) 1.2 1.3 Source from Bergamaschi Power Management Unit Bus Interfaces DC/DC Converter Reconfigurable Register Units Control Performance Unit Well-bias generator Clock generator Power Management State Machine Clock Control Unit Monitor Unit Clock & Power-Gating Device Performance Monitor Thermal Monitor Timer / Counter Power Control Unit IP Core Interfaces Busses with Different Voltages One clock & One signaling voltage Some approaches : Temporarily scaling V & F to for comm. Separate different voltages with bridges Hot Bus bridge Cold Bus bridge Cool Bus Power Management I/O’s, VReg, Gnd ROM Vdd1 DSP Vdd 2 Vdd 1 RLM 1 RLM 2 Microcontroller Vdd 2 ROM Vdd 1 Analog Vdd 5 Memory Arrays Vdd 3 Low Vt device arrays Optimized for low active power RLM 3 Monitor Logic Vdd 4 I/O’s, VReg, Gnd Memory Arrays Vdd 4 High Vt device arrays Optimized for low active power Independently controlled domain power switches Multiple On-Chip Voltage Islands On-Chip Voltage Regulators I/O’s, VReg, Gnd I/O’s, VReg, Gnd Memory Arrays Vdd 3 Low Vt device arrays Optimized for low active power Functional Partitioning Identifying functional components with similar inactive periods Assigning functional components to possible chip-level power sources capable of providing required voltage level Identifying the optimal grouping of components, based upon power sequencing (affects static power) and operating voltage (affects active power) that minimizes chip power within the limits (such as peak power) of the SoC Identifying or creating, and connecting, logic signals that will be used to control power-sequencing circuitry or control clock gates Connecting alternate voltage sources to latches or arrays used to save state across power sequencing Controlling VDD and VTH for low power Active Stand-by Multiple VTH Dual-VTH MTCMOS Variable VTH VTH hopping VTCMOS Multiple VDD Dual-VDD Boosted gate MOS Variable VDD VDD hopping Software-hardware cooperation Technology-circuit cooperation MTCMOS : Multi-Threshold CMOS VTCMOS : Variable Threshold CMOS Multiple : spatial assignment Variable : temporal assignment Dynamic power reduction Through Software-hardware cooperation OS and application programming Normalized power P∞fV2 1 Controller 0.8 0.6 Clock & VDD Required speed 0.4 0.2 Super-linear 0 0.2 0.4 0.6 0.8 Software 1 Hardware Processor Required speed ∞f If you don’t need to hustle, relax and save power Voltage Scaling Mechanism Four power domains On-chip supervisor for SOC voltage supplies Level shifting and latching circuits at domain interfaces DC/DC Supplies Select 3.3V Shutdown 1.0V-1.8V Voltage Scalable 1.0V-1.8V Logic Supply Domain Battery Persistent 1.8V BatteryBacked Domain Suspend Ctrl Logic RTC Logic Linear Regulator Regulated 1.0V PLL Supply Domain CPU Core Caches I/O Intf Logic Memory Intf Accelerators Constant 3.3V I/O Domain Drivers Recvrs Dynamic Voltage/Frequency Scaling Freq. changed and Vdd dropped from 1.8V to 1.0V PLL locked at 533MHz with CPU clock switched from 266MHz to 66MHz to 266MHz Continues to execute Dhrystone benchmark Low Leakage Cells – Standby Power Reduction Dual-Vt Storage Cells Low Vt for high performance High Vt for low leakage Gated Vdd and DRG Power Switch Sub threshold leakage current domi nates • Energy (Power) Management Approach by ARM Need for Energy Management • Today’s mobile consumers want: – longer battery life and – smaller, lighter products • Manufacturers are adding new features and applications to add product appeal: – media players (audio, video) – gaming – video capture Increasing processing power requirements and longer battery life are conflicting requirements • Battery technology alone offers only incremental improvement over the next several years Higher performance, higher power 1000 ARM10, 11 Power consumption (mW) ARM9 100 ARM7 10 0.18um process 0.13um process 1 0 50 100 150 200 250 300 Dhrystone MIPS 350 400 450 500 Layers of power optimizations Software (OS, applications) System – Architecture Micro-architecture Circuits Ambient environment Si conditions Power delivery • Important to optimize design at each level • ARM’s partners have widely varying designtime, technology, legacy, cost constraints. • IEM: current focus on top two layers – Widely applicable dynamic power-optimizations – Optimize for the requirements of the specific workload Conventional Power Management • Conventional power management schemes manage the transitions between defined power states ON RESTART Power Manager IDLE STANDBY – STANDBY is off but with state retained with clocks stopped – IDLE is a lower power mode with a slow clock running – ON state is fully powered up at maximum clock frequency • Despite the changing software workload, system runs at maximum performance while there is any work to be done Optimizing for utilization characteristics • Conventional power management optimizes power consumption when there is nothing to do (sleep modes). • IEM optimizes power when work is being done. – Only run fast enough to meet deadlines! – Running fast and idling wastes power. • The active- and sleep-mode techniques are orthogonal. 100% Utilization Energy used 0% 100% Dynamic Voltage Scaling Energy used 0% Meeting the Performance Requirement • Effective Energy Management requires: 1. Automatic Performance Prediction technology • Determining the lowest performance level that will get the software workload done just in time 2. Performance Scaling technology • • Delivering just enough performance to meet the current requirement Responding rapidly to changing performance levels Performance Prediction and Monitoring Scaling Technology Voltage Scaling Threshold Scaling Energy Management Control Components • Software component – To automatically predict future software workloads by interacting with instrumented Operating Systems and application software – To determine the software deadlines – To balance workload and deadlines with performance • Hardware component – To accurately measure the actual system performance – To independently manage the transitions of hardware scaling blocks. e.g., clock generators and power controllers • Together these components determine and manage the lowest performance level that gets the work done Adaptive Voltage Scaling (AVS) • AVS is a closed loop control mechanism. – Feedback from the PMU indicates the earliest opportunity to change processor frequency based on the voltage levels being output to the SoC. – APC monitors the difference between the requested performance level and the actual level achieved. • Taking into account variations due to differences in process technology and ambient temperature the system dynamically changes the voltage applied. • The lowest energy consumption is achieved OR a specified performance level can be met. • Low Power Design Example with Samsung AP based on ARM 920T Limited Battery Improvement • Power Increase vs. Battery Improvement Year 2001 Feature Size(nm) 2004 2007 2010 2013 2016 130 90 65 45 32 22 Dynamic Power Reduction(X) 0 1.5 2.5 4.0 7.0 20 Stand-by Power Reduction(X) 2 6 15 30 150 800 Volumetric Energy Density(Whr/L) [ITRS 2001] • Cellular Phone Talk Time : about 12Hrs Standby : about 1 month 800 600 Fuel Cell Lighter • Cellular Phone Talk Time : 2Hrs ~ 4Hrs Standby : about 1 week 400 NI-MH 200 Smaller 100 Li-Ion / Polymer Only 4~5 X improvement In Battery lifetime! 200 300 400 500 600 700 Gravimetric Energy Density(Whr/Kg) 800 900 Problem Statement • Power Analysis on CMOS Inverter Input switching to '1' or '0' Vthn < Input < VDD-|Vthp | Input : '1' or '0' steady state charge Input Input Input Cload discharge (a) Capacitive Current (b) Short Circuit Current (c) Static Leakage Current Problem Statement • Dynamic Power Pswitching Cswitching VDD2 f • Average Short Circuit Current I SC in 12 VDD (VDD 2Vth ) 3 f gain _ factor: n p , Threshold _ Voltage: Vthn Vthp Vth • Sub-threshold Leakage Current I DS e(VGS Vth )q / nkT (1 e VDS q / kT ) K: function of technology, VGS : gate to source voltage, VDS : drain to source voltage, Vth : theshold voltage, q: electronic charge, k : Boltzmann constance, T: temperature, n: nonlinearity constance 1 ~ 2 , ( kT 0.0259) Problem Statement • Domination of Leakage Current Feature Size > 0.25um 0.18/0.13/0.09um… Performance(AP) < 200MHz 300/400/533MHz, 1GHz Core Voltage 5.0/3.3/2.5V 1.8/1.2/1.0V … VTH(Threshold) > +/- 0.6V +/- 0.5, 0.4, 0.3V … TR Leakage Negligible Exponential growing(SD/Gate) Stand-by Mode PLL-off(Clock-off) V/MTMOS, High VTH/High VDD Low Power Focus on Operating Power Focus on Operating/Stand-by Active and Leakage Power with CMOS Scaling • As CMOS scales down the following stand-by leakage current rises rapidly. – Source to drain leakage (diffusion+tunneling) as Lg scales down – Gate leakage current (tunneling) as Tox scales down – Body to drain leakage current (tunneling) as channel doping scales up Two cases of Leakage Mechanism Turn off Turn on Sub-threshold Leakage Source to drain tunneling Vg=0V Vd=Vdd Drain to Body tunneling (BTB) Vg=Vdd Vd=0V Gate oxide tunneling 2 Current Density (A/cm) Gate Leakage Current Reduction with High-K Gate Dielectric 1 10 0 10 -1 10 Drain leakage -2 10 -3 10 Gate leakage High-K gate dielectric -4 10 Cox -5 10 k0 A Tphysical -6 10 20 25 30 Tox (A) 35 40 Gate Leakage Current Reduction with High-K Gate Dielectric • As Tox scales gate leakage current increases exponentially due to exponential increase of tunneling probability with reduction of physical tunneling distance. • Physically thicker gate dielectric allows lower leakage current but lower oxide capacitance reducing on-current • Using high k (dielectric constant) material, both thicker physical thickness and higher oxide capacitance can be achieved. • Applying high-k gate dielectric, several orders of magnitude lower gate leakage current can be achieved with similar oxide capacitance Power Saving vs. Abstraction Layers • Power Saving v.s. Abstraction Layers Design Time System/Algorithm/Architecture have a large potential! System Level Consideration for Low Power Design • Mobile Device’s Behavior according to Time (Operation Time is less than 10%) Periodic Wakeup Wakeup & Operation Idle/Stand-by Time “Need Various Power Modes In System” Power Management : Example General Clock Gating Controlling the individual clock source for each IP block by the on/off controlling of each corresponding clock source enable bit IDLE Turn off the clock source to the CPU STOP Turn off all of the clock sources including the external X-tal and internal PLLs SLEEP Turn off all of the clock sources and also the power-supply for the internal-logic except for the wake-up logic circuitry Dynamic Voltage Scaling (DVS) • Reduction of Stand-by Power in Leaky Process – By Monitoring Data Bus Congestion – By Monitoring/Guessing Performance Needed, for Specific Application V V DVS Task Task time Need to predict task execution time! ΔV Power gain ∝ ΔV2 time Dynamic Voltage Scaling (DVS) • Stretch the execution by lowering the supply voltage – Quadratic Power saving – No later than the deadline • Processors supporting DVS – Intel Xscale – Transmeta Crusoe • DVS Algorithms – Can be implemented as HW or SW – Optimal solution in continuous voltage domain, but not in discrete voltage domain Voltage Scaling for Low Power Low Power P VDD2 Low VDD I ds (VDD - Vth)1~2 Low Speed Speed Up I leakage e-C x Vth Low Vth High Leakage Leakage Suppression I ds (VDD - Vth)1~2 Low-Leakage Solution – Technology 100m VTH control Dynamic power[W] VDD control 10m High speed MTCMOS High speed VDD: 1.5V VDD control 1m VDD: 1.0V Low speed VTH control Low speed VTH: 0.5V 100n 1p 10p VTH: 0.25V 100p 1n 10n Leakage power[W] 100n VTCMOS & MTCMOS Multi-Threshold CMOS Variable-Threshold CMOS Schematic Diagram principle Merit VDD VDD Low-Vth Sleep Hi-Vth N-well Low Vt GND P-well Vpb = VDD or V+ Vt Control circuit Vnb = 0 or V- GND •On-off control of internal VDD or VSS •Special F/Fs, Two Vth’s •Threshold control with bulk-bias •Triple well is desirable •Low leakage in stand-by mode. •Conventional design Env. •Low leakage in stand-by mode. •Conventional design Env. Demerit •Large serial MOSFET •ground bounce noise •Ultra-low voltage region?(1V) •Scalability? (junction leakage) •TR reliability under 0.1mm •Latch-up immunity, Vth controllability, Substrate noise, Gate oxide reliability •Gate leakage current MTCMOS : Reduce Stand-by Power with High Speed With High VTH switch (MTCMOS) Without High VTH switch Vdd Vdd Normal or Low VTH MOSFET 0 0 1 1 Virtual Ground Vss 0 Vss High VTH switch • With High VTH switch, much lower leakage current flows between Vdd and Vss • High VTH MOSFET should have much lower ( >10X) leakage current compared to normal VTH MOSFET Multi-Threshold CMOS (MTCMOS) • Mobile Applications – Mostly in the idle state – Sub-threshold leakage Current • Power Gating – Low VTH Transistors for High Performance Logic Gates – High VTH Transistors for Low Leakage Current Current Gates Logic Component Cutoff-Switch (High Vth) (Low Vth) Operating Mode Active Sleep Sleep Control (SC) VDD Low Vth MOS Active SC Time VGND VSS High Vth MOS CCS Sizing • The effect of CCS size – As the size decreases, logic performance also decreases. – As the size increases, leakage current and chip area also increase. – Proper sizing is very important. – CCS size should be decided within 2% performance degradation. VDD Low Vt Switch Control High Vt GND Vop = VDD - V V must be sized within 2% performance degradation . Energy Management System – Open loop System-on-Chip (SoC) ARM Core Vdd Apps PMU OS Power Management Unit IEM Intelligent Energy Manager IEC Perform ance PC Intelligent Energy Controller Comms Power Controller DCG CPU Clk Dynamic Clock Generator • IEM and IEC components work together to predict lowest acceptable processor performance level • Power Controller, PMU and Clock Generator work together to deliver that lowest performance level Energy Management System – Closed loop System-on-Chip (SoC) ARM Core Hardware Performance Monitor Vdd EMU Apps "PowerWise" Energy Management Unit OS IEM Intelligent Energy Manager Performance IEC APC Intelligent Energy Controller Adaptive Power Controller PowerWise Interface Dynamic Clock Generator • APC operates in closed loop control mode using HPM to adapt to actual process and temperature • PowerWise™ Interface provides fast control of EMU and feedback of status for optimum control MPEG video playback comparison Legendary MPEG Danse De Cable MPEG 100% 80% 600 M hz 47.72% 48.34% 60% 40% 500 M hz 29.50% 51.17% 20% 400 M hz 17.04% 300 M hz 0% 4.07% 80% 600 M hz 79.15% 60% 88.06% 40% 20% 500 M hz 17.20% 400 M hz 5.74% LongRun Fraction of time at each performance level Fraction of time at each performance level 100% 7.78% 0% Vertigo LongRun Vertigo • Classical interval-based algorithms (e.g. LongRun) are too conservative – choose higher performance than necessary. Interactive app: Konqueror Konqueror Fraction of time at each performance level 100% 26.65% 80% 14.75% 60% 73.92% 25.56% 40% 20% 5.55% 10.44% 38.49% 10.09% 0% LongRun Vertigo • Exactly repeating the run of interactive apps is difficult. • Our methodology: LongRun in control, estimate what IEM would have done on that same run. Energy Management in Action Performance 100% 83% MPEG video 66% Closest available performance level of system 50% Performance level requested by algorithm 2 seconds 4 performance (frequency and voltage) levels available in benchmarked system DVS Control Sub-system IEC Config. DVC Configuration Interface DPM Dynamic Performance Monitor APB MAXPERF DEM DVS Emulation cpuclk CPU CLKGEN Perf. Index Target Current ... CLOCK PMU PWRREQ Target Current Interrupts DPC Dynamic Performance Controller Perf. Index DPC CLKGEN DCG Dynamic Clock Generator (SoC specific) DVC Dynamic Voltage Controller (SoC specific) Voltage vs. Frequency Lookup table DATA DVS operation (with MAXPERF Signalling) New Performance Target (50%) Requested by IEM S/W Maximum performance requested Back to software programmed performance as IECMAXPERF is cleared IECMAXPERF 100% 75% 50% 25% 0% VDD IECCRNTDVCIDX[7] IECCRNTDVCIDX[6] IECCRNTDVCIDX[5] IECCRNTDVCIDX[4] Index changes as Voltage Ramps down and respective stable point reached Index changes as VDD ramps up due to IECMAXPERF Index changes as VDD Ramps down as IECMAXPERF is cleared Prototype IEM test chip • ARM926EJ-S core • Multiple power domains • Voltage and frequency scaling of CPU, caches and TCMs • First full DVS silicon with National Semiconductor PowerWise™ technology • NSC Adaptive Power Controller (APC) implemented in FPGA • Includes DVS emulation mode for comparative tests • TSMC 0.13μm - CL013G - April Cyber Shuttle – Packaged parts – 11 August 2003 • Developed by ARM, Synopsys and National Semiconductor using Synopsys EDA tools Conclusions • Along with Process Technology Scaling, Signal Integrity, SoC Integration and System Verification, Low-Power Design is a critical issue. • Low Power Design needs to be approached from System-Level including Software, algorithm to Device/Process Standpoints. Thank you for your kind attention! • IBM Low Power Design using PowerPC Platforms for Information Appliances IBM PowerPC platforms enable highly integrated, power efficient Information Appliance (IA) chips CoreConnectTM Architecture SOC uP Cores 405/440 Custom IA Chips Low Power PowerPC Optimizations Platform IP Cores SOC ASIC Tools Application-Specific IA Chips Scalable PowerPC 405 CPU Core CPU Goals Expanded operating voltage range (0.9V to 1.95V) Maintain full software and tools with existing compatibility PowerPC 405 Provide a high performance core capable of high efficiency low power operation CPU Optimizations Redesigned custom circuits within CPU that were sensitive to low voltage operation Re-optimize design and timing for extended voltage range Verification of equivalence 64-bit Processor Local Bus I-cache I-cache Control D-cache D-cache Control MMU Instruction Branch Unit Unit Execution GPRs Load / Unit Store Pipe MAC Timers Power Mgmt. Debug/Trace Interrupts PowerPC 405 Core Embedded PowerPC Cores • PowerPC 405 – 32-bit data, 32-bit address, MMU – Single-issue, 5-stage pipeline: 1.52 DMIPS / MHz – 266 – 400 MHz – L1 Cache to 16KB/16KB – Voltage-scalable versions (405LP-1, 405LP-2) • PowerPC 440 – 32-bit data, 36-bit address, MMU – Dual-issue, 7-stage pipeline: 2.0 DMIPS / MHz – 400 – 800 MHz – L1 Cache 32KB/32KB; L2 256 KB; L3 Low Power Optimizations IBM low-power SOC designs include a wide range of optimizations to reduce both active and standby power Active Power Reductions Standby Power Reductions Voltage Scaling Clock Freezing Frequency Scaling Hibernation Flexible Clock Distribution “Cryo” Standby Clock Gating Hardware Accelerators Reducing operating voltage greatly reduces active power in CMOS Voltage Scaling Benefits CMOS Ring Oscillator Delay and Power VS VDD Complementary CMOS scales well over a wide voltage range Can be used widely over entire chip Can optimize power/performance (MIPS / W) over a 4X range Voltage Scaling Challenges Custom Circuits, PLLs, Analog, and I/O drivers don’t voltage scale easily Avoiding increases in standby power in low active power circuits ( the VTH dilemma ) Operating at 1/2 normal Vdd increases delay 2.43.2X but reduces power by > 10X IBM Low-Power SOC Designs “Palmtops to Teraflops” in a single ISA Optimized for high-performance handheld applications, e.g., high-end PDA • PowerPC 405LP-1 – Joint project of IBM Research and IBM Microelectronics – First silicon Oct. 2001 – 0.18m process – Frequency-scalable, < 66 – 266 MHz – Voltage-scalable, 1.0 – 1.8 V (0.9 – 1.65 V) – Technology evaluation platform All power and performance data from 405LP-1 systems • PowerPC 405LP-2 – 0.13 m process – Scalable to 333 MHz @ 1.5 V (est.) – Optimized for multimedia processing – Well into design 405LP-1 System on a Chip 3.3V I/O Supply 1.0V – 1.8V Logic PCMCIA/CFII RAM/ROM/ Peripheral Controller LCD Controller PLB-OPB Bridge RTC Standby Power Management Scalable Low Power PLL Processor Local Bus (PLB) 16K I-Cache DMA Controller 16K D-Cache PPC405 CPU Core Clock Power Management 64-bit Crypto Accel Interrupt Controller GPIO On-chip Peripheral Bus (OPB) Code Decompression Speech Accel UART 1.8V Battery-Backed UART 1.0V Internal Reg. 32-bit SDRAM Controller IIC CODEC INTRFC Passive Sensor INTRFC New Core Pre-existing Core Reducing Standby Power • Cryo mode uses – Customers/designs comfortable with clock-stop standby – Low-latency periodic sleep/wake with minimal standby power – IP cores with hidden state can cause problems for SWbased save/restore • Other methods under review – Voltage islands and power gating – State-saving latches Standby Power Modes Standby power modes enable longer battery life and “instant on” System Clock Freeze Mode Hibernation Mode Cryo Mode VDD Logic State Saved Restore Time Power Logic CMOS Leakage at 1V 0 Hz 1V All Observe Wake-up Condition (< 1ms) 0 0 Software State OS Restore (100s of mS) ~0 0 Registers and Software State “Instant On” – Scan Restore of State (20 - 200 mS) ~0 0 Cryo mode sequence – Shutdown: Save CPU Core State Flush caches and TLBs Clocks stopped State scanned to internal/external non-volatile storage Power removed from logic – Suspend: Monitor system for wake up condition or RTC timer – Restore: On Wake indicator Restore power to logic State scanned in from non-volatile storage Restore clocks Restore CPU state Dynamic Power Management • System-Wide power management (PM) during application execution • Examples: – Peripheral PM, including core clock gating – PM at idle (including low-latency sleep modes) – Memory PM – Dynamic voltage and frequency scaling – Energy policy management • DPM is proposed as an architecture for policy-guided dynamic power management. DPM Motivation • Embedded application requirements – Long battery life – System-specific policy requirements • Highly variable system designs • Watch, cell phone, personal server, PDA, tablet • Soft real-time (multimedia) requirements • Task-specific policy requirements – General-purpose systems and applications • No/minimal application software changes for PM – Minimal/variable firmware • PM must be in the OS/applications DPM Motivation • Technology – SOC • CPU + peripheral PM – Complex clocking architectures • Decoupled CPU/bus frequencies – Heterogeneous processor architectures • Example: 405LP-2 - Asynchronous heterogeneous processing in a common voltage/memory domain – New performance and leakage control mechanisms at the circuit level DPM Motivation • Linux – Platform independence desired – Community acceptance required • Simplicity – ease of maintenance • Integration with pre-existing facilities – Linux Device Model • Minimal core kernel changes – 5 lines of new code in the “core” kernel – Scalability to server/SMP systems Is: DPM: An Architecture for Policy-Guided PM A generic software architecture for policy-guided dynamic power management proposed by IBM and MontaVista software • • Flexible enough to implement a number of system-specific DVFS and static PM approaches Available in an embedded Linux distribution for several embedded processors Is Not: • PowerPC or Linux specific • A DVFS algorithm • Fully implemented yet DPM Overview Power-aware Applications Operating System Device Drivers Software Hardware Policy/Power Managers Signal operating/task state changes Requirements, Provide, power-mgmt. manage policies information DPM System Sets operating Clock points changing Generation power-performance Power levels Memory CPU Controller Supplies Dynamic Voltage and Frequency Scaling Total Chip Power 600mW Logic Power Dynamic Frequency Scaling 266Mhz CPU to 66MHz CPU 400mW 200mW I/O Power Dynamic Voltage Scaling 1.8V --> 1.0V at upto 1V/100us 0mW 2.0V Logic VDD 1.0V Uninterrupted Operation Linux 2.3.17 Running Dhrystone 2.1 code 400 loops per cycle . 0V --- 266 /133---| -------------------------- 66 /66 --------------------- |-------- 266/133-------CPU/MEMORY FREQUENCY( CPU and logic was reduced MHz) by 13X dynamically Power consumption for the under the control of the Linux kernel ( NO PLL Relock and NO stopping of the application ) Idle Scaling Trace (MPEG4) Core Voltage Battery Power Application Default Idle Scaling Sys. Savings Core Savings MPEG4 A/V 2.76 W 2.63 W 4.7 % 11.4 % MP3 1.42 W 1.1 W 22.5 % 47.8 % Load Scaling Trace (MPEG4/spmt) B F Core Voltage E D A Battery Power Application Default Load Scaling System Savings MPEG4 A/V 2.76 W 2.54 W 8.0 % MP3 1.42 W 1.03 W 27.7 % Application Scaling Trace F E D Task Task+1 Task+1 Task Task-1 Task-1 Video Thread Task State More Performance Required Working Ahead AS Results • AS achieved close to an “ideal” LS result with a simple policy manager and a straightforward modification of the application Applicatio n MPEG4 A/V No DPM 2.76 W DPM: Application Scaling 2.46 W DPM Savings 10.8 % “Ideal” Savings 10.8 % Operating Point Usage for MPEG4 by Strategy 70 60 50 Idle Scaling Load Scaling App. Scaling "Ideal" 40 % 30 20 10 0 Idle/33 100 133 166 200 266 References • Nowka et al., “A 32-bit PowerPC System-on-a-chip With Support for Dynamic Voltage Scaling and Dynamic Frequency Scaling”, IEEE Journal of Solid-State Circuits, vol. 37(11), Nov. 2002, pp. 1441-1447. • IBM Austin Research Laboratory (www.research.ibm.com/arl) – Dynamic Power Management for Embedded Systems (Whitepaper) http://www.research.ibm.com/arl/projects/papers/DPM_V1.1.pdf • Linux 2.4 kernel including DPM implementation (Bitkeeper) bk://source.mvista.com/linuxppc_2_4_devel-pm