* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download System and Architecture Level
Standby power wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Power over Ethernet wikipedia , lookup
Buck converter wikipedia , lookup
Wireless power transfer wikipedia , lookup
Electric power system wikipedia , lookup
Electrification wikipedia , lookup
Audio power wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
History of electric power transmission wikipedia , lookup
Power electronics wikipedia , lookup
Rectiverter wikipedia , lookup
Life-cycle greenhouse-gas emissions of energy sources wikipedia , lookup
Distributed generation wikipedia , lookup
Voltage optimisation wikipedia , lookup
Distribution management system wikipedia , lookup
Mains electricity wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Power engineering wikipedia , lookup
Wireless Sensor Networks Low Power Design Outline Introduction – Importance of Low Power Design Power and Energy Low Power at various levels of circuit design: System and Architecture Level Register Transfer and Logic Level Physical Level Conclusion Importance of Low Power Design Power is considered as the most important constraint in embedded systems Low power design is essential in: high-performance systems (reason: excessive power dissipation reduces reliability and increases the cost imposed by cooling systems and packaging) portable systems (reason: battery technology cannot keep the pace with large demands for devices with light batteries and long time between recharges) Sources of Power Consumption The three major sources of power consumption in digital CMOS circuits are: Pavg pt CL Vdd2 f clk I sc Vdd I leakage Vdd P1 P2 P3 where: P1 – capacitive switching power P2 – short circuit power P3 – leakage current power Trends in Power Management Reducing power is now a mainstream design issue Power and Energy Power and Energy are related (E=∫Pdt) Minimizing the power consumption is important for the design of the power supply the design of voltage regulators the dimensioning of interconnect short term cooling Minimizing the energy consumption is important due to restricted availability of energy (mobile systems) limited battery capacities (only slowly improving) very high costs of energy (solar panels, in space) cooling high costs limited space long lifetimes, low temperatures Low Power at various levels of circuit design higher impact more options System Level Design partitioning, Power Down Algorithm Level Complexity, Concurrency, Locality, Regularity, Data representation Architecture Level Voltage scaling, Parallelism, Instruction set, Signal correlations Circuit Level Transistor sizing, Logic optimization, Activity Driven Power Down, Lowswing logic, Adiabatic switching Process Device Level Threshold Reduction, Multithreshold The design of low power circuits can be tackled at different levels, from system to technology Potential for Power Savings Power and Synthesis Flow 400% 50% 20% 10% Behavioral RTL Gate Switch Accuracy of Power Estimation Expectations Algorithmic Algorithm selection orders of magnitude Behavioral Concurrency Memory several times Power manage Clock ctrl 10-90% RT Level Structural transform. 10-15% Tech. indepen. Extraction/decomp. 15% Tech dep. Tech. mapping Gate sizing 20% 20% Layout Placement 20% System and Architecture Level Given a certain application, there are several possibilities for low power optimizations of the system: Selection of an optimum algorithm with respect to the cost function the design Partitioning into building blocks Voltage/Frequency scaling Dynamic power management Minimize waste and overhead (indirectly) – increase regularity, locality System and Architecture Level Instruction set selection Mult-Add vs Mult, Add Module selection Ripple Adder vs Carry Select Hardware Library Memory Management Global Flow Selection Allocation How Many? 2 MULTs (M1, M2) 2 ADDERs (A1,A2) Memory Assignment Memory Selection Assignment Which HW? A1 + A1 + Scheduling When? Exu D + A1 M1 M1 A2 D M2 M2 + Time Input Algorithm Output Architecture System and Architecture Level Algorithm selection and optimization The first choice in design flow is usually the selection of an optimum algorithm with respect to the cost function The term cost depends on the application and typically includes the number of operations, memory accesses and the memory size that is required by this algorithm Power reduction is achieved with: Scheduling of operations Adaptive implementations of certain algorithms System and Architecture Level Optimizations for Memory Accesses A paradigm for energy efficient software: Avoid using of memory operands as far as possible Improve register utilization Example of heapsort program [Jan M. Rabaey ‘97]: Handtuning for performance: 15% reduction in time, 13.5 reduction in energy Register allocation of temporaries: 5% reduction in current, 7% reduction in time, 11.4% reduction in energy Further optimization Further 22.4% reduction Total: 40.6% reduction in energy cost System and Architecture Level Design partitioning Optimum partitioning of the design will result in orders of magnitude power reduction Examples of partitioning for low power: Partitioning the design in such a way as to confine the operations involving maximum switching activity to a single block Partitioning the memory and distributing it to different blocks instead of centralized memory Hardware/Software partitioning Optimum partition of a design into analog and digital sections System and Architecture Level Design partitioning – Interconnections Interconnect power is important Interconnect may contribute large percentage to total power dissipation and to total reduction Interconnect power is greatly affected by architecture level design decisions System and Architecture Level Design partitioning Spatially Global All communications use long global buses Spatially Local Few global bus accesses Cheap localized communication Reduced # of global bus accesses Reduced buffer power Reduced # of multiplexers System and Architecture Level Design partitioning – spectral partitioning 8th-order IIR cascade filter -0.2 -0.1 0.0 0.1 0.2 Spectral Partitioning places computational nodes on 1-D axis based on “closeness” — identifies candidates for clustering Partitioning may lead to extra hardware units. This does not necessarily mean an increase in area! System and Architecture Level DP ctl DP ctl DP DP DP ctl DP ctl Units Global buses Bus power Total Power Area ctl DP DP DP DP ctl ctl ctl Non-local Local 4 add, 3 shift 4 add, 4 shift 106 accesses 6 accesses 2 mW 0.3 mW 21.3 mW 16.3 mW 8.78 mW 7.46 mW Average: Power reduction: 18.5 % Area Reduction: 1% Global bus: 2400 Global bus: 4700 Design partitioning Result System and Architecture Level Exploiting Regularity Coarse-grained regularity Fine-grained regularity * + + - * >> * = Usually evident to user • Loops • Subroutines Not obvious to user • Similar code fragments Regular implementations typically reduce interconnect and/or controller requirements [Mehru96] System and Architecture Level Common Design Approaches Desired Throughput 1) Compute-intensive and short-latency processes Max. processor speed (TMAX) Processor Usage Model Time 3) System idle 2) Background and long-latency processes In order to reduce power following design approaches can be used: Compute ASAP Clock Frequency Reduction Voltage Scaling System and Architecture Level Compute ASAP Delivered Throughput, Energy/Operation In this approach the processor always performs the desired computation at maximum throughput This is the simplest approach Delivered throughput Energy/Operation Desired throughput Time System and Architecture Level Clock Frequency Reduction Delivered Throughput, Energy/Operation A common low power design technique is to reduce the clock frequency, fclk This in turn reduces the throughput, and power dissipation, by proportional amount The energy consumption remains unchanged This approach is more energy inefficient, because the processor delivers the same amount of computation per battery life, but at lower level of peak throughput Delivered throughput Energy/Operation Desired throughput Time System and Architecture Level Voltage Scaling Delivered Throughput, Energy/Operation When fclk is reduced the processor’s circuits have a longer cycle time to complete their computation With voltage scaling down, i.e. reducing Vdd, the delay of the circuits increase But, the energy/operation, which is quadratic function of Vdd, decreases Delivered throughput Energy/Operation Desired throughput Time System and Architecture Level Voltage Scaling Minimizing the delay penalty due to voltage scaling Architecture-level speedup (pipelining, concurrency), then downscale supply voltage, or match supply voltage with throughput requirement multiple supply voltages in the same design one supply voltage for each block Circuit-level lowering threshold voltage heavily process-dependent System and Architecture Level Dynamic Power Management Dynamic power management is a design methodology that dynamically reconfigures an electronic system to provide the requested services and performance levels with a minimum number of active components or a minimum load on such components Power Manager P=400mW Workload information OBSERVER RUN CONTROLLER Observations Commands ~10s ~90s ~10s 160ms P=50mW P=0.16mW IDLE SLEEP ~90s SYSTEM Wait for interrupt Wait for wake-up event Power State Machine Power Manager Register Transfer and Logic Level Low-power techniques at RTL and Logic Level can be subdivided into: techniques for lowering the capacitance and the switched voltage minimizing global communication logic optimization by synthesis tools (area, speed) techniques to reduce the toggle rate of nodes with a high relative capacitance guarding techniques pipelining reorganization of logic gates and operators Register Transfer and Logic Level Reducing switching activity Guarding technique (clock gating) Clock gating means to shut down the clocking for a certain group of registers under a certain guard condition advantages: they are implemented with minor overhead in area and design effort disadvantages: testability Register Transfer and Logic Level Examples of guarding technique An A B Latch Latch L_A L_B R1 Bn N-bits binary Comparator Adder Y=A>B A1.....An-1 R2 1 B1…..Bn-1 Datain Ctrl 0 Sel Register Transfer and Logic Level Reducing switching activity Pipelining reduces critical path (enables savings due to voltage scaling, or slower but energy-efficient algorithms) reduces glitches disadvantages: area overhead (with an implicit increase of capacitances and increase in clock power) Register Transfer and Logic Level Reducing switching activity Reorganization of logic gates and operators manual (reorganization of logic cells and reordering inputs) automatic (performed by synthesis tools): combinatorial don’t care optimization path balancing factorization sequential state encoding retiming Register Transfer and Logic Level Reducing switching activity – examples of reorganization A + B + + C D Flattening A + + B + A C D Factoring Idea: Remove common expressions to reduce capacitance Pa = 0.1 Pb = 0.5 Pc = 0.5 Caveat: This may increase activity! Don’t Care Optimization Example: a b c Activity is maximized for P(1) = 0.5! Sequential logic optimization State encoding seems to be of minimal impact in general Data encoding in data paths e.g. use of sign-magnitude , one-hot, or redundant representations mostly ad hoc Retiming for low power registers can be strategically placed to reduce glitching, or to perform path balancing Physical Level On this level of abstraction the number of manually guided optimizations is quite limited The place and route tools automatically minimize the wire length (and wire capacitances) according to the time constraints This doesn’t represent the optimum concerning power consumption There are some design tasks which can nevertheless be exploited to save power: partitioning (taking into account the interconnections between the layout blocks) back-annotating of layout capacitances together with switching activity information from gate level simulation to the synthesis tool (enables reoptimization of logic for low-power) Conclusion Power is a distributed problem – spans all designs disciplines: standards (GSM, OS), software, digital and analog hardware, process Power related design decisions must be weighed against all of the system constraints: size cost, performance, testability, time to market … to develop a successful system Low power design techniques have to be implemented at different levels of system design in order to achieve the best results