Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Low Power Design USC Low Power CAD Low Power Design Methodologies and Techniques: An Overview Massoud Pedram Department of EE-Systems University of Southern California Los Angeles CA 90064 [email protected] Massoud Pedram USC Low Power CAD Outline Introduction Analysis/Estimation Techniques Synthesis/Optimization Techniques Summary Massoud Pedram Page 1 USC/LPCAD Low Power Design USC Low Power CAD What Is the Power Problem? Power: Biggest challenge facing the industry Ö Power goes up 2X per generation for high end CPU CAD/analysis tools are excellent at enabling tradeoffs Ö But what if performance cannot be traded-off? Ö Need innovative CAD solutions to reduce Power 40 A21164-300 HP PA8000 35 30 A21064A PPro200 UltraSparc-167 PPro-150 Power(W) 25 HP PA7200 MIPS R10000 20 SuperSparc2-90 15 10 PPC 604-120 PP-66 PPC 601-80 MIPS R5000 PP166 MIPS R4400 PP-100 PP-133 486-66 5 0 Massoud Pedram i486C-33 DX4 100 PPC603e-100 i386 i386C-33 '91 '93 '94 '94 '95 '96 [Source: Microprocessor Report] USC Low Power CAD Process Technology Trend Min. Feature Size (μm) Threshold Voltage (V) Tox(nm) Cinterconnect (aF/μm) Norminal Ion(μA/μm)(NMOS/PMOS) 1997 0.25 0.5 5 1999 0.18 0.45 4 2001 0.15 0.4 3 2003 0.13 0.35 3 2006 0.1 0.3 2 2009 0.07 0.25 1.5 2012 0.05 0.2 1 50 36 29 21 20 16 14 600/280 600/280 600/280 600/280 600/280 600/280 600/280 1 2.5 1 1.8 3 1.8 3 1.5 3 1.2 10 0.9 On-chip across-chip clock freq. (MHz) 350 526 727 928 1108 1468 Max Ioff(μA/μm) (For minimum L device) Supply Voltage (V) Off-chip peripheral bus freq. (MHz) Usable transistors (millions / cm2) Chip size (cm2) Chip pad count 75/175 100.263 100/362 125/464 125/554 150/734 8 3 14 3.4 256-800 300-976 16 3.85 24 4.3 40 5.2 64 6.2 352-1193 413-1458 524-1968 666-2656 10 0.6 1827 150/913 100 7.5 846-3578 Cost-Perf. Designs - notebooks, desktop personal computers, telecom Massoud Pedram Data extracted from NTRS’97 Page 2 USC/LPCAD Low Power Design USC Low Power CAD Design Technology Trend Power (watt) Vcc (volt) Current (Amp) 1000 10 1000 100 1 100 10 .1 10 130 100 70 Process Minimum Geometry (nm) Based on Information provided by Shakhar Borkar, Intel Massoud Pedram Data extracted from EDAR’99 USC Low Power CAD Example Applications Portable Electronics (PC, PDA, Wireless) IC Cost (Packaging and Cooling) Reliability (Electromigration, Latch-up) Signal Integrity (Switching Noise, DC Voltage Drop) Thermal Design Where Does the Power Go? Varies from one design to next Memory Memory Data Path Data Path Control, IO Control, IO Clock Clock Massoud Pedram Page 3 USC/LPCAD Low Power Design USC Low Power CAD What Has Worked? Voltage and process scaling (3x/Generation) Design methodologies Ö Power management through HW/SW, trade area for lower power Architecture Design Power down techniques Ö Clock gating, dynamic power management Dynamic voltage scaling based on workload Power conscious RT/ logic synthesis Better cell library design and resizing methods Ö Cap. reduction, threshold control, transistor layout Massoud Pedram USC Low Power CAD Opportunities for Power Savings HW/SW Co-design Custom ISA Algorithm Design Communication Synthesis System > 70 % Behavioral 40-70 % Scheduling, Binding Pipelining Behavioral Transformations RT-Level 25-40% Clock Gating, Precomputation Operand Isolation State Assignment, Retiming Logic 15-25 % Logic Restructuring Technology Mapping, Rewiring Pin Ordering & Phase Assignment Physical 10-15 % Fanout Optimization, Buffering Transistor Sizing, Placement Partitioning, Clock Tree Design Glitch Elimination Massoud Pedram Savings Page 4 USC/LPCAD Low Power Design USC Low Power CAD Realistic Estimation Expectations System Seconds > 50 % Architectural Minute 25-50 % RT-Level Minutes 15-30% Gate Hour 10-20 % Transistor Hours 5-10 % Speed Massoud Pedram Instruction-Level Models IP Core Models Stochastic Models Program Simulation Entropic Bounds Architectural Simulation I/O and Memory Accesses RT-Level Macromodels HDL Simulation Quick Synthesis Probabilistic Simulation Gate-Level Simulation Sampling and Compaction ASIC Library Models Parasitic Extraction Accurate Timing Analysis Circuit-Level Simulation Error USC Low Power CAD Sources of Power Consumption in CMOS Power Consumption Active Power Standby Power Short-Circuit Capacitive Interconnect Gate Massoud Pedram Page 5 USC/LPCAD Subthreshold P-N Junction Internal Diffusion Low Power Design USC Low Power CAD Power Dissipation Equations CMOS circuits only dissipate power when node voltages are changing VDD C short circuit charge P = 0.5CVDD fN + QSCVDD fN + I leakVDD 2 leakage current f: frequency of clocking N: number of times gate switches in a clock cycle Massoud Pedram USC Low Power CAD Power Estimation Techniques Static (non-simulative) - useful for synthesis and architectural exploration Ö Probability-based Ö Entropy-based Dynamic (simulative) - useful for final power evaluation and validation Ö Direct (flat and hierarchical) Ö Sampling-based Ö Compaction-based Hybrid (high-level simulation + low-level analytical model evaluation) Ö Power macromodels for datapath, control, memory Ö Instruction-level models for microprocessors, DSPs Massoud Pedram Page 6 USC/LPCAD Low Power Design USC Low Power CAD Probabilistic Power Estimation Capture primary input and register output statistics Ö E.g., signal probability, switching activity, spatiotemporal correlation Obtain load capacitances for all nodes Ö Estimate at the RT and logic level Ö Extract after layout Propagate input statistics throughout the circuit Ö Account for reconvergent fanout, Ö Use appropriate delay model Ö Iterate in the case of finite state machines Calculate node power and hence total power Massoud Pedram USC Low Power CAD Zero Gate Delays 0 1 g1 0 0 g2 0 f g1 1 0 g2 0 0 f g1 1 g2 0 0 Massoud Pedram Page 7 USC/LPCAD f 0 Low Power Design USC Low Power CAD General Gate Delays A gate may switch many times for a vector pair 1 1 1 1 1 1 1 2 Massoud Pedram USC Low Power CAD Static Probabilities Define pg as static probability of g = 1 A B C I K J For AND gate: pI = pA • pB For OR gate: pJ = 1 - (1 - pB) • (1 - pC) Above equations assume that A, B and C are uncorrelated Massoud Pedram Page 8 USC/LPCAD Low Power Design USC Low Power CAD Computing Static Probabilities f=ab + bc Write f as a disjoint sum-of-products expression where each product-term has a null intersection with any other f=ab + abc Then, pf = pa•pb + (1 - pa)•pb•pc Massoud Pedram USC Low Power CAD Spatial Correlations I A B C K J pK ≠ pI • pJ since I and J are correlated - both depend on B Need to compute static probabilities taking into account the spatial correlations Massoud Pedram Page 9 USC/LPCAD Low Power Design USC Low Power CAD Temporal Correlations 010101001010101010 I K 010001101101011000 pK = pI pJ J For temporally uncorrelated data, swK = 2 • pI pJ • (1- pI pJ) For temporally correlated data, this is not true E.g., every 1 on input I is immediately followed by a 0 Need to compute switching probabilities taking into account the temporal correlations Massoud Pedram USC Low Power CAD Sequential Circuits Primary Inputs PI Present State Lines PS Combinational Logic FF PO Primary Outputs NS Next State Lines Two issues for sequential circuits: Probability of machine being in a particular state Correlation in time: Given a primary input and present state, next state is uniquely determined by the next state combinational logic Solutions: Solve C-K Equations for calculating steady state probabilities Unroll the state machine Massoud Pedram Page 10 USC/LPCAD Low Power Design USC Low Power CAD Reducing the Simulation Time RT-Level or Gate-Level Netlist Probabilistic Compaction Compacted Sequence Input Sequence Event or Cycle-Based Simulation Statistical Sampling Report Power Sampled Sequence Library Data Massoud Pedram USC Low Power CAD Simple Random Sampling (SRS) Input vectors Efficiency: depends on population characteristics and sampling procedure Generate one sample of k units NO Converged? Do circuit level simulation YES “Difficult” distributions Calculate mean power & confidence interval Report Power Massoud Pedram Page 11 USC/LPCAD Low Power Design USC Low Power CAD SRS Results C7552 C6288 C5315 C3540 C2670 C1908 C1355 Biased Sequence Random Sequence C880 C432 0 2000 4000 6000 8000 Massoud Pedram USC Low Power CAD Stratified Random Sampling (StRS) Estimate the transistor-level power, assuming that the zero-delay power estimate is readily available Input vectors (population) Zero-Delay Simulation of the Entire population Population Stratification Random Sampling and PowerMill Simulation Lower sample variance leads to faster convergence Convergent? Report Power Massoud Pedram Page 12 USC/LPCAD YES NO Low Power Design USC Low Power CAD Sampling on FSMs Method 1: Functional simulation + sampling on comb. circuit Input Seq. Input Seq. + State Lines Logic Logic FFs Method 2: Do machine warm-up for every sampling unit Warm-up vectors + sampling unit Logic Warm-up sequence length can be quite large FFs Massoud Pedram USC Low Power CAD Probabilistic Compaction (PC) Sequence compaction : Generate a new, but shorter, input sequence which exhibits nearly the same spatiotemporal correlations as the initial sequence Probabilistic Model Initial Seq. Sequence Generation in Target Circuit New Seq. Massoud Pedram Page 13 USC/LPCAD out Low Power Design USC Low Power CAD PC by Example Initial sequence 01101010011111010 11100000110111000 10000110101110101 Avg. bit activity 23/16 2/3 1 1/4 1/3 3 1/2 1/2 6 1/4 1 Compacted sequence 001010 011000 010001 Avg. bit activity 8/5 1/2 5 1/2 7 4 1/2 1/2 1/2 1/2 1/2 0 Stochastic State Machine Model Massoud Pedram USC Low Power CAD Comparison with SRS L = 4,000 C om paction R atio 10 C6288 circuit C3540 C1908 SRS H ie rarc hic al C1355 C880 C432 0% 2% 4% Massoud Pedram Page 14 USC/LPCAD 6% 8% Low Power Design USC Low Power CAD Compaction for FSMs Input Sequence Target Circuit zn = out (xn, sn) xn Logic p(xnsn) Markov Chain sn+1 = next (xn, sn) sn FFs Massoud Pedram USC Low Power CAD Dynamic Markov Trees S1 = (0000, 0001, 1001, 1100, 1001, 1100, 1001, 1100) DMT0 2 3 2 2 2 2 2 2 3 1 5 1 51 6 1 1 7 81 3 3 2 1 3 3 43 6 3 2 3 3 36 3 37 6 7 8 8 2 2 Page 15 3 Upper subtree 3 4 5 2 5 3 3 3 4 Massoud Pedram USC/LPCAD 2 4 3 DMT1 1 6 17 8 6 3 3 3 1 4 1 1 4 1 1 2 Lower subtrees Low Power Design USC Low Power CAD Higher Order DMTs A lag-k Markov chain which correctly models the input sequence, also models the joint k-step conditional probabilities of the primary inputs and state lines DMT0 vi p(vi) DMT1 vj p(vj|vi) DMT2 vk p(vk|vjvi) Massoud Pedram USC Low Power CAD High Order DMT Results s9234 Compaction Ratio 10 s820 O rder 1 O rder 2 s5378 s1423 s1196 shiftreg planet mc dk17 bbara 0% 10% 20% 30% Massoud Pedram Page 16 USC/LPCAD 40% 50% 60% Low Power Design USC Low Power CAD Power Characterization Transistor-Level Netlist Gate-Level Netlist SPICE Simulation Event-Driven Simulation SPICE Parameters RT-Level Power/Delay Characterization Data Gate-Level Power/Delay Characterization Data Massoud Pedram USC Low Power CAD Power Macromodeling Training Set Macromodel Equation Form Module Model Variables Analytic Model Reduction Macro Models Massoud Pedram Page 17 USC/LPCAD Low - Level Simulation Regression Analysis Regression Coefficients Low Power Design USC Low Power CAD Dual Bit Type Model Consider a data path block: Pwr = C0 + C1 · S1 + C2 · S2 + C3 · S3 + C4 · S4 Sign bit Module MSB region LSB region S1 S2 : avg. switching activity of LSB (MSB) region of operand 1 S3 S4 : avg. switching activity of LSB (MSB) region of operand 2 Massoud Pedram USC Low Power CAD Bitwise Data Model Consider a random logic block: Pwr = C 0 + ∑ C ⋅S i i inputs Si : avg. switching activity of input signal i More parameters lead to a higher degree of accuracy, but increase the computational overhead Massoud Pedram Page 18 USC/LPCAD Module Low Power Design USC Low Power CAD Cycle-Accurate Macro-Models Power Distribution Average Power Switching Noise CycleAccurate MacroModels Peak Power DC Voltage Drop Signal Integrity Massoud Pedram USC Low Power CAD Macro-Model Construction Starting point: an exact power consumption function Pwr = f (V1 , V2 ) = Order 1 + terms Order 2 + …… + Order n terms terms Exact function After order reduction Massoud Pedram Page 19 USC/LPCAD module module module Equation simplification After input grouping Low Power Design USC Low Power CAD Regression Analysis module Variable reduction: find the “most significant” variables, by using a statistical test After variable reduction Population stratification: makes the macro-model accuracy less sensitive to different input conditions Training set design: reduces the macro-model bias k A general linear regression model: Pwr = C 0 + ∑ C ⋅ X i =1 i i Massoud Pedram USC Low Power CAD Integrated Macro-Modeling Flow Powermill Config. Powermill Netlist Spatial Correlation Analysis for Primary Inputs Program Flow Technology File Training Set Generation and Simulation Variable Reduction and Curve Fitting Library Generation Output Macro-model Library Massoud Pedram Page 20 USC/LPCAD Input Circuit Simulator (Powermill) Low Power Design USC Low Power CAD Macro-Modeling Results 4 EAP (%) 3 2 1 C8 80 M ul 16 AD D1 6 C4 32 C5 31 5 C6 28 8 C7 55 2 C1 35 5 C1 90 8 C2 67 0 C3 54 0 0 Average EAP = 2.03% 20 15 10 5 0 6 AD D1 6 M ul 1 C8 80 52 88 C7 5 15 C6 2 C5 3 C4 32 40 70 C3 5 08 C2 6 C1 9 C1 3 55 ECP (%) Average ECP = 10.04% Massoud Pedram USC Low Power CAD Adaptive Macro-Modeling Static macro-model Adaptive macro-model Training set Training set Macro-model generation Macro-model generation Macro-model Low-level simulation Macro-model Cycle-based simulation Random sampling Cycle-based simulation Power estimates Power estimates Massoud Pedram Page 21 USC/LPCAD Low Power Design USC Low Power CAD Standard Cell Characterization SPICE Netlist Configuration File Command Scripts Characterization File Generation and Simulation Function Validation Technology Data Input Circuit Simulator (HSPICE) Generalized Data Base (Timing, Power, Capacitance, Wire load, etc.) Program Flow Library Exportation Synopsys Lib. VHDL Simulation Lib. Verilog Simulation Lib. Customized Lib. Massoud Pedram USC Low Power CAD Design Tool Characteristics Accurate model Ö Include both output load (C) and input slope (S) Ö Nonlinear equation f=k0+k1*C+k2*C2+k3*S+k4*C*S Ö Variable domain stratification, different sets of coefficients for different strata Ö Default sensitization method is pin-to-pin, can be extended to pattern-dependent by customization Fast model Ö Small equation form Ö Can be customized to full table-lookup library Easy library maintenance Ö Manage the libraries through Graphic User Interface Ö Generalized data base enables data reuse Massoud Pedram Page 22 USC/LPCAD Output Low Power Design USC Low Power CAD A Power Estimation Methodology CAD Database Estimate Power Cell Library Glue Logic Memory Input Vector Sequences Behavioral/RTL Simulation Interconnect Estimates Design Planning Multiplexers Interconnect Clock Tree HDL Description Library Models Hard IP Blocks Datapath Macromodels Analytic Models Controller Soft IP Blocks Prob. Analysis Low Level Simulation Sampling/Compaction Massoud Pedram USC Low Power CAD Low Power RTL Synthesis Flow RTL Code RTL Synthesis Event or Cycle-Based Simulation RTL netlist Power Macromodeling Power Optimization Switching Activity Calculator Power Optimized Netlist Massoud Pedram Page 23 USC/LPCAD Data Trace for Ports and Registers Low Power Design USC Low Power CAD RTL Optimization for Low Power Y Resource Assignment Ö Module and/or register allocation and binding Y Clock Gating Y State Encoding ÖTwo-level and multi-level logic realizations Y Multiple Supply Voltage Scheduling ÖPipelined and non-pipelined designs Massoud Pedram USC Low Power CAD Register Assignment for Low Power Avoid value change at the input of functional units during idle cycles a x R0←R1 R2 / R3 b + R0←R3 R1 / Z= x R2 c R3 + Massoud Pedram 1 a a b + + = c ( + b) + c x2 x x x R1 Z Page 24 USC/LPCAD Low Power Design USC Low Power CAD An RTL Structure a b c Cycle R1 R2 1 2 3 4 5 R1 R3 x + R2 a b b c R3 / x x a x x a x+b a b x x +x 2 + < a, x > < R1 or R3 , x > < b, a > x a + b, x > x a b < R1 or R3 , x > < 2 + , c > x x < / OUT Can insert an isolation latch here Massoud Pedram Z USC Low Power CAD A Low Power RTL Structure a x R0 R2 b c / R3 R1 Cycle R0 1 2 3 4 5 a a a x+b a x+b R1 R2 b b c c x x a x x a x+b a b x x +x + OUT Massoud Pedram Z Page 25 USC/LPCAD R3 2 / + < a, x > < a, x > < b, a > x a + b, x > x a a b < + b, x > < 2 + , c > x x x < Low Power Design USC Low Power CAD Module Assignment for Low Power Do module binding to minimize the input toggling of modules a × O1 = a ⋅ x + b ⋅ y O2 = c ⋅ x + d ⋅ y O3 = e ⋅ x + f ⋅ y x b a × y × c d f y × y b × x × c x × + e x + y × x × e × d y × + O3 O1 O2 O1 O2 USC Gated Clocks Disable clocking of input registers if the current instruction does not access it Instruction Register Clock Registers B A Decode Logic ALU Massoud Pedram Page 26 USC/LPCAD y × + Low Power CAD Clock f + + Massoud Pedram x O3 Low Power Design USC Low Power CAD Low Power Logic Synthesis Flow Gate Netlist Cell Library Logic Synthesis Mapped Netlist Power Information from Cell Library Power Optimization Event or Cycle-Based Simulation Switching Activity Information Power Optimal Netlist Massoud Pedram USC Low Power CAD Logic Optimization for Low Power Y Combinational logic optimization Ö Technology Mapping Ö Gate Resizing Ö Pin Assignment Y Sequential logic optimization Ö Precomputation Ö State Re-encoding Ö Power management for control units Massoud Pedram Page 27 USC/LPCAD Low Power Design USC Low Power CAD Technology Mapping Outputs of nodes of combinational circuit mapped to library gates under the cost function of load capacitance multiplied by the switching activity To minimize power dissipation, high switching activity points are either hidden within gates or driven by smaller gates Minimum power realization under zero delay model can be obtained using dynamic programming Massoud Pedram USC Low Power CAD Circuit and Gate Library 0.109 0.179 0.109 Gate INV NAND2 AOI22 Area 928 1392 2320 Intrinsic Capacitances 0.1029 0.1421 0.3410 Massoud Pedram Page 28 USC/LPCAD Input Load Capacitances 0.0514 0.0747 0.1033 Low Power Design USC Low Power CAD Minimum Area Mapping AOI22 0.179 Area = 2320 + 928 = 3248 Power = 0.179 (0.3410 + 0.0514) + 0.179 (0.1029) = 0.0887 Note: Ignore capacitances at circuit inputs Massoud Pedram USC Low Power CAD Minimum Power Mapping 0.109 WIRE 0.179 0.109 Area = 1392 * 3 = 4176 Power = 0.109 (0.1421 + 0.0747) * 2 + 0.179 (0.1421) = 0.0726 Massoud Pedram Page 29 USC/LPCAD Low Power Design USC Low Power CAD Sequential Precomputation Selectively precompute the outputs of the logic circuit one clock cycle before they are required and use the precomputed values to reduce switching activity in the next clock cycle Massoud Pedram USC Low Power CAD Original Circuit X1 X2 A R1 R2 Xn Predictor functions: g1 = 1 f=1 g2 = 1 f=0 Massoud Pedram Page 30 USC/LPCAD f Low Power Design USC Low Power CAD Subset Input Disabling Architecture X1 X2 R1 A Xn R3 f R2 LE g1 g2 g1 = 1 f=1 g2 = 1 f=0 Massoud Pedram USC Low Power CAD A Comparator Example C<n-1> D<n-1> R1 C>D C<n-2> D<n-1> C<0> D<0> R3 R2 LE g1 = C<n-1> D<n-1> g2 = C<n-1> D<n-1> Massoud Pedram Page 31 USC/LPCAD f Low Power Design USC Low Power CAD Layout Optimization for Low Power and Noise Floorplanning and/or Placement Gate Resizing Gated Clock Tree Design Driver Design Chip-Package Interface Design Power Bus Planning and Sizing Massoud Pedram USC Low Power CAD Gated Clock Tree Design - sinks - Steiner points Controller clock tree edges source gate control signals Ö Clock tree : binary tree Ö Controller tree : star tree Massoud Pedram Page 32 USC/LPCAD Low Power Design USC Low Power CAD Clock Tree Layout Steiner nodes Sinks source Merging sectors Ö Use the Deferred Merge Embedding algorithm of to layout the clock tree Ö The bottom-up merging process is guided by the minimum switched capacitance heuristic Ö Top-down phase determines the exact locations of the Steiner nodes in their respective merging sectors Massoud Pedram USC Low Power CAD Buffered clock tree versus Gated clock tree Sw itche d Ca pacitance Area 800 70 60 50 Buffere d 40 30 Ga ted 20 Ga te Re d. 10 0 600 400 200 0 r1 r2 r3 r4 r1 r5 Ö The average module activity was set to Massoud Pedram Page 33 USC/LPCAD r2 40% r3 r4 r5 Low Power Design USC Low Power CAD Conclusions Analysis tools : Ö Simulation based techniques provide the best accuracy Ö Transistor level power estimation is mature; a satisfactory gate and RT level simulation engine is needed Ö Probability-based techniques are good for relative accuracy analysis and to guide the synthesis engine Optimization Tools: Ö Gate level optimization / re-mapping provides 20-40% power reduction Ö Behavioral and RTL power synthesis provides 40-60% power reduction Ö System-level optimization provides 2-5 X power reduction Power analysis, tradeoffs and optimizations for design Massoud Pedram levels higher than RTL are not mature Page 34 USC/LPCAD