Download Lower Power Synthesis - VADA

Lower Power Logic/Circuit/Layout Design 1998. 6.7 성균관대학교 조 준 동 교수 http://vlsicad.skku.ac.kr SungKyunKwan Univ. VADA Lab. 1 Transition Probability • • • Transition Probability: Prob. of a transition at the output of a gate, given a change at the inputs For temporally uncorrelated data, use signal probabilities Example: F = X’Y + XY’ – Signal Prob. Of F: Pf = Px(1-Py)+(1-Px)Py – Transistion Prob. Of F = 2Pf(1-Pf) – Assumption of independence of inputs • • • • Use BDDs to compute these References: Najm’91 For temporarily correlated data, this is not true, e.g., every 1 on input is immediately followed by a 0. Need to compute switching probabilities taking into account the temporal correlations SungKyunKwan Univ. VADA Lab. 2 Technology Mapping •Implementing a Boolean network in terms of gates from a given library •Popular technique: Tree-based mapping •Library gates and circuits decomposed into canonical patterns •Pattern matching and dynamic programming to find the best cover •NP-complete for general DAG circuits •Ref: Keutzer’87, Rudell’89 •Idea: High transition probability points are hidden within gates SungKyunKwan Univ. VADA Lab. 3 Low Power Cell Mapping • Example of High Switching Activity Node • Internal Mapping in Complex Gate A A B B Y C Y C Q D SungKyunKwan Univ. D VADA Lab. 4 Signal Probability vs. Power p(x) > 0.5 power : P(x)  p(x) (1-p(x)) p(x) < 0.5 0.0 SungKyunKwan Univ. 0.5 signal probability :p(x) 1.0 VADA Lab. 5 Spatial Correlation P(x) = 0.25 P(x) = 0.25 P(b) = 0.5 a x z y P(x) = 0.25 x P(c) = 0.5 P(z) = 0.4375 b z P(z) = 0.375 y P(d) = 0.5 c SungKyunKwan Univ. P(y) = 0.25 VADA Lab. 6 Low Activity XOR Function SungKyunKwan Univ. VADA Lab. 7 GLITCH (Spurious transitions) • 15-20% of the total power is due to glitching. SungKyunKwan Univ. VADA Lab. 8 Glitches SungKyunKwan Univ. VADA Lab. 9 Logic Transformation SungKyunKwan Univ. VADA Lab. 10 Logic Transformation • • • • • • • Use a signal with low switching activity to reduce the activity on a highly active signal. Done by the addition of a redundant connection between the gate with low activity (source gate) to the gate with a high switching activity (target gate). Signals a, b, and g1 have very high switching activity and most of time its value is zero Suppose c and g1 are selected as the source and target of a new connection ` 1 is undetectable, hence the function of the new circuit remains the same. Signal c has a long run of zero, and zero is the controlling value of the and gate g1 , most of the switching activities at the input of g1 will not be seen at the output, thus switching activity of the gate g1 is reduced. The redundant connection in a circuit may result in some irredundant connections becoming redundant. By adding ` 1 , the connections from c to g3 become redundant. SungKyunKwan Univ. VADA Lab. 11 Logic Transformation SungKyunKwan Univ. VADA Lab. 12 High-Performance PowerDistribution • (S: Switching probability; C: Capacitance) • Start with all logic at the lowest power level; then, successive iterations of delay calculation, identifying the failing blocks, and powering • up are done until either all of the nets pass their delay criteria or the • maximum power level is reached. • Voltage drops in ground and supply wires use up a more serious fraction of the total noise margin SungKyunKwan Univ. VADA Lab. 13 Hazard Generation in Logic Circuits •Static hazard: A transient pulse of width w (= the delay of the inverter). • Dynamic hazard: the transient consists of three edges, two rising and one falling with w of two units. • Each input can have several arriving paths. SungKyunKwan Univ. VADA Lab. 14 GATED-CLOCK D-FLIP-FLOP • Flip- op present a large internal capacitance on the internal clock node. • If the DFF output does not switch, the DFF does not have to be clocked. SungKyunKwan Univ. VADA Lab. 15 Frequency Reduction ◈ Power saving  Reduces capacitance on the clock network  Reduces internal power in the affected registers  Reduces need for muxes(data recirculation) ◈ Opportunity  Large opportunity for power reduction, dependent on;  Number of registers gated  percentage of time clock is enabled ◈ Cost  Testability  Complicates clock tree synthesis  Complicates clock skew balancing SungKyunKwan Univ. VADA Lab. 16 Frequency Reduction Clock Gating Example - When D is not equal to Q 32 data_in 32 data_in D reset FSM load_en Q D data_out data _reg clk 32 load_en reset FSM clk clk load-en_latched L A T C H Q data_out data _reg clk_en Before Clock Gating After Clock Gating SungKyunKwan Univ. VADA Lab. 17 Frequency Reduction ◈ Clock Gating Example - Before Code library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity nongate is port(clk,rst : in std_logic; data_in : in std_logic_vector(31 downto 0); data_out : out std_logic_vector(31 downto 0)); end nongate; architecture behave of nongate is signal load_en : std_logic; signal data_reg : std_logic_vector(31 downto 0); signal count : integer range 0 to 15; begin FSM : process begin wait until clk'event and clk='1'; if rst='0' then count <= 0; elsif count=9 then count <= 0; else count <= count+1; end if; end process FSM; SungKyunKwan Univ. enable_logic : process(count,load_en) begin if(count=9) then load_en <= '1'; else load_en <= '0'; end if; end process enable_logic; datapath : process begin wait until clk'event and clk='1'; if load_en='1' then data_reg <= data_in; end if; end process datapath; data_out <= data_reg; end behave; configuration cfg_nongate of nongate is for behave end for; end cfg_nongate; VADA Lab. 18 Frequency Reduction ◈ Clock Gating Example - After Code library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity gate is port(clk,rst : in std_logic; data_in : in std_logic_vector(31 downto 0); data_out : out std_logic_vector(31 downto 0)); end gate; architecture behave of gate is signal load_en,load_en_latched,clk_en : std_logic; signal data_reg : std_logic_vector(31 downto 0); signal count : integer range 0 to 15; begin SungKyunKwan Univ. VADA Lab. 19 Frequency Reduction FSM : process begin wait until clk'event and clk='1'; if rst='0' then count <= 0; elsif count=9 then count <= 0; else count <= count+1; end if; end process FSM; enable_logic : process(count,load_en) begin if(count=9) then load_en <= '1'; else load_en <= '0'; end if; end process enable_logic; deglitch : PROCESS(clk,load_en) begin SungKyunKwan Univ. if(clk='0') then load_en_latched <= load_en; end if; end process deglitch; clk_en <= clk and load_en_latched; datapath : process begin wait until clk_en'event and clk_en='1'; data_reg <= data_in; end process datapath; data_out <= data_reg; end behave; configuration cfg_gate of gate is for behave end for; end cfg_gate; VADA Lab. 20 Frequency Reduction ◈ Clock Gating Example - Report SungKyunKwan Univ. VADA Lab. 21 Frequency Reduction ◈ 4-bit Synchronous & Ripple counter - code  4-bit Synchronous Counter Library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity BINARY is Port ( clk : In std_logic; reset : In std_logic; count : BUFFER UNSIGNED (3 downto 0)); end BINARY; architecture BEHAVIORAL of BINARY is begin process(reset,clk,count) begin SungKyunKwan Univ. if (reset = '0') then count <= "0000” elsif (clk'event and clk = '1') then if (count = UNSIGNED'("1111")) then count <= "0000"; else count <=count+UNSIGNED'("1"); end if; end if; end process; end BEHAVIORAL; configuration CFG_BINARY_BLOCK_BEHAVIORAL of BINARY is for BEHAVIORAL end for; end CFG_BINARY_BLOCK_BEHAVIORAL; VADA Lab. 22 Frequency Reduction  4-bit Ripple Counter Library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity RIPPLE is Port ( clk : In std_logic; reset : In std_logic; count : BUFFER UNSIGNED (3 downto 0)); end RIPPLE; architecture BEHAVIORAL of RIPPLE is signal count0, count1, count2 : std_logic; begin process(count) begin count0 <= count(0); count1 <= count(1); SungKyunKwan Univ. count2 <= count(2); end process; process(reset,clk) begin if (reset = '0') then count(0) <= '0'; elsif (clk'event and clk = '1') then if (count(0) = '1') then count(0) <= '0'; else count(0) <= '1'; end if; end if; end process; process(reset,count0) begin if (reset = '0') then count(1) <= '0'; elsif (count0'event and count0 = '1') then VADA Lab. 23 Frequency Reduction if (count(1) = '1') then count(1) <= '0'; else count(1) <= '1'; end if; end if; end process; process(reset,count1) begin if (reset = '0') then count(2) <= '0'; elsif (count1'event and count1 = '1') then if (count(2) = '1') then count(2) <= '0'; else count(2) <= '1'; end if; end if; end process; if (count(3) = '1') then count(3) <= '0'; else count(3) <= '1'; end if; end if; end process; end BEHAVIORAL; configuration CFG_RIPPLE_BLOCK_BEHAVIORAL of RIPPLE is for BEHAVIORAL end for; end CFG_RIPPLE_BLOCK_BEHAVIORAL; process(reset,count2) begin if (reset = '0') then count(3) <= '0'; elsif (count2'event and count2 = '1') then SungKyunKwan Univ. VADA Lab. 24 Frequency Reduction ◈ 4-bit Synchronous & Ripple counter - Report SungKyunKwan Univ. VADA Lab. 25 Bus-Invert Coding for Low Power I/O An eight-bit bus on which all eight lines toggle at the same time and which has a high peak (worst-case) power dissipation. •There are 16 transitions over 16 clock cycles (average 1 transition per clock cycle). SungKyunKwan Univ. VADA Lab. 26 Peak Power Dissipation An eight-bit bus on which the eight lines toggle at different moments and which has a low peak power dissipation. There are the same 16 transitions over 16 clock cycles and thus the same average power dissipation SungKyunKwan Univ. VADA Lab. 27 Bus-Invert - Coding for low power • • • • • The Bus-Invert method proposed here uses one extra control bit called invert. By convention then invert = 0 the bus value will equal the data value. When invert = 1 the bus value will be the inverted data value. The peak power dissipation can then be decreased by half by coding the I/O as follow 1. Compute the Hamming distance (the number of bits in which they differ) between the present bus value (also counting the present invert line) and the next data value. 2. If the Hamming distance is larger than n=2, set invert = 1 (and thus make the next bus value equal to the inverted next data value). 3. Otherwise, let invert = 0 (and let the next bus value equal to the next data value). 4. At the receiver side the contents of the bus must be conditionally inverted according to the invert line, unless the data is not stored encoded as it is (e.g. in a RAM). In any case the value of invert must be transmitted over the bus (the method increases the number of bus lines from n to n + 1). SungKyunKwan Univ. VADA Lab. 28 Example A typical eight-bit synchronous data bus. The transitions between two consecutive time-slots are \clean". There are 64 transitions for a period of 16 time slots. This represents an average of 4 transitions per time slot, or 0.5 transitions per bus line per time slot. SungKyunKwan Univ. VADA Lab. 29 Bus encoding The same sequence of data coded using the Bus Invert method. There are now only 53 transitions over a period of 16 time slots. This represents an average of 3.3 transitions per time slot, or 0.41 transitions per bus line per time slot. The maximum number of transitions for any time slot is now 4. SungKyunKwan Univ. VADA Lab. 30 Comparisons Comparison of unencoded I/O and coded I/O with one or more invert lines. The comparison looks at the average and maximum number of transitions per time-slot, per bus-line per time-slot, and I/O power dissipation for different bus-widths. SungKyunKwan Univ. VADA Lab. 31 Remarks • • • • The increase in the delay of the data-path: By looking at the power-delay product which removes the effect of frequency (delay) on power dissipation, a clear improvement is obtained in the form of an absolute lower number of transitions. It is also relatively easy to pipeline the bus activity. The extra pipeline stage and the extra latency must then be considered. The increased number of I/O pins. As was mentioned before ground-bounce is a big problem for simultaneous switching in high speed designs. That is why modern microprocessors use a large number of Vdd and GND pins. The BusInvert method has the side-effect of decreasing the maximum ground-bounce by approximately 50%. Thus circuits using the Bus Invert method can use a lower number of Vdd and GND pins and by using the method the total number of pins might even decrease. Bus-Invert method decreases the total power dissipation although both the total number of transitions increases (by counting the extra internal transitions) and the total capacitance increases (because of the extra circuitry). This is possible because the transitions get redistributed very nonuniformly, more on the low-capacitance side and less on the high-capacitance side. SungKyunKwan Univ. VADA Lab. 32 References [1] H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley, 1990. [2] T. K. Callaway, E. E. Swartzlander, \Estimating the Power Consumption of CMOS Adders", 11th Symp. on Comp. Arithmetic, pp. 210-216, Windsor, Ontario, 1993. [3] A. P. Chandrakasan, S. Sheng, R. W. Brodersen, \Low-Power CMOS Digital Design", IEEE Journal of Solid-State Circuits, pp. 473-484, April 1992. [4] A. P. Chandrakasan, M. Potkonjak, J. Rabaey, R. W. Brodersen, \HYPER-LP: A System for Power Minimization Using Architectural Transformations", ICCAD-92, pp.300-303, Nov. 1992, Santa Clara, CA. [5] A. P. Chandrakasan, M. Potkonjak, J. Rabaey, R. W. Brodersen, \An Approach to Power Minimization Using Transformations", IEEE VLSI for Signal Processing Workshop, pp. , 1992, CA. [6] S. Devadas, K. Keutzer, J. White, \Estimation of Power Dissipation in CMOS Combinational Circuits", IEEE Custom Integrated Circuits Conference, pp. 19.7.1-19.7.6, 1990. [7] D. Dobberpuhl et al. \A 200-MHz 64-bit Dual-Issue CMOS Microprocessor", IEEE Journal of Solid-State Circuits, pp. 15551567, Nov. 1992. [8] R. J. Fletcher, \Integrated Circuit Having Outputs Congured for Reduced State Changes", U.S. Patent no. 4,667,337, May, 1987. SungKyunKwan Univ. [9] D. Gajski, N. Dutt, A. Wu, S. Lin, High-Level Synthesis, Introduction to Chip and System Design, Kluwer Academic Publishers, 1992. [10] J. S. Gardner, \Designing with the IDT SyncFIFO: the Architecture of the Future", 1992 Synchronous (Clocked) FIFO Design Guide, Integrated Device Technology AN-60, pp. 7-10, 1992, Santa Clara, CA. [11] A. Ghosh, S. Devadas, K. Keutzer, J. White, \Estimation of Average Switching Activity in Combinational and Sequential Circuits", Proceedings of the 29th DAC, pp. 253-259, June 1992, Anaheim, CA. [12] J. L. Hennessy, D. A. Patterson, Computer Architecture - A Quantitative Approach, Morgan Kaufmann Publishers, Palo Alto, CA, 1990. [13] S. Kodical, \Simultaneous Switching Noise", 1993 IDT High-Speed CMOS Logic Design Guide, Integrated Device Technology AN-47, pp. 41-47, 1993, Santa Clara, CA. [14] F. Najm, \Transition Density, A Stochastic Measure of Activity in Digital Circuits", Proceedings of the 28th DAC, pp. 644-649, June 1991, Anaheim, CA. VADA Lab. 33 References [16] A. Park, R. Maeder, \Codes to Reduce Switching Transients Across VLSI I/O Pins", Computer Architecture News, pp. 17-21, Sept. 1992. [17] Rambus - Architectural Overview, Rambus Inc., Mountain View, CA, 1993. Contact [email protected]. [18] A. Shen, A. Ghosh, S. Devadas, K. Keutzer, \On Average Power Dissipation and Random Pattern Testability", ICCAD-92, pp. 402-407, Nov. 1992, Santa Clara, CA. [19] M. R. Stan, \Shift register generators for circular FIFOs", Electronic Engineering, pp. 26-27, February 1991, Morgan Grampian House, London, England. [20] M. R. Stan, W. P. Burleson, \Limited-weight codes for low power I/O", International Workshop on Low Power Design, April 1994, Napa, CA. SungKyunKwan Univ. [21] J. Tabor, Noise Reduction Using Low Weight and Constant Weight Coding Techniques, Master's Thesis, EECS Dept., MIT, May 1990. [22] W.-C. Tan, T. H.-Y. Meng, \Low-power polygon renderer for computer graphics", Int. Conf. on A.S.A.P., pp. 200-213, 1993. [23] N. Weste, K. Eshraghian, Principles of CMOS VLSI Design, A Systems Perspective, AddisonWesley Publishing Company, 1988. [24] R. Wilson, \Low power and paradox", Electronic Engineering Times, pp. 38, November 1, 1993. [25] J. Ziv, A. Lempel, A universal Algorithm for Sequential Data Compression", IEEE Trans. on Inf. Theory, vol. IT-23, pp. 337-343, 1977. VADA Lab. 34 DesignPower Gate Level Power Model ◈ Switching Power  Power dissipated when a load capacitance(gate+wire) is charged or discharged at the driver’s output  If the technology library contains the correct capacitance value of the cell and if capacitive_load_unit attribute is specified then no additional information is needed for switching power modeling  Output pin capacitance need not be modeled if the switching power is incorporated into the internal power 2 V Psw   [  Ci  TRi ] 2 forall nets SungKyunKwan Univ. VADA Lab. 35 DesignPower Gate Level Power Model ◈ Internal Power  power dissipated internal to a library cell  Modeled using energy lookup table indexed by input transition time and output load  Library cells may contain one or more internal energy lookup tables P int   E int i ( outputload, inputtransition) TRi ] forall Cells SungKyunKwan Univ. VADA Lab. 36 DesignPower Gate Level Power Model ◈ Leakage Power  Leakage power model supports a signal value for each library cell  State dependent leakage power is not supported Pleak   Pleaki fo ra ll Cells SungKyunKwan Univ. VADA Lab. 37 Operand Isolation m Significant Power Dissipation m D n m Q Register Data_out Bank • Combinational logic dissipates significant power when output is unused FSM EN m m n D n Latch m Q Register G Bank FSM Data_out • Inputs to combination logic held stable when output is unused EN SungKyunKwan Univ. VADA Lab. 38 Operation Isolation Example -Diagram Data_Mul 8 Data_Add a 8 D ADD do MUL Q b 16 8 Before DataReg c rst FSM Load_En Load_En_Latched D Q Latch G Operand Isolation Clk_En clk Data_Add Iso_Data_Add Data_Mul 8 a D 8 ADD Q Latch G D do ADD Q b 16 8 After DataReg c rst FSM Load_En D Load_En_Latched Q Clk_En Operand Isolation Latch G clk SungKyunKwan Univ. VADA Lab. 39 Operand Isolation Example - Before Code Library IEEE; Use IEEE.STD_LOGIC_1164.ALL; Use IEEE.STD_LOGIC_SIGNED.ALL; Signal Data_Add : std_logic_vector(7 downto 0); Signal Data_Mul : std_logic_vector(15 downto 0); Begin Entity Logic is Port( a, b, c : in std_logic_vector(7 downto 0); do : out std_logic_vector(15 downto 0); rst : in std_logic; clk : in std_logic ); End Logic; Process(clk,rst) Architecture Behave of Logic is Signal Count : integer; Signal Load_En : std_logic; -- Counter Logic in FSM Begin If(clk='1' and clk'event) then If(rst='0') then Count <= 0; Elsif(Count=9) then Count <= 0; Else Count <= Count + 1; End If; End If; End Process; Signal Load_En_Latched : std_logic; Signal Clk_En : std_logic; SungKyunKwan Univ. VADA Lab. 40 Operand Isolation Example - Before Code Process(Count) -- Enable Logic in FSM Begin If(Count=9) then Load_En <= '1'; Else Load_EN <= '0'; End If; End Process; Process(clk,Load_En) -- Latch(for Deglitch) Logic Begin If(clk='0') then Load_En_Latched <= Load_En; End If; End Process; clk_En <= clk and Load_En_Latched; SungKyunKwan Univ. Data_Add <= a + b; Data_Mul <= Data_Add * c; Process(Data_Mul,Clk_En) -- Data Reg Logic Begin If(Clk_En='1' and Clk_En'event) then Do <= Data_Mul; End If; End Process; End Behave; Configuration CFG_Logic of Logic is for Behave End for; End CFG_Logic; VADA Lab. 41 Operand Isolation Example - After Code Library IEEE; Use IEEE.STD_LOGIC_1164.ALL; Use IEEE.STD_LOGIC_SIGNED.ALL; Entity Logic1 is Port( a, b, c : in std_logic_vector(7 downto 0); do : out std_logic_vector(15 downto 0); rst : in std_logic; clk : in std_logic ); End Logic1; Architecture Behave of Logic1 is Signal Count : integer; Signal Load_En : std_logic; Signal Load_En_Latched : std_logic; Signal Clk_En : std_logic; SungKyunKwan Univ. Signal Data_Add : std_logic_vector(7 downto 0); Signal Data_Mul : std_logic_vector(15 downto 0); Signal Iso_Data_Add : std_logic_vector(7 downto 0); Begin Process(clk,rst) -- Counter Logic in FSM Begin If(clk='1' and clk'event) then If(rst='0') then Count <= 0; Elsif(Count=9) then Count <= 0; Else Count <= Count + 1; End If; End If; End Process; VADA Lab. 42 Operand Isolation Example - After Code Process(Count) -- Enable Logic in FSM Begin If(Count=9) then Load_En <= '1'; Else Load_EN <= '0'; End If; End Process; Process(Load_En_Latched,Data_Add) -- Latch Begin -- for Operand Isolation If(Load_En_Latched='1' and Load_En_Latched'event) then Iso_Data_Add <= Data_Add; End If; End Process; Data_Mul <= Iso_Data_Add * c; Process(clk,Load_En) -- Latch(for Deglitch) Logic Begin If(clk='0') then Load_En_Latched <= Load_En; End If; End Process; Process(Data_Mul,Clk_En) -- Data Reg Logic Begin If(Clk_En='1' and Clk_En'event) then Do <= Data_Mul; End If; End Process; clk_En <= clk and Load_En_Latched; End Behave; Data_Add <= a + b; SungKyunKwan Univ. VADA Lab. 43 Operand Isolation Example - Report Before Code SungKyunKwan Univ. After Code VADA Lab. 44 Precomputation • Power saving – Reduces power dissipation of combinational logic – Reduces internal power to precomputed registers • Opportunity – Can be significant, dependent on; • percentage of time latch precomputation is successful • Cost – Increase area – Impact circuit timing – Increase design complexity • number of bits to precompute – Testability • may generate redundant logic SungKyunKwan Univ. VADA Lab. 45 Precomputation Register Bank p / n / n Register Bank p / Entire function is computed. Data_out / n-m / Register Bank p / n-m / p / EN m / D Register Bank Register Bank 1 / Q m / / p / SungKyunKwan Univ. p / Data_out Smaller function is defined, Enable is precomputed. VADA Lab. 46 Precomputation • Before Precomputation Diagram a b 8 / 8 / 8 / a>b 8 / 1 / 1 / Data_out CLK SungKyunKwan Univ. VADA Lab. 47 Precomputation • After Precomputation Diagram a(6:0) 7 / a(6:0) b(6:0) 7 / 8 / 7 / b(6:0) 1 / 7 / a>b 1 / Data_out 8 / Latch 1 a(7) / a(7) 1 / b(7) 1 / 1 b(7) / CLK SungKyunKwan Univ. VADA Lab. 48 Precomputation • Before Precomputation - Report SungKyunKwan Univ. VADA Lab. 49 Precomputation • After Precomputation - Report SungKyunKwan Univ. VADA Lab. 50 Low power circuit techniques • Power modeling on circuit level. Node activity. Speed and supply voltage. Flipflops and latches. • Driving large loads. Clocking and clock distribution, Low swing • circuit techniques (adiabetic, carry select adder, manchester carry chain). SungKyunKwan Univ. VADA Lab. 51 Precomputation Example - Before Code Library IEEE; Use IEEE.STD_LOGIC_1164.ALL; Entity before_precomputation is port ( a,b : in std_logic_vector(7 downto 0); CLK: in std_logic; D_out: out std_logic); end before_precomputation; Architecture Behav before_precomputation is of signal a_in, b_in : std_logic_vector(7 downto 0); signal comp : std_logic; SungKyunKwan Univ. Begin process (a,b,CLK) Begin if (CLK = '1' and CLK'event) then a_in <= a; b_in<= b; end if; if (a_in > b_in) then comp <= '1'; else comp <= '0'; end if; if (CLK'event and CLK='1') then D_out <= comp; end if; end process; end Behav; VADA Lab. 52 Precomputation Example - After Code Begin process(a,b,CLK) Begin Library IEEE; Use IEEE.STD_LOGIC_1164.ALL; Entity after_precomputation is port (a, b : in std_logic_vector(7 downto 0); CLK: in std_logic; D_out: out std_logic); end after_precomputation; if (CLK='1' and CLK'event) then a_in(7) <= a(7); b_in(7) <= b(7); end if; Architecture Behav after_precomputation is if (CLK='0') then pcom_D <= pcom; end if; of signal a_in, b_in : std_logic_vector(7 downto 0); signal pcom, pcom_D : std_logic; signal CLK_en, comp : std_logic; SungKyunKwan Univ. pcom <= a xor b; CLK_en <= pcom_D and CLK; VADA Lab. 53 Precomputation - Example After Code if (CLK_en='1' and CLK_en'event) then a_in(6 downto 0) <= a(6 downto 0); b_in(6 downto 0) <= b(6 downto 0); end if; if (CLK='1' and CLK'event) then D_out <= comp; end if; end process; end Behav; if (a_in > b_in) then comp <= '1'; else comp <= '0'; end if; SungKyunKwan Univ. VADA Lab. 54 Peak Power Reduction • • Peak Power has relation to EMI Reducing concurrent switching makes peak power reduction – Adjust delay  within the speed of system clock in Bus/Port driver – Consider the power consumption of delay element – Maintaining total power consumption, we improve EMI in peak power reduction • Before Peak Power Reduction Itotal n bits wide E1 • After Peak Power Reduction t Itotal E  Vdd   I totoldt  n bits wide E2 t t (n-1)/  SungKyunKwan Univ. VADA Lab. 55 Factoring Example Function : f = ad + bc + cd The function f is not on the critical path. The signal a,b,c and d are all the same bit width. Signal b is a high activity net. The two factorings below are equivalent from both a timing and area criteria. Net Result : network toggling and power is reduced. f = b(a+c) + cd f = b(a+c) + cd a a c b b f f c c b d d SungKyunKwan Univ. VADA Lab. 56 Low Power Logic Gate Resynthesis on Mapped Circuit 김현상 조준동 전기전자컴퓨터공학부 성균관대학교 SungKyunKwan Univ. VADA Lab. 57 Low Power Logic Synthesis RTL Description Logic Synthesis Technology Independent Optimization Logic Equation Timing & Power Analysis Tools Technology Mapping Connection of Gates Resynthesis on Mapped Circuit Gate Level Description SungKyunKwan Univ. VADA Lab. 58 Technology Mapping h l h h h l l (a) l (b) l h : high switching activity node l l : low switching activity node (c) SungKyunKwan Univ. VADA Lab. 59 Tree Decomposition f f Low Power (b) (a) critical path primary input gate(AND) f SungKyunKwan Univ. output VADA Lab. 60 Huffman Algorithm 23 13 y3 y1 5 x5 8 2 3 x1 x2 x3 SungKyunKwan Univ. 10 4 y2 4 x4 VADA Lab. 61 Depth-Constrained Decomposition • • • • • • • • • • • • • • • • • • • Algorithm problem : minimize SUM from i=1 to m p_t (x_i ) input : 입력 시그널 확률(p1, p2,íñíñíñ, pn), 높이(h), 말단 노드의 수(n), 게이트당 fanin limit(k) output : k-ary 트리 topology Begin sort (signal probability of p1, p2,íñíñíñ, pn); while (n!=0) if (h>logkn) assign k nodes to level L(=h+1); /*레벨 L(=h+1)에 노드 k개만큼 할당*/ h=h-1, n=n-(k-1); /*upward*/ else if (h<logkn) assign k nodes to level L(=h+2); /*이전 레벨 L(=h+2)에 노드 k개만큼 할당*/ h=h, n=n-(k-1); /*downward*/ else (h=logkn) assign the remaining nodes to level L(=h+1); /*complete; 레벨 L(=h+1)에 나머지 노드를 모두 할당하고 complete k-ary 트리 구성*/ • • • for (bottom level L; L>1; L--) min_edge_weight_matching (nodes in level L); End SungKyunKwan Univ. VADA Lab. 62 Example level L=0 h=1 level L=1 h=2 x x y x y e f 0.5 0.6 level L=2 h=3 a b a b c d a b c d 0.1 0.2 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 x y e f 0.5 0.6 level L=3 x y a b c d a d b c 0.1 0.2 0.3 0.4 0.1 0.4 0.2 0.3 before matching SungKyunKwan Univ. e f 0.5 0.6 after matching VADA Lab. 63 After Decomposition K 1 =2 16 14 SIS SIS+OURS Improvement Ratio Value, Ratio 12 10 8 6 4 2 0 h=3 6 h=4 10 h=6 h=5 h=7 h=5 20 h=7 h=9 Fanin, Height SungKyunKwan Univ. VADA Lab. 64 After Tech. Mapping K 1 =3, k 2 =3 80 SIS+LEVEL MAP 70 SIS+OURS+LEVEL MAP Improvement Ratio Power(mW), Ratio 60 50 40 30 20 10 0 h=2 6 h=3 h=3 10 h=4 h=5 h=3 15 h=4 h=5 h=5 20 h=6 h=7 h=8 Fanin, Height SungKyunKwan Univ. VADA Lab. 65 Buffer Chain • • Delay analysis of buffer chain (W / L) k 1  a (W / L) k n Ck  a k Cin  a k 1C p n Td   t d (k )   a  t0  n a  t0 k 1 Pk  Ck  Vdd  f  Vdd  f  a i 1  (a  Cin  C p ) 2 k 1 n 2 PT   Pk  Vdd  f  (a  Cin  C p ) C L  a  Cin n 2 k 1 ln( C L / Cin ) ln( a ) ln( C L / Cin ) Td  a  t0  ln( a )  (Td ) 0 a (a ) optimum  e  2.72 , n size 1 Delay analysis considering parasitic capacitance,Cp Eff  a n 1 a 1 a n 1 a n 1  a n  2  a 1 a  2 ~ 10 (typical : e) Ck,Pk: stage k buffer output의 total capacitance, power PT: buffer chain의 power consumption Pn: load capacitance CL의 power consumption (n) optimum  ln( C L / Cin ) size a size a i-2 Eff: power efficiency pn/pT size ai-1 size an-1 input C in stage 1 stage 1 aC in ai-1 C in stage (i-1) SungKyunKwan Univ. aiC in stage i C in = an C in stage n VADA Lab. 66 Slew Rate • Determining rise/fall time I short I mean t3 t  2 2    I short (t )dt   I short (t )dt  T  t1  t2 t  4 2    I short (t )dt  T  t1  t  4 2     (Vin  Vt ) 2 dt  T  t1 2  where,  n   p   , Vtp  Vtn  Vt PSC  I mean  Vdd  where, t r  t f    2 (Vdd  2Vt ) 3 f Period T Vin Vdd +V tp tr tf Vtn Imax Imean t1 t2 t3 SungKyunKwan Univ. VADA Lab. 67 Slew Rate(Cont’d) • Power consumption of Short circuit current in Oscillation Circuit Vo Vo Vdd Vdd Vi Vi Vdd Vi SungKyunKwan Univ. Vdd Vo VADA Lab. 68 Pass Transistor Logic • Reducing Area/Power – Macro cell(Large part in chip area) XOR/XNOR/MUX(Primitive)  Pass Tr. Logic – Not using charge/discharge scheme  Appropriate in Low Power Logic • CPL – Basic Scheme A B B A B B AB • Pass Tr logic Family – CPL (Complementary Pass Transistor Logic) – DPL (Dual Pass Transistor Logic) – SRPL (Swing Restored Pass Transistor Logic) AB – Inverter Buffering A B B A B B Vdddd V AB AB p-MOS Latch SungKyunKwan Univ. VADA Lab. 69 Pass Transistor Logic(Cont’d) • DPL – Pass Tr Network + Dual p-MOS – Enables rail-to-rail swing – Characteristics • Increasing input capacitance(delay) • Increasing driving ability for existing 2 ON-path • equals CPL in input loading capacitance • A A B B B SRPL – Pass Tr network + Cross coupled inverter – Restoring logic level – Inverter size must not be too big n-MOS CPL network B A A AB SungKyunKwan Univ. AB VADA Lab. 70 Dynamic Logic • • • Using Precharge/Evaluation scheme Family – Domino logic – NORA(NO RAce) logic Characteristics – Decreasing input loading capacitance – Power consumption in precharge clock – Increasing useless switching in precharging period precharge evaluation • Basic architecture of Domino logic P1 A CL A B N Logic Block clk clk A C in N1 B SungKyunKwan Univ. VADA Lab. 71 Input Pin Ordering • • • Reorder the equivalent inputs to a transistor based on critical path delays and power consumption N- input Primitive CMOS logic – symmetrical in function level – antisymmetrical in Tr level • capacitance of output stage • body effect Scheme – The signal that has many transition must be far from output – If it is hard to estimate switching frequency, we must determine pin ordering considering path and path delay balance from primary input to input of Tr. SungKyunKwan Univ. • Example of N-input CMOS logic CL A B C1 C C2 D C3 Experimentd with gate array of TI For a 4-input NAND gate in TI’s BiCMOS gate array library (with a load of 13 inverters), the delay varies by 20% while power dissipation by 10% between a good and bad ordering VADA Lab. 72 INPUT PIN Reordering VDD A B C MPA MPB 1 D MPC 1 A MNA MPD CL Simulation result ( tcycle=50ns, tf/tr=1ns) 1 1 1 1 B MNB CB : A가 critical input인 경우 =38.4uW, 1 1 1 1 C MNC CC D가 critical input인 경우 =47.2uW D MND CD 1 (a) (b) 1 (c) (d) SungKyunKwan Univ. VADA Lab. 73 Sensitization • Definition – sensitization : input signal that forces output transition event – sensitization vector : the other inputs if one signal is sensitized Y  [ f ] X i 0  [ f ] X i 1 X i  f ( X 1 ,, X i l ,0, X i 1 ,, X n )   f ( X 1 ,, X i l ,1, X i 1 ,, X n ) SungKyunKwan Univ. • Example X1 X2 X3 Y  ( X1  X 2 )  X 3 Y  [ f ] X1 0  [ f ] X1 1 X 1  X2X3  X3  X2X3 VADA Lab. 74 Sensitization(Cont’d) • Considering Sensitization in Combinational logic:Remove unnecessary transitions in the C.L Q Considering Sensitization in Sequential logic: Also reduces the power consumption in the flipflops. X1 D Q Q Y Combinational Logic Xn • Xn D Q E E X1 Q X1 Y E Q D Q Y Combinational Logic Combinational Logic Xn Y Combinational Logic Xn D Q E clk SungKyunKwan Univ. VADA Lab. 75 TTL-Compatible • TTL level signal  CMOS input • Vdd IDTTL1 Characteristic Curve of CMOS Inverter Vo V dd = 3.3V IDTTL2 Vin TTL INPUT Vi Vo 1.4V Ileak = avg(I V dd = 3.3V ,I ) d1 d2 PTTL  NTTL Vdd  ( I DTTL1  I DTTL 2 ) wher e NTTL : number of TTL compatible input pad V IL = 0.8V SungKyunKwan Univ. V IH = 2.0V Vi V dd = 3.3V VADA Lab. 76 TTL Compatible(Cont’d) • CMOS output signal  TTL input Chip Boundary Chip Boundary – Because of sink current IOL, CMOS gets a large amount of heat IOL – Increased chip operating temperature – Power consumption of whole system SungKyunKwan Univ. Input Pad VOL Output Pad VADA Lab. 77 INPUT PIN Reordering ◈ To reduce the power dissipation one should place the input with low transition density near the ground end. (a) If MNA turns off , only CL needs to be charged (b) If MND turns off , all CL, CB, CC and CD needs to be charged (c) If the critical input is rising and placed near output node, the initial charge of CB, CC and CD are zero and the delay time of CL discharging is less than (d) (d) If the critical input is rising and placed near ground end, the charge of CB, CC and CD must dischagge before the charge of CL discharge to SungKyunKwan Univ. VADA Lab. zero 78 Conclusion % of instances with circuit states effects 9.0% reduction Power[pJ] 35 12 30 10 12bit 8bit 0 4bit 5 bits SungKyunKwan Univ. 4 2 0 average 10 6 12bit circuit states effects considered 4.0% reduction 8 8bit 20 15 circuit states effects not considered 12.0% reduction 4bit 25 bits VADA Lab. Device Scaling of Factor of S • • • • • • • • • Constant scaled wire increases coupling capacitance by S and wire resistance by S Supply Voltage by 1/S, Theshold Voltage by 1/S, Current Drive by 1/S Gate Capaitance by 1/S, Gate Delay by 1/S Global Interconnection Delay, RC load+para by S Interconnect Delay: 50-70% of Clock Cycle Area: 1/S2 Power dissipation by 1/S - 1/S2 ( P = nCVdd2f, where nC is the sum of capacitance times #transitions) SIA (Semiconductor Industry Association): On 2007, physical limitation: 0.1 m 20 billion transistors, 10 sqare centimeters SungKyunKwan Univ. , 12 or 16 inch wafer VADA Lab. 80 Delay Variations at Low-Voltage • At high supply voltage, the delay increases with temperature (mobility is decreasing with temperature) while at very low supply voltages the delay decreases with temperature (VT is decreasing with temperature). • At low supply voltages, the delay ratio between large and minimum transistor widths W increases in several factors. • Delay balancing of clock trees based on wire snaking in order to avoid clock-skew. In this case, at low supply voltages, slightly VT variations can significantly modify the delay balancing. SungKyunKwan Univ. VADA Lab. 81 Quarter Micron Challenge • • • • • • • • • • • • • Computers/peripherals (SOC): 1996 ($50 Billion) 1999 ($70 Billion) Wiring dominates delay: wire R comparable to gate driver R; wire/wire coupling C > C to ground Push beyond 0.07 micron Quest for area(past), speed-speed (now), power-power-power(future) Accelerated increases of clock frequencies Signal integrity-based tools Design styles (chip + packages) System-level design(system partitioning) Synthesis with multiple constraints (power,area,timing) Partitioning/MCM Increasing speed limits complicate clock and power distribution Design bounded by wires, vias, via resistance, coupling Reverse scaling: adding area/spacing as needed: widening, thickening of wires, metal shielding & noise avoidance - adding metal SungKyunKwan Univ. VADA Lab. 82 CLOCK POWER CONSUMPTION •Clock power consumption is as large as the logic power; Clock Signal carrying the heaviest load and switching at high frequency, clock distribution is a major source of power dissipation. • In a microprocessor, 18% of the total power is consumed by clocking • Clock distribution is designed as a hierarchical clock tree, according to the decomposition principle. SungKyunKwan Univ. VADA Lab. 83 Power Consumption per block in typical microprocessor SungKyunKwan Univ. VADA Lab. 84 Crosstalk SungKyunKwan Univ. VADA Lab. 85 Solution for Clock Skew • • • • • • • • • • • Dynamic Effects on Skew Capacitance Coupling Supply Voltage Deviation (Clock driver and receiver voltage difference) Capacitance deviation by circuit operation Global and local temperature Layout Issues: clocks routed first Must aware of all sources of delay Increased spacing Wider wires Insert buffers Specialized clock need net matching Two approaches: Single Driver, Htree driver SungKyunKwan Univ. • • • • Gated Clocks: The local clocks that are conditionally enabled so that the registers are only clocked during the write cycles. The clock is partitioned in different blocks and each block is clocked with its own clock. Gating the clocks to infrequently used blocks does not provide and acceptable level of power savings Divide the basic clock frequency to provide the lowest clock frequency needed to different parts of the circuit Clock Distribution: large clock buffer waste power. Use smaller clock buffers with a well-balanced clock tree. VADA Lab. 86 PowerPC Clocking Scheme SungKyunKwan Univ. VADA Lab. 87 CLOCK DRIVERS IN THE DEC ALPHA 21164 SungKyunKwan Univ. VADA Lab. 88 DRIVER for PADS or LARGE CAPACITANCES Off-chip power (drivers and pads) are increasing and is very difficult to reduce such a power, as the pads or drivers sizes cannot be decreased with the new technologies. SungKyunKwan Univ. VADA Lab. 89 Layout-Driven Resynthesis for Lower Power SungKyunKwan Univ. VADA Lab. 90 Low Power Process • Dynamic Power Dissipation Vdd C djp Pd  a  C L  Vdd  f 2 I ds   2 (Vgs  Vt ) 2 Vin C ovp Vo C ovn C djn n C gate  Cox  (W  L) i 1 m Cin   (C gate ) j D j 1 Cov  CGD0  W Cdj  C j  AD  C jsw  PD AD  W  D, PD  2(W  D ) SungKyunKwan Univ. Drain W C jb C jsw VADA Lab. 91 Crosstalk • • • In deep-submicron layouts, some of the netlengths for connection between modules can be so long that they have a resistance which is comparable to the resistance of the driver. Each net in the mixed analog/digital circuits is identified depending upon its crosstalk sensitivity – 1. Noisy = high impedance signal that can disturb other signals, e.g., clock signals. – 2. High-Sensitivity = high impedance analog nets; the most noise sensitive nets such as the input nets to operational amplifiers. – 3. Mid-Sensitivity = low/medium impedance analog nets. – 4. Low-Sensitivity = digital nets that directly affect the analog part in some cells such as control signals. – 5. Non-Sensitivity = The most noise insensitive nets such as pure digital nets, The crosstalk between two interconnection wires also depends on the frequencies (i.e., signal activities) of the signals traveling on the wires. Recently, deep-submicron designs require crosstalk-free channel routing. 92 SungKyunKwan Univ. VADA Lab. Power Measure in Layout • • • • • The average dynamic power consumed by a CMOS gate is given below, where C_l is the load capacity at the output of the node, V_dd is the supply voltage, T_cycle is the global clock period, N is the number of transitions of the gate output per clock cycle, C_g is the load capacity due to input capacitance of fanout gates, and C_w is the load capacity due to the interconnection tree formed between the driver and its fanout gates. Pav = (0.5 Vdd2) / (Tcycle Cl N) = (0.5 Vdd2) / (Tcycle (Cg + Cw )N) Logic synthesis for low power attempts to minimize SUMi Cgi Ni Physical design for low power tries to minimize SUMi Cwi Ni . Here Cwi consists of Cxi + CsI, where Cxi is the capacitance of net i due to its crosstalk, and CsI is the substrate capacitance of net i. For low power layout applications, power dissipation due to crosstalk is minimized by ensuring that wires carrying high activity signals are placed sufficiently far from the other wires. Similarly, power dissipation due to substrate capacitance is proportional to the wirelength and its signal activity. SungKyunKwan Univ. VADA Lab. 93 이중 전압을 이용한 저전력 레이아웃 설계 성균관대학교 전기전자컴퓨터공학부 김 진 혁, 이 준 성, 조 준 동 SungKyunKwan Univ. VADA Lab. 목 • • • • • • • • • 차 연구목적 연구배경 Clustered Voltage Scaling 구조 Row by Row Power Supply 구조 Mix-And-Match Power Supply 구조 Level Converter 구조 Mix-And-Match Power Supply 설계흐름 실험결과 결론 SungKyunKwan Univ. VADA Lab. 연 구 목 적 및 배경 • 조합회로의 전력 소모량을 줄이는 이중 전압 레이아웃 기법 제안 • 이중 전압 셀을 사용할 때, 한 cell row에 같은 전압의 cell이 배치되면 서 증가하는 wiring 과 track 의 수를 줄임 • 최소 트랜지스터 개수를 사용하는 Level Converter 회로의 구현 SungKyunKwan Univ. • 디바이스의 성능을 유지하면서 이중 전압을 사용하는 Clustered Voltage Scaling [Usami, ’95]을 적 용 • 제안된 Mix-And-Match Power Supply 레이 아웃 구조는 기존의 Row by Row Power Supply [Usami, ’97] 레이 아웃 구조를 개선하여 전력과 면적을 줄임 VADA Lab. 96 Clustered Voltage Scaling • 저전력 netlist 를 생성 G5 F/F S 5>0 G4 Slack(S i) = R i - A i G3 G6 G2 S 6>0 S 4>0 G8 S 2<0 S 3>0 LC1 S 8<0 G1 S 1>0 F/F G7 S 7<0 S 9>0 : VDDL S 11<0 F/F : VDDH LC2 G11 G10 SungKyunKwan Univ. S 10<0 G9 : Level Converter VADA Lab. Row by Row Power Supply 구조 standard cell VDDL VDDH VDDL cell VDDL VDDH standard cell standard cell VDDL VDDH cell module VSS VDDL cell SungKyunKwan Univ. VDDH VSS VDDH cell VADA Lab. Mix-And-Match Power Supply 구조 standard cell VDDL VDDH cell VDDH VDDL VDDL cell VDDH standard cell standard cell module VDDH cell SungKyunKwan Univ. VDDL cell VDDH cell VDDL VDDL VDDH VDDH VSS VSS VDDL cell VADA Lab. 구조비교 Conventional Circuit RRPS MAMPS VDDL VDDH VDDH VDDL VDDH module SungKyunKwan Univ. module module VADA Lab. 100 Level Converter 구조 • Transistor의 갯수 : 6개 4개 • 전력과 면적면에서 효과적 VDDH VDDH VDDH OUT VDDL VSS/VDDL VSS/VDDH IN Vth=1.5V 기 존 SungKyunKwan Univ. Vth=2.0V 제 안 VADA Lab. Mix-And-Match Power Supply Design Flow Single voltage netlist Multiple voltage scaling Netlist with multiple supply voltage (OPUS) Assign supply voltage to each cell Physical placement (Aquarius XO) Routing Synthesis timing, power and area SungKyunKwan Univ. (PowerMill) VADA Lab. 실험결과 전체 Power 전체 Area Area (%) power (%) 100 47% 10% 15% 100 2% Conventional circuit RRPS MAMPS SungKyunKwan Univ. Conventional circuit RRPS MAMPS VADA Lab. 결 론 • 단일 전압 회로와 비교하여 49.4%의 Power 감소를 Area overhead가 발생 얻은 반면 5.6%의 • 기존의 RRPS 구조보다 10%의 Area 감소와 2%의 Power 감소 • 제안된 Level Converter는 기존의 Level Converter보다 30%의 Area 감소와 35%의 Power 감소 SungKyunKwan Univ. VADA Lab. Low Power Design Tools • Transistor Level Tools (5-10% of silicon) – SPICE, PowerMill(Epic), ADM(Avanti/Anagram), Lsim Power Analyst(mentor) • Logic Level Tools (10-15%) – Design Power and PowerGate (Synopsys), WattWatcher/Gate (Sente), PowerSim (System Sciences), POET (Viewlogic), and QuickPower (Mentor) • Architectural (RTL) Level Tools (20-25%) – WattWatcher/Architect (Sente): 20-25% accuracy • Behavioral (spreadsheet) Level Tools (50-100%) – Active area of academic research SungKyunKwan Univ. VADA Lab. 105 Commercial synthesis systems SungKyunKwan Univ. VADA Lab. 106 Research synthesis systems AArchitectural synthesis. L - Logic synthesis. SungKyunKwan Univ. VADA Lab. 107 Low-Power CAD sites • • • • • • Alternative System Concepts, Inc, : 7X power reduction throigh optimization, contact http://www.ee.princeton.edu and Jake Karrfalt at [email protected] or (603) 437-2234. Reduction of glitch and clock power; modeling and optimization of interconnect power; power optimization for data-dominated designs with limited control flow. Mentor Graphics QuickPower: Hierarchical of determining overall benet of exchanging the blocks for lower power. powering down or disabling blocks when not in use by gated-clock choose candidates for power-down Calculate the effect of the power-down logic http://www.mentorg.com Synopsys's Power Compiler http://www.synopsys.com/products/power/power_ds Sente's WattWatcher/Architect (first commerical tool operating at the architecture level(20-25 %accuracy). http://www.powereda.com Behavioral Tool: Hyper-LP (Optimization), Explore (Estimation) by J. Rabaey SungKyunKwan Univ. VADA Lab. 108 Design Power(Synopsys) • • • DesignPower(TM) provides a single, integrated environment for power analysis in multiple phases of the design process: – Early, quick feedback at the HDL or gate level through probabilistic analysis. – Improved accuracy through simulation-based analysis for gate level and library exploration. DesignPower estimates switching, internal cell and leakage power. It accepts user-defined probabilities, simulation toggle data or a combination of both as input. DesignPower propagates switching information through sequential devices, including flip-flops and latches. It supports sequential, hierarchical, gated-clock, and multiple-clock designs. For simulation toggle data, it links directly to Verilog and VHDL simulators, including Synopsys' VSS. SungKyunKwan Univ. VADA Lab. 109 References [1] Gary K. Yeap, "Practical Low Power Digital VLSI Design", Kluwer Academic Publishers. [2] Jan M. Rabaey, Massoud Pedram, "Low Power Design Methodologies", Kluwer Academic Publishers. [3] Abdellatif Bellaouar, Mohamed I. Elmasry, "Low-Power Digital VLSI Design Circuits And Systems", Kluwer Academic Publishers. [4] Anantha P. Chandrakasan, Robert W. Brodersen, "Low Power Digital CMOS Design", Kluwer Academic Publishers. [5] Dr. Ralph Cavin, Dr. Wentai Liu, "1996 Emerging Technologies : Designing Low Power Digital Systems" [6] Muhammad S. Elrabaa, Issam S. Abu-Khater, Mohamed I. Elmasry, "Advanced Low-Power Digital Circuit Techniques", Kluwer Academic Publishers. SungKyunKwan Univ. VADA Lab. 110 References • • • • • [BFKea94] R. Bechade, R. Flaker, B. Kaumann, and et. al. A 32b 66 mhz 1.8W Microprocessor". In IEEE Int. Solid-State Circuit Conference, pages 208-209, 1994. [BM95] Bohr and T. Mark. Interconnect Scaling - The real limiter to high performance ULSI". In proceedings of 1995 IEEE international electron devices meeting, pages 241-242, 1995. [BSM94] L. Benini, P. Siegel, and G. De Micheli. Saving Power by Synthesizing Gated Clocks for Sequential Circuits". IEEE Design and Test of Computers, 11(4):32-41, 1994. [GH95] S. Ganguly and S. Hojat. Clock Distribution Design and Verification for PowerPC Microprocessor". In International Conference on Computer-Aided Design, page Issues in Clock Designs, 1995. [MGR96] R. Mehra, L. M. Guerra, and J. Rabaey. Low Power Architecture Synthesis and the Impact of Exploiting Locality". In Journal of VLSI Signal Processing,, 1996. SungKyunKwan Univ. VADA Lab. 111

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lower Power Synthesis - VADA