Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Low-Power Logic Design and Parallelism Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal [email protected] 11/01/05 ELEC 5970-001/6970-001 Lecture 17 1 State Encoding • Two-bit binary counter: • State sequence, 00→01→10→11→00 • Six bit transitions in four clock cycles • 6/4 = 1.5 transitions per clock • Two-bit Gray-code counter • State sequence, 00→01→11→10→00 • Four bit transitions in four clock cycles • 4/4 = 1.0 transition per clock • Gray-code counter is more power efficient. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers (now Springer), 1998. 11/01/05 ELEC 5970-001/6970-001 Lecture 17 2 Three-Bit Counters State 000 Binary No. of toggles - Gray-code State No. of toggles 000 - 001 010 011 100 1 2 1 3 001 011 010 110 1 1 1 1 101 110 1 2 111 101 1 1 111 000 1 3 100 000 1 1 11/01/05 ELEC 5970-001/6970-001 Lecture 17 3 N-Bit Counter: Toggles in Counting Cycle • Binary counter: T(binary) = 2(2N – 1) • Gray-code counter: T(gray) = 2N • T(gray)/T(binary) = 2N-1/(2N – 1) → 0.5 11/01/05 Bits T(binary) T(gray) T(gray)/T(binary) 1 2 2 1.0 2 6 4 0.6667 3 14 8 0.5714 4 30 16 0.5333 5 62 32 0.5161 6 126 64 0.5079 ∞ - - 0.5000 ELEC 5970-001/6970-001 Lecture 17 4 Bus Encoding • Example: Four bit bus • 0000→1110 has three transitions. • If bits of second pattern are inverted, then 0000→0001 will have only one transition. Number of bit transitions after inversion encoding • Bit-inversion encoding for N-bit bus: 11/01/05 N N/2 0 0 N/2 Number of bit transitions ELEC 5970-001/6970-001 Lecture 17 N 5 Sent data Received data Bus-Inversion Encoding Logic Polarity decision logic 11/01/05 Bus register Polarity bit M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49-58, March 1995. ELEC 5970-001/6970-001 Lecture 17 6 Transition probability based on PI statistics FSM State Encoding 0.6 11 0.3 0.4 00 0.6 0.6 0.1 01 0.3 0.1 0.4 01 00 0.9 0.6 0.1 0.1 11 0.9 Expected number of state-bit transitions: 2(0.3+0.4) + 1(0.1+0.1) = 1.6 1(0.3+0.4+0.1) + 2(0.1) = 1.0 State encoding can be selected using a power-based cost function. 11/01/05 ELEC 5970-001/6970-001 Lecture 17 7 FSM: Clock-Gating • Moore machine: Outputs depend only on the state variables. – If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Xi/Zk Si Sk Sj 11/01/05 Xj/Zk Xk/Zk Clock can be stopped when (Xk, Sk) combination occurs. ELEC 5970-001/6970-001 Lecture 17 8 Clock-Gating in Moore FSM Flip-flops PI Clock activation logic CK 11/01/05 Latch Combinational logic PO L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998. ELEC 5970-001/6970-001 Lecture 17 9 Clock-Gating in Low-Power Flip-Flop D D Q CK 11/01/05 ELEC 5970-001/6970-001 Lecture 17 10 Low-Power Datapath Architecture • Lower supply voltage – This slows down circuit speed – Use parallel computing to gain the speed back • Works well when threshold voltage is also lowered. • About 60% reduction in power obtainable. • Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. 11/01/05 ELEC 5970-001/6970-001 Lecture 17 11 Combinational logic Register Input Register A Reference Datapath Output Cref CK Supply voltage Total capacitance switched per cycle Clock frequency Power consumption: Pref 11/01/05 ELEC 5970-001/6970-001 Lecture 17 = Vref = Cref =f = CrefVref2f 12 Comb. Logic Copy 2 Multiphase Clock gen. and mux control f/N Register f/N N = Deg. of parallelism Register Input Comb. Logic Copy 1 Supply voltage: VN ≤ V1 = Vref N to 1 multiplexer f/N Register A copy processes every Nth input, operates at reduced voltage Register A Parallel Architecture Output f Comb. Logic Copy N CK 11/01/05 ELEC 5970-001/6970-001 Lecture 17 13 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 11/01/05 ELEC 5970-001/6970-001 Lecture 17 14 Power PN = Pproc + Poverhead Pproc = N(Cinreg+Ccomb)VN2f/N + CoutregVN2f = (Cinreg+Ccomb+Coutreg)VN2f = CrefVN2f CoverheadVN2f PN [1 + δ(N – 1)]CrefVN2f = PN ── P1 11/01/05 ≈ δCref(N – 1)VN2f Poverhead = = VN2 [1 + δ(N – 1)] ─── Vref2 ELEC 5970-001/6970-001 Lecture 17 15 Voltage vs. Speed Delay of a gate, T ≈ CLVref ──── I = CLVref ────────── k(W/L)(Vref – Vt)2 Normalized gate delay, T where I is saturation current k is a technology parameter W/L is width to length ratio of transistor Vt is threshold voltage 4.0 1.2μ CMOS Voltage reduction slows down as we N=3 3.0 get closer to Vt N=2 2.0 N=1 1.0 0.0 11/01/05 Vt V V2=2.9V Vref =5V ELEC 5970-001/6970-001 Lecture 17 3 Supply voltage 16 Increasing Multiprocessing 1.0 1.2μ CMOS, Vref = 5V 0.8 Vt=0.8V 0.6 PN/P1 Vt=0.4V 0.4 0.2 Vt=0V (extreme case) 0.0 1 2 3 4 5 6 7 8 9 10 11 12 N 11/01/05 ELEC 5970-001/6970-001 Lecture 17 17 Extreme Case: Vt = 0 Delay, T α 1/ Vref For N processing elements, delay = NT → VN = Vref/N PN ── P1 = [1+ δ (N – 1)] 1 ── N2 → 1/N For negligible overhead, δ→0 PN ── P1 ≈ 1 ── N2 For Vt > 0, power reduction is less and there will be an optimum value of N. 11/01/05 ELEC 5970-001/6970-001 Lecture 17 18 Reduced-Power Shift Register D Q D Q D Q D Q multiplexer D D Q D Q D Q D Output Q CK(f/2) Flip-flops are operated at full voltage and half the clock frequency. 11/01/05 ELEC 5970-001/6970-001 Lecture 17 19 Power Consumption of Shift Reg. P = C’VDD2f/n 16-bit shift register, 2μ CMOS Freq (MHz) Power (μW) 1 33.0 1535 2 16.5 887 4 8.25 738 C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997. 11/01/05 Normalized power Deg. Of parallelism 1.0 0.5 0.25 0.0 1 ELEC 5970-001/6970-001 Lecture 17 2 4 Degree of parallelism, n 20 Multicore Processors • D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. • A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors. 11/01/05 ELEC 5970-001/6970-001 Lecture 17 21 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Processors 11/01/05 Computer, May 2005, p. 12 Multicore Single core 2000 2004 ELEC 5970-001/6970-001 Lecture 17 2008 22