Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 Power Dissipation in CMOS Logic (0.25µ) Ptotal (0→1) = CL VDD2 + tscVDD Ipeak + VDDIleakage VDD VDD CL %75 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) %20 %5 2 Low-Power Datapath Architecture Lower supply voltage This slows down circuit speed Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 3 Combinational logic Register Input Register A Reference Datapath Output Cref CK Supply voltage Total capacitance switched per cycle Clock frequency Power consumption: Pref Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) = Vref = Cref =f = CrefVref2f 4 Comb. Logic Copy 2 Multiphase Clock gen. and mux control f/N Register f/N N = Deg. of parallelism Register Comb. Logic Copy 1 Supply voltage: VN ≤ V1 = Vref N to 1 multiplexer Input Register Each copy processes every Nth input, operates at f/N reduced voltage Register A Parallel Architecture Output f Comb. Logic Copy N CK Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 5 Level Converter: L to H VDDH Transistors with thicker oxide and longer channels Vout_H Vin_L VDDL N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 6 Level Converter: H to L VDDL Vin_H Transistors with thicker oxide and longer channels Vout_L N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 7 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 8 Power PN = Pproc + Poverhead Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN2f = (Cinreg+ Ccomb+Coutreg)VN2f = CrefVN2f CoverheadVN2f PN [1 + δ(N – 1)]CrefVN2f = PN ── P1 Spring 07, Feb 20 ≈ δCref(N – 1)VN2f Poverhead = = VN2 [1 + δ(N – 1)] ─── Vref2 ELEC 7770: Advanced VLSI Design (Agrawal) 9 Voltage vs. Speed Delay of a gate, T ≈ CLVref ──── I = CLVref ────────── k(W/L)(Vref – Vt)2 Normalized gate delay, T where I is saturation current k is a technology parameter W/L is width to length ratio of transistor Vt is threshold voltage 4.0 1.2μ CMOS Voltage reduction slows down as we N=3 3.0 get closer to Vt N=2 2.0 N=1 1.0 0.0 Spring 07, Feb 20 Vt V V2=2.9V Vref =5V ELEC 7770: Advanced VLSI Design (Agrawal) 3 Supply voltage 10 Increasing Multiprocessing 1.0 1.2μ CMOS, Vref = 5V 0.8 Vt=0.8V 0.6 PN/P1 Vt=0.4V 0.4 0.2 Vt=0V (extreme case) 0.0 1 2 3 4 5 6 7 8 9 10 11 12 N Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 11 Extreme Cases: Vt = 0 Delay, T α 1/ Vref For N processing elements, delay = NT → VN = Vref/N PN ── P1 = [1+ δ (N – 1)] 1 ── N2 → 1/N For negligible overhead, δ→0 PN ── P1 ≈ 1 ── N2 For Vt > 0, power reduction is less and there will be an optimum value of N. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 12 Example: Multiplier Core Specification: 200MHz Clock 15W dissipation @ 5V Low voltage operation, VDD ≥ 1.5 volts Relative clock rate Problem: = (VDD – 0.5)2 ─────── 20.25 Integrate multiplier core on a SOC Power budget for multiplier ~ 5W Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 13 Multiphase Clock gen. and mux control 40MHz Reg 40MHz Output Reg Multiplier Core 2 5 to 1 mux Input Reg 40MHz Multiplier Core 1 Reg A Multicore Design 200MHz Multiplier Core 5 200MHz CK Core clock frequency = 200/N, N should divide 200. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 14 How Many Cores? For N cores: clock frequency = 200/N MHz Supply voltage, VDDN= 0.5 + (20.25/N)1/2 Volts Assuming 10% overhead per core, VDDN 2 Power dissipation =15 [1 + 0.1(N – 1)] (───) watts 5 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 15 Design Tradeoffs Number of cores N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 1 200 5.00 15.0 2 100 3.68 8.94 4 50 2.75 5.90 5 40 2.51 5.29 8 25 2.10 4.50 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 16 Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 17 Parallel Architecture Processor Input Output Processor Input Output f/2 f Processor Capacitance = C Voltage = V Frequency = f Power = CV2f f/2 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV2f 18 Output Input ½ Proc. Register Processor Register Input Register Pipeline Architecture ½ Proc. Output f f Capacitance = C Voltage = V Frequency = f Power = CV2f Spring 07, Feb 20 Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV2f ELEC 7770: Advanced VLSI Design (Agrawal) 19 Approximate Trend n-parallel proc. n-stage pipeline proc. Capacitance nC C Voltage V/n V/n Frequency f/n f Power CV2f/n2 CV2f/n2 Chip area n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 20 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Processors Spring 07, Feb 20 Computer, May 2005, p. 12 Multicore Single core 2000 2004 ELEC 7770: Advanced VLSI Design (Agrawal) 2008 21 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006. Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 22 Cell - Cell Broadband Engine Architecture © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops Spring 07, Feb 20 L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony ELEC 7770: Advanced VLSI Design (Agrawal) 23 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops 24