Download Lecture 10: Reducing Power through Multicore

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ELEC 7770
Advanced VLSI Design
Spring 2007
Reducing Power through Multicore Parallelism
Vishwani D. Agrawal
James J. Danaher Professor
ECE Department, Auburn University
Auburn, AL 36849
[email protected]
http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
1
Power Dissipation in CMOS
Logic (0.25µ)
Ptotal (0→1) = CL VDD2 + tscVDD Ipeak + VDDIleakage
VDD
VDD
CL
%75
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
%20
%5
2
Low-Power Datapath Architecture
 Lower supply voltage



 This slows down circuit speed
 Use parallel computing to gain the speed back
Works well when threshold voltage is also lowered.
About 60% reduction in power obtainable.
Reference: A. P. Chandrakasan and R. W. Brodersen,
Low Power Digital CMOS Design, Boston: Kluwer
Academic Publishers (Now Springer), 1995.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
3
Combinational
logic
Register
Input
Register
A Reference Datapath
Output
Cref
CK
Supply voltage
Total capacitance switched per cycle
Clock frequency
Power consumption:
Pref
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
= Vref
= Cref
=f
= CrefVref2f
4
Comb.
Logic
Copy 2
Multiphase
Clock gen.
and mux
control
f/N
Register
f/N
N = Deg. of
parallelism
Register
Comb.
Logic
Copy 1
Supply voltage:
VN ≤ V1 = Vref
N to 1 multiplexer
Input
Register
Each copy processes
every Nth input,
operates at
f/N
reduced voltage
Register
A Parallel Architecture
Output
f
Comb.
Logic
Copy N
CK
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
5
Level Converter: L to H
VDDH
Transistors
with thicker
oxide and
longer channels
Vout_H
Vin_L
VDDL
N. H. E. Weste and D. Harris, CMOS VLSI Design, Third
Edition, Section 12.4.3, Addison-Wesley, 2005.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
6
Level Converter: H to L
VDDL
Vin_H
Transistors
with thicker
oxide and
longer channels
Vout_L
N. H. E. Weste and D. Harris, CMOS VLSI Design, Third
Edition, Section 12.4.3, Addison-Wesley, 2005.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
7
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
8
Power
PN
=
Pproc + Poverhead
Pproc
=
N(Cinreg+ Ccomb)VN2f/N + CoutregVN2f
=
(Cinreg+ Ccomb+Coutreg)VN2f
=
CrefVN2f
CoverheadVN2f
PN
[1 + δ(N – 1)]CrefVN2f
=
PN
──
P1
Spring 07, Feb 20
≈ δCref(N – 1)VN2f
Poverhead =
=
VN2
[1 + δ(N – 1)] ───
Vref2
ELEC 7770: Advanced VLSI Design (Agrawal)
9
Voltage vs. Speed
Delay of a gate, T
≈
CLVref
────
I
=
CLVref
──────────
k(W/L)(Vref – Vt)2
Normalized
gate delay, T
where I is saturation current
k is a technology parameter
W/L is width to length ratio of transistor
Vt is threshold voltage
4.0
1.2μ CMOS Voltage reduction
slows down as we
N=3
3.0
get closer to Vt
N=2
2.0
N=1
1.0
0.0
Spring 07, Feb 20
Vt
V V2=2.9V Vref =5V
ELEC 7770: Advanced VLSI Design (Agrawal)
3
Supply voltage
10
Increasing Multiprocessing
1.0
1.2μ CMOS, Vref = 5V
0.8
Vt=0.8V
0.6
PN/P1
Vt=0.4V
0.4
0.2
Vt=0V (extreme case)
0.0
1
2
3
4
5
6
7
8
9
10
11
12
N
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
11
Extreme Cases: Vt = 0
Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN
──
P1
=
[1+ δ (N – 1)]
1
──
N2
→
1/N
For negligible overhead, δ→0
PN
──
P1
≈
1
──
N2
For Vt > 0, power reduction is less and there will be an
optimum value of N.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
12
Example: Multiplier Core
 Specification:
 200MHz Clock
 15W dissipation @ 5V
 Low voltage operation, VDD ≥ 1.5 volts
Relative clock rate
 Problem:
=
(VDD – 0.5)2
───────
20.25
 Integrate multiplier core on a SOC
 Power budget for multiplier ~ 5W
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
13
Multiphase
Clock gen.
and mux
control
40MHz
Reg
40MHz
Output
Reg
Multiplier
Core 2
5 to 1 mux
Input
Reg
40MHz
Multiplier
Core 1
Reg
A Multicore Design
200MHz
Multiplier
Core 5
200MHz
CK
Core clock frequency = 200/N, N should divide 200.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
14
How Many Cores?
 For N cores:
 clock frequency = 200/N MHz
 Supply voltage, VDDN= 0.5 + (20.25/N)1/2 Volts
 Assuming 10% overhead per core,
VDDN 2
Power dissipation =15 [1 + 0.1(N – 1)] (───) watts
5
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
15
Design Tradeoffs
Number of cores
N
Clock (MHz)
Core supply VDDN
(Volts)
Total Power
(Watts)
1
200
5.00
15.0
2
100
3.68
8.94
4
50
2.75
5.90
5
40
2.51
5.29
8
25
2.10
4.50
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
16
Power Reduction in Processors
 Just about everything is used.
 Hardware methods:


 Voltage reduction for dynamic power
 Dual-threshold devices for leakage reduction
 Clock gating, frequency reduction
 Sleep mode
Architecture:
 Instruction set
 hardware organization
Software methods
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
17
Parallel Architecture
Processor
Input
Output
Processor
Input
Output
f/2
f
Processor
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
f/2
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
f
Capacitance = 2.2C
Voltage = 0.6V
Frequency = 0.5f
Power = 0.396CV2f
18
Output Input
½
Proc.
Register
Processor
Register
Input
Register
Pipeline Architecture
½
Proc.
Output
f
f
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
Spring 07, Feb 20
Capacitance = 1.2C
Voltage = 0.6V
Frequency = f
Power = 0.432CV2f
ELEC 7770: Advanced VLSI Design (Agrawal)
19
Approximate Trend
n-parallel proc.
n-stage pipeline proc.
Capacitance
nC
C
Voltage
V/n
V/n
Frequency
f/n
f
Power
CV2f/n2
CV2f/n2
Chip area
n times
10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer
Academic Publishers, 1998.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
20
Performance based on
SPECint2000 and SPECfp2000 benchmarks
Multicore Processors
Spring 07, Feb 20
Computer, May 2005, p. 12
Multicore
Single core
2000
2004
ELEC 7770: Advanced VLSI Design (Agrawal)
2008
21
Multicore Processors
 D. Geer, “Chip Makers Turn to Multicore Processors,”


Computer, vol. 38, no. 5, pp. 11-13, May 2005.
A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor
Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40,
July 2005; this special issue contains three more
articles on multicore processors.
S. K. Moore, “Winner Multimedia Monster – Cell’s Nine
Processors Make It a Supercomputer on a Chip,” IEEE
Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
22
Cell - Cell Broadband Engine
Architecture
© IEEE Spectrum, January 2006
Nine-processor chip:
192 Gflops
Spring 07, Feb 20
L to R
Atsushi Kameyama, Toshiba
James Kahle, IBM
Masakazu Suzoki, Sony
ELEC 7770: Advanced VLSI Design (Agrawal)
23
Cell’s Nine-Processor Chip
© IEEE Spectrum, January 2006
Spring 07, Feb 20
ELEC 7770: Advanced VLSI Design (Agrawal)
Eight Identical
Processors
f = 5.6GHz (max)
44.8 Gflops
24
Related documents