Download Lecture 17 - Auburn Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ELEC 5970-001/6970-001(Fall 2005)
Special Topics in Electrical Engineering
Low-Power Design of Electronic Circuits
Low-Power Logic Design
and Parallelism
Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University
http://www.eng.auburn.edu/~vagrawal
[email protected]
11/01/05
ELEC 5970-001/6970-001 Lecture 17
1
State Encoding
• Two-bit binary counter:
• State sequence, 00→01→10→11→00
• Six bit transitions in four clock cycles
• 6/4 = 1.5 transitions per clock
• Two-bit Gray-code counter
• State sequence, 00→01→11→10→00
• Four bit transitions in four clock cycles
• 4/4 = 1.0 transition per clock
• Gray-code counter is more power efficient.
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:
Kluwer Academic Publishers (now Springer), 1998.
11/01/05
ELEC 5970-001/6970-001 Lecture 17
2
Three-Bit Counters
State
000
Binary
No. of toggles
-
Gray-code
State
No. of toggles
000
-
001
010
011
100
1
2
1
3
001
011
010
110
1
1
1
1
101
110
1
2
111
101
1
1
111
000
1
3
100
000
1
1
11/01/05
ELEC 5970-001/6970-001 Lecture 17
3
N-Bit Counter: Toggles in Counting Cycle
• Binary counter: T(binary) = 2(2N – 1)
• Gray-code counter: T(gray) = 2N
• T(gray)/T(binary) = 2N-1/(2N – 1) → 0.5
11/01/05
Bits
T(binary)
T(gray)
T(gray)/T(binary)
1
2
2
1.0
2
6
4
0.6667
3
14
8
0.5714
4
30
16
0.5333
5
62
32
0.5161
6
126
64
0.5079
∞
-
-
0.5000
ELEC 5970-001/6970-001 Lecture 17
4
Bus Encoding
• Example: Four bit bus
• 0000→1110 has three transitions.
• If bits of second pattern are inverted, then 0000→0001 will
have only one transition.
Number of bit transitions
after inversion encoding
• Bit-inversion encoding for N-bit bus:
11/01/05
N
N/2
0
0
N/2
Number of bit transitions
ELEC 5970-001/6970-001 Lecture 17
N
5
Sent data
Received data
Bus-Inversion Encoding Logic
Polarity
decision
logic
11/01/05
Bus register
Polarity bit
M. Stan and W. Burleson, “Bus-Invert
Coding for Low Power I/O,” IEEE Trans.
VLSI Systems, vol. 3, no. 1, pp. 49-58,
March 1995.
ELEC 5970-001/6970-001 Lecture 17
6
Transition
probability
based on
PI statistics
FSM State Encoding
0.6
11
0.3
0.4
00
0.6
0.6
0.1
01
0.3
0.1
0.4
01
00
0.9
0.6
0.1
0.1
11
0.9
Expected number of state-bit transitions:
2(0.3+0.4) + 1(0.1+0.1) = 1.6
1(0.3+0.4+0.1) + 2(0.1) = 1.0
State encoding can be selected using a power-based cost function.
11/01/05
ELEC 5970-001/6970-001 Lecture 17
7
FSM: Clock-Gating
• Moore machine: Outputs depend only on
the state variables.
– If a state has a self-loop in the state transition
graph (STG), then clock can be stopped
whenever a self-loop is to be executed.
Xi/Zk
Si
Sk
Sj
11/01/05
Xj/Zk
Xk/Zk
Clock can be stopped
when (Xk, Sk) combination
occurs.
ELEC 5970-001/6970-001 Lecture 17
8
Clock-Gating in Moore FSM
Flip-flops
PI
Clock
activation
logic
CK
11/01/05
Latch
Combinational
logic
PO
L. Benini and G. De Micheli,
Dynamic Power Management,
Boston: Springer, 1998.
ELEC 5970-001/6970-001 Lecture 17
9
Clock-Gating in Low-Power Flip-Flop
D
D
Q
CK
11/01/05
ELEC 5970-001/6970-001 Lecture 17
10
Low-Power Datapath Architecture
• Lower supply voltage
– This slows down circuit speed
– Use parallel computing to gain the speed back
• Works well when threshold voltage is also
lowered.
• About 60% reduction in power obtainable.
• Reference: A. P. Chandrakasan and R. W.
Brodersen, Low Power Digital CMOS Design,
Boston: Kluwer Academic Publishers (Now
Springer), 1995.
11/01/05
ELEC 5970-001/6970-001 Lecture 17
11
Combinational
logic
Register
Input
Register
A Reference Datapath
Output
Cref
CK
Supply voltage
Total capacitance switched per cycle
Clock frequency
Power consumption:
Pref
11/01/05
ELEC 5970-001/6970-001 Lecture 17
= Vref
= Cref
=f
= CrefVref2f
12
Comb.
Logic
Copy 2
Multiphase
Clock gen.
and mux
control
f/N
Register
f/N
N = Deg. of
parallelism
Register
Input
Comb.
Logic
Copy 1
Supply voltage:
VN ≤ V1 = Vref
N to 1 multiplexer
f/N
Register
A copy processes
every Nth input,
operates at
reduced voltage
Register
A Parallel Architecture
Output
f
Comb.
Logic
Copy N
CK
11/01/05
ELEC 5970-001/6970-001 Lecture 17
13
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
11/01/05
ELEC 5970-001/6970-001 Lecture 17
14
Power
PN
=
Pproc + Poverhead
Pproc
=
N(Cinreg+Ccomb)VN2f/N + CoutregVN2f
=
(Cinreg+Ccomb+Coutreg)VN2f
=
CrefVN2f
CoverheadVN2f
PN
[1 + δ(N – 1)]CrefVN2f
=
PN
──
P1
11/01/05
≈ δCref(N – 1)VN2f
Poverhead =
=
VN2
[1 + δ(N – 1)] ───
Vref2
ELEC 5970-001/6970-001 Lecture 17
15
Voltage vs. Speed
Delay of a gate, T
≈
CLVref
────
I
=
CLVref
──────────
k(W/L)(Vref – Vt)2
Normalized
gate delay, T
where I is saturation current
k is a technology parameter
W/L is width to length ratio of transistor
Vt is threshold voltage
4.0
1.2μ CMOS Voltage reduction
slows down as we
N=3
3.0
get closer to Vt
N=2
2.0
N=1
1.0
0.0
11/01/05
Vt
V V2=2.9V Vref =5V
ELEC 5970-001/6970-001
Lecture 17
3
Supply voltage
16
Increasing Multiprocessing
1.0
1.2μ CMOS, Vref = 5V
0.8
Vt=0.8V
0.6
PN/P1
Vt=0.4V
0.4
0.2
Vt=0V (extreme case)
0.0
1
2
3
4
5
6
7
8
9
10
11
12
N
11/01/05
ELEC 5970-001/6970-001 Lecture 17
17
Extreme Case: Vt = 0
Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN
──
P1
=
[1+ δ (N – 1)]
1
──
N2
→
1/N
For negligible overhead, δ→0
PN
──
P1
≈
1
──
N2
For Vt > 0, power reduction is less and there will be an
optimum value of N.
11/01/05
ELEC 5970-001/6970-001 Lecture 17
18
Reduced-Power Shift Register
D
Q
D
Q
D
Q
D
Q
multiplexer
D
D
Q
D
Q
D
Q
D
Output
Q
CK(f/2)
Flip-flops are operated at full voltage and half the clock frequency.
11/01/05
ELEC 5970-001/6970-001 Lecture 17
19
Power Consumption of Shift Reg.
P = C’VDD2f/n
16-bit shift register, 2μ CMOS
Freq
(MHz)
Power
(μW)
1
33.0
1535
2
16.5
887
4
8.25
738
C. Piguet, “Circuit and Logic Level
Design,” pages 103-133 in W. Nebel
and J. Mermet (ed.), Low Power
Design in Deep Submicron
Electronics, Boston: Kluwer
Academic Publishers, 1997.
11/01/05
Normalized power
Deg. Of
parallelism
1.0
0.5
0.25
0.0
1
ELEC 5970-001/6970-001 Lecture 17
2
4
Degree of parallelism, n
20
Multicore Processors
• D. Geer, “Chip Makers Turn to Multicore
Processors,” Computer, vol. 38, no. 5, pp.
11-13, May 2005.
• A. Jerraya, H. Tenhunen and W. Wolf,
“Multiprocessor Systems-on-Chips,”
Computer, vol. 5, no. 7, pp. 36-40, July
2005; this special issue contains three
more articles on multicore processors.
11/01/05
ELEC 5970-001/6970-001 Lecture 17
21
Performance based on
SPECint2000 and SPECfp2000 benchmarks
Multicore Processors
11/01/05
Computer, May 2005, p. 12
Multicore
Single core
2000
2004
ELEC 5970-001/6970-001 Lecture 17
2008
22
Related documents