Download Low Power Design of Integrated Systems Assoc. Prof. Dimitrios

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Three-phase electric power wikipedia , lookup

Decibel wikipedia , lookup

Wireless power transfer wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Variable-frequency drive wikipedia , lookup

Rectifier wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Electrification wikipedia , lookup

Power inverter wikipedia , lookup

Islanding wikipedia , lookup

Standby power wikipedia , lookup

Audio power wikipedia , lookup

Electric power system wikipedia , lookup

Stray voltage wikipedia , lookup

Transistor wikipedia , lookup

Power over Ethernet wikipedia , lookup

Electrical substation wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Amtrak's 25 Hz traction power system wikipedia , lookup

Opto-isolator wikipedia , lookup

Surge protector wikipedia , lookup

History of electric power transmission wikipedia , lookup

Integrated circuit wikipedia , lookup

Buck converter wikipedia , lookup

Power engineering wikipedia , lookup

Power electronics wikipedia , lookup

Distribution management system wikipedia , lookup

Alternating current wikipedia , lookup

Voltage optimisation wikipedia , lookup

AC adapter wikipedia , lookup

Mains electricity wikipedia , lookup

Switched-mode power supply wikipedia , lookup

CMOS wikipedia , lookup

Transcript
Low Power Design
of Integrated Systems
Assoc. Prof. Dimitrios Soudris
[email protected]
Technology Directions:
SIA Roadmap
Year 1999 2002
Feature size (nm)
Logic trans/cm2
Cost/trans (mc)
#pads/chip
Clock (MHz)
Chip size (mm2)
Wiring levels
Power supply (V)
High-perf pow (W)
Battery pow (W)
180
6.2M
1.735
1867
1250
340
6-7
1.8
90
1.4
130
18M
.580
2553
2100
430
7
1.5
130
2
2005
2008 2011
2014
100
39M
.255
3492
3500
520
7-8
1.2
160
2.4
70
84M
.110
4776
6000
620
8-9
0.9
170
2.8
35
390M
.022
8935
16900
900
10
0.5
183
3.7
50
180M
.049
6532
10000
750
9
0.6
175
3.2
Technology Directions:
Technology
Process2002
Evolution
SIA Roadmap
Transistors
#Transistors
Frequency
Performance
Performance
Power
Power Consumption
consumption
Power Terminology
• Power is the rate at which energy is delivered
or exchanged
» electrical energy is converted to heat energy
during operation
• Power Dissipation - rate at which energy is
taken from the source (Vdd ) and converted
into heat
Why Smaller Power?
• Large Market of Portable devices
– e.g. laptops, mobile phones
• Achieve larger transistor integration
– Pentium IV contains 42 million transistors
– Teraflops chip contains 1.9 billion
transistors
• Need for “green” computers
– 10% of total electrical energy consumed by
PCs
Battery Technology Improvements
The Industry’s Reaction
• Reduce chip capacitance through process scaling
==> Expensive
• Reduce Voltage levels from 5V  3.3V 2V
==> Industry is hard to move (microprocessors,
memory,...)
• Better Circuit Techniques
==> Gated clocks, Power-Down of non-operational
units…
• Example: IBM 80 MHz PowerPC RISC (3 W @ 3.3V)
–Power Management Logic determines activity on per cycle basis
–Clocks of idle blocks are turned off  12-30% savings
–Doze - Nap and Sleep mode (5 mW)
Example: Intel Pentium-II processor
• Pentium-1: 15 Watt (5V - 66MHz)
• Pentium-2: 8 Watt (3.3V- 133 MHz)
Where Does Power Go in CMOS?
• The power consumption in digital CMOS circuits
Pavg = Pdynamic + Pshort-circuit + Pleakage
• Dynamic Power Consumption
Charging and Discharging Capacitors
• Short Circuit Currents
Short Circuit Path between Supply Rails during Switching
• Leakage (Static)
Leaking diodes and transistors
Present & Future in Power
Consumption
Dynamic Power Consumption(1)
2
Pdynamic  CL  Vdd
N f
•
where VDDV supply voltage, CL capacitance,
N is the
average
Vdd
Vdd
dd
number of transitions per clock cycle, and f frequency operation
C h argi ng
cu rre n t
IN
O UT
O UT
CL
(a)
CL
(b)
O UT
CL
Di scharging
cu rre n t
(c)
Dynamic Power Consumption (2)
• For technologies up to 0.35 m, the dynamic
consumption is about 80% of the total consumption
• Goal ===> reduce dynamic power consumption
–
–
–
–
–
reduction capacitance
reduction of supply voltage
reduction of frequency
reduction of switching activity
or combination of above factors
Leakage current consumption
• the reverse-bias diode leakage at the transistor
drains and
• the sub-threshold current through an turned-off
transistor channel
Log ID
gate
p+
p+
n-type substrate
leakage
current
Subthreshold
region
10-3
10-5
reversed-biased diode
(drain-substrate)
Saturated
region
10-7
10-9
Decreasing V
, Vdd
DS
10-11
+
Vdd
The leakage of a reverse-biased pMOS transistor.
10-13
10-15
0
0.5
1
1.5
2
VGS, volts
Subthreshold leakage with respect to gate-source
voltage
The Design Flow
System
Specifications
Syste m
Spe cifications
Syste m -Le ve l De sign
Archite cture -Le ve l
De sign
System-Level De sign
System-Level
Analysis/Estimation
Powe r mode l s
for S yste m-le ve l
compone n ts
Architecture -Level
De sign
Architecture -Level
Analysis/Estimation
Powe r mode l s
for macroce ll s,
control l ogic
Logic-Le ve l De sign
Logic-Level Design
Circuit-Le ve l De sign /
Layout synthe sis
(a)
Logic-Level
Analysis/Estimation
Circuit-Le ve l De sign /
Layout synthesis
Circuit-Le ve l
Analysis/Estimation
(b)
Powe r mode l s
for gate s, ce ll s
Power savings in terms of the design level
System level
10-20 x
RT level
2-5 x
Logic level
Transistor level
20-50%
Layout level
Increasing power savings
Behavior level
Lower Vdd Increases Delay
7.50
7.00
multiplier
2.0m technology
clock generator
NORMALIZED DELAY
6.50
Td =
6.00
5.50
5.00
CL * Vdd
I
I ~ (Vdd - Vt)2
4.50
4.00
3.50
ring oscillator
3.00
Td(Vdd=2)
2.50
2.00
1.50
1.00
microcoded DSP chip
Td(Vdd=5)
adder
adder (SPICE)
2.00
4.00
V dd (volts)
(2) * (5 - 0.7)2
=
(5) * (2 - 0.7)2
 4
6.00
Relatively independent of logic function and style.
NORMALIZED POWER-DELAY PRODUCT
Reducing Vdd
1.5
P x td = E t = CL * Vdd 2
1.00
0.70
0.50
0.30
0.20
quadratic dependence
0.15
0.1
E(Vdd=2)
E(Vdd=5)
=
(CL) * (2)2
(CL) * (5)2
51 stage ring oscillator
0.07
E(Vdd=2)  0.16 E(Vdd =5)
0.05
8-bit adder
0.03
1
2
5
Vdd (volts)
Strong function of voltage (V 2 dependence).
Relatively independent of logic function and style.
Power Delay Product Improves with lowering VDD.
Lowering the Threshold
Delay
I
2V t
Vdd
D
Vt = 0
Vt = 0.2
VGS
Reduces the Speed Loss, But Increases Leakage
Interesting Design Approach:
DESIGN FOR PLeakage == PDynamic
Transistor Sizing for Power
Minimization
Lower Capacitance
Higher Voltage
Small W/L’s
Large W/L’s
Higher Capacitance
Lower Voltage
Larger sized devices are useful only when interconnect dominated.
Minimum sized devices are usually optimal for low-power.
Techniques to reduce supply voltage
Algorithm
Transformation to exploit
concurrency
Architecture
Parallelism and Pipelining
Circuit/Logic
Transistor Sizing, Fast Logic
Structures
Technology
Threshold Voltage Reduction,
Feature Size scaling
Techniques to minimizing the
switched capacitance
System
Algorithm
U
Partitioning, Power-down, power states
Complexity, Concurrency, Regularity,
Locality, Data representation
Architecture
Concurrency, Instruction set selection,
Signal correlations,
Data representation, Data Encoding
Circuit/Logic
Transistor sizing, Logic optimization,
Power down, Layout Optimization
Technology
Advanced packaging, SOI
Power consumption of transfer and storage
over datapath operations both in hardware
[Men95] and software [Tiw94, Gon96] .
9
10
4.4
relative energy
relative energy/operation
33
0.4
0.2
3.6
1
0.0
t
d)
te) cess
c
i
r
a
e
r
e
ess
e
l
i
c
c
l
r
e
w
c
(
(
p
A
-s
lti AM
O
M
ry
yA
/
u
r
I
r
A
a
o
l
it M 6 SR 6 SR rna
em
it c
b
b
e
1
1
M
t
6
x
x
1
16
Ex
28
bit
28
1
1
6
1
8x
8x
S
t
SC
ec s
I
n
R
ts
on ock
r
n
c
e
e
h
n
cl
ter
Ot mpo
In
co
ag
r
o
t
e
Architecture Power Optimization
Techniques
• Architecture-driven voltage reduction: The key idea is to
speed up the circuit in order to be able reduces voltage while
meeting throughput rate constraints. Voltage reduction can
be achieved by introducing parallelism in hardware or
inserting flip-flops
• Switching activity minimization: Try to prevent the
generation and propagation of spurious transitions or to
reduce the number of transitions, e.g. retiming, path
balancing, data representation
• Switched capacitance minimization: Aim at the minimization
of switched capacitance
• Dynamic power management: Under certain conditions, a
circuit part becomes inactive, avoiding unnecessary
calculations, e.g. gated clocks, operand isolation, precomputation, and guarded evaluation
Architecture Trade-offs:
Reference Data Path
•
•
•
•
Critical path delay  Tadder + Tcomparator (= 25ns),  fref = 40MHz
Total capacitance being switched = Cref
Vdd = Vref = 5V
Power for reference datapath = Pref = Cref Vref2 fref
Voltage Reduction Technique:
Parallelism
• The clock rate can be reduced by half with the same throughput
 fpar = fref / 2
• Vpar = Vref / 1.7 Cpar = 2.15 Cref
• Ppar = (2.15 Cref ) (Vref /1.7)2 (fref /2)  0.36 P ref
Voltage Reduction Technique:
Pipeline
• fpipe = fref, Cpipe = 1.1 Cref, Vpipe = Vref /1.7
• Voltage can be dropped while maintaining the original
throughput
• Ppipe = Cpipe Vpipe2 fpipe = (1.1 Cref ) (Vref /1.7)2 fref = 0.37 Pref
Comparisons
Logic Style and Power Consumption
• Power-delay product improves as voltage decreases
• The “best” logic style minimizes power-delay for a given delay
constraint
The concept of gating clock signals
<
X
comparator
output
Y
gated
clock
B
clock
<
A
scheme 1
0
0
clock
1
0
gated clock
(scheme 1)
<
REG
clock
0
gated
clock
clock
scheme 2
gated clock
(scheme 2)
0
1 clock period
(a)
(b)
(c)
Resource Sharing Can Increase
Activity
Reducing Effective Capacitance
Global bus architecture
Local bus architecture
Shared Resources incur Switching Overhead
Data representation
• Sign-extension activity significantly reduced using
sign-magnitude representation
Switching Activity in Adders
Switching Activity in Multipliers
Signals and Operations Reordering
• Example: complex multiplication
Trading a multiplication for an addition
Xr
Xr
Ar
Xi
x
x
Xr
Ai
Ai
Xi
x
x
Xr
+
x
x
Xi
Ar
Ai-Ar
-
Xi
Ar
x
+
Yr
+
-
Yi
Yr
Yi
(a)
(b)
Ai+A r
Module Selection
*ii
*i
*iii
*ii
*i
+i
*iii
+i
+ii
+ii
(a)
ripple
adder
(c)
RTL
Library
array
multiplier
Area=16185
Latency=60 ns
Power=18540μW
Area=2744
Latency=30 ns
Power=1199μW
*ii
*i
+
carry
loohahead
adder
wallace
multiplier
Area=3959
Latency=20 ns
Power=1467μW
(b)
Area=18443
Latency=40 ns
Power=23545μW
+ii
(d)
*iii
Glitching activity reduction (3)
x
y
x
a
c
b
d
0
1
0
1
y
a
b
c
0
Power Consumption:
Without glitches: 823.9 μW
With glitches: 1650 μW
Function
if (x < y) then
z=c+d
else
z=a+b
z
ARCHITECTURE 1
Power Consumption:
Without glitches: 951.7 μW
With glitches: 1357.7 μW
d
1
z
ARCHITECTURE 2
Two-Level Logic Circuits
Switching Activity Minimization (1)
• Taking into account the static and transition
probabilities (i.e. temporal correlation) of the primary
inputs, we can insert in certain gates of the first logic
level (i.e. AND gates), additional input signals
resulting into reduced switching activity
• Appropriately-selected input signals force the
outputs of the AND gates to logic level zero for a
number of combinations of the binary input signals
Two-Level Logic Circuits Switching
Activity Minimization (2)
• Example: F  x 0 x1  x 0 x 2  x 0 x 3
• Signal x3 exhibits low-transition probability and
high static-1 probability, while the signals x0 , x1,
and x2 are characterized by high-transition
probabilities
x0
x1
x0
x2
x0
x3
g1
g2
g3
y1
y2
y3
Intial Logic Circuit
x3
gg44
F
x0
x1
x0
x2
x0
x3
g1
g2
g3
y1'
y2'
y3'
Modified Logic circuit
g4
F'
Additional Info
•
•
•
•
•
•
•
A. Chandrakasan and R. Brodersen, “Low Power CMOS Design”,
Kluwer Academic Publishers, 1995
Christian Piguet, Editor, « Low-Power Electronics Design”, CRC
Press, November 2004
D. Soudris, C. Piguet, C. Goutis, “Designing CMOS Circuits for LowPower”, Kluwer Academic Press, October 2002
F. Catthoor, K. Danckaert, et. al.: 2002, Data Access and Storage
Management for Embedded Programmable Processors. Kluwer
Academic Publishers
Stamatis Vassiliadis and Dimitrios Soudris, “Fine- and CoarseGrain Reconfigurable Computing” Springer,
Dordrecht/London/Boston, August 2007
http://vlsi.ee.duth.gr/~dsoudris
AMDREL website  http://vlsi.ee.duh.gr/amdrel