Download Leakage Power Minimization in Deep

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Audio power wikipedia , lookup

Electrification wikipedia , lookup

Power over Ethernet wikipedia , lookup

Thermal runaway wikipedia , lookup

Mains electricity wikipedia , lookup

Electric power system wikipedia , lookup

Semiconductor device wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Earthing system wikipedia , lookup

Alternating current wikipedia , lookup

Power engineering wikipedia , lookup

Power MOSFET wikipedia , lookup

History of the transistor wikipedia , lookup

Transistor wikipedia , lookup

CMOS wikipedia , lookup

Transcript
Outline
• Introduction.
• Design for low leakage:
Leakage Power Minimization in
Deep-Submicron CMOS circuits
- Basics.
- Existing approaches.
• Sleep Transistor Insertion:
Enrico Macii
- Principle.
- Automated STI.
Politecnico di Torino
Dip. di Automatica e Informatica
10129 Torino, Italy
• Methodology.
• Preliminary results.
• Extensions.
[email protected]
• Conclusions.
Enrico Macii
Power Dissipation in CMOS Circuits
Electronic Technology Today: CMOS Convergence
• CMOS technology dominates in modern ICs.
1960s
1970s
1980s
1990s
• Power dissipation of a CMOS gate:
P = PSW + P SC + P Lk
2000s
where:
CMOS
Watch Chip
NMOS
SRAM
CMOS
NMOS
FLASH
PMOS
1960s
CMOS
NMOS
1970s
- Switching power minimization was the primary objective.
CMOS
Bipolar ECL
Server/Mainframe
• In older technologies (0.25um and above), P Lk was
marginal w.r.t. switching power:
CMOS
NMOS
Microprocessor
DRAM
• P SW = Switching (or dynamic) power.
• P SC = Short -circuit power.
• P Lk = Leakage (or stand-by) power.
CMOS
PMOS
Calculator
BICMOS
1980s
• In deep sub-micron processes, P Lk becomes critical.
CMOS
1990s
2000s
Enrico Macii
3
Leakage vs. Dynamic Power in Current CMOS Circuits
• Leakage power becomes comparable to dynamic power
as technology scales.
- Example: ASICs
Enrico Macii
• CMOS inverter:
V DD
150
125
Power Density 100
75
(Watts/cm2 )
50
25
0
PMOS
Leakage Power
Dynamic Power
250nm 180nm 130nm 90nm
V IN
65nm
V OUT
NMOS
- Example: Microprocessors [source: Intel] .
180nm, 1.5V, 1.0GHz,
221MTx (core+cache)
4
Power Dissipation Due to Leakage
[source: STMicroelectronics].
Itanium 2:
2
Isub
100%
CL
Igate
80%
Leakage Power
60%
I/O Power
Itanium 3:
40%
130nm, 1.3V, 1.5GHz,
410MTx (core+cache)
20%
Dynamic Power
0%
Itanium 2
Enrico Macii
Itanium 3
5
Enrico Macii
6
1
Power Dissipation Due to Leakage (Cont.)
Low-Leakage Design
• Leakage power of a CMOS gate:
P Leakage = I L Vdd
• Leakage power minimization:
- Design problem (and not just a technology/process problem).
where:
• For memory macros:
• V DD = Supply voltage.
• IL = Leakage current.
- Optimization based on ad- hoc solutions (cell optimization).
• For cell-based logic:
- Leakage current IL consists of two major contributions:
- Optimization requires design automation.
- Integration with existing tools (both at logic and physical level)
is mandatory.
IL= Isub + Igate
where:
• Isub = Sub-threshold current caused by low threshold voltage.
• Igate = Gate current caused by reduced thickness of gate oxide.
- Isub dominates, but grows by 5X per generation.
- Igate is less relevant, but grows much faster (500X per generation).
Enrico Macii
• Different solutions proposed for both sub -threshold and
gate leakage.
7
Enrico Macii
Existing Approaches to Low-Leakage Design
DTCMOS
• Sub-threshold leakage:
-
8
• Low-threshold cells:
Variable-threshold CMOS (VTCMOS).
Dual-threshold CMOS (DTCMOS).
Multi-threshold CMOS (MTCMOS).
Sleep transistor insertion (STI).
Multi-voltage CMOS (MVCMOS).
Body biasing (reverse -- RBB and forward -- FBB).
State assignment.
- 15- 20% faster.
- 10x higher leakage.
than high-threshold cells.
• Libraries containing high-V Th and low-V Th gates do exist.
• Idea:
- Use low -VTh gates for critical paths, high- VTh cells for the rest.
• Approach:
- Synthesize and map the design onto all high- VTh cells.
Minimum leakage implementation.
- Replace high-VTh cells on the critical path with low - VTh cells
to meet timing constraints.
• Gate leakage:
- Boosted gate MOS (BGMOS).
- P-type Domino.
- Pin reordering and state assignment.
Enrico Macii
• Leakage power increase required to meet timing
constraints may vary from 20% to 200%.
9
Enrico Macii
MTCMOS
10
MTCMOS (Cont.)
• Idea:
Use multi-threshold CMOS cells with capability of
operating at:
• Principle of MTCMOS:
- Insertion of high- VTh transistors in series to the pull-up and
pull-down networks in order to reduce the sub-threshold leakage
current while maintaining high-speed operation in active mode.
- Low- VTh, when in active mode;
- High- VTh, when in stand-by mode.
• Leakage power control obtained thanks to two effects:
Vdd
- Transistor stacking.
- Low sub-threshold leakage current of high-V Th transistors.
Vdd
Sleep
Virtual Vdd
Low-VTh
CMOS gate
Low-VTh
CMOS gate
Virtual GND
GND
Sleep’
GND
Enrico Macii
11
Enrico Macii
12
2
MTCMOS (Cont.)
STI
• Limitations of MTCMOS:
• Idea:
Modify the MTCMOS approach by:
- Impact on area:
• Each cell includes two extra transistors for low -leakage stand-by
operation ⇒ Significant cost in terms of area.
• The PMOS transistor is normally much larger than the NMOS
(e,.g., Form factor ~ 30) ⇒ Need of huge buffering circuitry .
- Using the same sleep transistors to control blocks of higher
complexity.
- Avoiding the PMOS sleep transistor.
- Process modifications for supporting the implementation of
high-VTh transistors.
- Impact on performance:
V dd
V dd
• Slow-down of power gated logic cells when the circuit is active.
• Re-activation delay for re -enabling a set of powered down cells.
Low-V Th
CMOS block
(N cells)
Low-V Th
CMOS block
(N cells)
Virtual GND
GND
Sleep’
GND
Enrico Macii
13
Enrico Macii
STI (Cont.)
Automated STI
• Further modification:
• Issues:
- Use low -V th sleep transistors.
- Granularity of STI insertion.
• Large CMOS blocks:
Size of sleep transistors and driving strengths of sleep signals.
• Small CMOS blocks:
Number of sleep transistors and size of control logic.
• Consequences:
- All devices are fabricated using the same process.
- Sub-threshold leakage power reduced by transistor stacking
effect only (smaller, but still significant reduction).
- Design of sleep transistor cells:
• Example:
• Different sizes and driving strenghts.
• Must be compliant with the cells in the library.
LOW - VTH GATED LOGIC
G
1
G
2
14
G
3
G
4
G
5
G
6
- Area and delay control.
G
7
• Selection of gates to which STI should be applied:
Requires layout information.
Vgnd
- Generation of sleep signals:
• Area, timing and power overhead.
SLEEP
LEAKAGE - CONTROL CELL
Enrico Macii
15
Enrico Macii
Automated STI (Cont.)
16
Automated STI (Cont.)
• Post-layout STI for combinational circuits.
• Post-layout STI for combinational circuits (Cont.).
- STI is performed on a row -by-row basis.
- Sleep transistors are added at the boundaries of each row
and they are connected to a common virtual ground.
- Assumptions:
• All cells in the circuit can be potentially controlled by sleep transistors.
• Only one control signal is used to drive all the sleep transistors
and it is available from some external module (e.g., a microprocessor).
- Design and characterization of a library of sleep transistor cel ls.
- Flow:
Placed
Row
Area
Constraint
Calculate
Cluster of Gates
Size and Insert
Sleep Transistor
Delay
Constraint
Sleep
Transistor
Library
Update
Layout
Enrico Macii
17
Enrico Macii
18
3
Automated STI (Cont.)
Automated STI (Cont.)
• Post-layout STI for combinational circuits (Cont.).
• Post-layout STI for combinational circuits (Cont.).
- Controlling area penalty.
- Controlling delay penalty.
• Use part of the area of empty regions between cells according
to the tolerated congestion overhead (compaction).
• Resize the floorplan according to the tolerated area overhead.
• For each row, consider the largest sleep transistor that can be
inserted (free area + allowed overhead).
• The maximum sustainable current of each sleep transistor is
computed according to the tolerated slow-down in active mode.
• The cell selection process performs a gate-by-gate exploration of each
row starting from the cell with the longest timing path and going back
towards the prymary inputs.
• The re-activation time penalty is traded (or nullified) by preventing the
power gating in the circuit of some of the cells whose arrival times are
shorter than the re -activation delay of the sleep transistors.
Free Area
G3
G2
G4
G1
G4
G1
For each row, the process stops when the
current budget is exhausted or the
re-activation time penalty is violated.
Free Area +
Area Overhead
G2
G3
Enrico Macii
19
Enrico Macii
Automated STI (Cont.)
20
Automated STI (Cont.)
• Post-layout STI for combinational circuits (Cont.).
• Post-layout STI for combinational circuits (Cont.).
- Experimental set-up:
- Results:
• Six benchmarks (from 1900 to 2600 standard cells).
• Circuits synthesized onto 0.13um CMOS technology library from
STMicroelectronics.
• Sleep transistor cells chosen so as to guarantee a total perform ance
degradation below 5% in active mode.
• Tolerated area overhead set to 5%.
• Leakage power reductions around 80%.
• Total power savings, accounting for cell dynamic and
internal power, are around 19%.
Benchmark
Original
Pdyn+int
[mW]
Ptot
[mW]
PL
[mW]
Pdyn+int
[mW]
Ptot
[mW]
PL
[%]
Pdyn+int
[%]
Ptot
[%]
Block1
0.11
0.29
0.40
0.02
0.32
0.34
78.9
-9.0
15.0
Block2
0.19
0.22
0.41
0.04
0.24
0.28
80.0
-10.1
31.0
Block3
0.16
0.31
0.47
0.04
0.33
0.37
74.6
-8.8
21.2
Block4
0.26
0.60
0.86
0.05
0.63
0.68
82.7
-5.0
18.6
Block5
0.12
0.29
0.41
0.03
0.32
0.35
78.9
-9.7
12.5
Block6
0.46
0.88
1.34
0.09
0.98
1.07
83.5
-12.4
20.1
79.7
-9.6
18.9
Avg.
Enrico Macii
21
Automated STI (Cont.)
∆∆
Optimized
PL
[mW]
Enrico Macii
22
Automated STI (Cont.)
• Post-layout STI for combinational circuits (Cont.).
• Extensions to the post-layout STI approach:
- Results (cont.):
- Handle sequential circuits:
• Area overhead around 2.5% and circuit delay increase of 5%.
Benchmark
Gates
Sleep Cells
Area_Orig
[µm2 ]
Area_Opt
[µm2 ]
∆∆
[%]
Block1
1852
14
64912
66794
2.9
Block2
1916
14
65210
66710
2.3
Block3
2215
22
65053
66550
2.3
Block4
2267
13
65412
66524
1.7
Block5
2302
26
65918
68159
3.4
Block6
2612
20
70298
71703
2.0
Enrico Macii
• Problem of state retention in sleep mode.
- Need to design low-leakage flip-flops
(based on the concept of “Baloon Circuit”).
- Automatic extraction of logic conditions for sleep.
• Can reuse idle conditions from clock gating.
• Exploit ODC -driven clock gating approach.
- Minimum overhead (shared logic with gated clock circuitry).
23
Enrico Macii
24
4
Automated STI (Cont.)
Conclusions
• Leakage accounts for around 5-10% of power budget at
180nm; this grows to 20-25% at 130nm and to 35-50% at
90nm.
• Preliminary results on the design of a low-leakage
storage element (flip-flop with “Baloon circuit”):
- Leakage current in stand-by mode:
• Regular flip-flop: 116nA.
• Low-leakage flip-flop: 10nA.
• Leakage power minimization must be faced from the
design stand-point, not just at the technology/process
level.
- Marginal delay increase.
• Several low-leakage design approaches introduced
recently.
• STI is promising, although it requires significant
methodology and tool infrastructure support.
• Preliminary results are currently under evaluation.
Enrico Macii
25
Enrico Macii
26
5