Download Performance characterization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Power over Ethernet wikipedia , lookup

Variable-frequency drive wikipedia , lookup

Electric power system wikipedia , lookup

Electrification wikipedia , lookup

Audio power wikipedia , lookup

Mains electricity wikipedia , lookup

Islanding wikipedia , lookup

Alternating current wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Buck converter wikipedia , lookup

Power engineering wikipedia , lookup

Transistor wikipedia , lookup

Power electronics wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Integrated circuit wikipedia , lookup

Power inverter wikipedia , lookup

Solar micro-inverter wikipedia , lookup

Power MOSFET wikipedia , lookup

Time-to-digital converter wikipedia , lookup

CMOS wikipedia , lookup

Transcript
Performance Characterization
•
•
•
•
Delay analysis
Transistor sizing
Logical effort
Power analysis
Krish Chakrabarty
ECE 261
1
Delay Definitions
• tpdr: rising propagation delay
– From input to rising output crossing VDD/2
• tpdf: falling propagation delay
– From input to falling output crossing VDD/2
• tpd: average propagation delay
– tpd = (tpdr + tpdf)/2
• tr: rise time
– From output crossing 0.2 VDD to 0.8 VDD
• tf: fall time
– From output crossing 0.8 VDD to 0.2 VDD
ECE 261
Krish Chakrabarty
2
1
Simulated Inverter Delay
• Solving differential equations by hand is too hard
• SPICE simulator solves the equations numerically
– Uses more accurate I-V models too!
• But simulations take time to write
ECE 261
Krish Chakrabarty
3
Delay Estimation
•
We would like to be able to easily estimate delay
– Not as accurate as simulation
– But easier to ask “What if?”
•
•
The step response usually looks like a 1st order RC
response with a decaying exponential.
Use RC delay models to estimate delay
– C = total capacitance on output node
– Use effective resistance R
– So that tpd = RC
•
Characterize transistors by finding their effective R
– Depends on average current as gate switches
ECE 261
Krish Chakrabarty
4
2
RC Delay Models
• Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R, capacitance C
• Capacitance proportional to width
• Resistance inversely proportional to width
ECE 261
Krish Chakrabarty
5
Example: 3-input NAND
• Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall
resistances equal to a unit inverter (R).
ECE 261
Krish Chakrabarty
6
3
Example: 3-input NAND
• Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall
resistances equal to a unit inverter (R).
ECE 261
Krish Chakrabarty
7
Example: 3-input NAND
• Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall
resistances equal to a unit inverter (R).
ECE 261
Krish Chakrabarty
8
4
3-input NAND Caps
• Annotate the 3-input NAND gate with gate and
diffusion capacitance.
ECE 261
Krish Chakrabarty
9
3-input NAND Caps
• Annotate the 3-input NAND gate with gate and
diffusion capacitance.
ECE 261
Krish Chakrabarty
10
5
3-input NAND Caps
• Annotate the 3-input NAND gate with gate and
diffusion capacitance.
ECE 261
Krish Chakrabarty
11
Elmore Delay
• ON transistors look like resistors
• Pullup or pulldown network modeled as RC ladder
• Elmore delay of RC ladder
ECE 261
Krish Chakrabarty
12
6
Example: 2-input NAND
• Estimate worst-case rising and falling delay of 2input NAND driving h identical gates.
ECE 261
Krish Chakrabarty
13
Example: 2-input NAND
• Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261
Krish Chakrabarty
14
7
Example: 2-input NAND
• Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261
Krish Chakrabarty
15
Example: 2-input NAND
• Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261
Krish Chakrabarty
16
8
Example: 2-input NAND
• Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261
Krish Chakrabarty
17
Example: 2-input NAND
• Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
ECE 261
Krish Chakrabarty
18
9
Example: 2-input NAND
• Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates. ECE 261
Krish Chakrabarty
19
Delay Components
• Delay has two parts
– Parasitic delay
• 6 or 7 RC
• Independent of load – Effort delay
• 4h RC
• Proportional to load capacitance
ECE 261
Krish Chakrabarty
20
10
Contamination Delay
• Best-case (contamination) delay can be
substantially less than propagation delay.
• Ex: If both inputs fall simultaneously
ECE 261
Krish Chakrabarty
21
Diffusion Capacitance
• We assumed contacted diffusion on every s / d.
• Good layout minimizes diffusion area
• Ex: NAND3 layout shares one diffusion contact
– Reduces output capacitance by 2C
– Merged uncontacted diffusion might help too
ECE 261
Krish Chakrabarty
22
11
Layout Comparison
• Which layout is better?
Krish Chakrabarty
ECE 261
23
Resizing the Inverter
Minimum-sized transistor:
W=3, L=2
2
To get equal rise and fall times,
n = p Wp = 3Wn, assuming
n-diffusion that electron mobility is three times that of holes
3
poly
2
p-diffusion
9
Wp=9
Sometimes the function
being implemented
makes resizing
unnecessary!
poly
ECE 261
Krish Chakrabarty
24
12
Analyzing the NAND Gate
VDD
p1
a
p2
b
p3
c
n, eff = F
1
1 + 1
+ 1
n2
n3
n1
a
n1
b
n2
Resistances are in series (conductances
are in parallel)
c
n3
If n1= n2 = n3 = n then n, eff = n/3
• Pull-down circuit has three times resistance,
one-third times the conductance
Gnd
For pull-up, only one transistor has to be on, p, eff = min{p1,p2,p3}
If p1= p2 = p3 = p = n/3 then n, eff = p no resizing is necessary
Krish Chakrabarty
ECE 261
25
Analyzing the NOR Gate
VDD
n1
a
p, eff = a
p1
b
p2
c
p3
n2
b
c
1
1 + 1 + 1
p1
p2
p3
Resistances are in series (conductances
are in parallel)
n3 If p1= p2 = p3 = p then p, eff = p/3
• Pull-up circuit has three times resistance,
one-third times the conductance
Gnd
For pull-down, only one transistor has to be on, n, eff = min{n1,n2,n3}
If n1= n2 = n3 = n = 3p then n,eff=9p,eff considerable resizing is
necessary
W = 9W !
p
ECE 261
n
Krish Chakrabarty
26
13
Effect of Series Transistors
Diffusion
Diffusion
L
poly
L
poly
L
poly
poly
3L
W
W
Krish Chakrabarty
ECE 261
27
Effect of Series Transistors
VDD
Transistor
resizing
a
example
b
p
c
p
p
Resize the pull-up transistors to
make pull-up times equal
After resizing: a: 2p, b: 2p, c: p
Pull-down
ECE 261
Krish Chakrabarty
28
14
Transistor Placement (Series Stack)
How to order transistors in a series stack?
Body effect: Vt Vsb
• At time t = 0, a=b=c=0, f=1, capacitances
are charged
• Ideally Vta = Vtb = Vtc 0.8V
Pull-up
stack
a
ta
b
tb
c
tc
F
Ca
• However, Vta > Vtb > Vtc because of
body effect
• If a, b, c become 1 at the same time, which
transistor will switch on first?
Cb
• tc will switch on first (Vsb for tc is zero), Cc will
discharge, pulling Vsb for tb to zero
• If signals arrive at different times, how should the transistors be ordered?
• Design strategy: place latest arriving signal nearest
to output-early signals will discharge internal nodes
Cc
Gnd
Krish Chakrabarty
ECE 261
29
Transistor Placement
Pull-up
stack
2
2
a
2
2
Primary
inputs
(change
simultaneously)
F
ta
tb
b
c
tc
Gnd
Ca
Cb
Cc
Pull-up
stack
2
2
2
2
b
ta
tb
a
c
tc
ECE 261
Krish Chakrabarty
F
Ca
Cb
Cc
30
15
Some Design Guidelines
• Use NAND gates (instead of NOR) wherever
possible
• Placed inverters (buffers) at high fanout nodes to
improve drive capability
• Avoid use of NOR completely in high-speed
circuits: A1 + A2 + … + An = A1.A2….An ECE 261
Krish Chakrabarty
31
Some Design Guidelines
• Use limited fan-in (<10): high fan-in long series
stacks
• Use minimum-sized gates on high fan-out nodes:
minimize load presented to driving gate
ECE 261
Krish Chakrabarty
32
16
Logical Effort
• Chip designers face a bewildering array of choices
– What is the best circuit topology for a function?
– How many stages of logic give least delay?
– How wide should the transistors be?
• Logical effort is a method to make these decisions
–
–
–
–
ECE 261
Uses a simple model of delay
Allows back-of-the-envelope calculations
Helps make rapid comparisons between alternatives
Emphasizes remarkable symmetries
Krish Chakrabarty
33
Delay in a Logic Gate
• Express delays in process-independent unit
= 3RC
12 ps in 180 nm process
40 ps in 0.6 μm process
ECE 261
Krish Chakrabarty
34
17
Delay in a Logic Gate
• Express delays in process-independent unit
• Delay has two components
ECE 261
Krish Chakrabarty
35
Delay in a Logic Gate
• Express delays in process-independent unit
• Delay has two components
• Effort delay f = gh (a.k.a. stage effort)
– Again has two components
ECE 261
Krish Chakrabarty
36
18
Delay in a Logic Gate
• Express delays in process-independent unit
• Delay has two components
• Effort delay f = gh (a.k.a. stage effort)
– Again has two components
• g: logical effort
– Measures relative ability of gate to deliver current
– g 1 for inverter
ECE 261
Krish Chakrabarty
37
Delay in a Logic Gate
• Express delays in process-independent unit
• Delay has two components
• Effort delay f = gh (a.k.a. stage effort)
– Again has two components
• h: electrical effort = Cout / Cin
– Ratio of output to input capacitance
– Sometimes called fanout
ECE 261
Krish Chakrabarty
38
19
Delay in a Logic Gate
• Express delays in process-independent unit
• Delay has two components
• Parasitic delay p
– Represents delay of gate driving no load
– Set by internal parasitic capacitance
ECE 261
Krish Chakrabarty
39
Delay Plots
d = f + p
= gh + p
ECE 261
Krish Chakrabarty
40
20
Delay Plots
d = f + p
= gh + p
ECE 261
Krish Chakrabarty
41
Computing Logical Effort
• Definition: Logical effort is the ratio of the input
capacitance of a gate to the input capacitance of
an inverter delivering the same output current.
• Measure from delay vs. fanout plots
• Or estimate by counting transistor widths
ECE 261
Krish Chakrabarty
42
21
Catalog of Gates
• Logical effort of common gates
Gate type
2
Number of inputs
3
4
n
NAND
4/3
5/3
6/3
(n+2)/3
NOR
5/3
7/3
9/3
(2n+1)/3
Tristate / mux 2
2
2
2
2
1
Inverter
1
Krish Chakrabarty
ECE 261
43
Catalog of Gates
• Parasitic delay of common gates
– In multiples of pinv (1)
Gate type
2
Number of inputs
3
4
n
NAND
2
3
4
n
NOR
2
3
4
n
Tristate / mux 2
4
6
8
2n
XOR, XNOR
4
6
8
1
Inverter
ECE 261
1
Krish Chakrabarty
44
22
Example: Ring Oscillator
• Estimate the frequency of an N-stage ring oscillator
Logical Effort:
g = Electrical Effort: h =
Parasitic Delay: p =
Stage Delay:
d =
Frequency: fosc = ECE 261
Krish Chakrabarty
45
Example: Ring Oscillator
• Estimate the frequency of an N-stage ring oscillator
Logical Effort:
g = 1
Electrical Effort: h = 1
Parasitic Delay: p = 1
Stage Delay:
d = 2
Frequency: fosc = 1/(2*N*d) = 1/4N
ECE 261
Krish Chakrabarty
31 stage ring oscillator in
0.6 μm process has
frequency of ~ 200 MHz
46
23
Example: FO4 Inverter
• Estimate the delay of a fanout-of-4 (FO4) inverter
Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay:
g = h =
p =
d =
Krish Chakrabarty
ECE 261
47
Example: FO4 Inverter
• Estimate the delay of a fanout-of-4 (FO4) inverter
Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay:
ECE 261
g = 1
h = 4
p = 1
d = 5
The FO4 delay is about
200 ps in 0.6 μm process
60 ps in a 180 nm process
f/3 ns in an f μm process
Krish Chakrabarty
48
24
Multistage Logic Networks
• Logical effort generalizes to multistage networks
• Path Logical Effort
• Path Electrical Effort
• Path Effort
ECE 261
Krish Chakrabarty
49
Multistage Logic Networks
• Logical effort generalizes to multistage networks
• Path Logical Effort
• Path Electrical Effort
• Path Effort
• Can we write F = GH?
ECE 261
Krish Chakrabarty
50
25
Paths that Branch
• No! Consider paths that branch:
G =
H =
GH =
h1 =
h2 =
F = GH? ECE 261
Krish Chakrabarty
51
Paths that Branch
• No! Consider paths that branch:
G = 1
H = 90 / 5 = 18
GH = 18
h1 = (15 +15) / 5 = 6
h2 = 90 / 15 = 6
F = g1g2h1h2 = 36 = 2GH ECE 261
Krish Chakrabarty
52
26
Branching Effort
• Introduce branching effort
– Accounts for branching between stages in path
Note:
• Now we compute the path effort
– F = GBH
Krish Chakrabarty
ECE 261
53
Multistage Delays
• Path Effort Delay
• Path Parasitic Delay
• Path Delay
ECE 261
Krish Chakrabarty
54
27
Designing Fast Circuits
•
Delay is smallest when each stage bears same effort
•
Thus minimum delay of N stage path is
•
This is a key result of logical effort
– Find fastest possible delay
– Doesn’t require calculating gate sizes
ECE 261
Krish Chakrabarty
55
Gate Sizes
• How wide should the gates be for least delay?
• Working backward, apply capacitance transformation to
find input capacitance of each gate given load it drives.
• Check work by verifying input cap spec is met.
ECE 261
Krish Chakrabarty
56
28
Example: 3-stage path
• Select gate sizes x and y for least delay from A to
B
Krish Chakrabarty
ECE 261
57
Example: 3-stage path
Logical Effort
Electrical Effort
Branching Effort
Path Effort Best Stage Effort
Parasitic Delay
Delay
ECE 261
G = H =
B =
F =
P =
D =
Krish Chakrabarty
58
29
Example: 3-stage path
Logical Effort
Electrical Effort
Branching Effort
Path Effort Best Stage Effort
Parasitic Delay
Delay
ECE 261
G = (4/3)*(5/3)*(5/3) = 100/27
H = 45/8
B = 3 * 2 = 6
F = GBH = 125
P = 2 + 3 + 2 = 7
D = 3*5 + 7 = 22 = 4.4 FO4
Krish Chakrabarty
59
Example: 3-stage path
• Work backward for sizes
y =
x =
ECE 261
Krish Chakrabarty
60
30
Example: 3-stage path
• Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10
Krish Chakrabarty
ECE 261
61
Best Number of Stages
• How many stages should a path use?
– Minimizing number of stages is not always fastest
• Example: drive 64-bit datapath with unit inverter
D =
ECE 261
Krish Chakrabarty
62
31
Best Number of Stages
• How many stages should a path use?
– Minimizing number of stages is not always fastest
• Example: drive 64-bit datapath with unit inverter
D = NF1/N + P
= N(64)1/N + N
ECE 261
Krish Chakrabarty
63
Derivation
• Consider adding inverters to end of path
– How many give least delay?
• Define best stage effort ECE 261
Krish Chakrabarty
64
32
Best Stage Effort
• has no closed-form solution
• Neglecting parasitics (pinv = 0), we find = 2.718
(e)
• For pinv = 1, solve numerically for = 3.59
Krish Chakrabarty
ECE 261
65
Review of Definitions
Term
number of stages
logical effort
electrical effort
branching effort
effort
effort delay
parasitic delay
delay
ECE 261
Stage
Krish Chakrabarty
Path
66
33
Method of Logical Effort
1)
2)
3)
4)
5)
Compute path effort
Estimate best number of stages
Sketch path with N stages
Estimate least delay
Determine best stage effort
6) Find gate sizes
Krish Chakrabarty
ECE 261
67
Limits of Logical Effort
• Chicken and egg problem
– Need path to compute G
– But don’t know number of stages without G
• Simplistic delay model
– Neglects input rise time effects
• Interconnect
– Iteration required in designs with wire
• Maximum speed only
– Not minimum area/power for constrained delay
ECE 261
Krish Chakrabarty
68
34
Summary
• Logical effort is useful for thinking of delay in circuits
–
–
–
–
–
–
–
Numeric logical effort characterizes gates
NANDs are faster than NORs in CMOS
Paths are fastest when effort delays are ~4
Path delay is weakly sensitive to stages, sizes
But using fewer stages doesn’t mean faster paths
Delay of path is about log4F FO4 inverter delays
Inverters and NAND2 best for driving large caps
• Provides language for discussing fast circuits
– But requires practice to master
Krish Chakrabarty
ECE 261
69
Power and Energy
• Power is drawn from a voltage source attached to
the VDD pin(s) of a chip.
• Instantaneous Power:
• Energy:
• Average Power:
ECE 261
Krish Chakrabarty
70
35
Dynamic Power
• Dynamic power is required to charge and discharge load
capacitances when transistors switch.
• One cycle involves a rising and falling output.
• On rising output, charge Q = CVDD is required
• On falling output, charge is dumped to GND
• This repeats Tfsw times
over an interval of T
ECE 261
Krish Chakrabarty
71
Dynamic Power Cont.
ECE 261
Krish Chakrabarty
72
36
Dynamic Power Cont.
Krish Chakrabarty
ECE 261
73
Activity Factor
• Suppose the system clock frequency = f
• Let fsw = f, where = activity factor
– If the signal is a clock, = 1
– If the signal switches once per cycle, = – Dynamic gates: • Switch either 0 or 2 times per cycle, = – Static gates:
• Depends on design, but typically = 0.1
• Dynamic power:
ECE 261
Krish Chakrabarty
74
37
Short Circuit Current
• When transistors switch, both nMOS and pMOS
networks may be momentarily ON at once
• Leads to a blip of “short circuit” current.
• < 10% of dynamic power if rise/fall times are
comparable for input and output
ECE 261
Krish Chakrabarty
75
Example
• 200 Mtransistor chip
– 20M logic transistors
• Average width: 12 – 180M memory transistors
• Average width: 4 – 1.2 V 100 nm process
– Cg = 2 fF/μm
ECE 261
Krish Chakrabarty
76
38
Dynamic Example
• Static CMOS logic gates: activity factor = 0.1
• Memory arrays: activity factor = 0.05 (many
banks!)
• Estimate dynamic power consumption per MHz.
Neglect wire capacitance and short-circuit current.
ECE 261
Krish Chakrabarty
77
Dynamic Example
• Static CMOS logic gates: activity factor = 0.1
• Memory arrays: activity factor = 0.05 (many
banks!)
• Estimate dynamic power consumption per MHz.
Neglect wire capacitance.
ECE 261
Krish Chakrabarty
78
39
Static Power
• Static power is consumed even when chip is
quiescent.
– Ratioed circuits burn power in fight between ON
transistors
– Leakage draws power from nominally OFF devices
Krish Chakrabarty
ECE 261
79
Ratio Example
• The chip contains a 32 word x 48 bit ROM
– Uses pseudo-nMOS decoder and bitline pullups
– On average, one wordline and 24 bitlines are high
• Find static power drawn by the ROM – = 75 μA/V2
– Vtp = -0.4V
ECE 261
Krish Chakrabarty
80
40
Ratio Example
• The chip contains a 32 word x 48 bit ROM
– Uses pseudo-nMOS decoder and bitline pullups
– On average, one wordline and 24 bitlines are high
• Find static power drawn by the ROM – = 75 μA/V2
– Vtp = -0.4V
• Solution:
Krish Chakrabarty
ECE 261
81
Leakage Example
• The process has two threshold voltages and two oxide
thicknesses. • Subthreshold leakage: – 20 nA/μm for low Vt
– 0.02 nA/μm for high Vt
• Gate leakage:
– 3 nA/μm for thin oxide
– 0.002 nA/μm for thick oxide
• Memories use low-leakage transistors everywhere
• Gates use low-leakage transistors on 80% of logic
ECE 261
Krish Chakrabarty
82
41
Leakage Example Cont.
• Estimate static power:
Krish Chakrabarty
ECE 261
83
Leakage Example Cont.
• Estimate static power:
– High leakage:
– Low leakage:
ECE 261
Krish Chakrabarty
84
42
Leakage Example Cont.
• Estimate static power:
– High leakage:
– Low leakage:
• If no low leakage devices, Pstatic = 749 mW (!)
ECE 261
Krish Chakrabarty
85
Low Power Design
• Reduce dynamic power
– :
– C:
– VDD:
– f:
• Reduce static power
ECE 261
Krish Chakrabarty
86
43
Low Power Design
• Reduce dynamic power
– : clock gating, sleep mode
– C:
– VDD:
– f:
• Reduce static power
ECE 261
Krish Chakrabarty
87
Low Power Design
• Reduce dynamic power
– : clock gating, sleep mode
– C: small transistors (esp. on clock), short wires
– VDD:
– f:
• Reduce static power
ECE 261
Krish Chakrabarty
88
44
Low Power Design
• Reduce dynamic power
– : clock gating, sleep mode
– C: small transistors (esp. on clock), short wires
– VDD: lowest suitable voltage
– f:
• Reduce static power
ECE 261
Krish Chakrabarty
89
Low Power Design
• Reduce dynamic power
– : clock gating, sleep mode
– C: small transistors (esp. on clock), short wires
– VDD: lowest suitable voltage
– f: lowest suitable frequency
• Reduce static power
ECE 261
Krish Chakrabarty
90
45
Low Power Design
• Reduce dynamic power
–
–
–
–
: clock gating, sleep mode
C: small transistors (esp. on clock), short wires VDD: lowest suitable voltage
f: lowest suitable frequency
• Reduce static power
– Selectively use ratioed circuits
– Selectively use low Vt devices
– Leakage reduction: stacked devices, body bias, low temperature
ECE 261
Krish Chakrabarty
91
46