Download A current-saving match-line sensing scheme for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mercury-arc valve wikipedia , lookup

Power engineering wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Islanding wikipedia , lookup

Ground (electricity) wikipedia , lookup

Variable-frequency drive wikipedia , lookup

Power over Ethernet wikipedia , lookup

Rectifier wikipedia , lookup

Transistor wikipedia , lookup

History of electric power transmission wikipedia , lookup

Ohm's law wikipedia , lookup

Electrical substation wikipedia , lookup

TRIAC wikipedia , lookup

Voltage regulator wikipedia , lookup

Distribution management system wikipedia , lookup

Power electronics wikipedia , lookup

P–n diode wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Current source wikipedia , lookup

Stray voltage wikipedia , lookup

Power MOSFET wikipedia , lookup

Opto-isolator wikipedia , lookup

Voltage optimisation wikipedia , lookup

Buck converter wikipedia , lookup

Rectiverter wikipedia , lookup

Surge protector wikipedia , lookup

Mains electricity wikipedia , lookup

CMOS wikipedia , lookup

Alternating current wikipedia , lookup

Current mirror wikipedia , lookup

Transcript
ISSCC 2003 / SESSION 17 / SRAM AND DRAM / PAPER 17.3
17.3
A Current-Saving Match-Line Sensing Scheme
for Content-Addressable Memories
Igor Arsovski, Ali Sheikholeslami
University of Toronto, Toronto, Canada
A Content-Addressable Memory (CAM) searches for data by its
content and returns the address of the matching data. This feature is used extensively in applications such as internet routers
to channel incoming packets towards their destination addresses contained in the packet header. Energy per search and search
speed are two important metrics used to evaluate CAM performance [1]. A current-saving match-line (ML) sensing scheme is
proposed that substantially reduces the energy per search without compromising search speed. The proposed sensing scheme
consumes only 1.3fJ/bit/search, a 60% power reduction compared
to previously reported sensing schemes [2,3], to achieve a search
time of less than 2ns in a 256x144b array implemented in a
0.13µm CMOS process.
Figure 17.3.1 shows a general CAM architecture where searchlines (SL) run perpendicular to MLs. The search data is presented
to the SLs which are connected bit-by-bit to all the words stored in
the memory. A NAND-based ML architecture is well known [4] for
its low power consumption (due to switching of only one ML) and
relatively long search time (due to having several transistors in
series). In contrast, a NOR-based CAM provides a much faster
search by pulling down a precharged-high ML through parallel
NMOS transistors that form the NORs. This speed however comes
at the price of higher switching activities of the MLs, i.e., all MLs
are charged to VDD and then to ground.
Figure 17.3.2 shows the circuit details of the current-saving ML
sensing scheme. To explain the circuit operations of this scheme,
ignore the effects of the current-saving block and assume the
VAR node is grounded [2]. Prior to a search operation, search
data is applied to the SLs while all MLs are precharged to
ground and all 'sn' nodes are precharged high. A CAM search
begins by lowering ML_EN and allowing the PMOS transistors
to provide identical currents (IML) to all MLs. An ML0 (ML with
no mismatch) is charged faster than any ML having a one-bit
mismatch (ML1) or more (MLn for an ML with n-bit miss). When
the Dummy ML (DML, equivalent to an ML0) reaches the
threshold voltage of the NMOS transistor (VTn) in the ML sense
circuit, the 'sn' node is discharged, flagging ML_EN to turn off all
current sources, hence preventing further power consumption in
the array. The signal of the DML is slightly delayed, in fact, to let
ML0 go above the threshold voltage (VTn) while all other MLs
stay below VTn. Without the current-saving block, this architecture consumes power uniformly-distributed across the MLs.
That is, all MLs will consume the same amount of power independent of the number of mismatches in the ML. The currentsaving block is added to reduce the power consumed by the MLs
having one-bit miss or more. This is achieved through dynamically allocating less current to slower-rising MLs as next
described.
The VAR node is precharged to ground to guarantee all MLs initially receive the same level of current. This node is then gradually pulled up by a threshold current (ITh) or pulled down by a
current proportional to the ML voltage (Isink). In case of an ML0
(or DML), Isink is initially less than ITh, but gradually rises above
ITh, pulling VAR to ground and providing a maximum IML to the
corresponding ML. In case of an ML1 (or MLn with n higher than
1), Isink remains less than ITh, allowing the difference to charge
VAR and cut off IML to the corresponding ML.
• 2003 IEEE International Solid-State Circuits Conference
Figures 17.3.3 and 17.3.4 show the simulation results of the current-saving ML sensing scheme. Figure 17.3.3 compares the voltages developing on ML0 and ML1, both initially at ground. In
less than 0.5ns, VAR0 and VAR1, corresponding to ML0 and
ML1, respectively, begin to separate, helping ML0 to rise faster
than ML1. In less than 2ns, the voltage difference between ML0
and ML1 reaches at least 200mV. This is far larger than any
threshold-voltage mismatch of the NMOS transistors signaling a
global shut-down of all current sources with ML_EN. Figure
17.3.4 illustrates the current supplied to the MLs during the
search operation. The amount of current saving increases monotonically with the number of mismatches in a ML. This current
saving reaches nearly its maximum for ML7, for which the total
charge delivered by the current source is 48% less than the total
charge delivered to ML0. All other MLs (MLn with n > 7) save
similar amounts of current as ML7.
To verify the circuit operation under various process conditions,
Fig. 17.3.5 shows the comparison of the voltage developed on
ML0 with the voltage developed on a 'fast' ML1 (ML1f). To model
an ML1f, its capacitance is reduced by 20% to 0.8CML and the size
of its corresponding PMOS transistor is increased by 20% to
1.2WML. As seen in the figure, ML1f does rise faster than ML0
initially, but falls below ML0 by at least 160mV in less than
2.5ns, producing correct search results. This is made possible
partly by increasing ITh in Fig. 17.3.2 through three programmable bits (provided off-chip) and partly by ensuring that under all
process conditions, the product of IML (max) and the resistance to
ground of ML1 is less than the minimum VTn. This guarantees
that the voltage of an MLn with n ≥ 1 never reaches VTn.
Increasing ITh, however, decreases IML, hence increasing the
search time by 50% to 3ns.
To compare the energy saved in the proposed scheme versus
those of the conventional precharge-high scheme and the current-race scheme, all three methods are simulated in an array of
256 rows by 144b. Figure 17.3.6 shows that the current-save
sensing scheme cuts the energy/bit/search by 60% when compared to the precharge-high scheme and by 40% when compared
to the current-race sensing scheme.
The proposed scheme is implemented along with current-race
sensing scheme in an array of 256x144b for direct comparison of
search speed and search energy in a 1.2V, 0.13µm CMOS process.
The testchip layout, having a total area of 1.6mmx1.8mm, is
shown in Fig. 17.3.7.
Acknowledgments
Authors thank Trevis Chandler, Kostas Pagiamtzis, and Marcus van
Ierssel for insightful discussions on this work. Authors also thank
Canadian Microelectronics Corporation (CMC) for testchip fabrication,
and Natural Sciences and Engineering Research Council of Canada
(NSERC) for funding.
References:
[1] I. Y. L. Hsiao, D. H. Wang and C. W. Jen, "Power Modeling and LowPower Design of Content Addressable Memories,” IEEE ISCAS, Vol. 4, pp.
926 -929, 2001.
[2] I. Arsovski, T. Chandler, and A. Sheikholeslami, "A Ternary ContentAddressable Memory (TCAM) Based on 4T Static Storage and Including a
Current-Race Sensing Scheme,” to appear in the IEEE J. Solid State
Circuits, Jan. 2003.
[3] P. Lin and J. Kuo, "A 1-V 128-kb Four-Set-Associative CMOS Cache
Memory Using Wordline-Oriented Tag Compare (WLOTC) Structure with
Content-Addressable Memory (CAM) 10-Transistor Tag Cell,” IEEE J.
Solid State Circuits, Vol. 36, No. 4, pp. 666-676, Apr. 2001.
[4] F. Shafai, K. J. Schultz, R. Gibson, et al., "Fully Parallel 30-MHz, 2.5Mb CAM,” IEEE J. Solid State Circuits, Vol. 33, No. 11, pp. 1690-1696,
Nov. 1998.
0-7803-7707-9/03/$17.00
©2003 IEEE
ISSCC 2003 / February 12, 2003 / Salon 1-6 / 9:30 AM
prog.
bias
search-lines
prog.
delay
ML_EN
SLn
0
1
1
0
0
0
0
0
0
1
1
0
SL0
d
m
d
SLn
SLn
d
m
DML
Match Lines
miss
ML
match
0
1
1
SL0 SL0
VAR
IML
SLn SLn
1
0
ITh
ML(1)
pre
ML
miss
0
bias
ML_EN
ML(0)
NAND ML architecture
miss
1
m
d d
∆
d
Current Saving
Control
ML
ML_EN
SL0
Search Data
bias
SLn
Isink
0
m
d d
Differential Search Lines
ML(m)
m
d d
MLS
sn
ML
NOR ML architecture
Figure 17.3.1: CAM ML architectures.
ML sense circuit
Current-Saving
Control
Current Source
Figure 17.3.2: Current-saving match-line sensing scheme.
ML_EN
MLS0
120
1.2
IML0
MLS0
80
Current (µA)
Voltage (V)
0.9
0.6
IML1
IML2
40
VAR1
IML3
VAR0
0.3
ML0
ML1
IML7
0
10
0
10
13
12
11
13
12
11
Time (ns)
Time (ns)
Figure 17.3.3: Voltage development on ML0 (fully-matched) and ML1
(one-bit miss).
Figure 17.3.4: Current supplied to ML0, ML1,...,and ML7. (MLn is an
ML with n-bit mismatch).
5
MLS0
ML_EN
WML
VAR0
WSW
3
Voltage (V)
CML
full match (typical model)
VAR0
0.6
VAR1f
160mV
1.2WML
0.3
VAR1f
1.2WSW
ML1f
ML0
IR
0
10
11
12
Time (ns)
3.4
ML0
MLS0
0.9
4
Current
Saving
Control
13
Current
Saving
Control
ML1f
1xR
Energy
(fJ / bit / search)
1.2
SLs
2.1
2
2.1
SLs
1
MLs
1.3
1.4
one-bit miss (worst-case model)
Figure 17.3.5: Simulation results comparing the voltage on a typical
ML0 against the voltage on a fast-rising ML1 (worst-case one-bit miss).
• 2003 IEEE International Solid-State Circuits Conference
SLs
MLs
0.7
0.8CML
MLs
0
precharged-high
current-race
[3]
[2]
current-saving
[this work]
Figure 17.3.6: NOR architecture: energy-per-search comparison.
0-7803-7707-9/03/$17.00
©2003 IEEE
17
MEMORY ARRAY2
MLSA2
MEMORY ARRAY3
MLSA3
MEMORY ARRAY4
MLSA4
WORD-LINE DRIVERS
MEMORY ARRAY1
MLSA1
BIT/SEARCH LINES DRIVERS
Figure 17.3.7: Chip Layout.
17
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
SL0
Search Data
0
1
1
d
0
SL0
m
d
SLn
SLn
d
m
d
miss
Match Lines
ML
0
0
0
0
match
0
1
1
0
miss
1
0
1
NAND ML architecture
SL0 SL0
SLn SLn
1
ML
miss
0
1
0
0
m
d d
Differential Search Lines
m
d d
NOR ML architecture
Figure 17.3.1: CAM ML architectures.
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
search-lines
prog.
bias
prog.
delay
ML_EN
SLn
SLn
Current Saving
Control
ML
ML_EN
m
d d
∆
DML
ML(0)
bias
ML_EN
bias
ITh
ML(1)
VAR
IML
pre
Isink
ML(m)
sn
MLS
ML
Current Source
Current-Saving
Control
ML sense circuit
Figure 17.3.2: Current-saving match-line sensing scheme.
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
ML_EN
MLS0
1.2
MLS0
Voltage (V)
0.9
0.6
VAR1
VAR0
0.3
ML0
ML1
0
10
11
12
13
Time (ns)
Figure 17.3.3: Voltage development on ML0 (fully-matched) and ML1 (one-bit miss).
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
120
Current (µA)
IML0
80
IML1
IML2
40
IML3
IML7
0
10
13
12
11
Time (ns)
Figure 17.3.4: Current supplied to ML0, ML1,...,and ML7. (MLn is an ML with n-bit mismatch).
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
MLS0
ML_EN
1.2
WML
VAR0
WSW
ML0
MLS0
0.9
Current
Saving
Control
Voltage (V)
CML
full match (typical model)
VAR0
0.6
VAR1f
160mV
1.2WML
0.3
1.2WSW
ML1f
ML0
11
12
13
Current
Saving
Control
ML1f
IR
0
10
VAR1f
1xR
0.8CML
Time (ns)
one-bit miss (worst-case model)
Figure 17.3.5: Simulation results comparing the voltage on a typical ML0 against the voltage on a
fast-rising ML1 (worst-case one-bit miss).
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
5
4
3.4
Energy
(fJ / bit / search)
3
SLs
2.1
2
2.1
SLs
1
MLs
1.3
1.4
SLs
MLs
0.7
MLs
0
precharged-high
current-race
[3]
[2]
current-saving
[this work]
Figure 17.3.6: NOR architecture: energy-per-search comparison.
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE
MEMORY ARRAY2
MLSA2
MEMORY ARRAY3
MLSA3
MEMORY ARRAY4
MLSA4
WORD-LINE DRIVERS
MEMORY ARRAY1
MLSA1
BIT/SEARCH LINES DRIVERS
Figure 17.3.7: Chip Layout.
• 2003 IEEE International Solid-State Circuits Conference
0-7803-7707-9/03/$17.00
©2003 IEEE