Download Survey on Content Addressable Memory and Sparse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Immunity-aware programming wikipedia , lookup

Random-access memory wikipedia , lookup

Transcript
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Survey on Content Addressable Memory and Sparse Clustered
Network
1
1,2
Aswathy K.S., 2K. Gnana Sheela
Department of Electronics and Communication, Toc H Institute of Science and Technology Kerala, India
Email: [email protected], [email protected]
Abstract— Most memory devices store and retrieve data
by addressing specific memory locations. As a result, this
path often becomes the limiting factor for systems that rely
on fast memory accesses. The time required to find an item
stored in memory can be reduced considerably if the item
can be identified for access by its content rather than by its
address. A memory that is accessed in this way is called
content-addressable
memory
or
CAM.
Content
Addressable Memory provides a performance advantage
over other memory search algorithms, such as binary or
tree-based searches or look-aside tag buffers, by
comparing the desired information against the entire list of
pre-stored entries simultaneously, often resulting in an
order-of-magnitude reduction in the search time. Different
Content Addressable Memory algorithms are available
today, each of them having its own advantages as well as
disadvantages. Here a comparative study on different
Content Addressable Memory architecture and algorithm
is done. The introduction of Sparse Clustered Network
(SCN) had a great impact on Content Addressable
Memory designs. A low power Content Addressable
Memory based on Sparse Clustered Networks has been
proposed. The dynamic energy consumption of the
proposed design is significantly lower compared with that
of other conventional low-power Content Addressable
Memory design.
Keywords: Content Addressable Memory (CAM), Sparse
Clustered Network (SCN), Associative Memory; Message
Error Rate (MER)
I. INTRODUCTION
A Content Addressable Memory is a type of memory
that can be accessed using its contents rather than an
explicit address. In order to access a particular entry in
such memories, a search data word is compared against
previously stored entries in parallel to find a match.
Each stored entry is associated with a tag that is used in
the comparison process. Once a search data word is
applied to the input of a Content Addressable Memory,
the matching data word is retrieved within a single clock
cycle if it exists. This prominent feature makes CAM a
promising candidate for applications where frequent and
fast look-up operations are required, such as in
translation look-aside buffers (TLBs), database
accelerators, image processing, parametric curve
extraction,
Hough
transformation,
Huffman
coding/decoding and image coding. Although dynamic
CMOS circuit techniques can result in low-power and
low-cost CAM’s, these designs can suffer from low
noise margins, charge sharing, and other problems not to
be energy efficient when scaled. Thus a new family of
associative memories based on SCNs has been recently
introduced and implemented using field-programmable
gate arrays (FPGAs). Such memories make it possible to
store many short messages instead of few long ones as
in the conventional Hopfield networks with significantly
lower level of computational complexity. Furthermore, a
significant improvement is achieved in terms of the
number of information bits stored per memory bit
(efficiency). A low-power CAM employing a new
algorithm for associatively between input tag and the
corresponding address of the output data which is based
on sparse clustered network using binary connections
that on-average eliminates most of the parallel
comparisons performed during a search, is compared
against various other content addressable memories. It
was found that the dynamic energy consumption of the
proposed design is significantly lower compared with
that of a conventional low-power Content Addressable
Memory design. Given an input tag, the proposed
architecture computes a few possibilities for the location
of the matched tag and performs the comparison on
them to locate a single valid match.
The next section II the Literature Review includes the
details on various content addressable memories and the
architectural effects on energy consumption and
performance. The section III contains the comparative
analysis on the various algorithms included in former
section. The next section IV concludes the survey and a
better choice among the previous algorithm and its
performance results is also included.
II. LITERATURE REVIEW
Yamagata T et al (1992) proposed a 288-kb (8K words
X 36 b) fully parallel content addressable memory
(CAM) LSI using a compact dynamic Content
Addressable Memory cell with a stacked-capacitor
structure and a novel hierarchical priority encoder [1].
The proposed CAM cell is shown in Fig 2.1. It consists
of five NMOS transistors and two stacked capacitors.
Four of the transistors (Ms0, Ms1, Mw0, Mw1) are used
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
15
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
to store and access data, and one (Md) is used as a diode
to isolate current paths during match operations.
Charges are stored on stacked capacitors (Cs0, Cs1) and
the Ms0 and Ms1 gates. The opposite electrodes of the
Cs0 and Csl are connected to a cell plate voltage Vcp,
which is equal to half Vcc (Vcc: supply voltage). Two
bit lines are supplied with data in write and search
operations. The word line (WL) allows write access to
each cell in a word. The match line (ML) passes through
a word to perform a logical AND of the results of each
cell’s comparison. The ML is also used to read cell data.
The storage capacitor (Cso) is stacked on the gate of the
Ms0 and Mw0. The gate electrodes of the Mso and
Mw0are fabricated with the first poly-Si layer [1]. The
stacked capacitor is composed of the second poly-Si
layer, the insulator film, and the third poly-Si layer. The
second poly-Si layer (storage node) is also used to
connect the drain (or source) of the Mw0 to the gate of
the Ms0.Similarly, the storage capacitor (Csl) is formed
on the gate of the Msl and Mwl.
A write operation is performed by activating a select
WL and then driving the bit lines according to the write
data. A read operation is accomplished by discharging
the bit lines and then driving a selected ML to a high
level. A read operation is accomplished by discharging
the bit lines and then driving a selected ML to a high
level. A match operation is achieved by pre charging
both the bit lines and the match lines to a high level, and
then loading the bit lines with search data. When match
occurs at several words in a search operation (multiple
response), the CAM outputs the address of the matched
word with the highest priority. A priority encoder (PE)
circuit is utilized for multiple-response resolution and
match address generation [1]. As a bit capacity of
Content Addressable Memory’s becomes larger, the
number of words increases rather than the bit length of
words. Thus in a high density Content Addressable
Memory chip, the configuration is such that the cell
array is divided into several blocks. This creates a
serious problem concerning the layout of the priority
encoder. When the priority encoder is incorporated in
each block, the silicon area occupied by the priority
encoder and the power dissipation of the priority
encoder are increased in proportion to the number of
divided blocks. So in order to overcome this, novel
hierarchical priority encoder architecture suitable for
high-density CAM was proposed.
Fig 2.1: CAM cell structure with a stacked capacitor
In the novel CAM cells, since a stacked capacitor is
adopted as the storage capacitor, a storage capacitance
of 90ff is attained in a 66µm2 Content Addressable
Memory cell using a 0.8µm CMOS process [1]. This is
sufficient for the high soft-error immunity and provides
stable performance of the operations mentioned above.
Furthermore, it suggests the possibility of achieving a
more compact Content Addressable Memory cell by
device scaling. The novel hierarchical priority encoder
reduces the circuit area and power dissipation. A typical
search cycle time of 150ns and a maximum power
dissipation of 1.1W have been obtained using circuit
simulation.
Schultz K et al (1996) developed a CAM based on bank
selection architecture. In the bank-selection architecture
the CAM array is divided into B equally partitioned
banks that are activated based on the value of added bits
of length log2(B) to the search data word. These extra
bits are decoded to determine, which banks must be
selected. The basic concept is that bank selection divides
the CAM into subsets called banks. The bank-selection
scheme partitions the Content Addressable Memory and
shuts off unneeded banks. Two extra data bits, called
bank-select bits, partition the Content Addressable
Memory into four blocks. When storing data, the bankselect bits determine into which of the four blocks to
store the data. When searching data, the bank-select bits
determine which one of the four blocks to activate and
search. The decoder accomplishes the selection by
providing enable signals to each block [2]. In the
original preclassification schemes, this architecture was
used to reduce area by sharing the comparison circuitry
between blocks. Although the blocks are physically
separate, they can be arranged such that words from
different blocks are adjacent. Thus only one of four
blocks is active at any time, only 1/4 of the comparison
circuitry is necessary compared to the case with no bank
selection, thus saving area. Bank selection reduces
overall power consumption in proportion to the number
of blocks. Thus, using four blocks ideally reduces the
power consumption by 75% compared to a Content
Addressable Memory without bank selection.
The major drawback of bank selection is the problem of
bank overflow. Since, in a Content Addressable
Memory, there are many more input combinations than
storage locations, the storage of a bank can quickly
overflow. For example, a Content Addressable Memory
with 72-bit words (and an additional bank-select bit) and
32K entries divided into two banks with 16K entries.
While each bank has 16K locations, there are actually 2
possible entries per bank. Thus, it can often occur that
there are more entries than can fit in the assigned bank
[11]. This overflow condition requires extra circuitry
and forces multiple banks to be activated at once,
decreasing the savings in power. To avoid overflow, an
external mechanism can balance the data in the banks by
periodically re-partitioning the banks.
Zukowski C et al (1997) developed a CAM aimed at
reduced energy consumption. Energy reduction of
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
16
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Content Addressable Memory employing circuit-level
techniques are mostly based on the following strategies:
1) reducing the SL energy consumption by disabling the
precharge process of SLs when not necessary and 2)
reducing the ML precharging, for example, by
segmenting the ML, selectively precharging the first few
segments and then propagating the precharge process if
and only if those first segments match [9]. This
segmentation strategy increases the delay as the number
of segments is increased. A hybrid-type CAM integrates
the low-power feature of NAND type with the highperformance NOR is type while similar to selective
precharging method, the ML segmented into two
portions. The high-speed CAM designed in 32-nm
CMOS achieves the cycle time of 290 Ps using a
swapped Content Addressable Memory cell that reduces
the search delay while requiring a larger Content
Addressable Memory cell (11-transistors) than a
conventional Content Addressable Memory cell [9
transistors(9T)] used in SCN- Content Addressable
Memory [3]. A high-performance AND-type match-line
scheme is proposed in, where multiple fan-in AND gates
are used for low switching activity along with
segmented-style match-line evaluation to reduce the
energy consumption.
Lin C et al (2003) developed an architectural technique
for saving power, which applies only to binary CAM, is
pre-computation. Pre-computation stores some extra
information along with each word that is used in the
search operation to save power. These extra bits are
derived from the stored word, and used in an initial
search before searching the main word. If this initial
search fails, then the Content Addressable Memory
aborts the subsequent search, thus saving power. The
extra information holds the number of ones in the stored
word. For example when searching for the data word,
10111, the pre-computation circuit counts the number of
one’s (which is four in this case). Hence PB-CAM is
also known as 1’s count [9]. First, it counts the number
of ones in an input and then compares the result with
that of the entries using an additional CAM circuit that
has the number of ones in the CAM data previously
stored. This activates a few MLs and deactivates the
others. In the second stage, a modified Content
Addressable Memory hierarchy is used, which has
reduced complexity, and has only one pull-down path
instead of two compared with the conventional design.
The modified architecture only considers 0 mismatches
instead of full comparison since the 1s have already
been compared. The number of comparisons can be
reduced to M × [log (N +2)] + (M×N)/(N +1) bits, where
M is the number of entries in the Content Addressable
Memory and N is the number of bits per entry [3]. In the
proposed design, we demonstrate how it is possible to
reduce the number of comparisons to only N bits.
Furthermore, in PB-CAM, the increase of the tag length
affects the energy consumption, the delay, and also
complicates the precomputation stage.
Pagiamtzis K et al (2004) developed a Content
Addressable Memory which is a combination of
pipelined architecture and hierarchical architecture. The
power can be saved by power adding pipelining to
match-lines and adding hierarchy to search-lines in an
otherwise non pipelined, non hierarchical CAM. The
power savings of the pipelined match-lines is a result of
activating only a small portion of the match line
segments. Similarly, the power savings of the
hierarchical search-lines is a result of activating only a
small portion of the local search lines. Pipelining matchlines saves 56% power compared to non pipelined
match-lines. Adding hierarchy to search-lines saves 63%
power compared to nonhierarchical search-lines. The
combination of the two techniques reduces overall
power consumption by 60% [4].
The match line is divided into five match line segments,
each evaluated sequentially in a pipeline fashion. The
left-most segment has 8 bits while the other four
segments have 34 bits each, for a total of 144 bits (a
typical word width used for IPv6 address lookup).The
match line segment array (MLSA) current source that
provides the current is divided among the five segments
in proportion to the number of bits in each segment. This
is to guarantee identical speed in all match line segments
and to allow a fair comparison with the non pipelined
architecture. The pipelined match line operates from left
to right, with each match line segment acting as an
enable signal for the match line segment array of the
subsequent segment. Hence, only words that match a
segment proceed with the search in their subsequent
segments [4-9]. Words that fail to match a segment do
not search for their subsequent segments and hence
consume no power.
Having pipelined the match-lines, the significant portion
of the power is now consumed by the highly capacitive
search lines. This problem is solved by observing how
the search lines are activated in the pipelined match line
architecture. As the match signals traverse the pipeline
stages from left to right, fewer match lines segments
survive the matching test and hence fewer match line
segments will be activated. However, the search lines
must be activated for the entire array at every stage of
the pipeline, since the search-lines must reach the
surviving match-line segments. This excessive power
consumption is curtailed in our design by breaking the
search-lines into global and local search-lines (GSLs and
LSLs), with the global search-lines using low-swing
signaling and the local search-lines using full-swing
signaling but with reduced capacitance. Also, by global
search-lines not directly serving every single Content
Addressable Memory cell on a search line, the global
search-lines capacitance is further reduced, resulting in
extra power savings.
Onizawa N et al (2012) proposed the asynchronous
architecture. In this architecture as the CAM assigns
consecutive search data matched in different word
blocks, it operates based on the delay of matching its
first few bits instead of its full length as long as the
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
17
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
consecutive sub search words are different. However,
the cycle time is drastically increased when the searchdata patterns are correlated. For example, if we have
correlations in the first 8 bits of the stored data, the cycle
time is increased to 1.359 ns, which is 5.2 times that of
the non-correlated scenario [7]. In the proposed design,
the cycle time is independent of the correlation between
the input patterns. Furthermore, the asynchronous
architecture in is more susceptible to process variations
compared with its synchronous counterparts. The
concept for saving power is to change the encoding of
stored data in the CAM cell. By storing the entry 101XX
in a 5-bit ternary Content Addressable Memory, the
stored word will match the four search data words
10100, 10101, 10110, and 10111. Correspondingly, it
can be viewed that a stored word in a ternary Content
Addressable Memory as actually storing multiple words.
This can be extended to multiple entries. Thus, for
example, the three entries 00011, 001XX, 01000 will
match any search word in the range 00011 to 01000 [7].
Efficiently mapping a set of ranges to stored ternary
Content Addressable Memory words allows for
reduction of the size of the Content Addressable
Memory required for an application. Reducing the
number of entries also has the effect of reducing power
consumption, since power consumption in Content
Addressable Memory is proportional to the array size.
To allow for a more dense encoding than conventional
ternary Content Addressable Memory, propose changing
the encoding of the Content Addressable Memory
resulting in a smaller average number of entries. For the
application of network routing, their scheme in
comparison to conventional ternary encoding reduces
the number of entries by almost 50% in some cases.
The dense encoding scheme makes use of these
previously unused states to make possible the storage of
any combination. The match line architecture is
modified to accommodate the dense encoding. The
match line is divided into local match lines (LMLs) and
global match lines (GMLs). The global match lines are
divided into sets of local match lines that are made up of
four ternary half-cells. A global match lines operates the
same way as the conventional NOR does match line,
where a path to ground indicates a miss and no path to
ground indicate a match. The logic levels on the local
match lines, however, are inverted so that a path to
ground indicates a match, while no path to ground
indicates a miss. To see how this works, it is assumed
that a local match line stores the numbers 0, 2, and 3,
which accordingly means that ABCD = 1101 as per
Table 2.1. In the case where the search word is 0, 2, or 3
(corresponding to search lines set to 0001, 0100, and
1000, respectively), there is a path to ground indicating a
match. In the case of search for a 1 (corresponding to
search lines set to 0010) there is no path to ground
indicating a miss. The GMLs are used to construct
longer words that are made up of multiple local match
lines.
Table 2.1: Possible Encoding For Two Ternary Cells
Dense Encoding
A group of two ternary cells is called a local match line
in this scheme. In conventional ternary Content
Addressable Memory encoding, there are three possible
states (0, 1, X) for each ternary Content Addressable
Memory cell and, correspondingly, nine states in total
for two ternary cells. But only some of the possible
combinations are available. For example, the
combinations of 0, 2 and 1, 3 are available, but the
combinations of 0, 3 and 0, 2, 3 are not available. Since
there are four storage cells for two ternary cells, there
are actually 16 possible states. This architecture is
modified to accommodate the dense encoding.
Fig 2.2: Two ternary CAM cells viewed as four
independent half cells in the Dense encoding system
Hanyu T et al (2014) proposed an algorithm to address
the increased rate of data retrieval error probability by
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
18
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
incorporating multiple-valued weights for the
interconnections used in the network. The proposed
algorithm lowers the error rate by an order of magnitude
for our sample network with 60% deleted contents
[8].Data storage and retrieval methodologies in
associative memories are different from the widelyknown indexed memories in which the data is written
and accessed using explicit addresses. In associative
memories, only associations between parts of data
patterns are stored in a way that data can later be
accessed by presenting a partial input pattern. Classical
associative memories implemented with Hopfield
Neural Networks (HNN) store input patterns (messages)
in a network consisting of nodes and connections
between them, where the index of each data bit
corresponds to that of a node. The decoding algorithm
for Hopfield Neural Networks retrieves messages from
partial inputs using the stored integer weights. A
drawback of Hopfield Neural Networks is that the
number of nodes in the network must be equal to the
length of each message which, due to a fixed available
memory capacity, limits the diversity of the stored
messages - the number of different messages the
network can store.
Furthermore, the network is not efficient in terms of
memory usage since the ratio between the number of
bits stored to that used approaches zero as the network
grows to increase its storage capacity. These drawbacks
were addressed in Sparse Clustered Networks (SCNs)
where the interconnections are binary and the nodes are
clustered in such a way that a significantly larger
diversity and memory efficiency are achieved compared
to those of an Hopfield Neural Networks.
A drawback of the conventional SCNs is that due to the
use of binary-weighted interconnections, deleting or
updating messages causes the removal of the shared
connections used by other stored messages [8]. The lost
connections result in a significant increase in the
Message Error Rate (MER) – the ratio between the
average number of correctly retrieved messages to the
total number of partial input messages selected from the
previously stored ones. Some of the lost connections can
be recovered due to the error-correcting capability of
Sparse Clustered Networks for which the degree or
recovery depends on the selection of hardware
architecture and the application. So propose an
algorithm based on a variation of the conventional
Sparse Clustered Networks algorithm that incorporates
Multiple-Valued (MV) interconnections in this
algorithm (MV-SCN), deletions and updates of
messages are achieved with a significant improvement
in the Message Error Rate compared to that of its
binary-weighted counterpart. Two architectures are
recommended for use in the related applications: The
first one optimized for aggressively updated, low
Message Error Rate applications, and the second,
suitable for applications requiring a low hardware
complexity and low Message Error Rate but more
relaxed deletion or updating requirements. A trade-off
analysis is then followed to elaborate the benefits of
architecture that depend on the application requirements.
Both of the recommended architectures can be tuned for
a desired Message Error Rate with the cost of hardware
resources.
Fig 2.3: Graphical representation of SCN with four
clusters Storing two messages
As shown in Fig 2.3, an SCN network consists of n
nodes divided into c clusters, with binary connections
between each node to nodes in other clusters. No
connections exist within a cluster between two nodes
[8]. The number of nodes per cluster does not
necessarily have to be the same for each cluster.
However, it is commonly chosen to be equal for all
clusters. Here the number of nodes in a cluster is
considered to be equal to l for all clusters. An input
pattern (message), mi (1 ≤ i ≤ M), with a bit-length of K
bits is first divided into c parts, κ = K/c bits each. Then,
each sub-message, Kj (1 ≤ j ≤ c) is mapped to a node in
its corresponding cluster through a process known as
Local Decoding that activates a node, i.e., it sets the
value of the node to ‘1’ and the rest of the nodes to ‘0’.
Therefore, in order to permit a one-to-one relationship
between the any value of sub-message and the indices of
the nodes of its corresponding cluster, the number of
nodes, l, in a cluster must be equal to 2^k. If a submessage contains erased bits, all possible values are
computed in local decoding, and all corresponding
nodes are activated. Therefore, if a sub-message is
entirely erased, all of the nodes of its corresponding
cluster are activated.
Jarollahi H et al (2014) proposed a low-power contentaddressable memory employing a new algorithm for
associativity between the input tag and the
corresponding address of the output data[9]. The
proposed architecture is based on a recently developed
sparse clustered network using binary connections that
on-average eliminates most of the parallel comparisons
performed during a search. Therefore, the dynamic
energy consumption of the design is significantly lower
compared with that of a conventional low-power
content-addressable memory design. Given an input tag,
the proposed architecture computes a few possibilities
for the location of the matched tag and performs the
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
19
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
comparisons on them to locate a single valid match.
TSMC 65-nm CMOS technology was used for
simulation purposes. Following a selection of design
parameters, such as the number of content-addressable
memory entries, the energy consumption and the search
delay of the proposed design are 8%, and 26% of that of
the conventional NAND architecture, respectively, with
a 10% area overhead.
Fig 2.4: Block diagram of SCN-CAM
The sparse clustered network based content-addressable
memory consists of a sparse clustered network based
classifier coupled to a content-addressable memory
array. The content-addressable memory array is divided
into several equally sized sub-blocks, which can be
activated independently. For a previously trained
network and given an input tag, the classifier only uses a
small portion of the tag and predicts very few sub-blocks
of the content-addressable memory to be activated. Once
the sub-blocks are activated, the tag is compared against
the few entries in them while keeping the rest
deactivated and thus lowers the dynamic energy
dissipation. As shown in Fig 2.4, this architecture
consists of a sparse clustered network based classifier,
which is connected to a special-purpose contentaddressable memory array. The sparse clustered network
based classifier is at first trained with the association
between the tags and the address of the data to be later
retrieved.
The proposed CAM array is based on a typical
architecture, but is divided into several sub-blocks that
can be compare-enabled independently. Therefore, it is
also possible to train the network with the association
between the tag and each content-addressable memory
sub-block if the number of desired sub-blocks is known.
Once an input tag is presented to the sparse clustered
network based classifier, it predicts which contentaddressable memory content-addressable memory subblock(s) need to be compare-enabled and thus saves the
dynamic power by disabling the rest. Disabling a
content-addressable memory sub-block avoids charging
its highly capacitive SLs, while applying the search data,
and also turns the precharge path off for the MLs.
The SCN-CAM uses only a portion of the actual tag to
create or recover the association with the corresponding
output. The operation of the content-addressable
memory, on average, allows this reduction in the tag
length. A large enough tag length permits sparse
clustered network content-addressable memory to
always point to a single sub-block. However, the length
of reduced-length tag affects the hardware complexity of
the sparse clustered network SCN-based classifier. The
length of the reduced-length tag is not dependent on the
length of the original tag but rather dependent on the
number of content-addressable memory entries. Now the
coming sections deals with the various sections of
algorithm.
A sparse clustered network based classifier consisting of
two parts are being used and they are P1 and P2
respectively. The neurons in P1 are binary, correspond
to the input tags, and are grouped into c equally sized
clusters with neurons in each. The processing of an input
tag in the sparse clustered network based classifier is for
either of the two situations: training or decoding. For
training or decoding purpose, the input tag is reduced in
length to q bits, and then divided into c equally sized
partitions of length k bits each. Each partition is then
mapped to the index of a neuron in its corresponding
cluster in P1, using a direct binary-to-integer mapping
from the tag portion to the index of the neuron to be
activated. Thus, l = 2k. If l is a given parameter, the
number of cluster is calculated to be c = q log₂(l)
simplicity in hardware implementation, chooses q to be
a multiple of k [9].There are no connections between the
neurons in P1.P2 is a single cluster consisting of M
neurons, which is equal to the number of entries in the
content-addressable memory.
The binary values of the connections in the sparseclustered network based classifier indicate associations
of the input tags and the corresponding outputs. The
connection values are set during the training process,
and are stored in a memory module such that they can
later be used to retrieve the address of the target data. A
connection has a value 1 when there exists an
association between the corresponding neuron in P1 and
a content-addressable memory entry, represented as a
neuron in P2. When an update is requested in sparse
clustered network CAM, retraining the entire sparse
clustered network based classifier with the entries is not
required. The reason lies in the fact that the output
neurons of P2 are independent from each other.
Therefore, by deleting the connections from a neuron P2
to the corresponding connections in P1, a tag can be
deleted. In other words, to delete an entry, c connections
are removed, one for each cluster. Adding new
connections to the same neuron in P2, but to different
neurons in P1, adds a new entry to the sparse clustered
network based classifier. The new entry can therefore be
added by adding new connections while keeping the
previous connections for other entries in the network.
Once the sparse clustered network based classifier has
been trained, the ultimate goal after receiving the tag is
to determine which neuron(s) in P2 should be activated
based on the given q bits of the tag. This process is
called decoding in which the connection values are
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
20
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
recalled from the memory. The decoding process [9] is
divided into four steps as,
consumption of sparse clustered network based content
addressable memory can be modeled as
1) An input tag is reduced in length to q bits and divided
into c equally sized partitions. The q bits can be selected
within the tag bits in such a way to reduce the
correlation.
ETotal  ESCN  ECAM
2) Local Decoding (LD): A single neuron per cluster in
p1 is activated using a direct binary-to-integer mapping
from the tag portion to the index of the neuron to be
activated.
3) Global Decoding (GD): GD determines which
neuron(s) in P2 must be activated based on the results
from LD and the stored connection values. If there exists
at least one active connection from each cluster in P2
toward a neuron in P2, that neuron is activated.
4) If more than one neuron is activated in P2, then, the
same number of word comparisons is required to detect
the correct match. A single activated neuron means no
further comparisons are required. Next section deals
with the sparse clustered network based content
addressable memory architecture.
In order to exploit the prominent feature of the sparse
clustered network associative memory in the
classification of the search data, a conventional contentaddressable memory array is divided into sufficient
number of compare-enabled sub-blocks such that: 1) the
number of sub blocks are not too many to expand the
layout and to complicate the interconnections and 2) the
number of sub-blocks should not be too few to be able to
exploit to energy-saving opportunity with the sparse
clustered network based classifier. Consequently, the
neurons in P2 are grouped and ORed to construct the
compare-enable signal(s) for the content-addressable
memory array. Even the conventional contentaddressable memory arrays are divided into multiple
sub-blocks since long bit lines and SLs can slow down
the read, write, and search operations due to the
presence of drain, gate, and wire capacitances. The total
number of sub-blocks can be selected depending on the
silicon-area availability since each sub-block will
slightly increase the silicon area. If the input data word
is not uniformly distributed, more sub-blocks will be
activated during a search consuming higher amounts of
energy while them accuracy of the final output is not
affected. Therefore, a false-negative output is never
generated. However, since the full length of the tag is
not used in sparse clustered network based content
addressable memory; it is possible to select the reducedlength tag bits depending on the application and
according to a pattern to reduce the tag correlation. A
complete circuit for sparse clustered network based
content addressable memory was implemented and
simulated using HSPICE and TSMC 65-nm CMOS
technology [9]. The energy consumption of sparse
clustered network based content addressable memory
depends on various design parameters, such as q, c, and
the effect of non uniform input distributions, the energy
(2.1)
ESCN = c·Edec + M·c·ESRAMacc
+ (l −
1)·M·c·ESRAMidle + Eλ·EANDc + Ψ·EORζ (2.2)
ECAM
=
(Ψ·ζ
–
1)·N·ECAMmismatch
N·ECAMmatch + (M/ζ− Ψ)·ζ·N·ECAMstat (2.3)
+
The total energy consumption is divided into the energy
consumption in the sparse clustered network based
classifier (ESCN), and the content-addressable memory
sub-blocks (ECAM). The SCN-based classifier’s
contribution to the energy consumption includes
decoders, EDec, SRAMs (accessed, ESRAMacc, and
idle, ESRAMidle), and the logical gates to perform the
GD and to generate the compare enable signals for the
content-addressable memory array. Also it includes the
content-addressable memory portion of the energy
model consists of match (ECAM match ) and mismatch
(ECAM mismatch ) portions, and Static energy
consumption of the idle content-addressable memorys
(ECAMstat ) due to the presence of leakage current
occurring in advanced technologies of match (ECAM
match ) and mismatch (ECAM mismatch ) portions. As
the value of q is increased, the energy consumption is
decreased as well since the number of comparisons is
reduced but up to a point until the energy consumption
of the sparse clustered network based classifier itself
would dominate that of the content-addressable memory
array. Therefore, the energy consumption of the sparse
clustered network based classifier is not dependent on
the original tag length, and rather on the number of
entries in the content-addressable memory array.
Table 2.2: Relation b/w No. of bits/word and the energy
consumption in SCN-CAM as compared to a conv.
NAND architecture
No: of bits per
word (N)
0
20
40
60
80
100
120
Estimated
energy
consumption (fJ/bit/search)
SCN-CAM Conv. NAND
1
0.398
1
0.158
1
0.102
1
0.063
1
0.050
1
0.039
1
0.032
The effect of the word length on the energy consumption
in comparison with the conventional design is shown
above in Table 2.2. The original tag length (N) does not
change the architecture and the energy consumption of
the sparse clustered network based classifier.
Furthermore, due to the small size of the sub-blocks, the
search power of sparse clustered network contentaddressable memory is much smaller compared with that
of the conventional. Consequently, as N is increased, the
energy consumption per-bit-per-search is decreased in
sparse clustered network while it stays constant in the
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
21
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
conventional content-addressable memory contentaddressable memory. It also implies an advantage of
sparse clustered network content-addressable memory
over
precomputation
based
content-addressable
memory, where longer tag lengths increase the energy
consumption as well as the precomputation delay. Thus
tags will increase the complexity of the adders and the
number of comparison.
The correlation effect is applied on the inputs by
creating similarities within their contents. For example,
a 10% correlation means that 10% of the bit values are
similar in all input patterns. It is therefore observed that
larger correlation costs energy consumption although it
does not affect the performance. [9]. As said earlier the
limitation of the sparse clustered network contentaddressable memory is that it requires larger area. The
silicon area of the sparse clustered network based
classifier can be estimated considering the area of the
decoders, the SRAM arrays, the precharge devices, the
read and write circuits, the interconnections, and the
standard cells. Similarly, the area of the contentaddressable memory array can be estimated by
considering the gaps between the content-addressable
memory sub-blocks, pass gate transistors, read and write
circuits. The area overhead [9] is estimated to be 10.1%
higher than that of the conventional content-addressable
memory design.
III. COMPARATIVE ANALYSIS
Table 3.1: Comparative analysis on various content addressable memory architectures
Author
Yamagata
T et al
Year
1992
Algorithm
A 288-kb dynamic contentaddressable memory cell with a
stacked capacitor structure and a
novel
hierarchical
priority
encoder.
Advantage
Reduced circuit area.
Reduced
power
dissipation.
1996
The content-addressable memory
array is divided into B equally
partitioned banks that are
activated based on the value of
added bits of length log2 (B) to
the search data word.
Reduced silicon Area.
Reduced
power
consumption.
Zukowski
C et al
1997
The comparator or memory array
is partitioned such that a small
subset does a portion of each
comparison calculation first, and
each comparator in the main part
of the array is only activated if
still needed afterwards.
Lin C et al
2003
Precomputation based (PB) content-addressable
memory
divides the comparison process
and the circuitry into two stages.
First stage counts the number of
ones in an input and then
compares the result with that of
the entries using an additional
content-addressable
memory
circuit that has the number of
ones in the content-addressable
memory -data previously stored.
Second stage, a modified content-
Reduces the search
lines (SL) energy
consumption
by
disabling
the
precharge process of
SLs
when
not
necessary.
Reduces the
match
lines(ML)
precharging,
by
segmenting the ML,
selectively
precharging the first
few segments and
propagates
the
precharge process if
and only if those first
segments match.
Low power.
Low cost.
Low
operating
voltage.
Schultz
et al
K
Disadvantage
Alpha-particle
induced soft error
in dynamic cells
as the storage
capacitance
decreases as the
transistor
size
decreases.
The banks can
overflow since the
length of the
words remains the
same for all the
banks.
Delay increases
number
of
segment
increases.
The increase of
the tag length
affects the energy
consumption, the
delay,
and
complicates precomputation
stage.
Result
A typical search cycle
time of 150ns and a
maximum
power
dissipation of 1.1W
have been obtained
using
circuit
simulation.
An 8 kb test chip was
fabricated and found to
be fully functional at
clock speeds up to 59
MHz, with a power
dissipation of 260 mW
at 50 MHz.
Cost efficient high
performance contentaddressable memory.
With a 128 words by
30
bits
contentaddressable
memory
size, the measurement
results indicate that the
proposed circuit works
up to 100 MHz with
power consumption of
33 mW at 3.3V supply
voltage and works up to
30 MHz under 1.5-V
supply voltage.
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
22
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Pagiamtzis
K et al
2004
Onizawa N
et al
2012
Hanyu
et al
T
2014
Jarollahi H
et al
2014
addressable memory hierarchy is
used,
which
has
reduced
complexity, and has only one
pull-down path.
Two techniques to reduce power
consumption
in
contentaddressable memories (CAMs).
The first technique is to pipeline
the search operation by breaking
the match-lines into several
segments.
The second technique is to
broadcast small-swing search data
on less capacitive global searchlines, and only amplify this signal
to full swing on a shorter local
search-line.
Reduced
consumption.
power
High
hardware
complexity.
A self-timed overlapped search
mechanism, where mismatches
can be found by searching the
first few bits in a search word.
Operates based on the delay of
matching bits, first few bits
instead of its full length a long as
the consecutive sub search words
are different.
High throughput.
Reduced
power
dissipation.
The cycle time is
independent of
the
correlation between
the input patterns.
The asynchronous
architecture in is
more susceptible
to
process
variations
compared with its
synchronous
counterparts.
An algorithm to address data
retrieval error problem by
incorporating
multiple-valued
weights for the interconnections
used in the network.
Two different architectures based
on variations of previous work
were proposed and investigated:
First an architecture that can
correct erroneous inputs as well
as partial ones with the cost of
hardware resources.
Second, a reduced-complexity
architecture that can only be used
in applications with partial input
patterns.
The Sparse clustered network
Content addressable memory
consists of a sparse clustered
network based classifier, which is
connected to a special purpose
content-addressable
memory
array.
Content-addressable
memory
array is divided into sub blocks.
Sparse clustered network based
classifier connected to the
content addressable memory subblocks gets enabled once a tag is
presented, which predicts the
content-addressable
memory
sub-blocks that are needed to be
compare-enabled.
Low message error
rate.
Low
hardware
complexity.
High delay.
Low
power
dissipation
Number
of
comparisons is less
due to the reduced
length of the input
tag.
Requires
area.
Table 3.1 shows the comparative analysis of various
CAM architectures and their merits and demerits. From
larger
Employed the proposed
schemes in a 1024 144bit ternary
CAM in 1.8-V 0.18- m
CMOS, illustrating an
overall power reduction
of 60% compared to a
non-pipelined,
nonhierarchical
architecture.
The
ternary
contentaddressable
memory
achieves a 7-ns search
cycle time
at 2.89fJ/bit per search.
A 256x144-bit contentaddressable memory is
designed under in 90
nm
CMOS
that
operates with 5.57x
faster throughput than a
synchronous contentaddressable memory ,
with
38%
energy
saving and 8% area
overhead.
The proposed algorithm
lowers the error rate by
an order of magnitude
with 60% deleted
contents.
The
energy
consumption and the
cycle time of Sparse
clustered
network
content-addressable
memory are 8.02%, and
28.6% of that of the
conventional NANDtype architecture.
the analysis it is found that the SCN based CAM
architecture is far better than all former algorithms.
________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
23
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Addressable Memory Using a Stacked-Capacitor
Cell Structure, December 1992.
IV. CONCLUSION
This paper reviews algorithm and architecture of a low
power CAM based on sparse clustered network. In
addition other CAM architectures and a comparative
study on the same with latest proposed architecture are
also included here. As a result of this analysis it came to
conclusion that the proposed content- addressable
memory architecture with sparse clustered network is
more advantageous and reliable one as compared to
other architecture sparse clustered network content
addressable memory employs a novel associativety
mechanism based on a recently developed family of
associative memories based on sparse cluster network.
Sparse clustered network addressable memory is
suitable for low- power applications, where frequent
and parallel look-up operations are required. Sparse
clustered network content-addressable memory employs
a sparse clustered network based classifier, which is
connected to several independently compare-enabled
content-addressable memory sub-blocks, some of which
are enabled once a tag is presented to the sparse
clustered network based classifier. By using independent
nodes in the output part of sparse clustered network
content-addressable memory’s training network, simple
and fast updates can be achieved without retraining the
network entirely. With optimized lengths of the reducedlength tags, sparse clustered networkcontent-addressable
memory eliminates most of the comparison operations
given a uniform distribution of the reduced-length
inputs. Depending on the application, non uniform
inputs may result in higher power consumptions, but
does not affect the accuracy of the final result. In other
words, a few false-positives may be generated by the
sparse clustered network based classifier, which are then
filtered by the enabled content-addressable memory subblocks. Thus, no false-negatives are ever generated. It
has been estimated that for a case study design
parameter, the energy consumption and the cycle time of
sparse clustered network Content addressable memory
are 8.02%, and 28.6% of that of the conventional
NAND-type architecture, respectively, with a 10.1%
area overhead.
[2]
Schultz K and Gulak P, Fully parallel integrated
CAM/RAM using preclassification to enable
large capacities, J. Solid-State Circuits, May
1996; 5, 689–699.
[3]
Zukowski C and Wang S Y, Use of selective
precharge for low power on the match lines of
content-addressable memories, Aug. 1997; 64–
68. [4] Lin C S, Chang J C, and Liu B D, A lowpower precomputation based fully parallel
content-addressable memory, J. Solid-State
Circuits, April 2003; 4,. 654–662.
[5]
Pagiamtzis K and Sheikholeslami K, Pipelined
match-lines and hierarchical search-lines for lowpower content-addressable memories, Sep. 2003;
39, 1512–1519.
[6]
Ruan S J, Wu C Y, and Hsieh J Y, Low power
design of precomputation-based contentaddressable memory, March 2008.
[7]
Onizawa N, Matsunaga S, Gaudet V C, and
Hanyu T, High throughput low-energy contentaddressable memory based on self-timed
overlapped search mechanism, May 2012; 41–48.
[8] Jarollahi H, Onizawa N, Hanyu T and Gross
W J, Associative Memories Based on MultipleValued Sparse Clustered Networks, July 2014.
[9]
Jarollahi H, Gripon V, Onizawa N, and Gross W
J, Algorithm and Architecture for a Low-Power
Content-Addressable Memory Based on Sparse
Clustered Networks, April 2014.
[10]
Gripon V and Berrou C, Sparse neural networks
with large learning diversity, 2011.
[11]
Pagiamtzis K and Sheikholeslami A, ContentAddressable Memory (CAM) Circuits and
Architectures: A Tutorial and Survey, J Solid
State Electronics M arch, 2006.
[12]
Peng M and Azgomi S, Content-Addressable
memory (CAM) and its network applications,
1996.
REFERENCE
[1]
Yamagata T, Mihara M, Hamamoto T, Murai Y,
Kobayashi T, A 288-kb Fully Parallel Content

________________________________________________________________________________________________
ISSN (Print): 2278-5140, Volume-3, Issue-4, 2014
24