Download cse477-25memperipherals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Immunity-aware programming wikipedia , lookup

Magnetic-core memory wikipedia , lookup

Random-access memory wikipedia , lookup

Transcript
CSE477
VLSI Digital Circuits
Fall 2003
Lecture 25: Peripheral Memory Circuits
Mary Jane Irwin ( www.cse.psu.edu/~mji )
www.cse.psu.edu/~cg477
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003
J. Rabaey, A. Chandrakasan, B. Nikolic]
CSE477 L25 Memory Peripheral.1
Irwin&Vijay, PSU, 2003
Review: Read-Write Memories (RAMs)

Static – SRAM







data is stored as long as supply is applied
large cells (6 fets/cell) – so fewer bits/chip
fast – so used where speed is important (e.g., caches)
differential outputs (output BL and !BL)
use sense amps for performance
compatible with CMOS technology
Dynamic – DRAM






periodic refresh required (every 1 to 4 ms) to compensate for the
charge loss caused by leakage
small cells (1 to 3 fets/cell) – so more bits/chip
slower – so used for main memories
single ended output (output BL only)
need sense amps for correct operation
not typically compatible with CMOS technology
CSE477 L25 Memory Peripheral.2
Irwin&Vijay, PSU, 2003
Review: 2D Memory Bank
Precharge Circuit
Precharge Circuit
Write Circuitry
Sense Amps
Write Circuitry
Sense Amps
Ai-1 … A0
Column Decoder
Column Decoder
AN-1 … Ai
Read Precharge
Read Precharge
data
CSE477 L25 Memory Peripheral.3
Irwin&Vijay, PSU, 2003
Peripheral Memory Circuitry

Row and column decoders

Read bit line precharge logic


Speed

Power
consumption

Area – pitch
matching
Sense amplifiers

Read/write circuitry

Timing and control
CSE477 L25 Memory Peripheral.4
Irwin&Vijay, PSU, 2003
Row Decoders

Collection of 2M complex logic gates organized in a
regular, dense fashion

(N)AND decoder for 8 address bits
WL(0) = !A7 & !A6 & !A5 & !A4 & !A3 & !A2 & !A1 & !A0
…
WL(255) = A7 & A6 & A5 & A4 & A3 & A2 & A1 & A0

NOR decoder for 8 address bits
WL(0) = !(A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0)
…
WL(255) = !(!A7 | !A6 | !A5 | !A4 | !A3 | !A2 | !A1 | !A0)

Goals: Pitch matched, fast, low power
CSE477 L25 Memory Peripheral.5
Irwin&Vijay, PSU, 2003
Implementing a Wide NOR Function

Single stage 8x256 bit decoder (as in Lecture 22)



Decompose logic into multiple levels



One 8 input NOR gate per row x 256 rows = 256 x (8+8) = 4,096
Pitch match and speed/power issues
!WL(0) = !(!(A7 | A6) & !(A5 | A4) & !(A3 | A2) & !(A1 | A0))
First level is the predecoder (for each pair of address bits, form
Ai|Ai-1, Ai|!Ai-1, !Ai|Ai-1, and !Ai|!Ai-1)
Second level is the word line driver
Predecoders reduce the number of transistors required


Four sets of four 2-bit NOR predecoders = 4 x 4 x (2+2) = 64
256 word line drivers, each a four input NAND – 256 x (4+4) = 2,048
- 4,096 vs 2,112 = almost a 50% savings

Number of inputs to the gates driving the WLs is halved, so
the propagation delay is reduced by a factor of ~4
CSE477 L25 Memory Peripheral.6
Irwin&Vijay, PSU, 2003
Split Row Two-Level 8x256 Decoder
!(!(!A0&!A1&!A2) | !(!A3&!A4&!A5) | !(!A6&!A7))
WL0
WL0
*256
*256
WL255
WL255
!(!A0 & !A1 & !A2)
...
!(A0 & A1 & A2)
Address<7:0>
*8
*8
*8
CSE477 L25 Memory Peripheral.7
*8
*4

Pitch matched

Buffered word
line drivers
Irwin&Vijay, PSU, 2003
Pass Transistor Based Column Decoder
A1
A0
2 input NOR decoder
BL3 !BL3
BL2 !BL2
S3
S2
S1
S0
data_out

BL1 !BL1 BL0 !BL0
!data_out
Read: connect BLs to the Sense Amps (SA)
drive one of the BLs low to write a 0 into the cell
Writes:

Fast since there is only one transistor in the signal path. However,
there is a large transistor count ( (K+1)2K + 2 x 2K)

For K = 2  3 x 22 (decoder) + 2 x 22 (PTs) = 12 + 8 = 20
CSE477 L25 Memory Peripheral.8
Irwin&Vijay, PSU, 2003
Tree Based Column Decoder
BL3 !BL3
BL2 !BL2
BL1 !BL1
data_out
!data_out
BL0 !BL0
A0
!A0
A1
!A1

Number of transistors reduced to (2 x 2 x (2K -1))


for K = 2  2 x 2 x (22 – 1) = 4 x 3 = 12
Delay increases quadratically with the number of sections (K)
(so prohibitive for large decoders)

can fix with buffers, progressive sizing, combination of tree and
pass transistor approaches
CSE477 L25 Memory Peripheral.9
Irwin&Vijay, PSU, 2003
Decoder Complexity Comparisons

Consider a memory with 10b address and 8b data
Conf.
1D
2D
2D
2D
Data/Row
Row Decoder
10b = a 10x210 decoder
Single stage = 20,480
Two stage = 10,320
32b
8b = 8x28 decoder
Single stage = 4,096 T
(32x256 core) Two stage = 2,112 T
64b
7b = 7x27 decoder
Single stage = 1,792 T
(64x128 core) Two stage = 1,072 T
128b
6b = 6x26 decoder
Single stage = 768 T
(128x64 core) Two stage = 432 T
Column Decoder
8b
CSE477 L25 Memory Peripheral.10
2b = 2x22 decoder
PT = 76 T
Tree = 96 T
3b = 3x23 decoder
PT = 160 T
Tree = 224 T
4b = 4x24 decoder
PT = 336 T
Tree = 480 T
Irwin&Vijay, PSU, 2003
Bit Line Precharge Logic

First step of a Read
cycle is to precharge
(PC) the bit lines to VDD


every differential signal in
the memory must be
equalized to the same
voltage level before Read
Turn off PC and enable
the WL

!PC
the grounded PMOS load
limits the bit line swing
(speeding up the next
precharge cycle)
CSE477 L25 Memory Peripheral.11
BL
!BL
equalization transistor - speeds up
equalization of the two bit lines by
allowing the capacitance and pull-up
device of the nondischarged bit line to
assist in precharging the discharged
line
Irwin&Vijay, PSU, 2003
Sense Amplifiers


Amplification – resolves data
with small bit line swings
(in some DRAMs required
for proper functionality)
SA
input
output
Delay reduction – compensates for the limited drive
capability of the memory cell to accelerate BL transition
tp = ( C * V ) / Iav
large
small
make  V as small as
possible

Power reduction – eliminates a large part of the power
dissipation due to charging and discharging bit lines

Signal restoration – for DRAMs, need to drive the bit lines
full swing after sensing (read) to do data refresh
CSE477 L25 Memory Peripheral.12
Irwin&Vijay, PSU, 2003
Classes of Sense Amplifiers

Differential SA – takes small signal differential inputs (BL
and !BL) and amplifies them to a large signal singleended output

common-mode rejection – rejects noise that is equally injected to
both inputs

Only suitable for SRAMs (with BL and !BL)

Types




Current mirroring
Two-stage
Latch based
Single-ended SA – needed for DRAMs
CSE477 L25 Memory Peripheral.13
Irwin&Vijay, PSU, 2003
Latch Based Sense Amplifier
bit line inputs
BL
!BL
V = 0.1VDD
isolate
SE
data_out
!data_out
V = VDD
sense amplifier outputs
CSE477 L25 Memory Peripheral.14
Irwin&Vijay, PSU, 2003
Alpha Differential Amplifier/Latch
S3
S2
S1
S0
P2
N3
N5
N2
N4
offon
N1
data_out
P3
sense
amplifier
V = VDD
P4
!data_out
01 SE
CSE477 L25 Memory Peripheral.15
P1
sense
!sense
01 PC
!mux_out
mux_out
column
decoder
Irwin&Vijay, PSU, 2003
Read/Write Circuitry
BL
!BL
D: data (write) bus
R: read bus
W: write signal
CS: column select
(column decoder)
SA
CS
Local R/W
D
W
R
CSE477 L25 Memory Peripheral.16
Precharge
!R
Local W (write):
BL = D, !BL = !D
enabled by W & CS
Local R (read):
R = BL, !R = !BL
enabled by !W & CS
Irwin&Vijay, PSU, 2003
Approaches to Memory Timing
SRAM Timing
Self-Timed
DRAM Timing
Multiplexed Addressing
Address
Bus
Address
Bus
Address
Address transition
initiates memory
operation
msb’s
lsb’s
Row
Addr.
Column
Addr.
RAS
CAS
RAS-CAS timing
CSE477 L25 Memory Peripheral.17
Irwin&Vijay, PSU, 2003
Reliability and Yield

Memories operate under low signal-to-noise conditions

word line to bit line coupling can vary substantially over the
memory array
- folded bit line architecture (routing BL and !BL next to each other
ensures a closer match between parasitics and bit line
capacitances)

interwire bit line to bit line coupling
- transposed (or twisted) bit line architecture (turn the noise into a
common-mode signal for the SA)


suffer from low yield due to high density and structural
defects


leakage (in DRAMs) requiring refresh operation
increase yield by using error correction (e.g., parity bits) and
redundancy
and are susceptible to soft errors due to alpha particles
and cosmic rays
CSE477 L25 Memory Peripheral.18
Irwin&Vijay, PSU, 2003
Redundancy in the Memory Structure
Fuse bank
Redundant row
Redundant columns
Row
address
Column
address
CSE477 L25 Memory Peripheral.19
Irwin&Vijay, PSU, 2003
Redundancy and Error Correction
CSE477 L25 Memory Peripheral.20
Irwin&Vijay, PSU, 2003
Soft Errors
Nonrecurrent and
nonpermanent errors from



alpha particles (from the
packaging materials)
neutrons from cosmic rays
System FITS

As feature size
decreases, the charge
stored at each node
decreases (due to a lower
node capacitance and
lower VDD) and thus Qcritical
(the charge necessary to
cause a bit flip) decreases
leading to an increase in
the soft error rate (SER)
CSE477 L25 Memory Peripheral.21
From Semico Research Corp.
10000
1000
100
10
1
0.25
0.18
0.13
0.09
0.05
Process Technology
From Actel
MTBF (hours)
.13 m
.09  m
Ground-based
895
448
Civilian Avionics System
324
162
Military Avionics System
18
9
Irwin&Vijay, PSU, 2003
Next Lecture and Reminders

Next lecture

Power consumption in datapaths and memories
- Reading assignment – Rabaey, et al, 11.7; 12.5

Reminders




HW#5 due today
Project final reports due on-line by 5:00pm on Friday,
December 5th
Final grading negotiations/correction (except for the final
exam) must be concluded by December 10th
Final exam scheduled
- Tuesday, December 16th from 10:10 to noon in 118 and 113
Thomas
CSE477 L25 Memory Peripheral.22
Irwin&Vijay, PSU, 2003