Download Semiconductor Memories

Document related concepts
no text concepts found
Transcript
Semiconductor Memories
Mohammad Sharifkhani
Outline
• Introduction
• Non-volatile memories
Semiconductor Memory Classification
Read-Write Memory
Non-Volatile
Read-Write
Read-Only Memory
Memory
Random
Access
Non-Random
Access
EPROM
2
E PROM
SRAM
FIFO
DRAM
LIFO
Shift Register
CAM
Mask-Programmed
FLASH
Programmable (PROM)
Memory Timing: Definitions
Memory Architecture: Decoders
M bits
S0
S0
Word 0
S1
Word 1
S2
Word 2
words
N SN - 2
SN -
M bits
1
Storage
cell
Word 0
A0
Word 1
A1
Word 2
AK
-1
Word N 2 2
DecoderWord N 2 2
Word N 2 1
Word N 2 1
Storage
cell
K = log2N
Input-Output
(M bits)
Intuitive architecture for N x M memory
Too many select signals:
N words == N select signals
Input-Output
(M bits)
Decoder reduces the number of select signals
K = log2N
Array-Structured Memory Architecture
Problem: ASPECT RATIO or HEIGHT >> WIDTH
An
An+1
An+m
Bit line
Row Decoder
2m
Storage cell in
the accessed
word
Word line
M.2n
A0
An - 1
Column decoder
Input-Output
(M bits)
Amplify swing to
rail-to-rail amplitude
Selects appropriate
word
Hierarchical Memory Architecture
Advantages:
1. Shorter wires within blocks
2. Block address activates only 1 block => power savings
Block Diagram of 4 Mbit
SRAM
Clock
generator
Z-address
buffer
X-address
buffer
Predecoder and block selector
Bit line load
128 K Array Block 0
Subglobal row decoder
Subglobal rowGlobal
decoder
row decoder
Block3130
Block
Block 1
Transfer gate
Column decoder
Sense amplifier and write driver
CS, WE
buffer
I/O
buffer
x1/x4
controller
Y-address
buffer
[Hirose90]
Local row decoder
X -address
buffer
Contents-Addressable Memory
Commands
I/O Buffers
I/O Buffers
CAM Array
2 words 3 64 bits
9
Priority Encoder
Control Logic R/W Address (9 bits)
Mask
Address Decoder
Commands
Commands
Comparand
29 Validity Bits
I/O Buffers
Data (64 bits)
92Validity Bits
Priority
BitsEncode
Address Decoder 9 Validity
2 Priority Encode
Address Decoder
Memory Timing: Approaches
DRAM Timing
Multiplexed Adressing
SRAM Timing
Self-timed
• Introduction
• Non volatile memories
Non-Volatile Memories
The Floating-gate transistor
(FAMOS)
Floating gate
Gate
Source
D
Drain
G
tox
tox
n+
p
n+_
S
Substrate
Device cross-section
Schematic symbol
Floating-Gate Transistor Programming
20 V
10 V
S
5V
0V
20 V
D
Avalanche injection
- 5V
S
5V
0V
D
Removing programming
voltage leaves charge trapped
- 2.5 V
S
5V
D
Programming results in
higher V T .
A “Programmable-Threshold” Transistor
FLOTOX EEPROM
Gate
Floating gate
I
Drain
Source
20–30 nm
V GD
-10 V
10 V
n1
n1
Substrate
p
10 nm
FLOTOX transistor
Fowler-Nordheim
I-V characteristic
EEPROM Cell
BL
WL
VDD
Absolute threshold control
is hard
Unprogrammed transistor
might be depletion 
always on
 2 transistor cell
Flash EEPROM
Control gate
Floating gate
erasure
n 1 source
Thin tunneling oxide
programming
p-substrate
Many other options …
n 1 drain
Cross-sections of NVM cells
Flash
EPROM
Courtesy Intel
Basic Operations in a NOR Flash
Memory―
Erase
Basic Operations in a NOR Flash
Memory―
Write
Basic Operations in a NOR Flash
Memory―
Read
NAND Flash Memory
Select line
Word line(poly)
Gate
Unit Cell
ONO
BL
Select line
Source line
(Diff. Layer)
Courtesy Toshiba
Gate
Oxide
FG
NAND Flash Memory
Select transistor
Word lines
Active area
STI
Bit line contact
Source line contact
Courtesy Toshiba
Characteristics of State-of-the-art
NVM
Outline
• Introduction
• Non-volatile memories
• RAM
Read-Write Memories (RAM)
 STATIC (SRAM)
Data stored as long as supply is applied
Large (6 transistors/cell)
Fast
Differential
 DYNAMIC (DRAM)
Periodic refresh required
Small (1-3 transistors/cell)
Slower
Single Ended
6-transistor CMOS SRAM Cell
WL
V DD
M2
M5
Q
M1
BL
M4
Q
M6
M3
BL
CMOS SRAM Analysis (Read)
WL
V DD
M4
BL
Q= 0
M5
V DD
Cbit
M1
Q= 1
V DD
BL
M6
V DD
Cbit
CMOS SRAM Analysis (Read)
1.2
Voltage Rise (V)
1
0.8
0.6
0.4
0.2
Voltage rise [V]
0
0
0.5
1 1.2 1.5 2
Cell Ratio (CR)
2.5
3
CMOS SRAM Analysis (Write)
WL
V DD
M4
M5
Q= 1
M1
BL = 1
M6
Q= 0
V DD
BL = 0
CMOS SRAM Analysis (Write)
6T-SRAM — Layout
M2
VDD
M4
Q
Q
M1
M3
GND
M5
BL
M6
BL
WL
Decreasing Word Line Delay
Driver
WL
Polysilicon word line
Metal word line
(a) Driving the word line from both sides
Metal bypass
WL
K cells
Polysilicon word line
(b) Using a metal bypass
(c) Use silicides
Resistance-load SRAM Cell
WL
V DD
RL
M3
BL
RL
Q
Q
M1
M2
M4
BL
Static power dissipation -- Want R L large
Bit lines precharged to V DD to address t p problem
SRAM Characteristics
• Introduction
• Non-volatile memories
• RAM
– SRAM
– DRAM
3-Transistor DRAM Cell
BL 1
BL 2
WWL
WWL
RWL
M3
X
M1
CS
M2
RWL
V DD 2 V T
X
BL 1
BL 2
V DD
V DD 2 V T
No constraints on device ratios
Reads are non-destructive
Value stored at node X when writing a “1” = V WWL-VTn
DV
3T-DRAM — Layout
BL2
BL1
GND
RWL
M3
M2
WWL
M1
1-Transistor DRAM Cell
Write: C S is charged or discharged by asserting WL and BL.
Read: Charge redistribution takes places between bit line and storage capacitance
CS
DV = VBL – V PRE = V BIT – V PRE -----------C S + CBL
Voltage swing is small; typically around 250 mV.
DRAM Cell Observations
 1T
DRAM requires a sense amplifier for each bit line, due
to charge redistribution read-out.
 DRAM memory cells are single ended in contrast to
SRAM cells.
The read-out of the 1T DRAM cell is destructive; read
and refresh operations are necessary for correct
operation.
 Unlike 3T cell, 1T cell requires presence of an extra
capacitance that must be explicitly included in the design.
 When writing a “1” into a DRAM cell, a threshold voltage
is lost. This charge loss can be circumvented by
bootstrapping the word lines to a higher value than VDD
Sense Amp Operation
V
V(1)
BL
V PRE
D V(1)
V(0)
Sense amp activated
Word line activated
t
1-T DRAM Cell
Capacitor
M 1 word
line
Metal word line
SiO2
Poly
n+
Field Oxide
n+
Poly
Inversion layer
induced by
plate bias
Cross-section
Diffused
bit line
Polysilicon
gate
Polysilicon
plate
Layout
Uses Polysilicon-Diffusion Capacitance
Expensive in Area
SEM of poly-diffusion capacitor 1T-DRAM
Advanced 1T DRAM Cells
Word line
Insulating Layer
Cell plate
Capacitor dielectric layer
GND
Cell Plate Si
Capacitor Insulator
Refilling Poly
Transfer gate
Isolation
Storage electrode
Storage Node Poly
Si Substrate
2nd Field Oxide
Trench Cell
Stacked-capacitor Cell
Static CAM Memory Cell
Bit
Bit
Bit
Bit
Bit
Word
CAM
Word
•••
•••
CAM
M4
M8
M9
M6
M7
CAM
•••
•••
Bit
Word
CAM
M3
Match
Wired-NOR Match Line
S
int
S
M2
M1
M5
CAM in Cache Memory
CAM
SRAM
ARRAY
ARRAY
Hit Logic
Address Decoder
Input Drivers
Address
Tag
Sense Amps / Input Drivers
Hit
R/W
Data
•
•
•
•
Introduction
Non-volatile memories
RAM
Periphery circuits
Periphery
 Decoders
 Sense Amplifiers
 Input/Output Buffers
 Control / Timing Circuitry
Row Decoders
Collection of 2M complex logic gates
Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
Hierarchical Decoders
Multi-stage implementation improves performance
•••
WL 1
WL 0
A 0A 1 A 0A 1 A 0A 1 A 0A 1
A 2A 3 A 2A 3 A 2A 3 A 2A 3
•••
NAND decoder using
2-input pre-decoders
A1 A0
A0
A1
A3 A2
A2
A3
Dynamic Decoders
Precharge devices
GND
VDD
GND
WL 3
VDD
WL 3
WL 2
WL 2
VDD
WL 1
WL 1
V DD
WL 0
WL 0
VDD f
A0
A0
A1
A1
2-input NOR decoder
A0
A0
A1
A1
f
2-input NAND decoder
Active low inputs (all
are high except for
the selected WL
which is low)
4-input pass-transistor based column
decoder
BL 0 BL 1 BL 2 BL 3
A0
S0
S1
S2
A1
S3
2-input NOR decoder
D
Advantages: speed (tpd does not add to overall memory access time)
Only one extra transistor in signal path
Disadvantage: Large transistor count
4-to-1 tree based column
decoder
BL
BL
BL
BL
0
1
2
3
A0
A0
A1
A1
D
Number of devices drastically reduced
Delay increases quadratically with # of sections; prohibitive for large decoders
Solutions: buffers
progressive sizing
combination of tree and pass transistor approaches
Decoder for circular shiftregister
V DD
WL
V DD
V DD
WL
0
R
V DD
f
f
f
f
R
V DD
V DD
WL
1
f
f
f
f
R
V DD
2
f
f
f
f
• • •
Sense Amplifiers
×D V
C
tp = ---------------Iav
large
make D V as small
as possible
small
Idea: Use Sense Amplifer
small
transition
s.a.
input
output
Differential Sense Amplifier
V DD
M3
M4
y
M1
bit
SE
M2
Out
bit
M5
Directly applicable to
SRAMs
Differential Sensing ― SRAM
V DD
PC
V DD
BL
BL
EQ
V DD
y M3
WL i
M1
x
SE
V DD
M4
M2
2y
2x
2x
x
SE
M5
SE
SRAM cell i
Diff.
x Sense 2x
Amp
V DD
Output
y
SE
Output
(a) SRAM sensing scheme
(b) two stage differential amplifier
Latch-Based Sense Amplifier
(DRAM)
EQ
BL
BL
VDD
SE
SE
Initialized in its meta-stable point with EQ
Once adequate voltage gap created, sense amp enabled with SE
Positive feedback quickly forces output to a stable operating point.
Charge-Redistribution
Amplifier
V ref
VL
M2
M3
M1
C large
VS
C small
Transient Response
Concept
Charge-Redistribution Amplifier―
V
EPROM
DD
SE
Load
M4
Out
V casc
M3
Cascode
device
Cout
Ccol
WLC
Column
decoder
M2
BL
WL
M1
CBL
EPROM
array
Single-to-Differential Conversion
WL
BL
x
Cell
Diff.
S.A.
2x
Output
How to make a good Vref?
1
2
V ref
Open bitline architecture with
dummy cells
EQ
L
L1
L0
V DD
R0
R1
L
SE
BLL
CS
…
Dummy cell
CS
BLR
CS
…
SE
CS
CS
CS
Dummy cell
DRAM Read Process with Dummy
Cell
3
3
2
2
BL
V
1
0
0
BL
V
BL
1
2
1
0
3
BL
0
1
t (ns)
t (ns)
reading 0
reading 1
3
EQ
WL
2
V
2
SE
1
0
0
1
2
t (ns)
control signals
3
3
Voltage Regulator
VDD
Mdrive
VDL
VREF
Equivalent Model
Vbias
VREF
+
Mdrive
VDL
Charge Pump
-
Q=Cpump (VDD-Vt)
DRAM Timing
SDRAM Timing
A chunk of data is processed at the same time 
effective when data is written in large sequential blocks
RDRAM Architecture
Rambus DRAM to reduce the access time
Bus
Clocks
Data
bus
Operates at uP
clock speed
Synch.
DRAM
k
kx l
memory
array
network
mux/demux
Column
Row
demux
packet dec.
demux
packet dec.
up to 1.6
GB/sec
bandwidth
Highly parallel: A large number of bits can be read/write at the same time
interface ; fast and synch
Address Transition Detection
V DD
A0
DELAY
td
A1
DELAY
td
A N2 1
DELAY
td
ATD
…
ATD
•
•
•
•
•
Introduction
Non-volatile memories
RAM
Periphery
Reliability
Reliability and Yield
Sensing Parameters in DRAM
1000
C D(1F)
V smax (mv)
100
smax
V
,
DD
V
,S
10
C
,S
Q
,
D
C
C S(1F)
Q S(1C)
V DD (V)
Q S = C S V DD / 2
V smax = Q S / (C S 1 C D )
4K
64K
1M 16M 256M 4G
Memory Capacity (bits
/ chip)
From [Itoh01]
64G
Noise Sources in 1T DRam
BL
substrate Adjacent BL
CWBL
a -particles
WL
leakage
CS
electrode
Ccross
Open Bit-line Architecture —Cross
Coupling
EQ
WL 1
WL 0
WL
C WBL D
C WBL
WL D
WL 1
WL 0
BL
BL
C BL
C
C
C
Sense
Amplifier
C BL
C
C
C
Folded-Bitline Architecture
WL 1
WL 0 C
WBL
WL 0
WL D
WL D
CBL
BL
…
BL
WL 1
C
x
C
C
C
C
C
Sense
EQ Amplifier
x
CBL
CWBL
y
y
Transposed-Bitline Architecture
Ccross
BL 9
BL
SA
BL
BL 99
(a) Straightforward bit-line routing
Ccross
BL 9
BL
SA
BL
BL 99
(b) Transposed bit-line architecture
Alpha-particles (or Neutrons)
a -particle
WL
V
DD
BL
n1
SiO 2
2
1
2
1
1
2
2
1
2
1
1
2
1 Particle ~ 1 Million Carriers
Yield
Yield curves at different stages of process maturity
(from [Veendrick92])
Redundancy
Row
Address
Redundant
rows
:
Redundant
columns
Memory
Array
Row Decoder
Column Decoder
Column
Address
Fuse
Bank
Error-Correcting Codes
Example: Hamming Codes
e.g. B3 Wrong
with
1
1
0
=3
Redundancy and Error Correction
Sources of Power Dissipation in
Memories
V
DD
I DD = Σ C iΔ V if+Σ I DCP
CHIP
nC DE V INT f
m
selected
C PT V INT f
mi act
I DCP
n
ROW
DEC
PERIPHERY
m(n 1)i hld
non-selected
ARRAY
mC DE V INT f
COLUMN DEC
V SS
From [Itoh00]
Data Retention in SRAM
1.30u
1.10u
0.13 m m CMOS
Ileakage
900n
700n
500n
Factor 7
(A)300n
0.18 m m CMOS
100n
0.00
.600
1.20
1.80
VDD
SRAM leakage increases with technology scaling
Suppressing Leakage in
SRAM
V DD
V DD
low-threshold transistor
V DDL
sleep
V DD,int
sleep
V DD,int
SRAM
cell
SRAM
cell
sleep
SRAM
cell
SRAM
cell
SRAM
cell
SRAM
cell
V SS,int
Inserting Extra Resistance
Reducing the supply voltage
Data Retention in DRAM
From [Itoh00]
Case Studies
• SRAM
• Flash Memory
4 Mbit SRAM
Hierarchical Word-line Architecture
Bit-line Circuitry
Block
select
Bit-line
load
BEQ
Local WL
Memory cell
B /T
B /T
CD
CD
CD
I/O
I/O line
I/O
Sense amplifier
ATD
Sense Amplifier (and
Waveforms)
I /O
Address
I /O
SEQ
Block
select
ATD
ATD
BS
BEQ
SA
BS
Vdd
I/O Lines
GND
SA
SEQ
SEQ
SEQ
SEQ
SEQ
DATA
Dei
Vdd
SA, SA
GND
DATA
BS
Data-cut
1 Gbit Flash Memory
From [Nakamura02]
Writing Flash Memory
108
106
104
102
100
0V
1V
2V
3V
4V
Number of cells
Vt of memory cells
Evolution of thresholds
Final Distribution
From [Nakamura02]
Read
Charge pump
2kB Page buffer & cache
10.7mm
125mm2 1Gbit NAND Flash
Memory
32 word lines
x 1024 blocks
16896 bit lines
11.7mm
From [Nakamura02]
2
125mm
•
•
•
•
•
•
•
•
•
1Gbit NAND Flash
Memory
Technology
0.13m p-sub CMOS triple-well
1poly, 1polycide, 1W, 2Al
Cell size
0.077m2
Chip size
125.2mm2
Organization 2112 x 8b x 64 page x 1k block
Power supply 2.7V-3.6V
Cycle time
50ns
Read time
25s
Program time 200s / page
Erase time
2ms / block
From [Nakamura02]
Semiconductor Memory Trends
(up to the 90’s)
Memory Size as a function of time: x 4 every three years
Semiconductor Memory Trends
(updated)
From [Itoh01]
Trends in Memory Cell Area
From [Itoh01]
Future generations
• Very specialized technologies for stand alone
memories  expensive
• Reliability is going to be a very important issue
(SER) particularly for SRAMs and DRAMs
• Power is going to be the limiting factor
particularly when it comes to standby currents
• Embedded memories is the prominent market
thrust driven by all mobile/SoC applications
Related documents