Download Lecture 20 DRAMs - University of British Columbia

Document related concepts

Switched-mode power supply wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Opto-isolator wikipedia , lookup

Shockley–Queisser limit wikipedia , lookup

Transistor wikipedia , lookup

Power MOSFET wikipedia , lookup

History of the transistor wikipedia , lookup

Transcript
CMOS Memories
Brad Quinton
(based on slides from R. Saleh)
Dept. of ECE
University of British Columbia
R. Saleh / B. Quinton
1
Overview
•
•
•
•
•
Memories are used in almost all existing chips and it represents a large
part of the semiconductor market today
In many applications, the most important characteristic of the memory
is its price per bit. That is, the user wants to spend the least amount of
money to store the required data.
To reduce the cost of a bit, means making it smaller, which in turns
intends to reduce the number of components it requires. Ideally, each
bit should contain only one transistor. While this gives very dense
storage, it leads to a number of challenging circuit design issues.
In this lecture, we start with a review of the SRAM. Then we look at
CAMs and DRAMs, ROMs, EPROMs, EEPROMs, and finally the
dominant Flash memories.
Readings:
Chapters 9 of HJS (see Chapter 8 for SRAM review material)
R. Saleh / B. Quinton
2
Lecture Outline
1.
2.
3.
4.
5.
6.
SRAM
CAM
DRAM
ROM
EPROM/ EEPROM
Flash
R. Saleh / B. Quinton
3
SRAM
Applications
1.
2.
3.
4.
Embedded RAM for ASICs and SoCs
Configuration Bits for FPGAs
Caches in most CPUs
Trace buffers in debug applications
•
In general SRAM is used to replace arrays of flip-flops or latches to increase
storage density and save die area, as always there are trade-offs:
•
•
•
•
Single word access - one address at a time
Lower performance - not as fast as flip-flops based designs
Requires BIST - there is no “scan chain”
Higher leakage current
R. Saleh / B. Quinton
5
High-level View: Memory
R. Saleh / B. Quinton
6
Overall Structure of 64Kb SRAM
Row decoder
2m =256
Column Pullups
word line
bitline
2n
=256
n=8
Address input
2m
Column Mux
m=8
Column decoder
Read/Write
Sense en
Write en
Sense amplifier
Read-write control
Write driver
Data in
R. Saleh / B. Quinton
Data out
7
Overall Structure of 64Kb SRAM
Row decoder
2m =256
Column Pullups
word line
bitline
Write
2n
=256
n=8
Address input
2m
Column Mux
m=8
Column decoder
Read/Write
Sense en
Write en
Sense amplifier
Read-write control
Write driver
Data in
R. Saleh / B. Quinton
Data out
8
Overall Structure of 64Kb SRAM
Row decoder
2m =256
Column Pullups
word line
bitline
Read
2n
=256
n=8
Address input
2m
Column Mux
m=8
Column decoder
Read/Write
Sense en
Write en
Sense amplifier
Read-write control
Write driver
Data in
R. Saleh / B. Quinton
Data out
9
Cross-Coupled Inverters
R. Saleh / B. Quinton
10
Cross-Coupled Inverters
Static State
R. Saleh / B. Quinton
11
Cross-Coupled Inverters
•
The cross-coupled inverter structure is fairly intuitive... it looks a lot like a flipflop or latch
•
However, if you look closely the same transistors and wires are being used for
both read and write operations
•
This makes things tricky. We could add logic to the cells to distinguish reads
and writes, but that we be expensive in terms of area.
•
Instead, we adjust the relative sizing of the transistors to ensure that reads and
writes work as expected.
R. Saleh / B. Quinton
12
6T SRAM Cell
Vdd
b
M5
Normal Design:
M6
M3
b
M4
q
q
M1
M2
wordline
Vdd
Vdd
Pull-up transistor
wp
1/2 of the mirror:
wa
q
Pull-down transistor
R. Saleh / B. Quinton
Access transistor
q
wd
Vdd
13
Reading a Cell
Vdd
b
Icell
b
M5
M6
M3
M4
q
Cbit
M1
(=0)
(1=) q
M2
Cbit
wl
R. Saleh / B. Quinton
14
Reading a Cell
pre-charged before read
Vdd
b
Icell
b
M5
M6
M3
M4
q
Cbit
wordline goes high
R. Saleh / B. Quinton
M1
(=0)
(1=) q
M2
Cbit
wl
15
Reading a Cell
pre-charged before read
Vdd
b
Icell
b
M5
M6
M3
M4
q
Cbit
M1
open
wordline goes high
R. Saleh / B. Quinton
(=0)
(1=) q
M2
Cbit
closed
wl
16
Reading a Cell
pre-charged before read
Vdd
b
Icell
b
M5
M6
M3
M1
open
wordline goes high
Trigger DV
M4
q
Cbit
wl
wl
(=0)
b,b
(1=) q
D
Cbit
M2
q
closed
q
b goes low because M1 is open
R. Saleh / B. Quinton
17
Reading a Cell
pre-charged before read
Vdd
b
Icell
b
M5
M6
M3
M1
open
wordline goes high
Trigger DV
M4
q
Cbit
wl
wl
(=0)
b,b
(1=) q
D
Cbit
M2
q
closed
q
b goes low because M1 is open
there is a risk the bit will “flip”
R. Saleh / B. Quinton
18
Transistor Ratio Required for Read
•
To ensure that the bit does not “flip” during a read the voltage drop of q must be
controlled
•
This can be done (see textbook) by ensuring that we have the following
transistor width ratios:
W1
 1.5
W3
R. Saleh / B. Quinton
19
Writing a Cell
Vdd
Vdd
M5
b
M6
M4
M3
q
M1
Vdd
R. Saleh / B. Quinton
b
( = 0)
(1=) q
M2
Gnd
20
Writing a Cell
Vdd
Vdd
M5
b
M6
b
M4
M3
q
M1
Vdd
( = 0)
(1=) q
M2
Gnd
wordline goes high
R. Saleh / B. Quinton
21
Writing a Cell
Vdd
Vdd
M5
b
M6
b
M4
M3
q
M1
Vdd
( = 0)
(1=) q
M2
bitline is forced low
Gnd
wordline goes high
R. Saleh / B. Quinton
22
Writing a Cell
Vdd
voltage drops
Vdd
M5
b
M6
b
M4
M3
q
M1
Vdd
( = 0)
(1=) q
M2
bitline is forced low
Gnd
wordline goes high
R. Saleh / B. Quinton
23
Writing a Cell
Vdd
voltage drops
Vdd
M5
b
M6
M4
M3
q
M1
Vdd
wordline goes high
R. Saleh / B. Quinton
b
( = 0)
wl
b
(1=) q
b
q
M2
bitline is forced low
Gnd
q
must be forced to cross threshold voltage
24
Transistor Ratio Required for Write
•
To ensure that the bit does “flip” during a write q must be pulled low
•
This can be done (see textbook) by ensuring that we have the following
transistor width ratios:
W4
 1.5
W6
R. Saleh / B. Quinton
25
Overall Transistor Widths
•
Both sides of the circuit must be balanced, therefore:
W4 = W 3
W6 = W 5
W2 = W 1
•
So, given the minimum transistor width, we have a starting point for our design
R. Saleh / B. Quinton
26
Layout of SRAM Cell
Vdd
Vdd
b
x
b
x
x
x
q
x
q
q
q
x
x
x
x
x
x
x
R. Saleh / B. Quinton
x
27
CAMs
Applications
•
•
•
•
•
•
•
•
CAMs are often used in cache memories
Instead of storing/accessing data in cache through a static address, we
would like to be able to store it anywhere in SRAM and then retrieve it
later when needed
We store the data with a keyword based on the application
The lookup can be done with a tag that is matched with the keyword
stored in memory and associated with the data
Key design issue is to minimize the time required to access the data
that matches the tag
The matching is done simultaneously with all tags to reduce the read
time so it consumes a lot of power!
That’s why CAMs are considered to be power hungry
NOTE: CAMs are really only useful if you *need* single-cycle latency,
since you can always emulate a CAM with multiple stages of RAM
lookups..
R. Saleh / B. Quinton
29
Associative Memory
R. Saleh / B. Quinton
30
Overall Architecture of CAM Array
WL
Row Decoder
matchline
CAM Array
SRAM Array (256x256)
tagline
Addr [8:0]
dummy replica row
Column decode & MUX
2
CAM Write I/O & TAG drive
33
SRAM Read/Write IO
3
Data [63:0]
Tag [31:0]
valid bit
R. Saleh / B. Quinton
index[2:0]
31
Overall Structure of CAM Lookup
WL 1
6T SRAM
cell
6T SRAM
Bit 1
6T SRAM
cell
•••
precharge
Matchline 1
•
•
•
•
•
•
•
•
•
WL n
6T SRAM
Bit 1
6T SRAM
cell
6T SRAM
cell
•••
precharge
Matchline n
Bit 1 of TAG
R. Saleh / B. Quinton
bit 2 of TAG
bit N of TAG
32
CAM Cell
b
b
50l
WL
Matchline
80l
M7
M9
M8
R. Saleh / B. Quinton
M10
33
DRAM
Applications
1.
2.
RAM for Desktops, Laptops, Servers. etc.
RAM for embedded systems: routers, switches, set-top boxes, etc.
•
DRAMs are extremely dense (only 1 transistor/bit!) but they have a number of
limitations:
•
•
•
They require a specialized fabrication process, so they can not (easily) be
mixed with regular CMOS logic - DRAM will not be “embedded” in
processors anytime soon...
DRAMs bits are dynamic (i.e. the fade away...) - they must be re-freshed or
they become invalid!
DRAM is sensitive to “soft-errors” caused by alpha particles - servers
usually require ECC (Error Correcting Codes”
R. Saleh / B. Quinton
35
One Transistor DRAM
•
•
• Issue with VT drops
•
Reading is more complex
– Precharge bitline
– Raise wordline
(boosted)
Wordline
Bitline
Minimal cell
– Transistor is really only an access
device
– Storage device is a capacitor
Write operation is similar to SRAM
– Place data value on bitline
– Raise wordline
– Value on bitline is now on cell
M1
Ccell
• Charge sharing occurs
– Read the voltage on bitline
– Operation destroys value in cell
R. Saleh / B. Quinton
36
Simple way to design capacitor:
Capacitor
Use an extra “poly-plate” layer
Metal word line
M1 word
line
SiO2
poly
n+
Field Oxide
n+
poly
Inversion layer
induced by
plate bias
Diffused
bit line
Polysilicon
Polysilicon
plate
gate
(a) Cross-section
(b) Layout
Used Polysilicon-Diffusion Capacitance
Expensive in Area
R. Saleh / B. Quinton
37
Area is everything for DRAM
•
Area is the number one concern of DRAM designers. Everyone wants more
RAM is the same area....
•
The number of transistors is already minimal. (You can’t get lower than one!)
•
The trick then is to design a smaller capacitor....
R. Saleh / B. Quinton
38
Capacitor Structures for DRAMs
Word line
Insulating Layer
Cell plate
Capacitor dielectric layer
Cell Plate Si
Capacitor Insulator
Refilling Poly
Transfer gate
Isolation
Storage electrode
Capacitor Insulator
Si Substrate
2nd Field Oxide
Trench Capacitor
R. Saleh / B. Quinton
Stacked Capacitor
39
DRAM Fabrication Diverges from Regular
CMOS
•
The more tricks that you do to make smaller and better capacitors the more you
are straying from “normal” CMOS
•
Because of this DRAMs are almost always fabricated separately, even though it
would be extremely useful to integrate DRAM in an SoC.
R. Saleh / B. Quinton
40
Issues in 1-T DRAMs
•
Leakage
– Leakage rate sets the refresh rate (there is no re-generation)
•
•
•
•
With large memory, refresh time must be large
Want refresh to take only a few percent of the access cycles
– Must keep all leakage sources very small (subthreshold)
Stored Charge
– Want as much charge as possible
– Large C and large V, need to get full Vdd into cell
Readout
– Is through charge sharing
– Limit Cbit so it is about 10x Ccell
– Need to send small signals
R. Saleh / B. Quinton
41
Reading the Cell
Hard problem:
• Small single-ended signal
• Assume bitlines precharged to Vdd
– Voltage either stays the same
Ccell
• Cell stored a ‘1’
– Or Voltage drops about 200mV
– Precise voltage drop
Cbit
• Depends of Ccell/Cbit
Another issue:
• The value of all the cells on the wordline are destroyed
• You need to read ALL the cells on the wordline on each access
• Then you have to write them back into the cells
R. Saleh / B. Quinton
42
Reading the Cell
D (column 1)
10C
10C
Sense
D (column 1)
Amp
C
•••
C
½C
D (column 2)
C
½C
10C
10C
•••
C
D (column 2)
Sense
C
•••
R1
Amp
C
R128
½C
Rdummy1
½C
Rdummy2
C
•••
R129
C
R256
Single ended voltage detection is difficult, so a dummy cell
is used on the opposite side of the cell being read
R. Saleh / B. Quinton
43
Reading the Cell
provides differential voltage to sense amp
D (column 1)
10C
10C
Sense
D (column 1)
Amp
C
•••
C
½C
D (column 2)
C
½C
10C
10C
•••
C
D (column 2)
Sense
C
•••
R1
Amp
C
R128
½C
Rdummy1
½C
Rdummy2
C
•••
R129
C
R256
Single ended voltage detection is difficult, so a dummy cell
is used on the opposite side of the cell being read.
R. Saleh / B. Quinton
44
Latch-based Sense Amplifier
VDD
SenseEnable
M5
M1
M2
D
D
M3
SenseEnable
M4
M6
• When differential voltage develops on D
and D_bar, the SenseEnable line is
turned on
• This activates the pullup and pulldown
• Cross-coupled inverters use
regenerative behavior to restore full
logic levels
• These values are written back into the
cell
Dummy cell voltage is set to Vdd/2
R. Saleh / B. Quinton
45
ROM
Applications
1. Initial Boot Code in embedded designs
2. Alternative implementation of some transform / coding algorithms
3. Power-on Self Test code
•
The basic ROM structure forms the basis for EPROM and Flash...
R. Saleh / B. Quinton
47
Read-Only Memories
• store values in memory at design time
COLUMN
(BIT LINE)
• large storage - typically 1T/bit
• operation:
ROW
(WORD LINE)
– word line goes high
–single bit line goes low or stays high
depending on stored value
–sense amp. reads value
• design alternatives:
– Nor array
– Nand array
– Mixed Nor/Nand
R. Saleh / B. Quinton
SENSE
CIRCUIT
• Presence of a transistor implies a stored “0”
• Absence of a transistor implies a stored “1”
48
NOR Array
• Columns of Nor Array form
large NOR gate
• Sense circuit input may be a
simple pull-up or a more
complicated sense amplifier
•Program by making transistors
stay in the off state even when
word line goes high:
– source or drain contact
– presence or absence or
diffusion region
Bj
W
Bj+1
Bj+2
Bj+3
i
Wi+1
•
•
•
•
•
•
•
•
•
•
•
•
– enhancement implant
i.e., VT>5V
SENSE
CIRCUIT
SENSE
CIRCUIT
SENSE
CIRCUIT
SENSE
CIRCUIT
If the transistor is present the bitline will be pulled down
R. Saleh / B. Quinton
49
NAND Array
VDD
• Program by making transistor
always on (shorted)
• Can program with depletion
implant (VT < 0.0)
• No ground lines through
core; only poly rows and
diffused columns
• Densest array possible since
there are no contacts
• Word lines are high by
default; one switches low
• Can be very slow due to long
series resistance paths; only
useful for small memories
R. Saleh / B. Quinton
To reduce power
COLUMN
SELECT
Bj
Bj+1
Bj+2
Wi+2
Wi+1
Wi
W0
50
If the transistor is absent the path to ground will not be complete
EPROM / EEPROM
Applications
1.
2.
3.
4.
Initial Boot code or BIOS
Configuration storage for FPGAs
Software storage for embedded systems
Code storage for DRM systems
•
EEPROM is quickly being replaced by flash in most systems since it is much
more convenient.
R. Saleh / B. Quinton
52
EPROM Structure and Operation
Id
2
“0”
“1”
Write
1
Erase
VT0
R. Saleh / B. Quinton
Vread
Vg
VT1
53
EPROM Structure and Operation
Id
2
“0”
“1”
Write
1
Erase
VT0
Vread
Vg
VT1
This is the key to the EPROM structure.
R. Saleh / B. Quinton
54
Operation of Floating Gate Device
V2
C
2
C1
R. Saleh / B. Quinton
Circuit symbol
V
V
1
1,new
= V 1,old +
C
C
2
1
DV2
+ C2
55
EPROM Write/Erase
UV light
GND
Vpp
GND
N+
Vd
e-
e-
e-
e-
N+
P-substrate
(a) write process – hot carrier injection
R. Saleh / B. Quinton
GND
Vs
N+
eN+
eP-substrate
(b) erase process – UV light
56
EPROM Write/Erase
Voltage >> VDD
UV light
GND
Vpp
GND
N+
Vd
e-
GND
Vs
e-
e-
e-
N+
P-substrate
(a) write process – hot carrier injection
N+
eN+
eP-substrate
(b) erase process – UV light
Result is that threshold voltage is now
greater than VDD
R. Saleh / B. Quinton
57
EPROM Write/Erase
Voltage >> VDD
UV light
GND
Vpp
GND
N+
Vd
e-
e-
e-
e-
N+
P-substrate
(a) write process – hot carrier injection
Result is that threshold voltage is now
greater than VDD
R. Saleh / B. Quinton
GND
Vs
N+
eN+
eP-substrate
(b) erase process – UV light
UV light makes the SiO2 slightly
conductive.
58
EEPROM
•
What if you don’t have access the device to shine light on it?
•
Customer setups, remote situations, sealed units....
•
What we want is to be able to erase electrically.
•
There is a solution: add a transistor per bit.
R. Saleh / B. Quinton
59
Conventional EEPROM (FLOTOX)
•
•
•
•
Two transistors per cell
Selective erase
Relatively low area efficiency
Write/Erase done by Fowler-Nordheim (FN) tunneling
– FLOTOX = (FLOating gate Tunneling Oxide)
G
GND
N+
FN tunneling
R. Saleh / B. Quinton
Control Gate 2
Floating Gate 1
WL Select Transistor
BL
ee- N+
N+
p substrate
60
Write/Erase Operation
12V
0V
G0
G1
0V
0V
VDD WL0
0V
12V G0
0V
G1
WL1
WL1
BL0 BL1 BL2 BL3
(a) Program (lower VT of FOTOX device)
R. Saleh / B. Quinton
0V
Source
WL0
0V
12V
Source
VDD
0V 0V
BL0 BL1 BL2 BL3
(b) Erase (raise VT of whole row)
61
Read Operation
•
VDD
0
low VT
VDD
G0
VDD
G1
high VT
high VT
low VT
Sour ce
WL0
0
•
•
0V
WL1
Set gate control voltage to
VDD on FLOTOX devices
Connect source to Gnd
When word line goes high,
each selected cell will
either pull the BL low or
leave it high depending on
the state of the
programmed cell
BL0 BL1 BL2 BL3
•
R. Saleh / B. Quinton
BUT: 2T cell is too
expensive in terms of
area
62
Flash
Applications
1.
2.
3.
4.
5.
6.
Cell Phones
iPods
USB keys
Hard drives
BIOS
FPGA configuration bits for some specialized applications
•
•
Flash requires only 1 transistor per bit...
Flash supports single bit writes, with block erase. However this is not a
problem in systems with other storage, since you can read the entire block
before you erase it and write it back with the changes that you want....
R. Saleh / B. Quinton
64
NOR Flash Memory Architecture
Source line
Source switch
Bit line
Row de coder
Ad dress
Word line
Gate
Source
Sense amplifier
Column decoder
N+
Drain
N+
Data
R. Saleh / B. Quinton
65
Write/Erase Operation
Gnd
Gnd
Gnd
Vpp
Vs
Gnd
Gnd
Gnd
Gnd
Gnd
Gnd
Gnd
Gnd Vd(w) Gnd Gnd
Vpp
GND
N+
R. Saleh / B. Quinton
GND
Write: Hot-carriers
Vd(w)
ee-
Gnd Gnd Gnd Gnd
N+
Vs
N+ e-
e-
Erase: FN Tunneling
GND
N+
66
Flash Read Operation
•
•
•
Apply Vd to selected bit line, Vread to word line, Gnd to Source Connection
Write-VT > Vread > Erase – VT
Sense drain current using sense amplifier
Low if data=“0”
Load
High if data=”1”
Gnd
Gnd
Vref
Vread
Von
+ Gnd
S/A
Vd
Gnd
Gnd
Vread
Gnd
R. Saleh / B. Quinton
Vd
Gnd
Gnd
67
Summary
•
•
•
•
•
•
SRAM - Simple, static easy to use, but with relatively low density
CAM - Special purpose SRAM-like configuration for latency sensitive
applications
DRAM - Very high-density, but complex to implement and use
ROM - NOR or NAND configurations depending on size/speed
EPROM, E2PROM- maintain state without a power source
Flash - 1T storage that maintain state without a power source, bit-wise
write, block erase
R. Saleh / B. Quinton
68
End.