Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CMOS Memories Brad Quinton (based on slides from R. Saleh) Dept. of ECE University of British Columbia R. Saleh / B. Quinton 1 Overview • • • • • Memories are used in almost all existing chips and it represents a large part of the semiconductor market today In many applications, the most important characteristic of the memory is its price per bit. That is, the user wants to spend the least amount of money to store the required data. To reduce the cost of a bit, means making it smaller, which in turns intends to reduce the number of components it requires. Ideally, each bit should contain only one transistor. While this gives very dense storage, it leads to a number of challenging circuit design issues. In this lecture, we start with a review of the SRAM. Then we look at CAMs and DRAMs, ROMs, EPROMs, EEPROMs, and finally the dominant Flash memories. Readings: Chapters 9 of HJS (see Chapter 8 for SRAM review material) R. Saleh / B. Quinton 2 Lecture Outline 1. 2. 3. 4. 5. 6. SRAM CAM DRAM ROM EPROM/ EEPROM Flash R. Saleh / B. Quinton 3 SRAM Applications 1. 2. 3. 4. Embedded RAM for ASICs and SoCs Configuration Bits for FPGAs Caches in most CPUs Trace buffers in debug applications • In general SRAM is used to replace arrays of flip-flops or latches to increase storage density and save die area, as always there are trade-offs: • • • • Single word access - one address at a time Lower performance - not as fast as flip-flops based designs Requires BIST - there is no “scan chain” Higher leakage current R. Saleh / B. Quinton 5 High-level View: Memory R. Saleh / B. Quinton 6 Overall Structure of 64Kb SRAM Row decoder 2m =256 Column Pullups word line bitline 2n =256 n=8 Address input 2m Column Mux m=8 Column decoder Read/Write Sense en Write en Sense amplifier Read-write control Write driver Data in R. Saleh / B. Quinton Data out 7 Overall Structure of 64Kb SRAM Row decoder 2m =256 Column Pullups word line bitline Write 2n =256 n=8 Address input 2m Column Mux m=8 Column decoder Read/Write Sense en Write en Sense amplifier Read-write control Write driver Data in R. Saleh / B. Quinton Data out 8 Overall Structure of 64Kb SRAM Row decoder 2m =256 Column Pullups word line bitline Read 2n =256 n=8 Address input 2m Column Mux m=8 Column decoder Read/Write Sense en Write en Sense amplifier Read-write control Write driver Data in R. Saleh / B. Quinton Data out 9 Cross-Coupled Inverters R. Saleh / B. Quinton 10 Cross-Coupled Inverters Static State R. Saleh / B. Quinton 11 Cross-Coupled Inverters • The cross-coupled inverter structure is fairly intuitive... it looks a lot like a flipflop or latch • However, if you look closely the same transistors and wires are being used for both read and write operations • This makes things tricky. We could add logic to the cells to distinguish reads and writes, but that we be expensive in terms of area. • Instead, we adjust the relative sizing of the transistors to ensure that reads and writes work as expected. R. Saleh / B. Quinton 12 6T SRAM Cell Vdd b M5 Normal Design: M6 M3 b M4 q q M1 M2 wordline Vdd Vdd Pull-up transistor wp 1/2 of the mirror: wa q Pull-down transistor R. Saleh / B. Quinton Access transistor q wd Vdd 13 Reading a Cell Vdd b Icell b M5 M6 M3 M4 q Cbit M1 (=0) (1=) q M2 Cbit wl R. Saleh / B. Quinton 14 Reading a Cell pre-charged before read Vdd b Icell b M5 M6 M3 M4 q Cbit wordline goes high R. Saleh / B. Quinton M1 (=0) (1=) q M2 Cbit wl 15 Reading a Cell pre-charged before read Vdd b Icell b M5 M6 M3 M4 q Cbit M1 open wordline goes high R. Saleh / B. Quinton (=0) (1=) q M2 Cbit closed wl 16 Reading a Cell pre-charged before read Vdd b Icell b M5 M6 M3 M1 open wordline goes high Trigger DV M4 q Cbit wl wl (=0) b,b (1=) q D Cbit M2 q closed q b goes low because M1 is open R. Saleh / B. Quinton 17 Reading a Cell pre-charged before read Vdd b Icell b M5 M6 M3 M1 open wordline goes high Trigger DV M4 q Cbit wl wl (=0) b,b (1=) q D Cbit M2 q closed q b goes low because M1 is open there is a risk the bit will “flip” R. Saleh / B. Quinton 18 Transistor Ratio Required for Read • To ensure that the bit does not “flip” during a read the voltage drop of q must be controlled • This can be done (see textbook) by ensuring that we have the following transistor width ratios: W1 1.5 W3 R. Saleh / B. Quinton 19 Writing a Cell Vdd Vdd M5 b M6 M4 M3 q M1 Vdd R. Saleh / B. Quinton b ( = 0) (1=) q M2 Gnd 20 Writing a Cell Vdd Vdd M5 b M6 b M4 M3 q M1 Vdd ( = 0) (1=) q M2 Gnd wordline goes high R. Saleh / B. Quinton 21 Writing a Cell Vdd Vdd M5 b M6 b M4 M3 q M1 Vdd ( = 0) (1=) q M2 bitline is forced low Gnd wordline goes high R. Saleh / B. Quinton 22 Writing a Cell Vdd voltage drops Vdd M5 b M6 b M4 M3 q M1 Vdd ( = 0) (1=) q M2 bitline is forced low Gnd wordline goes high R. Saleh / B. Quinton 23 Writing a Cell Vdd voltage drops Vdd M5 b M6 M4 M3 q M1 Vdd wordline goes high R. Saleh / B. Quinton b ( = 0) wl b (1=) q b q M2 bitline is forced low Gnd q must be forced to cross threshold voltage 24 Transistor Ratio Required for Write • To ensure that the bit does “flip” during a write q must be pulled low • This can be done (see textbook) by ensuring that we have the following transistor width ratios: W4 1.5 W6 R. Saleh / B. Quinton 25 Overall Transistor Widths • Both sides of the circuit must be balanced, therefore: W4 = W 3 W6 = W 5 W2 = W 1 • So, given the minimum transistor width, we have a starting point for our design R. Saleh / B. Quinton 26 Layout of SRAM Cell Vdd Vdd b x b x x x q x q q q x x x x x x x R. Saleh / B. Quinton x 27 CAMs Applications • • • • • • • • CAMs are often used in cache memories Instead of storing/accessing data in cache through a static address, we would like to be able to store it anywhere in SRAM and then retrieve it later when needed We store the data with a keyword based on the application The lookup can be done with a tag that is matched with the keyword stored in memory and associated with the data Key design issue is to minimize the time required to access the data that matches the tag The matching is done simultaneously with all tags to reduce the read time so it consumes a lot of power! That’s why CAMs are considered to be power hungry NOTE: CAMs are really only useful if you *need* single-cycle latency, since you can always emulate a CAM with multiple stages of RAM lookups.. R. Saleh / B. Quinton 29 Associative Memory R. Saleh / B. Quinton 30 Overall Architecture of CAM Array WL Row Decoder matchline CAM Array SRAM Array (256x256) tagline Addr [8:0] dummy replica row Column decode & MUX 2 CAM Write I/O & TAG drive 33 SRAM Read/Write IO 3 Data [63:0] Tag [31:0] valid bit R. Saleh / B. Quinton index[2:0] 31 Overall Structure of CAM Lookup WL 1 6T SRAM cell 6T SRAM Bit 1 6T SRAM cell ••• precharge Matchline 1 • • • • • • • • • WL n 6T SRAM Bit 1 6T SRAM cell 6T SRAM cell ••• precharge Matchline n Bit 1 of TAG R. Saleh / B. Quinton bit 2 of TAG bit N of TAG 32 CAM Cell b b 50l WL Matchline 80l M7 M9 M8 R. Saleh / B. Quinton M10 33 DRAM Applications 1. 2. RAM for Desktops, Laptops, Servers. etc. RAM for embedded systems: routers, switches, set-top boxes, etc. • DRAMs are extremely dense (only 1 transistor/bit!) but they have a number of limitations: • • • They require a specialized fabrication process, so they can not (easily) be mixed with regular CMOS logic - DRAM will not be “embedded” in processors anytime soon... DRAMs bits are dynamic (i.e. the fade away...) - they must be re-freshed or they become invalid! DRAM is sensitive to “soft-errors” caused by alpha particles - servers usually require ECC (Error Correcting Codes” R. Saleh / B. Quinton 35 One Transistor DRAM • • • Issue with VT drops • Reading is more complex – Precharge bitline – Raise wordline (boosted) Wordline Bitline Minimal cell – Transistor is really only an access device – Storage device is a capacitor Write operation is similar to SRAM – Place data value on bitline – Raise wordline – Value on bitline is now on cell M1 Ccell • Charge sharing occurs – Read the voltage on bitline – Operation destroys value in cell R. Saleh / B. Quinton 36 Simple way to design capacitor: Capacitor Use an extra “poly-plate” layer Metal word line M1 word line SiO2 poly n+ Field Oxide n+ poly Inversion layer induced by plate bias Diffused bit line Polysilicon Polysilicon plate gate (a) Cross-section (b) Layout Used Polysilicon-Diffusion Capacitance Expensive in Area R. Saleh / B. Quinton 37 Area is everything for DRAM • Area is the number one concern of DRAM designers. Everyone wants more RAM is the same area.... • The number of transistors is already minimal. (You can’t get lower than one!) • The trick then is to design a smaller capacitor.... R. Saleh / B. Quinton 38 Capacitor Structures for DRAMs Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Capacitor Insulator Si Substrate 2nd Field Oxide Trench Capacitor R. Saleh / B. Quinton Stacked Capacitor 39 DRAM Fabrication Diverges from Regular CMOS • The more tricks that you do to make smaller and better capacitors the more you are straying from “normal” CMOS • Because of this DRAMs are almost always fabricated separately, even though it would be extremely useful to integrate DRAM in an SoC. R. Saleh / B. Quinton 40 Issues in 1-T DRAMs • Leakage – Leakage rate sets the refresh rate (there is no re-generation) • • • • With large memory, refresh time must be large Want refresh to take only a few percent of the access cycles – Must keep all leakage sources very small (subthreshold) Stored Charge – Want as much charge as possible – Large C and large V, need to get full Vdd into cell Readout – Is through charge sharing – Limit Cbit so it is about 10x Ccell – Need to send small signals R. Saleh / B. Quinton 41 Reading the Cell Hard problem: • Small single-ended signal • Assume bitlines precharged to Vdd – Voltage either stays the same Ccell • Cell stored a ‘1’ – Or Voltage drops about 200mV – Precise voltage drop Cbit • Depends of Ccell/Cbit Another issue: • The value of all the cells on the wordline are destroyed • You need to read ALL the cells on the wordline on each access • Then you have to write them back into the cells R. Saleh / B. Quinton 42 Reading the Cell D (column 1) 10C 10C Sense D (column 1) Amp C ••• C ½C D (column 2) C ½C 10C 10C ••• C D (column 2) Sense C ••• R1 Amp C R128 ½C Rdummy1 ½C Rdummy2 C ••• R129 C R256 Single ended voltage detection is difficult, so a dummy cell is used on the opposite side of the cell being read R. Saleh / B. Quinton 43 Reading the Cell provides differential voltage to sense amp D (column 1) 10C 10C Sense D (column 1) Amp C ••• C ½C D (column 2) C ½C 10C 10C ••• C D (column 2) Sense C ••• R1 Amp C R128 ½C Rdummy1 ½C Rdummy2 C ••• R129 C R256 Single ended voltage detection is difficult, so a dummy cell is used on the opposite side of the cell being read. R. Saleh / B. Quinton 44 Latch-based Sense Amplifier VDD SenseEnable M5 M1 M2 D D M3 SenseEnable M4 M6 • When differential voltage develops on D and D_bar, the SenseEnable line is turned on • This activates the pullup and pulldown • Cross-coupled inverters use regenerative behavior to restore full logic levels • These values are written back into the cell Dummy cell voltage is set to Vdd/2 R. Saleh / B. Quinton 45 ROM Applications 1. Initial Boot Code in embedded designs 2. Alternative implementation of some transform / coding algorithms 3. Power-on Self Test code • The basic ROM structure forms the basis for EPROM and Flash... R. Saleh / B. Quinton 47 Read-Only Memories • store values in memory at design time COLUMN (BIT LINE) • large storage - typically 1T/bit • operation: ROW (WORD LINE) – word line goes high –single bit line goes low or stays high depending on stored value –sense amp. reads value • design alternatives: – Nor array – Nand array – Mixed Nor/Nand R. Saleh / B. Quinton SENSE CIRCUIT • Presence of a transistor implies a stored “0” • Absence of a transistor implies a stored “1” 48 NOR Array • Columns of Nor Array form large NOR gate • Sense circuit input may be a simple pull-up or a more complicated sense amplifier •Program by making transistors stay in the off state even when word line goes high: – source or drain contact – presence or absence or diffusion region Bj W Bj+1 Bj+2 Bj+3 i Wi+1 • • • • • • • • • • • • – enhancement implant i.e., VT>5V SENSE CIRCUIT SENSE CIRCUIT SENSE CIRCUIT SENSE CIRCUIT If the transistor is present the bitline will be pulled down R. Saleh / B. Quinton 49 NAND Array VDD • Program by making transistor always on (shorted) • Can program with depletion implant (VT < 0.0) • No ground lines through core; only poly rows and diffused columns • Densest array possible since there are no contacts • Word lines are high by default; one switches low • Can be very slow due to long series resistance paths; only useful for small memories R. Saleh / B. Quinton To reduce power COLUMN SELECT Bj Bj+1 Bj+2 Wi+2 Wi+1 Wi W0 50 If the transistor is absent the path to ground will not be complete EPROM / EEPROM Applications 1. 2. 3. 4. Initial Boot code or BIOS Configuration storage for FPGAs Software storage for embedded systems Code storage for DRM systems • EEPROM is quickly being replaced by flash in most systems since it is much more convenient. R. Saleh / B. Quinton 52 EPROM Structure and Operation Id 2 “0” “1” Write 1 Erase VT0 R. Saleh / B. Quinton Vread Vg VT1 53 EPROM Structure and Operation Id 2 “0” “1” Write 1 Erase VT0 Vread Vg VT1 This is the key to the EPROM structure. R. Saleh / B. Quinton 54 Operation of Floating Gate Device V2 C 2 C1 R. Saleh / B. Quinton Circuit symbol V V 1 1,new = V 1,old + C C 2 1 DV2 + C2 55 EPROM Write/Erase UV light GND Vpp GND N+ Vd e- e- e- e- N+ P-substrate (a) write process – hot carrier injection R. Saleh / B. Quinton GND Vs N+ eN+ eP-substrate (b) erase process – UV light 56 EPROM Write/Erase Voltage >> VDD UV light GND Vpp GND N+ Vd e- GND Vs e- e- e- N+ P-substrate (a) write process – hot carrier injection N+ eN+ eP-substrate (b) erase process – UV light Result is that threshold voltage is now greater than VDD R. Saleh / B. Quinton 57 EPROM Write/Erase Voltage >> VDD UV light GND Vpp GND N+ Vd e- e- e- e- N+ P-substrate (a) write process – hot carrier injection Result is that threshold voltage is now greater than VDD R. Saleh / B. Quinton GND Vs N+ eN+ eP-substrate (b) erase process – UV light UV light makes the SiO2 slightly conductive. 58 EEPROM • What if you don’t have access the device to shine light on it? • Customer setups, remote situations, sealed units.... • What we want is to be able to erase electrically. • There is a solution: add a transistor per bit. R. Saleh / B. Quinton 59 Conventional EEPROM (FLOTOX) • • • • Two transistors per cell Selective erase Relatively low area efficiency Write/Erase done by Fowler-Nordheim (FN) tunneling – FLOTOX = (FLOating gate Tunneling Oxide) G GND N+ FN tunneling R. Saleh / B. Quinton Control Gate 2 Floating Gate 1 WL Select Transistor BL ee- N+ N+ p substrate 60 Write/Erase Operation 12V 0V G0 G1 0V 0V VDD WL0 0V 12V G0 0V G1 WL1 WL1 BL0 BL1 BL2 BL3 (a) Program (lower VT of FOTOX device) R. Saleh / B. Quinton 0V Source WL0 0V 12V Source VDD 0V 0V BL0 BL1 BL2 BL3 (b) Erase (raise VT of whole row) 61 Read Operation • VDD 0 low VT VDD G0 VDD G1 high VT high VT low VT Sour ce WL0 0 • • 0V WL1 Set gate control voltage to VDD on FLOTOX devices Connect source to Gnd When word line goes high, each selected cell will either pull the BL low or leave it high depending on the state of the programmed cell BL0 BL1 BL2 BL3 • R. Saleh / B. Quinton BUT: 2T cell is too expensive in terms of area 62 Flash Applications 1. 2. 3. 4. 5. 6. Cell Phones iPods USB keys Hard drives BIOS FPGA configuration bits for some specialized applications • • Flash requires only 1 transistor per bit... Flash supports single bit writes, with block erase. However this is not a problem in systems with other storage, since you can read the entire block before you erase it and write it back with the changes that you want.... R. Saleh / B. Quinton 64 NOR Flash Memory Architecture Source line Source switch Bit line Row de coder Ad dress Word line Gate Source Sense amplifier Column decoder N+ Drain N+ Data R. Saleh / B. Quinton 65 Write/Erase Operation Gnd Gnd Gnd Vpp Vs Gnd Gnd Gnd Gnd Gnd Gnd Gnd Gnd Vd(w) Gnd Gnd Vpp GND N+ R. Saleh / B. Quinton GND Write: Hot-carriers Vd(w) ee- Gnd Gnd Gnd Gnd N+ Vs N+ e- e- Erase: FN Tunneling GND N+ 66 Flash Read Operation • • • Apply Vd to selected bit line, Vread to word line, Gnd to Source Connection Write-VT > Vread > Erase – VT Sense drain current using sense amplifier Low if data=“0” Load High if data=”1” Gnd Gnd Vref Vread Von + Gnd S/A Vd Gnd Gnd Vread Gnd R. Saleh / B. Quinton Vd Gnd Gnd 67 Summary • • • • • • SRAM - Simple, static easy to use, but with relatively low density CAM - Special purpose SRAM-like configuration for latency sensitive applications DRAM - Very high-density, but complex to implement and use ROM - NOR or NAND configurations depending on size/speed EPROM, E2PROM- maintain state without a power source Flash - 1T storage that maintain state without a power source, bit-wise write, block erase R. Saleh / B. Quinton 68 End.