Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Semiconductor Memories Mohammad Sharifkhani Outline • Introduction • Non-volatile memories Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Read-Only Memory Memory Random Access Non-Random Access EPROM 2 E PROM SRAM FIFO DRAM LIFO Shift Register CAM Mask-Programmed FLASH Programmable (PROM) Memory Timing: Definitions Memory Architecture: Decoders M bits S0 S0 Word 0 S1 Word 1 S2 Word 2 words N SN - 2 SN - M bits 1 Storage cell Word 0 A0 Word 1 A1 Word 2 AK -1 Word N 2 2 DecoderWord N 2 2 Word N 2 1 Word N 2 1 Storage cell K = log2N Input-Output (M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals Input-Output (M bits) Decoder reduces the number of select signals K = log2N Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH An An+1 An+m Bit line Row Decoder 2m Storage cell in the accessed word Word line M.2n A0 An - 1 Column decoder Input-Output (M bits) Amplify swing to rail-to-rail amplitude Selects appropriate word Hierarchical Memory Architecture Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings Block Diagram of 4 Mbit SRAM Clock generator Z-address buffer X-address buffer Predecoder and block selector Bit line load 128 K Array Block 0 Subglobal row decoder Subglobal rowGlobal decoder row decoder Block3130 Block Block 1 Transfer gate Column decoder Sense amplifier and write driver CS, WE buffer I/O buffer x1/x4 controller Y-address buffer [Hirose90] Local row decoder X -address buffer Contents-Addressable Memory Commands I/O Buffers I/O Buffers CAM Array 2 words 3 64 bits 9 Priority Encoder Control Logic R/W Address (9 bits) Mask Address Decoder Commands Commands Comparand 29 Validity Bits I/O Buffers Data (64 bits) 92Validity Bits Priority BitsEncode Address Decoder 9 Validity 2 Priority Encode Address Decoder Memory Timing: Approaches DRAM Timing Multiplexed Adressing SRAM Timing Self-timed • Introduction • Non volatile memories Non-Volatile Memories The Floating-gate transistor (FAMOS) Floating gate Gate Source D Drain G tox tox n+ p n+_ S Substrate Device cross-section Schematic symbol Floating-Gate Transistor Programming 20 V 10 V S 5V 0V 20 V D Avalanche injection - 5V S 5V 0V D Removing programming voltage leaves charge trapped - 2.5 V S 5V D Programming results in higher V T . A “Programmable-Threshold” Transistor FLOTOX EEPROM Gate Floating gate I Drain Source 20–30 nm V GD -10 V 10 V n1 n1 Substrate p 10 nm FLOTOX transistor Fowler-Nordheim I-V characteristic EEPROM Cell BL WL VDD Absolute threshold control is hard Unprogrammed transistor might be depletion always on 2 transistor cell Flash EEPROM Control gate Floating gate erasure n 1 source Thin tunneling oxide programming p-substrate Many other options … n 1 drain Cross-sections of NVM cells Flash EPROM Courtesy Intel Basic Operations in a NOR Flash Memory― Erase Basic Operations in a NOR Flash Memory― Write Basic Operations in a NOR Flash Memory― Read NAND Flash Memory Select line Word line(poly) Gate Unit Cell ONO BL Select line Source line (Diff. Layer) Courtesy Toshiba Gate Oxide FG NAND Flash Memory Select transistor Word lines Active area STI Bit line contact Source line contact Courtesy Toshiba Characteristics of State-of-the-art NVM Outline • Introduction • Non-volatile memories • RAM Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended 6-transistor CMOS SRAM Cell WL V DD M2 M5 Q M1 BL M4 Q M6 M3 BL CMOS SRAM Analysis (Read) WL V DD M4 BL Q= 0 M5 V DD Cbit M1 Q= 1 V DD BL M6 V DD Cbit CMOS SRAM Analysis (Read) 1.2 Voltage Rise (V) 1 0.8 0.6 0.4 0.2 Voltage rise [V] 0 0 0.5 1 1.2 1.5 2 Cell Ratio (CR) 2.5 3 CMOS SRAM Analysis (Write) WL V DD M4 M5 Q= 1 M1 BL = 1 M6 Q= 0 V DD BL = 0 CMOS SRAM Analysis (Write) 6T-SRAM — Layout M2 VDD M4 Q Q M1 M3 GND M5 BL M6 BL WL Decreasing Word Line Delay Driver WL Polysilicon word line Metal word line (a) Driving the word line from both sides Metal bypass WL K cells Polysilicon word line (b) Using a metal bypass (c) Use silicides Resistance-load SRAM Cell WL V DD RL M3 BL RL Q Q M1 M2 M4 BL Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem SRAM Characteristics • Introduction • Non-volatile memories • RAM – SRAM – DRAM 3-Transistor DRAM Cell BL 1 BL 2 WWL WWL RWL M3 X M1 CS M2 RWL V DD 2 V T X BL 1 BL 2 V DD V DD 2 V T No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL-VTn DV 3T-DRAM — Layout BL2 BL1 GND RWL M3 M2 WWL M1 1-Transistor DRAM Cell Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance CS DV = VBL – V PRE = V BIT – V PRE -----------C S + CBL Voltage swing is small; typically around 250 mV. DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD Sense Amp Operation V V(1) BL V PRE D V(1) V(0) Sense amp activated Word line activated t 1-T DRAM Cell Capacitor M 1 word line Metal word line SiO2 Poly n+ Field Oxide n+ Poly Inversion layer induced by plate bias Cross-section Diffused bit line Polysilicon gate Polysilicon plate Layout Uses Polysilicon-Diffusion Capacitance Expensive in Area SEM of poly-diffusion capacitor 1T-DRAM Advanced 1T DRAM Cells Word line Insulating Layer Cell plate Capacitor dielectric layer GND Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell Stacked-capacitor Cell Static CAM Memory Cell Bit Bit Bit Bit Bit Word CAM Word ••• ••• CAM M4 M8 M9 M6 M7 CAM ••• ••• Bit Word CAM M3 Match Wired-NOR Match Line S int S M2 M1 M5 CAM in Cache Memory CAM SRAM ARRAY ARRAY Hit Logic Address Decoder Input Drivers Address Tag Sense Amps / Input Drivers Hit R/W Data • • • • Introduction Non-volatile memories RAM Periphery circuits Periphery Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry Row Decoders Collection of 2M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder Hierarchical Decoders Multi-stage implementation improves performance ••• WL 1 WL 0 A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3 ••• NAND decoder using 2-input pre-decoders A1 A0 A0 A1 A3 A2 A2 A3 Dynamic Decoders Precharge devices GND VDD GND WL 3 VDD WL 3 WL 2 WL 2 VDD WL 1 WL 1 V DD WL 0 WL 0 VDD f A0 A0 A1 A1 2-input NOR decoder A0 A0 A1 A1 f 2-input NAND decoder Active low inputs (all are high except for the selected WL which is low) 4-input pass-transistor based column decoder BL 0 BL 1 BL 2 BL 3 A0 S0 S1 S2 A1 S3 2-input NOR decoder D Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count 4-to-1 tree based column decoder BL BL BL BL 0 1 2 3 A0 A0 A1 A1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches Decoder for circular shiftregister V DD WL V DD V DD WL 0 R V DD f f f f R V DD V DD WL 1 f f f f R V DD 2 f f f f • • • Sense Amplifiers ×D V C tp = ---------------Iav large make D V as small as possible small Idea: Use Sense Amplifer small transition s.a. input output Differential Sense Amplifier V DD M3 M4 y M1 bit SE M2 Out bit M5 Directly applicable to SRAMs Differential Sensing ― SRAM V DD PC V DD BL BL EQ V DD y M3 WL i M1 x SE V DD M4 M2 2y 2x 2x x SE M5 SE SRAM cell i Diff. x Sense 2x Amp V DD Output y SE Output (a) SRAM sensing scheme (b) two stage differential amplifier Latch-Based Sense Amplifier (DRAM) EQ BL BL VDD SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point. Charge-Redistribution Amplifier V ref VL M2 M3 M1 C large VS C small Transient Response Concept Charge-Redistribution Amplifier― V EPROM DD SE Load M4 Out V casc M3 Cascode device Cout Ccol WLC Column decoder M2 BL WL M1 CBL EPROM array Single-to-Differential Conversion WL BL x Cell Diff. S.A. 2x Output How to make a good Vref? 1 2 V ref Open bitline architecture with dummy cells EQ L L1 L0 V DD R0 R1 L SE BLL CS … Dummy cell CS BLR CS … SE CS CS CS Dummy cell DRAM Read Process with Dummy Cell 3 3 2 2 BL V 1 0 0 BL V BL 1 2 1 0 3 BL 0 1 t (ns) t (ns) reading 0 reading 1 3 EQ WL 2 V 2 SE 1 0 0 1 2 t (ns) control signals 3 3 Voltage Regulator VDD Mdrive VDL VREF Equivalent Model Vbias VREF + Mdrive VDL Charge Pump - Q=Cpump (VDD-Vt) DRAM Timing SDRAM Timing A chunk of data is processed at the same time effective when data is written in large sequential blocks RDRAM Architecture Rambus DRAM to reduce the access time Bus Clocks Data bus Operates at uP clock speed Synch. DRAM k kx l memory array network mux/demux Column Row demux packet dec. demux packet dec. up to 1.6 GB/sec bandwidth Highly parallel: A large number of bits can be read/write at the same time interface ; fast and synch Address Transition Detection V DD A0 DELAY td A1 DELAY td A N2 1 DELAY td ATD … ATD • • • • • Introduction Non-volatile memories RAM Periphery Reliability Reliability and Yield Sensing Parameters in DRAM 1000 C D(1F) V smax (mv) 100 smax V , DD V ,S 10 C ,S Q , D C C S(1F) Q S(1C) V DD (V) Q S = C S V DD / 2 V smax = Q S / (C S 1 C D ) 4K 64K 1M 16M 256M 4G Memory Capacity (bits / chip) From [Itoh01] 64G Noise Sources in 1T DRam BL substrate Adjacent BL CWBL a -particles WL leakage CS electrode Ccross Open Bit-line Architecture —Cross Coupling EQ WL 1 WL 0 WL C WBL D C WBL WL D WL 1 WL 0 BL BL C BL C C C Sense Amplifier C BL C C C Folded-Bitline Architecture WL 1 WL 0 C WBL WL 0 WL D WL D CBL BL … BL WL 1 C x C C C C C Sense EQ Amplifier x CBL CWBL y y Transposed-Bitline Architecture Ccross BL 9 BL SA BL BL 99 (a) Straightforward bit-line routing Ccross BL 9 BL SA BL BL 99 (b) Transposed bit-line architecture Alpha-particles (or Neutrons) a -particle WL V DD BL n1 SiO 2 2 1 2 1 1 2 2 1 2 1 1 2 1 Particle ~ 1 Million Carriers Yield Yield curves at different stages of process maturity (from [Veendrick92]) Redundancy Row Address Redundant rows : Redundant columns Memory Array Row Decoder Column Decoder Column Address Fuse Bank Error-Correcting Codes Example: Hamming Codes e.g. B3 Wrong with 1 1 0 =3 Redundancy and Error Correction Sources of Power Dissipation in Memories V DD I DD = Σ C iΔ V if+Σ I DCP CHIP nC DE V INT f m selected C PT V INT f mi act I DCP n ROW DEC PERIPHERY m(n 1)i hld non-selected ARRAY mC DE V INT f COLUMN DEC V SS From [Itoh00] Data Retention in SRAM 1.30u 1.10u 0.13 m m CMOS Ileakage 900n 700n 500n Factor 7 (A)300n 0.18 m m CMOS 100n 0.00 .600 1.20 1.80 VDD SRAM leakage increases with technology scaling Suppressing Leakage in SRAM V DD V DD low-threshold transistor V DDL sleep V DD,int sleep V DD,int SRAM cell SRAM cell sleep SRAM cell SRAM cell SRAM cell SRAM cell V SS,int Inserting Extra Resistance Reducing the supply voltage Data Retention in DRAM From [Itoh00] Case Studies • SRAM • Flash Memory 4 Mbit SRAM Hierarchical Word-line Architecture Bit-line Circuitry Block select Bit-line load BEQ Local WL Memory cell B /T B /T CD CD CD I/O I/O line I/O Sense amplifier ATD Sense Amplifier (and Waveforms) I /O Address I /O SEQ Block select ATD ATD BS BEQ SA BS Vdd I/O Lines GND SA SEQ SEQ SEQ SEQ SEQ DATA Dei Vdd SA, SA GND DATA BS Data-cut 1 Gbit Flash Memory From [Nakamura02] Writing Flash Memory 108 106 104 102 100 0V 1V 2V 3V 4V Number of cells Vt of memory cells Evolution of thresholds Final Distribution From [Nakamura02] Read Charge pump 2kB Page buffer & cache 10.7mm 125mm2 1Gbit NAND Flash Memory 32 word lines x 1024 blocks 16896 bit lines 11.7mm From [Nakamura02] 2 125mm • • • • • • • • • 1Gbit NAND Flash Memory Technology 0.13m p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al Cell size 0.077m2 Chip size 125.2mm2 Organization 2112 x 8b x 64 page x 1k block Power supply 2.7V-3.6V Cycle time 50ns Read time 25s Program time 200s / page Erase time 2ms / block From [Nakamura02] Semiconductor Memory Trends (up to the 90’s) Memory Size as a function of time: x 4 every three years Semiconductor Memory Trends (updated) From [Itoh01] Trends in Memory Cell Area From [Itoh01] Future generations • Very specialized technologies for stand alone memories expensive • Reliability is going to be a very important issue (SER) particularly for SRAMs and DRAMs • Power is going to be the limiting factor particularly when it comes to standby currents • Embedded memories is the prominent market thrust driven by all mobile/SoC applications