Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Semiconductor Memories December 20, 2002 © Digital Integrated Circuits2nd Memories Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies © Digital Integrated Circuits2nd Memories Semiconductor Memory Classification Read-Write Memory Random Access Non-Random Access SRAM FIFO DRAM LIFO Non-Volatile Read-Write Memory Read-Only Memory EPROM Mask-Programmed E2PROM Programmable (PROM) FLASH Shift Register CAM © Digital Integrated Circuits2nd Memories Memory Timing: Definitions Read-access is from read request time to the time data is available on the output Write-access is from the write request to the final writing of the input data to the memory © Digital Integrated Circuits2nd Memories Memory Architecture: Decoders M bits S0 S0 Word 0 Word 1 S2 Word 2 SN 2 2 SN 2 1 Storage cell Word 0 A0 Word 1 A1 Word 2 A K2 1 Decoder S1 N words M bits Word N 2 2 Word N - 2 Word N 2 1 Word N - 1 Storage cell K = log2N Input-Output (M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals © Digital Integrated Circuits2nd Input-Output (M bits) Decoder reduces the number of selected signals K = log2N Memories Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH Amplify swing to rail-to-rail amplitude Selects appropriate word © Digital Integrated Circuits2nd Memories Hierarchical Memory Architecture Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings © Digital Integrated Circuits2nd Memories Block Diagram of 4 Mbit SRAM Clock generator Z-address buffer X – row address Y – column address Z – block address X-address buffer Predecoder and block selector Bit line load 128 K Array Block 0 Block 1 Subglobal row decoder Global row decoder Subglobal row decoder Block 30 Block 31 Transfer gate Column decoder Local row decoder Sense amplifier and write driver CS, WE buffer I/O buffer © Digital Integrated Circuits2nd x1/x4 controller Y-address buffer [Hirose90] X-address buffer Memories Contents-Addressable Memory Data (64 bits) Every row that satisfies the validity check (matches the mask bits) will activate the validity bits I/O Buffers Comparand Commands Data pattern to be matched © Digital Integrated Circuits2nd CAM Array 2 words 64 bits 9 ProrityEncoder Control Logic R/W Address (9 bits) Address Decoder Mask Priority encoder passes Only the valid rows and selects the one with the highest address in case of the multiple matches Memories Memory Timing: Approaches Multiplexed addressing Smaller number of pins needed since row and column addresses are sent sequentially Strobe signals RAS and CAS are used to read them DRAM Timing Multiplexed Addressing © Digital Integrated Circuits2nd Complete addressing Complete address is presented at once with sensing circuits to detect transitions on the address buss SRAM Timing Self-timed Memories Read-Only Memory Cells Bit line (BL) is resistively clamped to the ground, so its default value is 0 Diode disadvantage – no electrical isolation between bit and word lines BL BL is resistively clamped to VDD, so its default value is 1 BL BL VDD WL WL WL 1 BL WL BL BL WL WL 0 GND Diode ROM © Digital Integrated Circuits2nd MOS ROM 1 MOS ROM 2 Memories MOS OR ROM BL[0] BL[1] BL[2] BL[3] WL[0] V DD WL[1] WL[2] V DD WL[3] V bias Pull-down loads © Digital Integrated Circuits2nd Memories MOS NOR ROM V DD Pull-up devices WL[0] GND WL [1] WL [2] GND WL [3] BL [0] © Digital Integrated Circuits2nd BL [1] BL [2] BL [3] Memories MOS NOR ROM Layout Cell (9.5l x 7l) Programming using the Active Layer Only GND - diffusion is added to create transistors Polysilicon GND Metal1 Diffusion Metal1 on Diffusion VDD VDD © Digital Integrated Circuits2nd VDD VDD Connected to VDD through pMOS Memories MOS NOR ROM Layout Cell (11l x 7l) Programming using the Contact Layer Only GND - contacts are added to create transistors Polysilicon Metal1 GND Diffusion Metal1 on Diffusion VDD VDD © Digital Integrated Circuits2nd VDD VDD Connected to VDD through pMOS Memories MOS NAND ROM V DD Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row © Digital Integrated Circuits2nd Memories MOS NAND ROM Layout Cell (8l x 7l) Programming using the Metal-1 Layer Only No horizontal contact to GND necessary; drastically reduced cell size Loss in performance compared to NOR ROM Polysilicon Diffusion Metal1 on Diffusion Add metal to eliminate transistor (short circuit) © Digital Integrated Circuits2nd Memories NAND ROM Layout Cell (5l x 6l) Programming using Implants Only Polysilicon Implant switches transistor on permanently thus eliminating it in ROM this results in a very small layout area (2 times smaller than NOR cell) © Digital Integrated Circuits2nd Threshold-altering implant Metal1 on Diffusion Memories Equivalent Transient Model for MOS NOR ROM V DD Model for NOR ROM BL rword WL Cbit cword Word line parasitics Wire capacitance and gate capacitance Wire resistance (polysilicon) Bit line parasitics Resistance not dominant (metal) Drain and Gate-Drain capacitance © Digital Integrated Circuits2nd Memories Equivalent Transient Model for MOS NAND ROM V DD Model for NAND ROM BL CL r bit WL r word cbit cword Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates Drain/Source and complete gate capacitance © Digital Integrated Circuits2nd Memories Decreasing Word Line Delay Driver WL Polysilicon word line Metal word line (a) Driving the word line from both sides Metal bypass WL K cells Polysilicon word line (b) Using a metal bypass (c) Use silicides © Digital Integrated Circuits2nd Memories Precharged MOS NOR ROM f V DD pre Precharge devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. © Digital Integrated Circuits2nd Memories Non-Volatile Memories The Floating-gate Avalanche injection MOS transistor (FAMOS) Floating gate Gate Source D Drain G tox tox n+ p n+_ S Substrate Device cross-section © Digital Integrated Circuits2nd Schematic symbol Memories Floating-Gate Transistor Programming 20 V 10 V S 5V 0V 20 V D Avalanche injection Hot electrons go through the oxide and are trapped reducing floating gate voltage © Digital Integrated Circuits2nd - 5V S 5V 0V D Removing programming voltage leaves charge trapped - 2.5 V S 5V D Programming results in higher V T . Typically new threshold is around 7 V so 5 V supply is not sufficient to turn transistor on, so the device is disabled Memories A “Programmable-Threshold” Transistor Shifted threshold effectively switches transistor off permanently Shift in the threshold depends on the charge injected onto the floating gate VT (QFG ) / CFG Injected charges are well insulated by silicon dioxide and can stay there with power off for many years. Floating gate used in almost all nonvolatile memories (EPROM, EEPROM, Flash) © Digital Integrated Circuits2nd Memories Erasable-Programmable Read-Only Memory (EPROM) EPROM is erased by shining ultraviolet light through a transparent package window UV makes the oxide slightly conductive by generation of electron-hole pairs The erasure process is slow (seconds to minutes) Programming takes 5-10 msec Erasing can be repeated up to a thousand times Threshold is difficult to control after many erasures so special on-chip circuitry is required to control it High current required during programming © Digital Integrated Circuits2nd Memories Floating Gate Tunneling Oxide FLOTOX Transistor EEPROM Gate Floating gate I Drain Source 20–30 nm V GD -10 V 10 V n1 n1 Substrate p 10 nm FLOTOX transistor Fowler-Nordheim tunneling I-V characteristic Smaller voltage is required to program and programming is reversible by changing the voltage sign © Digital Integrated Circuits2nd Memories EEPROM Cell BL Absolute threshold control WL VDD © Digital Integrated Circuits2nd is hard Programmed transistor might be in depletion mode hard to turn off by word-line signal 2 transistor cell Design is larger than EPROM device Thin oxide is hard to make Erase-program can be repeated 105 times Memories Flash EEPROM oFlash EPROM combines density of EPROM with versatility of EEPROM oProgramming performed by avalanche hot-electron injection (fast 1-10msec) oErasure of the complete chip is done using tunneling (careful control >100ms) oNo extra access transistor needed Control gate Floating gate Erasure 12 V on source n + source Thin tunneling oxide ~10 nm Programming 12 V on gate n + drain p- substrate © Digital Integrated Circuits2nd Memories Cross-sections of NVM cells Flash © Digital Integrated Circuits2nd EPROM Courtesy Intel Memories Basic Operations in a NOR Flash Memory― Erase All transistors erased Read back and repeat if erase is still needed © Digital Integrated Circuits2nd Memories Basic Operations in a NOR Flash Memory― Write Programming requires 12V ant the gate and 6V at the drain with 0V source © Digital Integrated Circuits2nd Memories Basic Operations in a NOR Flash Memory― Read In read operation the programmed transistor stores 1 as it is switched off permanently NOR Flash memories have Fast random read time Slow erasure and programming time Need precise control of thresholds © Digital Integrated Circuits2nd Memories NAND Flash Memory Select transistors High dielectric material – oxide-nitride-oxide for large CFG Word line(poly) Gate ONO Unit Cell Gate Oxide Source line (Diff. Layer) © Digital Integrated Circuits2nd FG Large storage density lower cost Fast programming 200-400 nsec Fast serial access Courtesy Toshiba Memories NAND Flash Memory All contacts between word line are eliminated resulting in 40% smaller cell than NOR structure Select transistor Active area STI Bit line contact © Digital Integrated Circuits2nd Word lines During erasure all transistor are depletion mode obtained by setting 20V at the source During program the selected word line is set to 20V to store “1” increasing its threshold During read both select transistors are enabled and read proceeds as in the NAND ROM Source line contact Courtesy Toshiba Memories New Nonvolatile Memories FRAM – Ferroelectric RAM Uses programmable capacitors Dielectric cristals polarize under electric field Very high density, many read/write cycles, Low power MRAM – Magnetoresistive RAM Similar to magnetic core memories Spin electronic or tunneling magnetic resistance © Digital Integrated Circuits2nd Memories Characteristics of State-of-the-art NVM © Digital Integrated Circuits2nd Memories Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended © Digital Integrated Circuits2nd Memories 6-transistor CMOS SRAM Cell WL V DD M2 M5 Q M1 BL © Digital Integrated Circuits2nd M4 Q M6 M3 BL Memories CMOS SRAM Analysis (Read) WL V DD M4 BL Q= 0 M5 V DD Cbit M1 Q= 1 V DD BL M6 V DD Cbit Initially BL and BLbar are precharged. Assume that cell stores 1 When the WL goes high BLbar is discharged low Must keep Qbar voltage rise below the nMOS threshold (0.4V) to avoid flipping the cell © Digital Integrated Circuits2nd Memories CMOS SRAM Analysis (Read) 1.2 Voltage Rise (V) 1 Cell ratio 0.8 0.6 0.4 0.2 0 0 0.5 © Digital Integrated Circuits2nd 1 1.2 1.5 2 Cell Ratio (CR) 2.5 3 Design for cell ratio in the green zone Memories CMOS SRAM Analysis (Write) WL V DD M4 M5 Q= 1 M1 BL = 1 © Digital Integrated Circuits2nd M6 Q= 0 V DD BL = 0 Memories CMOS SRAM Analysis (Write) Must pull down cell voltage VQ below the nMOS threshold (0.4V) © Digital Integrated Circuits2nd Memories 6T-SRAM — Layout VDD M2 M4 Q Q M1 M3 GND M5 BL © Digital Integrated Circuits2nd M6 WL BL Memories Resistance-load SRAM Cell WL V DD RL M3 BL RL Q Q M1 For large resistor values use undoped polysilicon with sheet resistance in Tohm/sq M2 M4 BL An alternative solution uses low quality parasitic thin-film PMOS (TFT) with OFF current 10-13A Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem © Digital Integrated Circuits2nd Memories Resistance-load SRAM Cell WL V DD RL M3 BL RL Q Q M1 For large resistor values use undoped polysilicon with sheet resistance in Tohm/sq M2 M4 BL An alternative solution uses low quality parasitic thin-film PMOS (TFT) with OFF current 10-13A Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem © Digital Integrated Circuits2nd Memories Static CAM Memory Cell Incoming data on Bit and Bit_bar are compared with the stored data S and S_bar Bit Word Bit Bit Bit Bit CAM ••• Match line is ••• precharged high M4 CAM Word Word CAM ••• ••• Bit CAM M8 M9 M6 M7 S M3 Match int M5 S M2 M1 Wired-NOR Match Line If there is a match the internal signal int is grounded and match stays high Otherwise int is pulled high and match goes low © Digital Integrated Circuits2nd Memories SRAM Characteristics © Digital Integrated Circuits2nd Memories 3-Transistor DRAM Cell BL 1 BL 2 WWL WWL RWL M3 X M1 CS M2 RWL V DD 2 V T X BL 1 BL 2 V DD DV V DD 2 V T No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL-VTn © Digital Integrated Circuits2nd Memories 3T-DRAM — Layout BL2 BL1 GND RWL M3 M2 WWL M1 © Digital Integrated Circuits2nd Memories 1-Transistor DRAM Cell Storage capacitance Bit-line is precharged Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance CS V = VBL – V PRE = (VBIT – V PRE )-----------C S + CBL Voltage swing is small; typically around 250 mV. © Digital Integrated Circuits2nd Memories DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD © Digital Integrated Circuits2nd Memories Sense Amp Operation V BL V(1) V PRE D V(1) V(0) Sense amp activated Word line activated © Digital Integrated Circuits2nd t Memories 1-T DRAM Cell Capacitor M 1 word line Metal word line SiO2 Poly n+ Field Oxide n+ Poly Inversion layer induced by plate bias Cross-section Diffused bit line Polysilicon gate Polysilicon plate Layout Uses polysilicon-diffusion capacitance Expensive in area © Digital Integrated Circuits2nd Memories Poly-diffusion capacitor 1T-DRAM © Digital Integrated Circuits2nd Memories Advanced 1T DRAM Cells Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell © Digital Integrated Circuits2nd Stacked-capacitor Cell Memories Advanced 1T DRAM Cells Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell © Digital Integrated Circuits2nd Stacked-capacitor Cell Memories CAM in Cache Memory CAM SRAM ARRAY ARRAY Hit Logic Input Drivers Sense Amps / Input Drivers Address Decoder Address Tag Hit R/W Data Cash memory is used to store frequently accessed data to lower the memory access time and power. In cash CAM stores addresses and SRAM stores data. Once the address of requested data matches the one in CAM Hit signal goes high and data is read from SRAM otherwise the external slow memory must be read © Digital Integrated Circuits2nd Memories Periphery Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry © Digital Integrated Circuits2nd Memories Row Decoders Collection of 2M complex logic gates Organized in regular and dense fashion (N)AND Decoder - followed by inverter WL0 A0 A1 A2 A3 A4 A5 A6 A7 WL127 A0 A1 A2 A3 A4 A5 A6 A7 set by bits 00000000 set by bits 11111111 NOR Decoder WL0 A0 A1 A2 A3 A4 A5 A6 A7 WL127 A0 A1 A2 A3 A4 A5 A6 A7 © Digital Integrated Circuits2nd Memories Hierarchical Decoders Multi-stage implementation improves performance ••• WL 1 WL 0 A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3 ••• A1 A0 A0 A1 A3 A2 NAND decoder using 2-input pre-decoders © Digital Integrated Circuits2nd A2 A3 WL0 A0 A1 A2 A3 A4 A5 A6 A7 ( A0 A1 )( A2 A3 )( A4 A5 )( A6 A7 ) Memories Dynamic Decoders Precharge devices GND VDD GND WL 3 VDD WL 3 WL 2 WL 2 VDD WL 1 WL 1 V DD WL 0 WL 0 VDD f A0 A0 A1 A1 2-input NOR decoder Identical to NOR ROM © Digital Integrated Circuits2nd A0 A0 A1 A1 f 2-input NAND decoder Identical to NAND ROM Memories 4-input pass-transistor based Column Decoder BL BL BL BL 0 A0 1 2 3 S0 S1 S2 A1 S3 2-input NOR decoder D Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count © Digital Integrated Circuits2nd Memories 4-to-1 tree based Column Decoder BL 0 BL 1 BL 2 BL 3 A0 A0 A1 A1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches © Digital Integrated Circuits2nd Memories Decoder for circular shift-register V DD WL V DD V DD WL 0 R f f f f R V DD V DD WL 1 f f f f R V DD 2 f f f f • • • V DD In serial access memories read or write address changes sequentially Only one WLi is active (a pointer) and shifts by one with clock f R is used for reset © Digital Integrated Circuits2nd Memories Sense Amplifiers V C tp = ---------------Iav large make V as small as possible small Idea: Use Sense Amplifer small transition s.a. input © Digital Integrated Circuits2nd output Memories Differential Sense Amplifier V DD M3 M4 y M1 bit SE M2 M5 Out bit Directly applicable to SRAMs. Gain Asense=-gm (ro2||ro4) gm is transconductance of the input transistors © Digital Integrated Circuits2nd Memories Differential Sensing ― SRAM V DD PC BL Precharge bit lines by pulling PC_bar low Disable precharge to read Set SE to activate the sense amplifier BL EQ WL V DD V DD V DD y M3 i M1 x SE M4 M2 yx- SE M5 SE x- x Inputs from memory SRAM cell i Diff. x Sense xAmp V DD Output y SE Output (a) SRAM sensing scheme © Digital Integrated Circuits2nd (b) two stage differential amplifier Memories Latch-Based Sense Amplifier (DRAM) EQ BL BL VDD SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap is created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point. © Digital Integrated Circuits2nd Memories © Digital Integrated Circuits2nd Memories Charge-Redistribution Amplifier V ref BL Vin M2 VL M3 M1 VS C large C small Transient Response Concept Vs prechrged to VDD and VL to Vref-Vth M1 is cut off When M2 pulls down M1 conducts and Vs is quickly lowered to equalize VL © Digital Integrated Circuits2nd Memories Charge-Redistribution Amplifier― V EPROM DD SE Load M4 Out V casc M3 Cascode device Cout Ccol WLC Column decoder M2 BL WL © Digital Integrated Circuits2nd M1 CBL EPROM array Memories Single-to-Differential Conversion WL BL x- x Cell Diff. S.A. + - V ref Output How to make a good Vref for differential amplifier? Must deliver x_bar smaller than x if “1” is stored and larger than x if “0” is stored © Digital Integrated Circuits2nd Memories Open bitline architecture with dummy cells EQ L L1 L0 V DD R0 R1 L SE BLL CS … CS BLR CS Dummy cell … SE CS CS CS Dummy cell Bit line divided into left and right halves to reduce capacitance Dummy cells are added for reference When EQ signal is raised BLL and BLR are precharged to VDD/2 and L and L_bar are enabled charging Dummy cell to VDD/2 When reading Li word activate both Li and L When reading Ri word activate both Ri and L_bar © Digital Integrated Circuits2nd Memories DRAM Read Process with Dummy Cell 3 3 2 2 BL V 1 0 0 BL V BL 1 2 1 0 3 BL 0 1 t (ns) 2 3 t (ns) reading 0 reading 1 3 EQ WL 2 V SE 1 0 0 1 2 3 t (ns) control signals © Digital Integrated Circuits2nd Memories Voltage Down Converter VDD IR IR IL- IR VREF Mdrive VDL IL Memories require different level of voltages Boosted word-line voltage VDD+Vtn Precharge half VDD voltage Reduced internal VDD power supply Negative substrate bias voltage Equivalent Model Vbias VREF This circuit delivers the reference voltage to VDL When VDL<VREF => PMOS drive gate is discharged increasing VDL When VDL>VREF=> PMOS drive gate is charged increasing VDL © Digital Integrated Circuits2nd + Mdrive VDL Memories Charge Pump - Transistors M1 and M2 are connected as diodes. The charges stored at capacitor Q=Cpump(VDD-VT) When B rises above Vload by more than threshold the Vload increases Maximum load voltage Vload rises to 2*(VDD-VT) – higher than VDD © Digital Integrated Circuits2nd Memories DRAM Timing Total of 24 timing constraints must be observed © Digital Integrated Circuits2nd Memories RDRAM Architecture Very high speed packet transfer protocol is used to transfer large amount of data Narrow bus uses several clock cycles to transfer the data Bus Clocks Data bus k Column Row © Digital Integrated Circuits2nd k* l memory networkarray mux/demux demux packet dec. demux packet dec. Memories Address Transition Detection SRAM memories are triggered by events detected by ATD circuits V DD A0 DELAY td A1 DELAY td A N2 1 DELAY td © Digital Integrated Circuits2nd ATD ATD … Memories Reliability and Yield © Digital Integrated Circuits2nd Memories Trends in DRAM Parameters 1000 bit line capacitance voltage swing storage charge storage capacitance power supply voltage C D (fF) V smax (mV) 100 C S(fF) Q S(fC) are reduced in new technologies 10 V DD (V) Q S = C S V DD / 2 V smax = Q S / (C S + C D ) 4K 64K 1M 16M 256M 4G Memory Capacity (bits © Digital Integrated Circuits2nd From [Itoh01] 64G / chip) Memories Open Bit-line Architecture —Cross Coupling Word line selected creates charge redistribution Dummy line selected to compensate charge redistribution EQ WL 1 WL 0 WL C WBL D C WBL WL D WL 1 WL 0 BL BL C BL C C Charge redistribution C Sense Amplifier C BL C C C VW LCW LB CW LB C BL If both sides of memory array were symmetrical then the injected bit line noise would be compensated as a common mode signal for sense amplifiers © Digital Integrated Circuits2nd Memories Folded-Bitline Architecture WL 1 WL 0 C WBL WL 0 WL D WL D C BL BL … BL WL 1 C y x C C C C C Sense EQ Amplifier x C BL y C WBL Better matching of BL BL_bar capacitances, so the cross-coupling noise can be suppressed © Digital Integrated Circuits2nd Memories Transposed-Bitline Architecture C cross BL ‘ BL SA BL BL ‘’ (a) Straightforward bit-line routing C cross BL ‘ BL SA BL BL ‘’ (b) Transposed bit-line architecture equalizes cross coupling noise in BL and BL_bar © Digital Integrated Circuits2nd Memories Sources of Power Dissipation in Memories V DD I DD = S C i V if+S I DCP CHIP nC DE V INT f m selected C PT V INT f mi act I DCP n ROW DEC PERIPHERY iact active selector current ihld data retention current CDE decoder capacitance CPT peripheral capacitance IDCP static peripheral current VINT internal supply voltage m(n - 1)i hld non-selected ARRAY mC DE V INT f COLUMN DEC V SS © Digital Integrated Circuits2nd From [Itoh00] Memories Noise Sources in 1T DRam BL CWBL substrate Adjacent BL a-particles Capacitance Cs must be above 30fF otherwise a single a-particle can destroy its charge WL leakage CS electrode Ccross © Digital Integrated Circuits2nd Free neutrons from cosmic rays carry 10 time more charges than alpha particles Memories covered with polymide to protect against alpha radiation Error correction codes used to correct most failures Memories Alpha-particles a-particle WL V DD BL n+ SiO 2 - + - + + Alpha particle have energy 8-9 MeV And penetrates silicon up to 10mm depth - - + + + - 1 Particle generates ~ 1 Million electron-hole pairs This is comparable with the 50 fF capacitance storage at 3.5V © Digital Integrated Circuits2nd Memories Memory Yield Yield curves at different stages of process maturity (from [Veendrick92]) Memory yield problem is fought using redundancy and error correction © Digital Integrated Circuits2nd Memories Redundancy Row Address Redundant rows : Fuse Bank Redundant columns Memory Array Row Decoder Column Decoder © Digital Integrated Circuits2nd Column Address Memories Error-Correcting Codes Example: Hamming Codes e.g. B3 Wrong with 1 1 =3 0 © Digital Integrated Circuits2nd Memories Redundancy and Error Correction © Digital Integrated Circuits2nd Memories Redundancy and Error Correction © Digital Integrated Circuits2nd Memories Data Retention in SRAM SRAM leakage increases with technology scaling yet for 64-Gb memory leakage current should be less than 3.5 aA at 25C Reduce leakage by turning off unused memory blocks (cashes) Increase thresholds by using body biasing Increase resistance in the leakage path Lower the supply voltage Lower the junction temperature Scale down the refresh period 1.30u 0.13 m m CMOS 1.10u Ileakage 900n 700n (A) 500n Factor 7 300n 0.18 m m CMOS 100n 0.00 .600 1.20 1.80 VDD © Digital Integrated Circuits2nd Memories Suppressing Leakage in SRAM V DD V DD low-threshold transistor V DDL sleep V DD,int sleep V DD,int SRAM cell SRAM cell sleep SRAM cell SRAM cell SRAM cell V SS,int Inserting Extra Resistance © Digital Integrated Circuits2nd SRAM cell Reducing the supply voltage Memories Data Retention in DRAM Data retention (standby) current Active current Estimated current distribution for DRAM generations © Digital Integrated Circuits2nd From [Itoh00] Memories Case Studies Programmable Logic Array SRAM Flash Memory © Digital Integrated Circuits2nd Memories PLA versus ROM Programmable Logic Array structured approach to random logic “two level logic implementation” NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM! Main difference ROM: fully populated PLA: one element per minterm Note: Importance of PLA’s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) for random logic design © Digital Integrated Circuits2nd Memories Programmable Logic Array Pseudo-NMOS PLA GND GND GND V DD GND GND GND GND V DD X0 X0 X1 AND-plane © Digital Integrated Circuits2nd X1 X2 X2 f0 f1 OR-plane Memories Dynamic PLA f AND V DD GND f OR f OR f AND V DD X0 X0 X1 X1 AND-plane © Digital Integrated Circuits2nd X2 X2 f0 f 1 GND OR-plane Memories Clock Signal Generation for self-timed dynamic PLA Self-timing is recommended in PLA for maximum performance Dummy AND row in PLA estimates maximum loading condition for the required precharge time Worst case discharge time is estimated in a similar way f f f f AND Dummy AND row f OR AND tpre teval f Dummy AND row f AND OR (a) Clock signals © Digital Integrated Circuits2nd (b) Timing generation circuitry Memories PLA Layout And-Plane VDD x0 x0 x1 x1 x2 x2 Pull-up devices © Digital Integrated Circuits2nd Or-Plane f GND f0 f1 Pull-up devices Memories 4 Mbit SRAM Hierarchical Word-line Architecture To improve performance and reduce power consumption in large memories the word line is hierarchically divided and sections are activated as needed © Digital Integrated Circuits2nd Memories Bit-line Circuitry Block select Bit-line load ATD BEQ Local WL Memory cell B /T B /T CD CD CD I/O I/O line I/O Sense amplifier © Digital Integrated Circuits2nd Memories Sense Amplifier (and Waveforms) Address I /O I /O ATD SEQ Block select ATD BS SA BS BEQ Vdd I/O Lines GND SA SEQ SEQ SEQ SEQ SEQ Vdd DATA Dei SA, SA GND DATA BS Data-cut © Digital Integrated Circuits2nd Memories 1 Gbit Flash Memory © Digital Integrated Circuits2nd From [Nakamura02] Memories Number of cells Writing Flash Memory 108 106 104 Read level (4.5 V) 102 100 0V 1V 2V 3V 4V Vt of memory cells Evolution of thresholds Final Distribution During erasure all bits are programmed to become depletion devices It takes four cycles of write/erase to establish all device threshold > 0.8 V © Digital Integrated Circuits2nd From [Nakamura02] Memories 1Gbit NAND Flash Memory Charge pump 2kB Page buffer & cache 10.7mm 2 125mm 32 word lines x 1024 blocks 16896 bit lines 11.7mm © Digital Integrated Circuits2nd From [Nakamura02] Memories 125mm2 1Gbit NAND Flash Memory Technology 0.13mm p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al Cell size 0.077mm2 Chip size 125.2mm2 Organization 2112 x 8b x 64 page x 1k block Power supply 2.7V-3.6V Cycle time 50ns Read time 25ms Program time 200ms / page Erase time 2ms / block © Digital Integrated Circuits2nd From [Nakamura02] Memories Semiconductor Memory Trends (up to the 90’s) Memory Size as a function of time: x 4 every three years © Digital Integrated Circuits2nd Memories Semiconductor Memory Trends (updated) © Digital Integrated Circuits2nd From [Itoh01] Memories Semiconductor Memory Trends There was an apparent shift in memory market due to the increase of flash memory use for video and other personalized storage devices other than personal computers SOC technology implements systems integrated with memories and processors on a single die Challenges of making even bigger and denser memories with lower market incentives responsible for the slow down in the memory chip capacities © Digital Integrated Circuits2nd Memories Trends in Memory Cell Area © Digital Integrated Circuits2nd From [Itoh01] Memories Semiconductor Memory Trends Technology feature size for different SRAM generations © Digital Integrated Circuits2nd Memories