Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE 300 Advanced VLSI Design Fall 2006 Lecture 19: Memories Yunsi Fei [Adapted from Jan Rabaey et al’s Digital Integrated Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ] Digital Integrated Circuits Chpt. 12 Review: Basic Building Blocks Datapath – Execution units » Adder, multiplier, divider, shifter, etc. – Register file and pipeline registers – Multiplexers, decoders Control – Finite state machines (PLA, ROM, random logic) Interconnect – Switches, arbiters, buses Memory – Caches (SRAMs), TLBs, DRAMs, buffers Digital Integrated Circuits Chpt. 12 A Typical Memory Hierarchy By taking advantage of the principle of locality: – Present the user with as much memory as is available in the cheapest technology. – Provide access at the speed offered by the fastest technology. On-Chip Components Control eDRAM Instr Data Cache Cache Speed (ns): .1’s 1’s 10’s 100’s Size (bytes): 100’s K’s 10K’s M’s Cost: RegFile Datapath ITLB DTLB Second Level Cache (SRAM) highest Digital Integrated Circuits Chpt. 12 Main Memory (DRAM) Secondary Memory (Disk) 1,000’s T’s lowest Semiconductor Memories RWM NVRWM ROM Maskprogrammed Random Access Non-Random Access EPROM SRAM (cache, register file) FIFO/LIFO E2PROM DRAM Shift Register FLASH CAM Digital Integrated Circuits Chpt. 12 Electricallyprogrammed (PROM) Growth in DRAM Chip Capacity 1000000 256,000 100000 Kbit capacity 64,000 16,000 10000 4,000 1000 1,000 256 100 64 10 1980 1982 1984 1986 1988 1990 1992 Year of introduction Digital Integrated Circuits Chpt. 12 1994 1996 1998 2000 1D Memory Architecture m bits m bits S0 Word 0 S0 Word 0 S1 Word 1 S1 Word 1 S2 Word 2 A0 S2 Word 2 A1 S3 S3 Storage Cell Storage Cell Ak-1 Sn-2 Word n-2 Sn-2 Word n-2 Sn-1 Word n-1 Sn-1 Word n-1 Input/Output n words n select signals Digital Integrated Circuits Chpt. 12 Input/Output Decoder reduces # of inputs k = log2 n 2D Memory Architecture bit line 2k-j word line Aj Aj+1 storage (RAM) cell Ak-1 m∙2j A0 A1 Aj-1 Column Decoder Sense Amplifiers Read/Write Circuits Input/Output (m bits) Digital Integrated Circuits Chpt. 12 selects appropriate word from memory row amplifies bit line swing 3D Memory Architecture Input/Output (m bits) Advantages: 1. Shorter word and/or bit lines 2. Block addr activates only 1 block saving power Digital Integrated Circuits Chpt. 12 Precharged MOS NOR ROM Vdd 0 1 precharge WL(0) GND 0 1 WL(1) WL(2) GND WL(3) BL(0) 1 0 Digital Integrated Circuits Chpt. 12 BL(1) 1 1 BL(2) 1 1 BL(3) 1 0 MOS NOR ROM Layout Metal1 on top of diffusion WL(0) GND (diffusion) WL(1) Basic cell 10 x 7 Metal1 Polysilicon WL(2) GND (diffusion) WL(3) BL(0) BL(1) BL(2) BL(3) Only 1 layer (contact mask) is used to program memory array, so programming of the ROM can be delayed to one of the last process steps. Digital Integrated Circuits Chpt. 12 Transient Model for NOR ROM precharge poly metal1 rword BL Cbit WL cword Word line parasitics Resistance/cell: 35 Wire capacitance/cell: 0.65 fF Gate capacitance/cell: 5.10 fF Bit line parasitics Resistance/cell: 0.15 Wire capacitance/cell: 0.83 fF Drain capacitance/cell: 2.60 fF Digital Integrated Circuits Chpt. 12 Propagation Delay of NOR ROM Word line delay – Delay of a distributed rc-line containing M cells tword = 0.38(rword x cword) M2 = 20 nsec for M = 512 Bit line delay – Assuming min size pull-down and 3*min size pull-up with reduced swing bit lines (5V to 2.5V) Cbit = 1.7 pF and IavHL = 0.36 mA so tHL = tLH = 5.9 nsec Digital Integrated Circuits Chpt. 12 Read-Write Memories (RAMs) Static – SRAM – – – – – – data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology Dynamic – DRAM – – – – – – periodic refresh required small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology Digital Integrated Circuits Chpt. 12 Memory Timing Definitions Read Cycle Read Read Access Read Access Write Write Setup Data Data Valid Digital Integrated Circuits Chpt. 12 Write Cycle Write Hold 4x4 SRAM Memory 2 bit words read precharge enable bit line precharge WL[0] !BL BL A1 WL[1] A2 WL[2] WL[3] clocking and control A0 Column Decoder sense amplifiers BL[i] BL[I+1] Digital Integrated Circuits Chpt. 12 write circuitry 2D Memory Configuration Sense Amps Digital Integrated Circuits Chpt. 12 Sense Amps Decreasing Word Line Delay Drive the word line from both sides driver WL polysilicon word line metal word line Use a metal bypass WL polysilicon word line metal bypass Use silicides Digital Integrated Circuits Chpt. 12 driver Decreasing Bit Line Delay (and Energy) Reduce the bit line voltage swing – need sense amp for each column to sense/restore signal Isolate memory cells from the bit lines after sensing (to prevent the cells from changing the bit line voltage further) pulsed word line – generation of word line pulses very critical » too short - sense amp operation may fail » too long - power efficiency degraded (because bit line swing size depends on duration of the word line pulse) – use feedback signal from bit lines Isolate sense amps from bit lines after sensing (to prevent bit lines from having large voltage swings) - bit line isolation Digital Integrated Circuits Chpt. 12 Pulsed Word Line Feedback Signal Read Word line Bit lines Dummy bit lines Complete 10% populated Dummy column – height set to 10% of a regular column and its cells are tied to a fixed value – capacitance is only 10% of a regular column Digital Integrated Circuits Chpt. 12 Pulsed Word Line Timing Read Complete Word line Bit line Dummy bit line Dummy V = 0.1Vdd V = Vdd bit lines have reached full swing and trigger pulse shut off when regular bit lines reach 10% swing Digital Integrated Circuits Chpt. 12 Bit Line Isolation bit lines V = 0.1Vdd isolate Read sense amplifier sense V = Vdd sense amplifier outputs Digital Integrated Circuits Chpt. 12 6-transistor SRAM Cell WL M2 M5 Digital Integrated Circuits Chpt. 12 M6 !Q M1 !BL M4 Q M3 BL SRAM Cell Analysis (Read) WL=1 M4 M5!Q=0 M6 Q=1 M1 Cbit Cbit !BL=1 BL=1 Read-disturb (read-upset): must carefully limit the allowed voltage rise on !Q to a value that prevents the read-upset condition from occurring while simultaneously maintaining acceptable circuit speed and area constraints Digital Integrated Circuits Chpt. 12 SRAM Cell Analysis (Read) WL=1 M4 M5!Q=0 M6 Q=1 M1 Cbit Cbit !BL=1 BL=1 Cell Ratio (CR) = (WM1/LM1)/(WM5/LM5) V!Q = [(Vdd - VTn)(1 + CR (CR(1 + CR))]/(1 + CR) Digital Integrated Circuits Chpt. 12 Read Voltages Ratios 1.2 Vdd = 2.5V VTn = 0.5V Voltage Rise on !Q 1 0.8 0.6 0.4 0.2 0 0.3 0.6 0.9 1.2 1.5 Cell Ratio (CR) Digital Integrated Circuits Chpt. 12 1.8 2.1 2.4 SRAM Cell Analysis (Write) WL=1 M4 M5!Q=0 Q=1 M6 M1 !BL=1 BL=0 Pullup Ratio (PR) = (WM4/LM4)/(WM6/LM6) VQ = (Vdd - VTn) ((Vdd – VTn)2 – (p/n)(PR)((Vdd – VTn - VTp)2) Digital Integrated Circuits Chpt. 12 Write Voltages Ratios 1 Vdd = 2.5V |VTp| = 0.5V Write Voltage (VQ) 0.8 p/n = 0.5 0.6 0.4 0.2 0 0.3 0.6 0.9 1.2 1.5 Pullup Ratio (PR) Digital Integrated Circuits Chpt. 12 1.8 2.1 2.4 Cell Sizing Keeping cell size minimized is critical for large caches Minimum sized pull down fets (M1 and M3) – Requires minimum width and longer than minimum channel length pass transistors (M5 and M6) to ensure proper CR – But sizing of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle Minimum width and length pass transistors – Boost the width of the pull downs (M1 and M3) – Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger Digital Integrated Circuits Chpt. 12 6T-SRAM Layout VDD M2 M4 Q Q M1 M3 GND M5 BL Digital Integrated Circuits Chpt. 12 M6 BL WL Multiple Read/Write Port Cell WL2 WL1 M2 M5 !Q M4 Q M7 M8 M1 !BL2 !BL1 Digital Integrated Circuits Chpt. 12 M6 M3 BL1 BL2 4x4 DRAM Memory 2 bit words read precharge enable bit line precharge WL[0] BL A1 WL[1] A2 WL[2] WL[3] sense amplifiers clocking, control, and refresh BL[0] BL[1] A0 Digital Integrated Circuits Chpt. 12 BL[2] BL[3] Column Decoder write circuitry 3-Transistor DRAM Cell WWL WWL RWL write Vdd M3 M1 X BL1 M2 Cs X Vdd-Vt RWL read BL2 Vdd-Vt BL2 BL1 No constraints on device sizes (ratioless) Reads are non-destructive Value stored at node X when writing a “1” is VWWL - Vtn Digital Integrated Circuits Chpt. 12 V 3T-DRAM Layout BL2 BL1 GND RWL M3 M2 WWL M1 Digital Integrated Circuits Chpt. 12 1-Transistor DRAM Cell WL WL M1 Cs CBL BL write “1” read “1” X X BL Vdd/2 Vdd-Vt Vdd sensing Write: Cs is charged (or discharged) by asserting WL and BL Read: Charge redistribution occurs between CBL and Cs Read is destructive, so must refresh after read Digital Integrated Circuits Chpt. 12 1-T DRAM Cell Capacitor Metal word line M1 word line SiO2 poly n+ Field Oxide n+ poly Inversion layer induced by plate bias Diffused bit line Polysilicon Polysilicon plate gate (a) Cross-section (b) Layout Used Polysilicon-Diffusion Capacitance Expensive in Area Digital Integrated Circuits Chpt. 12 DRAM Cell Observations DRAM memory cells are single ended (complicates the design of the sense amp) 1T cell requires a sense amp for each bit line due to charge redistribution read 1T cell read is destructive; refresh must follow to restore data 1T cell requires an extra capacitor that must be explicitly included in the design A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than Vdd) Digital Integrated Circuits Chpt. 12