Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Memory Design • • • • • Memory Types g Memoryy Organization ROM design g RAM design PLA design Adapted from J. M. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits, 2nd ed. Copyright 2003 Prentice Hall/Pearson. ECE 261 James Morizio 1 Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Read-Only Memory Memory Random Access Non-Random Access EPROM 2 E PROM SRAM FIFO DRAM LIFO Mask-Programmed P Programmable bl (PROM) FLASH Shift Register CAM ECE 261 James Morizio 2 M Memory Timing: Ti i Definitions D fi i i Read cycle y READ Read access Read access Write cycle WRITE Write access Data valid DATA Data written ECE 261 James Morizio 3 Memory y Architecture: Decoders M bits S0 M bits S0 Word 0 S1 Word 1 S2 Word 2 Storage cell Word 0 A0 Word 1 A1 Word 2 A K2 words N SN 2 2 SN 2 Word N 2 2 Decoder Word N 2 2 1 Word N 2 1 Word N 2 1 1 Storage cell K 5 log2N Input-Output Input-Output (M bits) (M bits) Intuitive architecture for N x M memory Decoder reduces the number of select signals Too many select signals: K = log2N N words d == N select l t signals i l ECE 261 James Morizio 4 Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH 2L 2 K Storage cell Row Decoder AK Bit line A K1 1 AL2 1 Word line M.2K Sense amplifiers / Drivers A0 A K2 1 Column decoder Amplify swing to rail-to-rail amplitude Selects appropriate word p p Input-Output (M bits) ECE 261 James Morizio 5 Hierarchical Memory Architecture Block 0 Block i Block P 2 1 Row address dd Column address Block address Global data bus Control circuitry Block selector Global amplifier/driver I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings ECE 261 James Morizio 6 R d O l Memory Read-Only M Cells C ll BL BL BL VDD WL WL WL 1 BL WL BL BL WL WL 0 GND Diode ROM ECE 261 MOS ROM 1 James Morizio MOS ROM 2 7 MOS OR ROM BL[0] BL[1] BL[2] BL[3] WL[0] V DD WL[1] WL[2] V DD WL[3] V bias Pull-down loads ECE 261 James Morizio 8 ROM Example • 4-word x 6-bit ROM Word 0: 010101 – Represented with dot diagram – Dots indicate 1’s in ROM weak pseudo-nMOS pullups A1 A0 Word 1: 011001 Word 2: 100101 Word 3: 101010 2:4 DEC ROM Arrayy Y5 Y4 Y3 Y2 Y1 Y0 Looks like 6 4 4-input input pseudo pseudo-nMOS nMOS NORs ECE 261 James Morizio 9 MOS NOR ROM V DD Pull up devices Pull-up WL[0] GND WL [1] WL [2] GND WL [3] BL [0] ECE 261 BL [1] BL [2] James Morizio BL [3] 10 MOS NOR ROM Layout Cell (9.5λ x 7λ) Programmming using the Active Layer y Only y Polysilicon Metal1 Diffusion Metal1 on Diffusion ECE 261 James Morizio 11 MOS NOR ROM Layout Cell (11λ x 7λ) Programmming using the Contact Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion ECE 261 James Morizio 12 MOS NAND ROM V DD Pull-up devices BL [0] BL[1] BL[2] BL[3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row ECE 261 James Morizio 13 MOS NAND ROM Layout Cell (8λ x 7λ) Programmming P i using i the Metal-1 Layer Only No contact to VDD or GND necessary; drastically reduced cell size Loss in performance compared to NOR ROM Polysilicon Diffusion Metal1 on Diffusion ECE 261 James Morizio 14 NAND ROM Layout Cell (5λ x 6λ) Programmming P i using i Implants Only Polysilicon Threshold-altering implant Metal1 on Diffusion ECE 261 James Morizio 15 Decreasing g Word Line Delayy Driver WL Polysilicon word line Metal word line (a) Driving the word line from both sides Metal bypass WL K cells Polysilicon word line (b) Using a metal bypass ECE 261 James Morizio 16 Precharged g MOS NOR ROM f V DD pre Precharge devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. design ECE 261 James Morizio 17 Read-Write Memories (RAM) STATIC (SRAM) Data D t stored t d as llong as supply l iis applied li d Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended ECE 261 James Morizio 18 6 transistor CMOS SRAM Cell 6-transistor WL V DD M2 M5 M4 Q Q M1 M3 BL ECE 261 M6 BL James Morizio 19 6T-SRAM — Layout M2 VDD M4 Q Q M1 M3 GND M5 BL ECE 261 M6 WL BL James Morizio 20 Statue of Goethe and Schiller: the German National Theater, Weimar ECE 261 James Morizio 21 3-Transistor DRAM Cell BL 1 BL 2 WWL WWL RWL M3 X M1 CS M2 RWL X BL 1 BL 2 No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL-VTn ECE 261 James Morizio 22 3T-DRAM — Layout BL2 BL1 GND RWL M3 M2 WWL M1 ECE 261 James Morizio 23 1-Transistor DRAM Cell BL Write 1 WL Read 1 WL M1 V DD 2 V T X GND CS V DD BL V DD /2 V /2 sensing DD CBL Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance CS ΔV = VBL – V PRE = V BIT – V PRE -----------C S + CBL Voltage swing is small; typically around 250 mV. ECE 261 James Morizio 24 DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells ll are single-ended i l d d in i contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh f h operations ti are necessary for f correctt operation. ti Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD ECE 261 James Morizio 25 1-T DRAM Cell Capacitor Metal word line M 1 word line SiO2 Poly n+ Field Oxide n+ Poly Inversion layer induced by plate bias Diffused bit line Polysilicon gate Cross-section Cross section Polysilicon plate Layout Uses Polysilicon-Diffusion Capacitance Expensive in Area (trend now is to use trench capacitors ECE 261 James Morizio 26 Periphery p y Decoders Sense Amplifiers p Input/Output Buffers Control / Timing Circuitry ECE 261 James Morizio 27 R D Row Decoders d Collection of 2M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder ECE 261 James Morizio 28 Hierarchical Decoders Multi-stage implementation improves performance ••• WL 1 WL 0 A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3 ••• NAND decoder using 2-input pre pre--decoders A1 A0 ECE 261 A0 A1 A3 A2 A2 A3 James Morizio 29 D namic Decoders Dynamic Precharge devices GND VDD GND WL3 VDD WL 3 WL 2 WL 2 VDD WL 1 WL 1 V DD WL 0 WL 0 VDD φ A0 A0 A1 A1 2-input NOR decoder ECE 261 A0 A0 A1 A1 φ 2-input NAND decoder James Morizio 30 4-to-1 tree based column decoder BL 0 BL 1 BL 2 BL 3 A0 A0 A1 A1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination bi ti off tree t and d pass transistor t i t approaches h ECE 261 James Morizio 31 PLA versus ROM Programmable Logic Array structured approach to random logic “two level logic implementation” NOR-NOR (product of sums) NAND-NAND (sum of products) SIMILAR TO ROM Main difference ROM: fully populated PLA: one element per minterm Note: Importance p of PLA’s has drastically y reduced 1. slow 2. better software techniques (mutli-level logic synthesis) B t… But ECE 261 James Morizio 32 Programmable Logic Array P Pseudo-NMOS d NMOS PLA GND GND GND V DD GND GND GND GND V DD X0 X0 X1 X1 X2 X2 AND-plane ECE 261 f0 f1 OR-plane James Morizio 33 Dynamic PLA f AND V DD GND f OR f OR f AND V DD X0 X0 X1 X1 X2 X2 AND-plane AND plane ECE 261 f0 f 1 GND OR-plane OR plane James Morizio 34 PLA Layout VDD And-Plane x0 x0 x1 x1 x2 x2 Pull-up devices ECE 261 James Morizio Or-Plane φ GND f0 f1 Pull-up devices 35 CAMs • Extension E t i off ordinary di memory ((e.g. SRAM) – Read and write memory as usual – Also match to see which words contain a key adr data/key read CAM match write ECE 261 James Morizio 36 10T CAM Cell • Add four match transistors to 6T SRAM – 56 x 43 λ unit cell bit bit b bit_b word cell_b cell match ECE 261 James Morizio 37 CAM Cell Operation • Read and write like ordinary SRAM • For matching: Leave wordline low Precharge matchlines Place key on bitlines Matchlines evaluate address clk weak miss match0 row decoder – – – – CAM cell match1 match2 match3 read/write • Miss line column circuitry data – Pseudo-nMOS NOR of match lines – Goes high if no words match ECE 261 James Morizio 38