Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic] CSE477 L25 Memory Peripheral.1 Irwin&Vijay, PSU, 2003 Review: Read-Write Memories (RAMs) Static – SRAM data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology Dynamic – DRAM periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology CSE477 L25 Memory Peripheral.2 Irwin&Vijay, PSU, 2003 Review: 2D Memory Bank Precharge Circuit Precharge Circuit Write Circuitry Sense Amps Write Circuitry Sense Amps Ai-1 … A0 Column Decoder Column Decoder AN-1 … Ai Read Precharge Read Precharge data CSE477 L25 Memory Peripheral.3 Irwin&Vijay, PSU, 2003 Peripheral Memory Circuitry Row and column decoders Read bit line precharge logic Speed Power consumption Area – pitch matching Sense amplifiers Read/write circuitry Timing and control CSE477 L25 Memory Peripheral.4 Irwin&Vijay, PSU, 2003 Row Decoders Collection of 2M complex logic gates organized in a regular, dense fashion (N)AND decoder for 8 address bits WL(0) = !A7 & !A6 & !A5 & !A4 & !A3 & !A2 & !A1 & !A0 … WL(255) = A7 & A6 & A5 & A4 & A3 & A2 & A1 & A0 NOR decoder for 8 address bits WL(0) = !(A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0) … WL(255) = !(!A7 | !A6 | !A5 | !A4 | !A3 | !A2 | !A1 | !A0) Goals: Pitch matched, fast, low power CSE477 L25 Memory Peripheral.5 Irwin&Vijay, PSU, 2003 Implementing a Wide NOR Function Single stage 8x256 bit decoder (as in Lecture 22) Decompose logic into multiple levels One 8 input NOR gate per row x 256 rows = 256 x (8+8) = 4,096 Pitch match and speed/power issues !WL(0) = !(!(A7 | A6) & !(A5 | A4) & !(A3 | A2) & !(A1 | A0)) First level is the predecoder (for each pair of address bits, form Ai|Ai-1, Ai|!Ai-1, !Ai|Ai-1, and !Ai|!Ai-1) Second level is the word line driver Predecoders reduce the number of transistors required Four sets of four 2-bit NOR predecoders = 4 x 4 x (2+2) = 64 256 word line drivers, each a four input NAND – 256 x (4+4) = 2,048 - 4,096 vs 2,112 = almost a 50% savings Number of inputs to the gates driving the WLs is halved, so the propagation delay is reduced by a factor of ~4 CSE477 L25 Memory Peripheral.6 Irwin&Vijay, PSU, 2003 Split Row Two-Level 8x256 Decoder !(!(!A0&!A1&!A2) | !(!A3&!A4&!A5) | !(!A6&!A7)) WL0 WL0 *256 *256 WL255 WL255 !(!A0 & !A1 & !A2) ... !(A0 & A1 & A2) Address<7:0> *8 *8 *8 CSE477 L25 Memory Peripheral.7 *8 *4 Pitch matched Buffered word line drivers Irwin&Vijay, PSU, 2003 Pass Transistor Based Column Decoder A1 A0 2 input NOR decoder BL3 !BL3 BL2 !BL2 S3 S2 S1 S0 data_out BL1 !BL1 BL0 !BL0 !data_out Read: connect BLs to the Sense Amps (SA) drive one of the BLs low to write a 0 into the cell Writes: Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2K + 2 x 2K) For K = 2 3 x 22 (decoder) + 2 x 22 (PTs) = 12 + 8 = 20 CSE477 L25 Memory Peripheral.8 Irwin&Vijay, PSU, 2003 Tree Based Column Decoder BL3 !BL3 BL2 !BL2 BL1 !BL1 data_out !data_out BL0 !BL0 A0 !A0 A1 !A1 Number of transistors reduced to (2 x 2 x (2K -1)) for K = 2 2 x 2 x (22 – 1) = 4 x 3 = 12 Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders) can fix with buffers, progressive sizing, combination of tree and pass transistor approaches CSE477 L25 Memory Peripheral.9 Irwin&Vijay, PSU, 2003 Decoder Complexity Comparisons Consider a memory with 10b address and 8b data Conf. 1D 2D 2D 2D Data/Row Row Decoder 10b = a 10x210 decoder Single stage = 20,480 Two stage = 10,320 32b 8b = 8x28 decoder Single stage = 4,096 T (32x256 core) Two stage = 2,112 T 64b 7b = 7x27 decoder Single stage = 1,792 T (64x128 core) Two stage = 1,072 T 128b 6b = 6x26 decoder Single stage = 768 T (128x64 core) Two stage = 432 T Column Decoder 8b CSE477 L25 Memory Peripheral.10 2b = 2x22 decoder PT = 76 T Tree = 96 T 3b = 3x23 decoder PT = 160 T Tree = 224 T 4b = 4x24 decoder PT = 336 T Tree = 480 T Irwin&Vijay, PSU, 2003 Bit Line Precharge Logic First step of a Read cycle is to precharge (PC) the bit lines to VDD every differential signal in the memory must be equalized to the same voltage level before Read Turn off PC and enable the WL !PC the grounded PMOS load limits the bit line swing (speeding up the next precharge cycle) CSE477 L25 Memory Peripheral.11 BL !BL equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line Irwin&Vijay, PSU, 2003 Sense Amplifiers Amplification – resolves data with small bit line swings (in some DRAMs required for proper functionality) SA input output Delay reduction – compensates for the limited drive capability of the memory cell to accelerate BL transition tp = ( C * V ) / Iav large small make V as small as possible Power reduction – eliminates a large part of the power dissipation due to charging and discharging bit lines Signal restoration – for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh CSE477 L25 Memory Peripheral.12 Irwin&Vijay, PSU, 2003 Classes of Sense Amplifiers Differential SA – takes small signal differential inputs (BL and !BL) and amplifies them to a large signal singleended output common-mode rejection – rejects noise that is equally injected to both inputs Only suitable for SRAMs (with BL and !BL) Types Current mirroring Two-stage Latch based Single-ended SA – needed for DRAMs CSE477 L25 Memory Peripheral.13 Irwin&Vijay, PSU, 2003 Latch Based Sense Amplifier bit line inputs BL !BL V = 0.1VDD isolate SE data_out !data_out V = VDD sense amplifier outputs CSE477 L25 Memory Peripheral.14 Irwin&Vijay, PSU, 2003 Alpha Differential Amplifier/Latch S3 S2 S1 S0 P2 N3 N5 N2 N4 offon N1 data_out P3 sense amplifier V = VDD P4 !data_out 01 SE CSE477 L25 Memory Peripheral.15 P1 sense !sense 01 PC !mux_out mux_out column decoder Irwin&Vijay, PSU, 2003 Read/Write Circuitry BL !BL D: data (write) bus R: read bus W: write signal CS: column select (column decoder) SA CS Local R/W D W R CSE477 L25 Memory Peripheral.16 Precharge !R Local W (write): BL = D, !BL = !D enabled by W & CS Local R (read): R = BL, !R = !BL enabled by !W & CS Irwin&Vijay, PSU, 2003 Approaches to Memory Timing SRAM Timing Self-Timed DRAM Timing Multiplexed Addressing Address Bus Address Bus Address Address transition initiates memory operation msb’s lsb’s Row Addr. Column Addr. RAS CAS RAS-CAS timing CSE477 L25 Memory Peripheral.17 Irwin&Vijay, PSU, 2003 Reliability and Yield Memories operate under low signal-to-noise conditions word line to bit line coupling can vary substantially over the memory array - folded bit line architecture (routing BL and !BL next to each other ensures a closer match between parasitics and bit line capacitances) interwire bit line to bit line coupling - transposed (or twisted) bit line architecture (turn the noise into a common-mode signal for the SA) suffer from low yield due to high density and structural defects leakage (in DRAMs) requiring refresh operation increase yield by using error correction (e.g., parity bits) and redundancy and are susceptible to soft errors due to alpha particles and cosmic rays CSE477 L25 Memory Peripheral.18 Irwin&Vijay, PSU, 2003 Redundancy in the Memory Structure Fuse bank Redundant row Redundant columns Row address Column address CSE477 L25 Memory Peripheral.19 Irwin&Vijay, PSU, 2003 Redundancy and Error Correction CSE477 L25 Memory Peripheral.20 Irwin&Vijay, PSU, 2003 Soft Errors Nonrecurrent and nonpermanent errors from alpha particles (from the packaging materials) neutrons from cosmic rays System FITS As feature size decreases, the charge stored at each node decreases (due to a lower node capacitance and lower VDD) and thus Qcritical (the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER) CSE477 L25 Memory Peripheral.21 From Semico Research Corp. 10000 1000 100 10 1 0.25 0.18 0.13 0.09 0.05 Process Technology From Actel MTBF (hours) .13 m .09 m Ground-based 895 448 Civilian Avionics System 324 162 Military Avionics System 18 9 Irwin&Vijay, PSU, 2003 Next Lecture and Reminders Next lecture Power consumption in datapaths and memories - Reading assignment – Rabaey, et al, 11.7; 12.5 Reminders HW#5 due today Project final reports due on-line by 5:00pm on Friday, December 5th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled - Tuesday, December 16th from 10:10 to noon in 118 and 113 Thomas CSE477 L25 Memory Peripheral.22 Irwin&Vijay, PSU, 2003