* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A sensing circuit for single-ended read
Ground loop (electricity) wikipedia , lookup
Power inverter wikipedia , lookup
Alternating current wikipedia , lookup
Current source wikipedia , lookup
Stray voltage wikipedia , lookup
Voltage optimisation wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Fault tolerance wikipedia , lookup
Electrical substation wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Buck converter wikipedia , lookup
Ground (electricity) wikipedia , lookup
Earthing system wikipedia , lookup
Two-port network wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Circuit breaker wikipedia , lookup
Schmitt trigger wikipedia , lookup
Power MOSFET wikipedia , lookup
Rectiverter wikipedia , lookup
Regenerative circuit wikipedia , lookup
Mains electricity wikipedia , lookup
Surge protector wikipedia , lookup
Opto-isolator wikipedia , lookup
Flexible electronics wikipedia , lookup
Electrical wiring in the United Kingdom wikipedia , lookup
Adv. Radio Sci., 9, 247–253, 2011 www.adv-radio-sci.net/9/247/2011/ doi:10.5194/ars-9-247-2011 © Author(s) 2011. CC Attribution 3.0 License. Advances in Radio Science A sensing circuit for single-ended read-ports of SRAM cells with bit-line power reduction and access-time enhancement T. Heselhaus and T. G. Noll Chair of Electrical Engineering and Computer Systems, RWTH Aachen University, 52062 Aachen, Germany Abstract. The conventional sensing scheme of single-ended read-only-ports as integrated in 8T-SRAM cells suffers from low performance compared to double-ended complementary sensing schemes. In the proposed sensing scheme the precharge voltage of the single-ended read-bit-line is set to a level above the threshold voltage of the sensing device with an adjustable margin. This margin is minimized to speed up the read access on the one hand and kept large enough to provide a sufficient bit-line noise margin on the other hand. The pre-charge voltage level of the proposed sensing circuit tracks the threshold voltage of the sensing device under process variations in order to maintain a minimum required bitline noise margin. To avoid unnecessary bit-line discharging, the proposed sensing scheme employs a modified 8TSRAM cell. Compared to the conventional 8T-SRAM cell, the read port of the proposed cell provides a virtual ground line running in parallel to the bit-lines. An internal driver of the sensing circuit releases the virtual ground line during the evaluation period to prevent the charge dissipation resulting in a raised voltage level. The reduced pre-charge level and the increased virtual ground lead to a reduced bit-line voltage swing and thus a bit-line power reduction. Access time, energy dissipation, and noise margin of the proposed sensing circuit are compared with conventional sensing circuits from the literature for different numbers of memory cells connected to the bit-line. It is shown, that for a specific number of memory cells per bit-line the proposed circuit achieves fastest access time at low power operation. Correspondence to: T. Heselhaus ([email protected]) 1 Introduction Single-ended read sensing schemes will become more important in memory architectures of SRAM cells with additional read-only-ports to overcome the reliability problems related to standard 6T-SRAM cells. Though such single-ended ports require only one bit-line, the main power dissipation originates from the full-swing bit-line sensing schemes. The intention of this work is to set the initial bit-line voltage close to the threshold level of the individual local sensing device and stop the discharging, once the local sensing has detected the data bit, such that no more than necessary of the bit-line charge is dissipated. The expected side effect is a performance upgrade, since the initial bit-line voltage resides close to the sensing threshold level, such that this level can be reached in a shorter evaluation time. Section 2 presents a modified memory cell, which is required to enable the bit-line swing reduction. In Sect. 3 the proposed sensing circuit is described. In Sect. 4 the proposed sensing scheme is verified with simulations and compared with other sensing circuits from the literature. 2 The proposed 8T-SRAM cell Figure 1 shows the conventional (Chang et al., 2005) and the proposed 8T-SRAM cell. Both cell variants consist of a conventional cross-coupled inverter-pair connected via two access transistors to a complementary bit-line pair p and p, which represents the write-port for storing data into the cell by activating the storage-word-line s. The read-only-port consists of two series connected transistors for single-ended read-out with a separate read-word-line r. Area and most of the wiring of the proposed cell is identical to the conventional cell, however the proposed cell connects the source of the read-only-port to an additional virtual ground line v instead of VSS . This additional virtual ground Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V. 2248 M4 are both zero and the bit-line is charged by the series connected transistors M3 and M4. As the gates of M5 and M6 are at zero, these transistors establish a voltage divider of the T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports . . . actual bit-line voltage. The to the diϕ gate of M3 is connected r e vided voltage u, which is a fraction of the bit-line voltage m. While the bit-lineM5’ voltage rises, the tapped gate potential u ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ rises as well. The pre-charge stops, if M5 process of the bit-lineM11 p V SS m p p m p u rises above VDD −|Vth |, such thatM3 M3 enters the off-region. m m M6 Hence the final pre-charge voltage Vm,max of the bit-line has r r PSfrag replacements m M4 level VDD − |Vth | of risen in excess of the sensing threshold s s M3. This overshoot can be adjustedM7 by choosing the approM2 priate dimensioning of M5 and M6. y M9 M1 g ag replacements M8achieve the required The intention of the overshoot is to noise margin on the bit-line, which must be maintained even M10 the feed-back loop for preunder process variations. With V SS V SS charging the bit-line via the transistors M3, M4, M5 and M6, p p p p the voltage overshoot of Vm,max tracks the sensing threshold conventional proposed level of M3. This tracking effect is verified by simulations in 2. Schematic theproposed proposed single-ended sensing and and Fig. Fig. 2. Schematic ofofthe single-endedread read sensing Section 4. pre-charge circuit for one memory column and domino connected Fig. Conventionaland andproposed proposed 8T-SRAM-Cell as schematic Fig. 1. Conventional 8T-SRAM cell as schematic and pre-charge circuit for one memory column and domino connected Inline fact, in the proposed sensing scheme the signal is demultiplex. and layout view. additional virtual ground is labeled layout view. TheThe additional virtual ground line line is labeled as v.as v. digitdigit line multiplex. tected with two successive sensing stages, realized by M3 and M9. It behaves similar to an inverter sense circuit followed a domino NMOS pull-down the The by intention of the overshoot is to device. achieve However, the required runs in parallel to the read-bit-line m and can be easily 3linePre-charge and sensing circuit nodes x, y and v in the sensing circuit have to be pre-charged noise margin on the bit-line, which must be maintained even placed in the thin layout cell without any area penalty. The or clamped before enteringWith the evaluation phase. During the under process variations. the feed-back loop for preintention of this additional virtual groundcircuit line isoftoa singlereduce Fig. 2 shows the pre-charge and sensing pre-charge-phase the bit-line m rises to V and shortly m,max charging the bit-line via the transistors M3, M4, M5 and M6, the bit-line swing functionality ended bit-line for and one power columndissipation. of memoryThe cells. Only the M8 voltage begins to dischargeofthe internal node y such thatthreshold the main the overshoot Vm,max tracks the sensing of this virtual ground line will cell be described the nextthere secread-only-port of one memory is shown,inhowever, output stage M9 turns off. The drain of M9 is connected to level of M3. This tracking effect is verified by simulations in tion.N memory cells connected to the bit-lines m and v. The are a common node x, which could be a global bit-line in a subSect. 4. basic idea of the pre-charge circuit is to pre-charge the bitdivided bit-line memory architecture or a common digit line line m to a defined voltage level above the threshold-level of In fact, in the proposed sensing scheme the signal is de3 Pre-charge and sensing circuit in some other hierarchical memory architecture (e.g. a colthe sensing transistor M3. Because the drain current of M3 tected with two successive sensing stages, realized by M3 umn or data word multiplex output). Here, this output x is isFigure used 2toshows pre-charge the bit-line, should itselfofturn off as and M9. It behaves similar to an inverter sense circuit folthe pre-charge andM3 sensing circuit a singleassumed to be domino-connected, thus x requires an addithe pre-charge reached. pre-charge is lowed by a domino NMOS pull-down device. However, the ended bit-line voltage for one iscolumn of The memory cells. process Only the tional pre-charge M11, circuit which have is nottopart of the sensdescribed as follows. nodes x, y and v indevice the sensing be pre-charged read-only-port of one memory cell is shown, however, there ing circuit. However, M11 pre-charges x to V such that DD or clamped before entering the evaluation phase. During the the pre-charge-phase (φthe = 0) the word-line deareDuring N memory cells connected to bit-lines m and v.is The M10 turns on and clamps the virtual ground line v to VSS . pre-charge-phase the bit-line m rises to V and shortly activated (r = 0) and the sensing is disabled (e = 1). For an m,max basic idea of the pre-charge circuit is to pre-charge the bitThebegins circuittois discharge now prepared for sensing. M8 the internal node y such that the main initially bit-line level m theabove gate the potentials of M3 and line m todischarged a defined voltage threshold-level of φ to VDD The stage pre-charge phase is completed byM9 reverting output M9 turns off. The drain of is connected to M4 both transistor zero and the bit-line is charged by the series the are sensing M3. Because the drain current ofconM3 which turns off M4. Since M6 is connected as a diode, M6 a common node x, which could be a global bit-line in a subnected transistors M3 and M4. As the gates of M5 and M6 is used to pre-charge the bit-line, M3 should itself turn off as turns offbit-line as well, the voltage divider or built by M5 and divided memory architecture a common digitM6 lineis are zero, these transistors establish a voltage divider of the the at pre-charge voltage is reached. The pre-charge process is now no longer active and u follows the bit-line voltage via in some other hierarchical memory architecture (e.g. a coldescribed as follows. M5. The transistor M5’ may be added to afford a close couumn or data word multiplex output). Here, this output x is During the (φ = 0) ethe word-line is depling between and u over a wide range bit-line voltages. ϕ r pre-charge-phase assumed to bem domino-connected, thus xofrequires an addiactivated (r = 0) and the sensing is disabled (e = 1). For an Actually both M5’ and M5 together build up a transmission tional pre-charge device M11, which is not part of the sensM5’ initially discharged bit-line m the gate potentials of M3 and gatecircuit. to connect m to theM11 gatepre-charges of M3. In some cases where M5 ing However, x to V such that DD M4 are both zero and the series conM5 bit-line is charged by theM11 is strong enough for a close coupling between m and u, M5’ M10 turns on and clamps the virtual ground line v to VSS . nected transistors M3 and M4. AsM3 the gates of M5 and M6 might be omitted. However, this work M5’ is included M6 The circuit is now prepared forinsensing. are atmzero, these transistors establish a voltage divider of the Sfrag replacements in each power and performance simulation and comparison. M4 The pre-charge phase is completed by reverting φ to VDD actual bit-line voltage. The gate of M3 is connected to the diThe transistor M5’ is minimally dimensioned. M2 M7 which turns off M4. Since M6 is connected as a diode, M6 vided voltage u, which is a fraction of voltage m. y the bit-line Additionally, ethe is now tieddivider to zero,built andby M7 connects the M9 turns off as well, voltage M5 and M6 is M1 voltage rises, the tapped gate potential u While theg bit-line sensing transistor M3 to the internal node y. Any leakage M8 now no longer active and u follows the bit-line voltage via rises as well. The pre-charge process of the bit-line stops, if of M3 duetransistor to the near bit-line m M5. The M5’threshold-level may be addedpre-charged to afford a close couM10that M3 enters the off-region. u rises above VDD −|Vth |, such and node u is compensated now by the weakly dimensioned pling between m and u over a wide range of bit-line voltages. Hence the final pre-charge voltage Vm,max of the bit-line has transistorboth M8,M5’ suchand thatM5 y is kept tobuild VSS .upThe evaluationActually together a transmission risen in excess of the sensing threshold level VDD − |Vth | of phase starts by activating the word-line r. Depending on gate to connect m to the gate of M3. In some cases where M5 M3.2.This overshoot canproposed be adjusted by choosing the approthe stored data g, the bit-line remains pre-charged or is disFig. Schematic of the single-ended read sensing and is strong enough for a close coupling between m and u, M5’ priate dimensioning of M5 and M6. pre-charge circuit for one memory column and domino connected charged g = VDD However, via M2 and the memory cell and might beifomitted. in M1 this of work M5’ is included digit line multiplex. M10 of the sensing circuit. The gate potential u of M3 falls Adv. Radio Sci., 9, 247–253, 2011 www.adv-radio-sci.net/9/247/2011/ w tu no M pl A ga is m in Th se of an tra ph th ch M mV mV µA ρ = 0.65 (for lM5 = lM6 = 40nm). Reducing the variability M3, M4, M9, and M10 have a gate width of w = 480nm. of the voltage divider reduces the standard deviation of the Simulation using slow corner device models at 125°C and access time from 10.1% to 9%, while the mean access time a minimal supply voltage of 900mV are used to verify the remains nearly constant. function of the proposed sensing circuit under worst case -Ended Read-Ports . . . and T. G. Noll: A sensing circuit for single-ended3read-ports of SRAM cells T. Heselhaus 249 timing conditions. Fig. 3 shows the simulated pre-charge and evaluation-phase of a read cycle with a stored ’1’PSfrag on the sereplacements 600mV 0 M5 = 420nm lected memory cell (g = VDD ). The achieved bit-line swing ng M5 = 280nm −30 DD ∆V is approximately 435mV, a little bit less than VDD /2 tes M5 = 140nm V m,mx −60 such that bit-line energy savings of 50% compared to a full x x. V corr. swing bit-line sensing scheme can be expected. 900 y ed frag replacementsTracking the pre-charge voltage according to the sensing V DD − |Vth | m dithreshold level of M3 under process variations was functionΔV ≈ 435mV 100 100 ell ally verified ps ps 450 by Monte Carlo simulations. For this case pro300mV oncess variations apply to−M3 while all other transistors −→ −→ V DD |Vthonly, | 0 t ACC 0 t ACC t t operate in the typical corner. Fig. 4 shows the probability ed std(t ACC ) = 8.1% std(t ACC ) = 1.5% density functions (PDF) of sensing threshold level variations in0 and the resulting access time profile t ACC without tracking on the Fig. 4. 4. Verification Verification of of the t is Fig. the sensing sensing threshold threshold level level tracking. tracking. The The left side900 and with ϕtracking on the rightr side. On the ordinate left diagram diagram shows shows the the resulting access time profile with aafixed bitleft resulting access time profile with fixed bitlar 450sensing threshold level V , the mean pre-charge the mean line pre-charge pre-charge voltage, voltage, while the right diagram shows the results th line while the right diagram shows the results 0 nes when the the sensing sensing circuit circuit includes level Vm,max when includes the the voltage voltage divider divider M5/M6 M5/M6 for for 0 , and their PDFs 500 are plotted. 1000The abscissas 1500show ge the tracking tracking (Monte (Monte Carlo Carlo simulations, 25°C, VVDD = 900mV). the mean access times tACC and their PDFs for both cases. the simulations, 25°C, = 900mV). DD t / ps he rly ed. ca- 3. Simulatedwaveform waveform for for aa read-cycle proposed sensFig.Fig. 3. Simulated read-cycleofofthethe proposed sens- 4 Simulation results ing circuit (slow corner, 125°C, VDD = 900mV). ing circuit (slow corner, 125°C, VDD = 900mV). The sensing circuit was simulated for a 40nm general purpose bulk CMOS technology with a nominal supply voltin each power and performance simulation and comparison. transistor M5’ is minimally dimensioned. TheThe standard deviation of the access time PDF was reduced age of 900mV. All circuit simulations are based on transistor schematics. Extracted parasitics from a layout e is now tied to zero, and M7 connects the from Additionally, 8.1% to 1.5%. view are added to the bit-lines and to the virtual ground sensing transistor M3 to the internal node y. Any leakage of When considering process variations of all transistors of line corresponding to the connected number of cells N ∈ M3 due to the near threshold-level pre-charged bit-line m and thenode sensing circuit this also affects the fraction of the voltage {8,16,32,64,128}. Internal nodes of the sensing circuit are uru is compensated now by the weakly dimensioned trandivider M5/M6. Fig.y5isshows correlation of loaded with estimated parasitic capacitances rated between oltsistor M8, such that kept tothe VSSsimulated . The evaluation-phase thestarts sensing thresholdthelevel variation to the pre-charge level 0.4fF and 0.7fF depending on the assumed wiring complexon by activating word-line r. Depending on the stored variation. The left side of pre-charged Fig. 5 shows voltage iflevel ity. The transistor dimensioning of the proposed sensing cirdata g, the bit-line remains or isthe discharged out g = VDD viawhere M2 andall M1transistors of the memory cell sensing and M10circuit of the in cuit was basically designed with the minimal allowed gate distributions, of the nd width of w = 120nm. Only the driving/sensing transistors circuit. The gateapotential u of M3 and M3=now Fig.sensing 2 are designed with gate lengths of lfalls lM6 40nm. M3, M4, M9, and M10 have a gate width of w = 480nm. M5 = ∈ directly turnsmismatches on, which leads to avoltage steep rising edgetransistors on the Since device of the divider are Simulation using slow corner device models at 125°C and internal node y. The rising edge on y activates M9, which M5 and M6 disturb the correlation of the sensing threshold a minimal supply voltage of 900mV are used to verify the en now discharges the common data output node x. level The to the pre-charge level, a reduction of the variability of function of the proposed sensing circuit under worst case expower reduction feature is induced by the reduced preM5 and M6 the correlation. a gate timing conditions. Figure 3 shows the simulated pre-charge ircharge levelcan andimprove the discharge inhibition withChoosing the additional length of ground lM5 = line lM6v=of70nm for the8T-SRAM voltage divider M5/M6 and evaluation-phase of a read cycle with a stored “1” on the ate virtual the proposed cell and tranimproves the ofcorrelation toM10 ρ = is0.8 compared sistor M10 the sensingcoefficient circuit. Since connected to to selected memory cell (g = VDD ). The achieved bit-line swing ors 1V is approximately 435mV, a little bit less than VDD /2 data(for output which zero if the selected memory cell ρ =the 0.65 lM5x,= lM6 gets = 40nm). Reducing the variability such that bit-line energy savings of 50% compared to a full discharges bit-linereduces m, M10 the turnsstandard off and inhibits further of the voltagethedivider deviation of the swing bit-line sensing scheme can be expected. nd charge dissipation to VSS to as 9%, soon while the datathe bit mean is detected. Be-time access time from 10.1% access he Tracking the pre-charge voltage according to the sensing cause the bit-lines m and v are designed similar in the layout remains nearly constant. ase threshold level of M3 under process variations was functionview, the parasitic capacitances of both bit-lines are similar. ally verified by Monte Carlo simulations. For this case prond Thus, once M10 turns off, the remaining charge on the bitcess variations apply to M3 only, while all other transistors seline m will be balanced between m and v. The final bit-line acements operate in the typical corner. Figure 4 shows the probability 600mV = ng420nm potential of m would therefore settle to nearly half of the density functions (PDF) of sensing threshold level variations = 280nm /2140nm bit-line potential, when the data bit was detected. The final = V m,mx and the resulting access time profile without tracking on the potential depends on the ratio of the parasitic capacitances of ull x left side and with tracking on the right side. On the ordinate v. m Vand corr. the mean sensing threshold level Vth , the mean pre-charge ng level Vm,max , and their PDFs are plotted. The abscissas show V DD − |Vth | the mean access times tACC and their PDFs for both cases. on100 The standard deviation of the access time PDF was reduced 100 rops ps from 8.1% to 1.5%. 300mV ors −→ −→ ity ons he ate rge 0 t ACC t std(t ACC ) = 8.1% 0 t std(t ACC ) = 1.5% t ACC Fig.www.adv-radio-sci.net/9/247/2011/ 4. Verification of the sensing threshold level tracking. The left diagram shows the resulting access time profile with a fixed bitline pre-charge voltage, while the right diagram shows the results when the sensing circuit includes the voltage divider M5/M6 for Adv. Radio Sci., 9, 247–253, 2011 The proposed sensing circuit was compared to the following sensing circuits (refer to Table 1). As explained for the proposed sensing circuit, the sensing circuits for compariT. Heselhaus and T. Noll:circuit A Sensing Circuit for read-ports Single-Ended Read-Ports T. Heselhaus and T. G. Noll: A G. sensing for single-ended of SRAM cells . . . 4250 V m,mx / mV = 40nm erenc the re gate w transi signed Designation shows 800 The proposed sense circuit as shown in Fig. 2 the re A B A domino style sensing of the local bit-line m with a time t M5 = 420nm PMOS transistor, which implicitly implements the digit and E 750 = 70nm 650 Increasing the bit-line noisevariations margin was simulated by of enWhen considering process of all transistors larging the gate length M5 up tothe lM5fraction = 420nm forvoltage a series the sensing circuit this of also affects of the Figure 5 shows thekeeping simulated ofdivider MonteM5/M6. Carlo simulations, while thecorrelation gate lengthofof thefixed sensing level variation to the pre-charge level M6 to lthreshold = 70nm. The results shown in Fig. 6 exhibit M6 The left side ofpre-charge Fig. 5 shows theinvoltage levelto a variation. similar distribution of the level correlation distributions, where all transistors of the sensing circuit in the sensing threshold voltage like depicted on the right diaFig. 2 are designed with a gate lengths of l = l = 40nm. M5 M6 gram in Fig. 5. However, the dot clouds are shifted to higher Since device mismatches of the voltage dividerl transistors pre-charge levels with increasing gate length M5 . The elM5 and M6 disturb the correlation of the sensingerror threshold lipses inside the dot clouds denote the standard ellipse level to the pre-charge level, a reduction of the variability of the two dimensional density function. The resulting of imM5 and M6 can improve the correlation. Choosing a gate provement of the mean bit-line noise margin with increasing length of lM5 = lM6 = 70nm for the voltage divider M5/M6 the gate length lM5 = 70nm → {140nm,280nm,420nm} are improves the correlation coefficient to ρ = 0.8 compared to respectively ∆VBL,margin = {35mV,100mV,150mV}. ρ = 0.65 (for lM5 = lM6 = 40nm). Reducing the variability sensing circuitthe wasstandard compared to the followofThe the proposed voltage divider reduces deviation of the ing sensing circuits (refer to Table 1). As explained the access time from 10.1% to 9%, while the mean accessfor time proposed sensing circuit, the sensing circuits for compariremains nearly constant. V m,mx / mV line multiplex (see Fig. 7) (Takeda et al., 2004) M5 = 280nm Sensing with an inverter followed by an NMOS transistor for digit line multiplex (Cosemans et al., 2007). A similar 700 550 sensing circuit implies a static CMOS NAND gate instead replacements of an inverter for early digit line multiplex (Chang et al., M5 = 140nm 650 2007). 300 400 500 300 400 500 m D An AC coupled sense amplifier as PMOS replacement in a PSfrag replacements V DD − |Vth | / mV V DD − |Vth | / mV PSfrag replacements domino read path (Qazi et al., 2010). For the reduced bitPearson ρ = 0.65 Pearson ρ = 0.8 600 line swing due to word-line pulsing 400mV is assumed. To mean(t ACC ) = 150ps mean(t ACC ) = 152ps compare this circuit, the driving part for the global bit-line std(t ACC ) = 10.1% std(t ACC ) = 9.0% g 300 350 400 450 500 550 is included. V DD − |Vth | / mV Fig. level correlation correlationtotoM3-threshold-level M3-threshold-level E An AC coupled sense amplifier (Verma and Chandrakasan, Fig.5.5.Simulated Simulatedpre-charge pre-charge level variations runs for for two twodifferent differentgate gatelengths lengths variationsof of1000 1000 Monte Monte Carlo Carlo runs 2009). For the reduced bit-line swing due to word-line 6. Monte Carlo simulated pre-charge level for different lM5 in ofofthe (25°C, VVDD =900mV). 900mV). thevoltage voltagedivider divider M5/M6 M5/M6 (25°C, pulsing 200mV ispre-charge assumed. DD= Fig. 6.Fig. Monte Carlo simulated level for different lM5 in correlation to the M3-threshold-level (lM6 = 70nm, 1000 iterations, Fig. 7 correlation to the M3-threshold-level (lM6 = 70nm, 1000 iterations, 25°C, VDD = 900mV). 1. Proposed and reference sensing circuits for comparison. sensin 25°C, VTable = 900mV). DD 600 C Table 1. Proposed and reference sensing circuits for comparison. V m,mx / mV son are simulated using transistor schematics with estimated parasitics for their internal nodes (0.4...0.7fF). The transisDesignation tors are minimally dimensioned (l = 40nm, w = 120nm), exA senseand circuit as shown in Fig. 2 ceptThe forproposed pre-charge other driving devices (w = 480nm). B A domino style sensing of the local bit-line m with a PMOS For any number of cells N ∈ {8,16,32,64,128} the extracted transistor, which implicitly implements the digit line mulparasitics from a 7) column 8T-SRAM cells as tiplex (see Fig. (Takedaofetconventional al., 2004) shown in Fig. 1 are attached to the bit-line of each refC Sensing with an inverter followed by an NMOSm transistor erence circuit schematic. The circuit simulations of the reffor digit line multiplex (Cosemans et al., 2007). A similar sensing circuitare implies a static NAND gate instead erence circuits applied for CMOS the same simulation corners, of an inverter for early digit line multiplex (Chang et al., temperature and supply voltage as for the proposed sensing 2007). circuit. The cycle period was 2500ns. Fig. 7 shows the refD An AC coupled sense amplifier as PMOS replacement in a erence circuits B and C as implemented for comparison. For domino read path (Qazi et al., 2010). For the reduced bitthe reference circuits domino-read and standard inverter line swing due to word-line pulsing 400mV is assumed. To the gatecompare widths this of the sensing transistor M3 and the pre-charge circuit, the driving part for the global bit-line transistor M4 were set to w = 480nm, all others were deis included. Increasing the bit-line noise margin was simulated by enEsigned An AC coupled sense gate amplifier (Verma and=Chandrakasan, with minimal width of w 120nm. Table 2 larging the gate length of M5 up to lM5 = 420nm for a series 2009). For the reduced bit-line swing due tocircuit word-line shows the simulated results of the proposed A versus of Monte Carlo simulations, while keeping the gate length of pulsing 200mV is assumed. 800 the reference circuits B. . . E for N = 128 cells. The access M6 fixed to lM6 = 70nm. The results shown in Fig. 6 exhibit time tACC of the proposed circuit A is faster than B, C, D M5 = 420nm a similar distribution of the pre-charge level in correlation to and E for N = 128 cells. The access time tACC consists of a the sensing 750 threshold voltage like depicted on the right diagram in Fig. 5. However, the dot clouds are shifted to higher tors are minimally dimensioned (l = 40nm, w = 120nm), ex M5 = 280nm pre-charge levels with increasing gate length lM5 . The elcept for rpre-chargeϕ and other drivingrdevices (w = 480nm). ϕ 700 the dot clouds denote the standard error ellipse lipses inside For any number of cells N ∈ {8,16,32,64,128} the extracted of the two dimensional density function. The resulting imM4 parasitics fromM4 a column of conventional 8T-SRAM cells as M5 = 140nm provement shown in Fig. 1 are attached to the bit-line m ofM3each ref650of the mean bit-line noise margin with increasing the gate length lM5 = 70nm → {140nm,280nm,420nm} are m simulations of the referencemcircuit schematic. The circuit PSfrag replacements respectively 1VBL,margin = {35mV,100mV,150mV}. Sfrag replacements erence circuits are applied for the same simulation corners, 600 M3 voltage temperature and supply as for the proposed sensThe proposed sensing circuit was compared to the followg g ing sensing300 circuits (refer to Table 1). As explained for the ing circuit. The cycle period was 2500ps. Figure 7 shows 350 400 450 500 550 proposed sensing circuit, the sensing circuits for comparithe reference circuits B and C as implemented for comparV DD − |Vth | / mV son are simulated using transistor schematics with estimated ison. For the reference circuits domino-read and standard Inverter parasitics for their internal nodes (0.4...0.7fF). The transisinverter Domino-Read the gate widths of the sensingStandard transistor M3 and the Fig. 6. Monte Carlo simulated pre-charge level for different lM5 in correlation to the M3-threshold-level (lM6 = 70nm, 1000 iterations, 25°C, Adv.VRadio Sci., 9, 247–253, 2011 DD = 900mV). Fig. 7. Schematics of the domino-read (B) and inverter-read (C) sensing circuits for comparison. www.adv-radio-sci.net/9/247/2011/ 0nm shows the simulated results of the proposed circuit A versus proposed sensing circuit dissipates the least amount energy the reference circuits B. . . E for N = 128 cells. The access and shows the fastest access time compared to all reference time tACC of the proposed circuit A is faster than B, C, D circuits for N ∈ [64...128]. and E for N =and 128 The tACC consists of aread-ports of SRAM cells T. Heselhaus T. cells. G. Noll: A access sensingtime circuit for single-ended 251 circu gray To signa 0nm ϕ r M4 in tions, 2 A (prop.) M3 replacements (b) t ACC 2 M4 0nm M5 (a) E tot ϕ r m 1.5 B 1.5 m m M3 C 1 g Domino-Read D PSfrag replacements 1 g PSfrag replacements0.5 V m,mx / mV V DD − |Vth | / mV 0 Standard Inverter E 8 16 32 64 128 0.5 N 8 16 32 64 128 Fig.7.7. Schematics Schematics of (B)(B) andand inverter-read (C) (C) Fig. of the thedomino-read domino-read inverter-read sensingcircuits circuits for Fig.8. 8. Simulated Simulated ttACC and Etot for different number of connected sensing for comparison. comparison. Fig. ACC and Etot for different number of connected cells N (normalized to the proposed 125°C, cells N (normalized to the proposedcircuit circuitA,A,slow slowcorner, corner, 125°C, VDD = 900mV). VDD = 900mV). Table 2. Simulated results for N = 128 cells (slow corner, 125°C, VDD = 900mV). tACC / ps tACC,SA / ps ESense / fJ EBL / fJ Etot / fJ VBL,margin / mV Transistor counta A B C D E 300 82 5.08 5.85 10.93 120 10 408 45 1.12 13.40 14.53 181 3 512 53 2.21 13.42 15.62 379 5 401 94 5.58 6.19 11.77 96 13 325 126 9.95 2.94 12.89 53 14 a without transistors of memory cells. Table 3. Comparison of the average access time and relative standard deviation of the proposed sensing circuit (with lM5 = lM6 = 70nm) for N = 64 using 1000 Monte Carlo simulations. )a mean (tACC std (tACC )a mean (tACC )b std (tACC )b A B C D E 152 ps 9.0% 153 ps 9.5% 167 ps 11.1% 164 ps 14.7% 205 ps 9.5% 204 ps 10.6% 169 ps 10.1% 170 ps 10.6% 157 ps 10.6% 158 ps 11.3% a constant temperature 25°C and supply voltage V DD = 900mV. b normally distributed temperature and supply voltage variations, temperature ∈ [−55°C...125°C], VDD ∈ [810mV...990mV]. pre-charge transistor M4 were set to w = 480nm, all others were designed with minimal gate width of w = 120nm. Table 2 shows the simulated results of the proposed circuit A versus the reference circuits B. . . E for N = 128 cells. The access time tACC of the proposed circuit A is faster than B, C, D and E for N = 128 cells. The access time tACC consists of a bit-line dependent contribution tACC,BL and an access time offset tACC,SA , which depends on the internal complexity of the sensing circuit. Apart from circuit B and C, the proposed circuit shows a smaller intrinsic access time tACC,SA compared to circuit D and E. The sense error immunity to bit-line noise was simulated with current pulses to inject charge to the bit-line during an evaluation period. Read evaluation faults are detected within an evaluation period of 2500ps. The tolerable bit-line noise VBL,margin is shown in Table 2 for each circuit. Table 2 shows that the bit-line energy per cycle of circuit A is similar to circuit D. This results from nearly the same bit-line swing achieved in circuit A and D. The energy dissipation ESense of circuit A is less than that of circuit D and E. Since circuit B and C operate with full swing bit-lines, but dissipate the least energy ESense , there must be a crossover for a certain number of cells. Figure 8a shows that this crossover point can be identified at N ≥ 64 cells. www.adv-radio-sci.net/9/247/2011/ A similar crossover point for the access time is located at N ≈ 40 cells (see Fig. 8b). Simulations indicate smaller access times of circuit E for N > 256 cells. However, the proposed sensing circuit dissipates the least amount energy and shows the fastest access time compared to all reference circuits for N ∈ [64...128]. Table 3 shows the mean access time over 1000 Monte Carlo iterations for each circuit and its standard deviation relative to the mean access time. The proposed sensing circuit provides the fastest mean access time and lowest access time variation compared to the reference circuits. To implement the proposed sensing circuit in a hierarchical memory architecture, the sensing circuit on all but the last hierarchy level needs a small modification to provide the signaling scheme consisting of a digit line and a virtual ground line. The modification affects the output stage of the proposed sensing circuit and requires an additional internal inverter for driving the virtual ground switch. In Fig. 9 such a modified sensing circuit and its application in a hierarchical memory architecture is shown. The source terminal of the output transistor is now connected to a virtual ground line xv which enables the same signaling scheme as used for the Adv. Radio Sci., 9, 247–253, 2011 Fig. outpu tectu lated for the proposed signaling scheme with included virtual gray shaded box). To show the performance improvement of the proposed ground concept. This waveform shows the same properties as the waveform in Fig. 3: the virtual grounds are successignaling scheme, two test circuits representing 252 T. Heselhaus the and data T. G.sigNoll: A sensing circuit for single-ended read-ports of SRAM cells ϕ e m m µA .) SA* g replacements m M10 0 −300 −600 DD 900 PSfrag replacements ϕ e SA* SA* DD SA* mV m SA* 450 PSfrag replacements SA* 0 m SA mV ence 900 450 0 531ps ϕ r 673ps ϕ r Fig. Themodified modified sensing sensing circuit onon thethe 0 400 800 1200 0 400 800 1200 Fig. 9. 9.The circuit with withaavirtual virtualground ground output side and its implementation in a hierarchical memory archit / ps t / ps output side and its implementation in a hierarchical memory architecture. SA* denotes the modified sensing circuit. tecture. SA* denotes the modified sensing circuit. 6 T. Heselhaus and T. G. Noll: A Sensing Circuit for Single-Ended Read-Ports . . . Fig. 11. 11. Comparison Comparison of of of thethe Fig. ofsimulated simulatedwaveforms waveformsforfora read-cycle a read-cycle proposed sensing circuit on the left and a domino-style full-swing proposed sensing circuit on the left and a domino-style full-swing Test circuit 1: sively released soon thecorner, data bit is detected. The resignaling circuit on right (slow 125°C, VDD == 900mV). signaling circuit onasthe the rightas (slow corner, 125°C, VDD 900mV). duced voltage swing of approximately 475mV leads to digit SA* SA* SA r C1 C2 C3 line energy savings of approximately 50%. The right waveC1 C2 C3 digit-line, andthe virtual groundsignals of theof test are equally g form shows full-swing thecircuits domino-style sensloaded with parasitic capacitances on each hierarchy level. ing circuits. The access time of the proposed sensing cirFigure 11 shows the simulated waveform for read-cycles of cuit (t = 531ps) is smaller than the access time of the ACC ag replacements Test circuit 2: both sensing circuits. left read-cycle was simulated forto domino-style sensing The scheme (tACC = 673ps). This leads the proposed signaling schemeofwith included virtual an access time improvement approximately 21%.ground r concept. This waveform shows the same properties as the C1 C3 g waveform in Fig. 3: the virtual grounds are successively reC2 5 Conclusions leased as soon as the data bit is detected. The reduced voltage swing of approximately 475mV leads to digit-line enFig. for sensing sensing scheme scheme comparison. comparison. Test Test circuit circuit11 A new sensing circuit for a single-ended Fig. 10. Test Test circuits circuits for ergy savings of approximately 50%. The read-only-port right waveformof shows sensing scheme scheme with with virtual virtual grounds. grounds. Test Testcircirshows the the proposed proposed sensing SRAM cells is introduced. Instead of pre-charging the bitshows the full-swing signals of the domino-style sensing cuit sensing scheme. scheme. The The gray gray shaded shadedtrantrancuit 22 shows shows aa domino-style domino-style sensing line to V , the proposed circuit sets the pre-charge level circuits. DD The access time of the proposed sensing circuit sistors the digit-lines digit-lines and and are are not not relevant relevantfor for sistors are are used used to to pre-charge pre-charge the close to the threshold level V − |V | of the individual (tACC = 531ps) is smaller than theDD accessth time of the dominothe The dotted dotted box box encloses encloses the the read-only read-only the access access time time comparison. comparison. The sensing device, while ensuring a process variation tolerant style sensing scheme (t = 673ps). This leads to an access ACC port 8T-SRAM cell. cell. port of of the the (modified) (modified) 8T-SRAM bit-line noise margin. With a small modification of the 8Ttime improvement of approximately 21%. SRAM cell to provide an additional bit-line v, the charge dissipation of the bit-line m during evaluation can automatmodified cell.memory Since now the output signal xm nal path of8T-SRAM a hierarchical architecture were designed 5ically Conclusions be stopped by the proposed circuit, once the data bit will compared not fall below the thresholdThe voltage thecircuit virtualemploys ground and by simulations. firstoftest is detected by the sensing circuit. Both effects together reswitch M10, the output signaland xm pairs cannot used to and control the proposed sensing circuits ofbe digit line virAduce newthe sensing circuit forand a single-ended of bit-line swing the bit-line read-only-port power dissipation. the gate of M10 required to generate the tual ground lineanymore. on each Thus, level itofis hierarchy. The second SRAM cells is introduced. Instead of pre-charging the bitFor N = 64...128 cells, the proposed circuit achieves fewest control signal for theavirtual ground switch thesimilar sensing test circuit employs domino-style sensinginside circuit to line to Vcompared circuit sets the pre-charge DD , the proposed energy to the reference circuits along withlevel good circuit an additional (highlighted in Fig. 9 sensby a the leftwith circuit of Fig. 7.inverter However, in domino-style close to the threshold level VDD − |Vth | of the individual performance improvements. The proposed sensing circuit is graycircuits shaded successive box). sensing device, while ensuring a process variation tolerant ing stages alternatively employ NMOS also suited for a hierarchical memory architecture utilizing show sensing the performance of the bit-line noise margin. With a small modification of the 8TandToPMOS devices. improvement The test circuits areproposed depicted the optimized pre-charge level and virtual ground concepts. SRAM cell to provide an additional bit-line v, the charge signaling scheme, two comparison, test circuits representing the data sigin Fig. 10. For fair each corresponding bitdissipation of the bit-line m during evaluation can automatnal path of a hierarchical memory architecture were designed line, digit-line, and virtual ground of the test circuits are ically be stopped by the proposed circuit, once the data bit and compared by simulations. The first test circuit employs equally loaded with parasitic capacitances on each hierarReferences is detected by the sensing circuit. Both effects together rethe proposed sensing circuits and pairs of digit line and virchy level. Fig. 11 shows the simulated waveform for readChang, L., Fried, D. M., Hergenrother, J., Sleight, W., Dennard, duce the bit-line swing and the bit-line powerJ.dissipation. tual ground line on each level of hierarchy. The second cycles of both sensing circuits. The left read-cycle was simuR. H., Montoye, R. K., Sekaric, L., McNab, S. J., Topol, A. W., For N = 64...128 cells, the proposed circuit achieves fewest test circuit employs a domino-style sensing circuit similar to lated for the proposed signaling scheme with included virtual Adams, C. D., Guarini, K. W., and Haensch, W.: Stable SRAM energy compared to the reference circuits along with good 7. However, in domino-style sensthe left circuit of Fig. ground concept. This waveform shows the same properties Cell Designimprovements. for the 32nm Node Beyond,sensing in: Symposium performance Theand proposed circuit ison ingthe circuits successive employ as waveform in Fig.stages 3: thealternatively virtual grounds are NMOS succesTechnology, pp. 128–129, 2005. architecture utilizing alsoVLSI suited for a hierarchical memory and PMOS sensing devices. The test circuits are depicted Chang, L., Nakamura, Y., Montoye, R. K., Sawada, J., Martin, the optimized pre-charge level and virtual ground concepts. in Fig. 10. For fair comparison, each corresponding bit-line, cted 5°C, µA 0 −300 Adv. Radio −600 900 Sci.,DD9, 247–253, 2011 DD A. K., Kinoshita, K., Gebara, F. H., Agarwal, K. B., Acharyya, D. J., Haensch, W., Hosokawa, K., and Jamsek, D.: A 5.3GHz 8T-SRAM with Operation Down to 0.41V in 65nm CMOS, in: www.adv-radio-sci.net/9/247/2011/ Symposium on VLSI Circuits, pp. 252–253, 2007. Cosemans, S., Dehaene, W., and Catthoor, F.: A Low-Power Embedded SRAM for Wireless Applications, IEEE Journal of Solid- R A C V Chan A D 8T Sy Cose be St Qazi 51 A Vo So Take N Se C C Verm U Jo T. Heselhaus and T. G. Noll: A sensing circuit for single-ended read-ports of SRAM cells References Chang, L., Fried, D. M., Hergenrother, J., Sleight, J. W., Dennard, R. H., Montoye, R. K., Sekaric, L., McNab, S. J., Topol, A. W., Adams, C. D., Guarini, K. W., and Haensch, W.: Stable SRAM Cell Design for the 32 nm Node and Beyond, in: Symposium on VLSI Technology, 128–129, 2005. Chang, L., Nakamura, Y., Montoye, R. K., Sawada, J., Martin, A. K., Kinoshita, K., Gebara, F. H., Agarwal, K. B., Acharyya, D. J., Haensch, W., Hosokawa, K., and Jamsek, D.: A 5.3 GHz 8T-SRAM with Operation Down to 0.41 V in 65 nm CMOS, in: Symposium on VLSI Circuits, 252–253, 2007. Cosemans, S., Dehaene, W., and Catthoor, F.: A Low-Power Embedded SRAM for Wireless Applications, IEEE Journal of SolidState Circuits, 42, 1607–1617, 2007. www.adv-radio-sci.net/9/247/2011/ 253 Qazi, M., Stawiasz, K., Chang, L., and Chandrakasan, A.: A 512 kb 8T SRAM Macro Operating Down to 0.57V with An AC-Coupled Sense Amplifier and Embedded Data-RetentionVoltage Sensor in 45 nm SOI CMOS, in: IEEE International Solid-State Circuits Conference, 350–351, 2010. Takeda, K., Hagihara, Y., Aimoto, Y., Nomura, M., Uchida, R., Nakazawa, Y., Hirota, Y., Yoshida, S., and Saito, T.: Per-Bit Sense Amplifier Scheme for 1GHz SRAM Macro in Sub-100 nm CMOS Technology, in: IEEE International Solid-State Circuits Conference, 1, 502–542, 2004. Verma, N. and Chandrakasan, A. P.: A High-Density 45 nm SRAM Using Small-Signal Non-Strobed Regenerative Sensing, IEEE Journal of Solid-State Circuits, 44, 163–173, 2009. Adv. Radio Sci., 9, 247–253, 2011