Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNIT- III SEQUENTIAL LOGIC CIRCUITS Static Latches and Registers The Bistability Principle: • Static memories use positive feedback to create a bistable circuit — a circuit having two stable states that represent 0 and 1. 2 cascaded inverters Static Latches and Registers voltage transfer characteristics Static Latches and Registers • The resulting circuit has only three possible operation points (A, B, and C), as demonstrated on the combined VTC. • Under the condition that the gain of the inverter in the transient region is larger than 1, only A and B are stable operation points, and C is a metastable operation point. • A bistable circuit has two stable states. Static Latches and Registers • In absence of any triggering, the circuit remains in a single state (assuming that the power supply remains applied to the circuit), and hence remembers a value. • A trigger pulse must be applied to change the state of the circuit. • Another common name for a bistable circuit is flip-flop. SR Flip-Flops • The cross-coupled inverter pair provides an approach to store a binary variable in a stable way. • However, extra circuitry must be added to enable control of the memory states. NOR-based SR flip-flop SR Flip-Flops SR Flip-Flops • When both S and R are 0, the flip-flop is in a quiescent state and both outputs retain their value. • If a positive (or 1) pulse is applied to the S input, the Q output is forced into the 1 state. • Vice versa, a 1 pulse on R resets the flip-flop and the Q output goes to 0. • The characteristic table is the truth table of the gate and lists the output states as functions of all possible input conditions. SR Flip-Flops • Most systems operate in a synchronous fashion with transition events referenced to a clock. • One possible realization of a clocked SR flip-flop— a level-sensitive positive latch. • It consists of a cross-coupled inverter pair i , plus 4 extra transistors to drive the flip-flop from one state to another and to provide clocked operation. SR Flip-Flops SR Flip-Flops • The combination of transistorsM4 , M7 , and M8 forms a ratioed inverter. • In order to make the latch switch, we must succeed in bringingQ below the switching threshold of the inverterM1 -M2. • Once this is achieved, the positive feedback causes the flip-flop to invert states. • The presented flip-flop does not consume any static power. Multiplexer Based Latches • Multiplexer based latches can provide similar functionality to the SR latch, but has the important added advantage that the sizing of devices only affects performance and is not critical to the functionality. Multiplexer Based Latches • For a negative latch, when the clock signal is low, the input 0 of the multiplexer is selected, and the D input is passed to the output. • When the clock signal is high, the input 1 of the multiplexer, which connects to the output of the latch, is selected. • The feedback holds the output stable while the clock signal is high. • Similarly in the positive latch, the D input is selected when clock is high, and the output is held (using feedback) when clock is low. Multiplexer Based Latches Transistor level implementation of a positive latch built using transmission gates. • When CLK is high, the bottom transmission gate ison and the latch is transparent - that is, the D input is copied to the Q output. • During this phase, the feedback loop is open since the top transmission gate is off. Master-Slave Based Edge Triggered Register • The most common approach for constructing an edgetriggered register is to use a master-slave configuration. • The register consists of cascading a negative latch (master stage) with a positive latch (slave stage). Master-Slave Based Edge Triggered Register • On the low phase of the clock, the master stage is transparent and the D input is passed to the master stage output, QM. • During this period, the slave stage is in the hold mode, keeping its previous value using feedback. • On the rising edge of the clock, the master slave stops sampling the input, and the slave stage starts sampling. • During the high phase of the clock, the slave stage samples the output of the master stage (QM), while the master stage remains in a hold mode. Master-Slave Based Edge Triggered Register • When clock is low (CLK = 1), T1 is on and T2 is off, and the D input is sampled onto node QM. • When the clock goes high, the master stage stops sampling the input and goes into a hold mode. Low-Voltage Static Latches • The scaling of supply voltages is critical for low power operation. • Unfortunately, certain latch structures don’t function at reduced supply voltages. • Scaling to low supply voltages hence requires the use of reduced threshold devices. • When the registers are constantly accessed, the leakage energy is typically insignificant compared to the switching power. • However, with the use of conditional clocks, it is possible that registers are idle for extended periods and the leakage energy expended by registers can be quite significant. Low-Voltage Static Latches • Many solutions are being explored to address the problem of high leakage during idle periods. Dynamic Latches and Registers • Storage in a static sequential circuit relies on the concept that a cross-coupled inverter pair produces a bistable element and can thus be used to memorize binary values. • The major disadvantage of the static gate, however, is its complexity. • The principle is exactly identical to the one used in dynamic logic — charge stored on a capacitor can be used to represent a logic signal. • The absence of charge denotes a 0, while its presence stands for a stored 1. Dynamic Transmission-Gate Based Edge-triggred Registers •When CLK = 0, the input data is sampled on storage node 1, which has an equivalent capacitance of C1 consisting of the gate capacitance of I1 , the junction capacitance of T1 , and the overlap gate capacitance of T1 . Dynamic Transmission-Gate Based Edge-triggred Registers • During this period, the slave stage is in a hold mode, with node 2 in a high-impedance (floating) state. • On the rising edge of clock, the transmission gate T2 turns on, and the value sampled on node 1 right before the rising edge propagates to the output Q (note that node 1 is stable during the high phase of the clock since the first transmission gate is turned off). • Node 2 now stores the inverted version of node 1. • This implementation of an edge-triggered register is very efficient as it requires only 8 transistors. 2 C MOS Dynamic Register: A Clock Skew Insensitive Approach 2 The C MOS Register C MOS Dynamic Register: A Clock Skew Insensitive Approach ------ • CLK = 0 (CLK = 1): • The first tri-state driver is turned on, and the master stage acts as an inverter sampling the inverted version of D on the internal node X. • The master stage is in the evaluation mode. • Meanwhile, the slave section is in a high-impedance mode, or in a hold mode. • The roles are reversed when CLK = 1. True Single-Phase Clocked Register (TSPCR) • In the two-phase clocking schemes described above, care must be taken in routing the two clock signals to ensure that overlap is minimized. • The True Single-Phase Clocked Register (TSPCR) uses a single clock (without an inverse clock) . True Single-Phase Clocked Register (TSPCR) • For the positive latch, when CLK is high, the latch is in the transparent mode and corresponds to two cascaded inverters; the latch is non-inverting, and propagates the input to the output. • When CLK = 0, both inverters are disabled, and the latch is in hold-mode. • Only the pull-up networks are still active, while the pulldown circuits are deactivated. • A register can be constructed by cascading positive and negative latches. True Single-Phase Clocked Register (TSPCR) • The main advantage is the use of a single clock phase. • The disadvantage is the slight increase in the number of transistors — 12 transistors are required. • TSPC offers an additional advantage: the possibility of embedding logic functionality into the latches. • This reduces the delay overhead associated withthe latches. True Single-Phase Clocked Register (TSPCR) • When CLK = 0, the input inverter is sampling the inverted D input on node X. • The second (dynamic) inverter is in the precharge mode. • The third inverter is in the hold mode. Pulse Registers • A fundamentally different approach for constructing a register uses pulse signals. • The idea is to construct a short pulse around the rising (or falling) edge of the clock. • This pulse acts as the clock input to a latch, sampling • the input only in a short window. • Race conditions are thus avoided by keeping the opening time (i.e, the transparent period) of the latch very short. • The combination of the glitch generation • circuitry and the latch results in a positive edge-triggered register. Pulse Registers Pulse Registers • This in turn activates MN, pulling X and eventually CLKG low. • The length of the pulse is controlled by the delay of the AND gate and the two inverters. Pulse Registers • The advantage of the approach is the reduced clock load and the small number of transistors required. • The glitch-generation circuitry can be amortized over multiple register bits. • The disadvantage is a substantial increase in verification complexity. • This has prevented a wide-spread use. Sense-Amplifier Based Registers • A sense amplifier structure to implement an edgetriggered register. • Sense amplifier circuits accept small input signals and amplify them to generate rail-to-rail swings. • There are many techniques to construct these amplifiers, with the use of feedback (e.g., cross-coupled inverters). Sense-Amplifier Based Registers Positive edge-triggered register based on sense-amplifier Sense-Amplifier Based Registers • The circuit uses a precharged front-end amplifier that samples the differential input signal on the rising edge of the clock signal. • The outputs of front-end are fed into a NAND crosscoupled SR FF that holds the data and gurantees that the differential outputs switch only once per clock cycle. • The differential inputs in this implementation don’t have to have rail-to-rail swing and hence this register can be used as a receiver for a reduced swing differential bus. Pipelining: An approach to optimize sequential circuits • Pipelining is a popular design technique often used to accelerate the operation of the datapaths in digital processors. • The goal of the presented circuit is to computelog(|a b|), where both a and b represent streams of numbers, that is, the computation must be performed on a large set of input values. • The minimal clock period Tmin necessary to ensure correct evaluation is given as: Pipelining: An approach to optimize sequential circuits • Where tc-q and tsu are the propagation delay and the setup time of the register, respectively. • The term tpd,logic stands for the worst-case delay path through the combinatorial network, which consists of the adder, absolute value, and logarithm functions. • Pipelining is a technique to improve the resource utilization, and increase the functional throughput. Pipelining: An approach to optimize sequential circuits Pipelining: An approach to optimize sequential circuits • The advantage of pipelined operation becomes apparent when examining the minimum clock period of the modified circuit. • The combinational circuit block has been partitioned into three sections, each of which has a smallerp ropagation delay than the original function. • This effectively reduces the value of the minimum allowable clock period: Pipelining: An approach to optimize sequential circuits • Suppose that all logic blocks have approximately the same propagation delay, and that the register overhead is small with respect to the logic delays. • The pipelined network outperforms the original circuit by a factor of three under these assumptions, oTr min,pipe= Tmin/3. • The increased performance comes at the relatively small cost of two additional registers, and an increased latency. Latch- vs. Register-Based Pipelines • Pipelined circuits can be constructed using levelsensitive latches instead of edge-triggered registers. • The pipeline system is implemented based on passtransistor-based positive and negative latches instead of edge triggered registers. • That is, logic is introduced between the master and slave latches of a master-slave system. • Latch-based systems give significantly more flexibility in implementing a pipelined system, and often offers higher performance. Latch- vs. Register-Based Pipelines Operation of two-phase pipelined circuit using dynamic registers NORA-CMOS— A Logic Style for Pipelined Structures • This topology has one important property: • A - based pipelined circuit is race-free as long as all the logic functionsF (implemented using static logic) between the latches are noninverting. • The only way a signal can race from stage to stage under this condition is when the logic functionF is inverting where F is replaced by a single, static CMOS inverter. NORA-CMOS— A Logic Style for Pipelined Structures NORA-CMOS— A Logic Style for Pipelined Structures • Logic and latch are clocked in such a way that both are simultaneously in either evaluation, or hold (precharge) mode. ----• A block that is in evaluation during CLK = 1 is called a CLK-module, while the inverse is called a CLK-module. ----• A NORA datapath consists of a chain of alternating CLK and CLK modules. • While one class of modules is precharging with its output latch in hold mode, preserving the previous output value, the other class is evaluating. Memory architecture Semiconductor Memory Classification Read-Write Memory Random Access Non-Random Access SRAM FIFO DRAM LIFO Shift Register CAM Non-Volatile Read-Write Memory Read-Only Memory EPROM Mask-Programmed E2PROM Programmable (PROM) FLASH Memory Timing: Definitions Memory Architecture: Decoders M bits S0 S1 S2 words N SN 2 2 SN 2 1 M bits S0 Word 0 Word 1 Word 2 Storage cell Word 0 A0 Word 1 A1 Word 2 AK2 1 Word N 2 2 Storage cell Decoder Word N 2 2 Word N 2 1 Word N 2 1 K 5 log2N Input-Output (M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals Input-Output (M bits) Decoder reduces the number of select signals K = log2N Contents-Addressable Memory Commands I/O Buffers Data (64 bits) I/O Buffers I/O Buffers Priority Encoder CAM Array 2 words 3 64 bits 9 29 Validity Bits Control Logic R/W Address (9 bits) Mask Address Decoder Commands Commands Comparand 92Validity Bits Priority BitsEncode Address Decoder9 Validity 2 Priority Encod Address Decoder Memory Timing: Approaches DRAM Timing Multiplexed Adressing SRAM Timing Self-timed Read-Only Memory Cells BL BL BL VDD WL WL WL 1 BL WL BL BL WL WL 0 GND Diode ROM MOS ROM 1 MOS ROM 2 MOS OR ROM BL [0] BL [1] BL [2] BL [3] WL [0] V DD WL [1] WL [2] V DD WL [3] V bias Pull-down loads MOS NOR ROM V DD Pull-up devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] MOS NAND ROM V DD Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row Equivalent Transient Model for MOS NOR ROM V DD Model for NOR ROM BL r word WL cword • Word line parasitics – Wire capacitance and gate capacitance – Wire resistance (polysilicon) • Bit line parasitics – Resistance not dominant (metal) – Drain and Gate-Drain capacitance Cbit Equivalent Transient Model for MOS NAND ROM V DD Model for NAND ROM BL CL r bit WL r word cbit cword Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates Drain/Source and complete gate capacitance Non-Volatile Memories The Floating-gate transistor (FAMOS) Floating gate Gate Source D Drain G tox tox n+ p n+_ S Substrate Device cross-section Schematic symbol Floating-Gate Transistor Programming 20 V 10 V 5V S Avalanche injection 0V 20 V D 25V S 5V 0V D Removing programming voltage leaves charge trapped 2 2.5 V S 5V D Programming results in higher V T . Flash EEPROM Control gate Floating gate erasure n1 source Thin tunneling oxide programming p-substrate Many other options … n1 drain Basic Operations in a NOR Flash Memory―Erase Basic Operations in a NOR Flash Memory―Write Basic Operations in a NOR Flash Memory―Read NAND Flash Memory Word line(poly) Unit Cell Source line (Diff. Layer) Courtesy Toshiba Read-Write Memories (RAM) STATIC (SRAM) • • • • Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) • • • • Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended 6-transistor CMOS SRAM Cell WL V DD M2 M5 Q M1 BL M4 Q M6 M3 BL CMOS SRAM Analysis (Read) WL V DD M4 BL Q= 0 M5 V DD Cbit M1 Q= 1 V DD BL M6 V DD Cbit CMOS SRAM Analysis (Write) WL V DD M4 M5 Q= 1 M1 BL = 1 M6 Q= 0 V DD BL = 0 3-Transistor DRAM Cell BL 1 BL 2 WWL WWL RWL M3 X M1 CS M2 RWL V DD 2 V T X BL 1 BL 2 V DD DV V DD 2 V T No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL-VTn 1-Transistor DRAM Cell Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance CS DV = VBL – V PRE = V BIT – V PRE -----------C S + CBL Voltage swing is small; typically around 250 mV. DRAM Cell Observations • 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. • DRAM memory cells are single ended in contrast to SRAM cells. •The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. • Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. • When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD Static CAM Memory Cell Bit Bit Bit Bit Bit Word CAM Word ••• ••• CAM M4 M8 M9 M6 M7 CAM ••• ••• Bit Word CAM M3 Match Wired-NOR Match Line S int S M2 M1 M5 CAM in Cache Memory CAM SRAM ARRAY ARRAY Hit Logic Address Decoder Input Drivers Address Tag Sense Amps / Input Drivers Hit R/W Data Row Decoders •Collection of 2M complex logic gates •Organized in regular and dense fashion (N)AND Decoder NOR Decoder Hierarchical Decoders Multi-stage implementation improves performance ••• WL 1 WL 0 A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3 ••• NAND decoder using 2-input pre-decoders A1 A0 A0 A1 A3 A2 A2 A3 Dynamic Decoders Precharge devices GND VDD GND WL 3 VDD WL 3 WL 2 WL 2 VDD WL 1 WL 1 V DD WL 0 WL 0 VDD f A0 A0 A1 2-input NOR decoder A1 A0 A0 A1 A1 2-input NAND decoder f 4-input pass-transistor based column decoder BL BL BL BL 0 A0 1 2 3 S0 S1 S2 A1 S3 2-input NOR decoder D • Advantages: speed (tpd does not add to overall memory access time) • Only one extra transistor in signal path •Disadvantage: Large transistor count 4-to-1 tree based column decoder BL BL BL BL 0 1 2 3 A0 A0 A1 A1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions ::buffers progressive sizing combination of tree and pass transistor approaches Decoder for circular shiftregister V DD V DD V DD WL 0 R V DD V DD V DD WL 1 f f f f R V DD WL 2 f f f f R f f f f • • • Sense Amplifiers ×D V C tp = ---------------Iav large make D V as small as possible small Idea: Use Sense Amplifer small transition s.a. input output Differential Sense Amplifier V DD M3 M4 y bit M1 SE M2 Out bit M5 Directly applicable to SRAMs Differential Sensing ― SRAM V DD PC V DD BL BL EQ V DD y M3 WL i M1 x SE V DD M4 M2 2y 2x 2x x SE M5 SE SRAM cell i Diff. x Sense 2x Amp V DD Output y SE Output (a) SRAM sensing scheme (b) two stage differential amplifier Latch-Based Sense Amplifier (DRAM) EQ BL BL VDD SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable Sources of Power Dissipation in Memories V DD I DD 5 SCi DV i f 1S I DCP CHIP nCDE V INT f m selected mi act CPTV INT f I DCP n ROW DEC PERIPHERY non-selected m(n 2 1)i hld ARRAY mC DE V INT f COLUMN DEC V SS From [Itoh00] Suppressing Leakage in SRAM V DD V DD low-threshold transistor V DDL sleep V DD,int sleep V DD,int SRAM cell SRAM cell sleep Inserting Extra Resistance SRAM cell SRAM cell SRAM cell V SS,int Reducing the supply voltage SRAM cell Clocking • Synchronous systems use a clock to keep operations in sequence – Distinguish this from previous or next – Determine speed at which machine operates • Clock must be distributed to all the sequencing elements – Flip-flops and latches • Also distribute clock to other elements – Domino circuits and memories Clock Distribution • On a small chip, the clock distribution network is just a wire – And possibly an inverter for clkb • On practical chips, the RC delay of the wire resistance and gate load is very long – Variations in this delay cause clock to get to different elements at different times – This is called clock skew • Most chips use repeaters to buffer the clock and equalize the delay – Reduces but doesn’t eliminate skew Review: Skew Impact Q1 D2 Tc clk tpcq Q1 D2 F1 t pd Tc t pcq tsetup tskew Q1 CL D2 F2 clk sequencing overhead tskew clk thold Q1 tccq D2 tskew tpdq clk tcd thold tccq tskew Combinational Logic F2 clk F1 • Ideally full cycle is available for work • Skew adds sequencing overhead • Increases hold time too clk tcd tsetup • Reduce clock skew – Careful clock distribution network design – Plenty of metal wiring resources • Analyze clock skew – Only budget actual, not worst case skews – Local vs. global skew budgets • Tolerate clock skew – Choose circuit structures insensitive to skew Skew Tolerance • Flip-flops are sensitive to skew because of hard edges – Data launches at latest rising edge of clock – Must setup before earliest next rising edge of clock – Overhead would shrink if we can soften edge • Latches tolerate moderate amounts of skew – Data can arrive anytime latch is transparent Skew: Latches pdq sequencing overhead tcd 1 , tcd 2 thold tccq tnonoverlap tskew f1 T c tsetup tnonoverlap tskew 2 f2 tborrow Pulsed Latches t pd Tc max t pdq , t pcq tsetup t pw tskew sequencing overhead tcd thold t pw tccq tskew tborrow t pw tsetup tskew Q1 Combinational Logic 1 D2 f1 Q2 Combinational Logic 2 D3 L3 2t f2 L2 t pd Tc D1 L1 f1 2-Phase Latches Q3 Dynamic Circuit Review • Static circuits are slow because fat pMOS load input • Dynamic gates use precharge to remove pMOS transistors from the inputs – Precharge: f = 0 output forced high – Evaluate: f = 1 output may pull low A B f C D A B Y C D A Y B C D Domino Circuits • Dynamic inputs must monotonically rise during evaluation – Place inverting stage between each dynamic gate – Dynamic / static pair called domino gate • Domino gates can be safely cascaded domino AND W X A B f dynamic static NAND inverter Clock Skew • Skew increases sequencing overhead – Traditional domino has hard edges – Evaluate at latest rising edge – Setup at latch by earliest falling edge clk Latch Dynamic clk clk Static Dynamic Dynamic clk Static clk Latch clk clk Dynamic Static Dynamic clk Static clk Dynamic t pd Tc 2tsetup 2tskew clk tsetup tskew Time Borrowing • Logic may not exactly fit half-cycle – No flexibility to borrow time to balance logic between half cycles • Traditional domino sequencing overhead is about 25% of cycle time in fast systems! clk Latch clk Static clk Dynamic clk Static clk Dynamic Static Dynamic clk Static Dynamic clk Latch clk tsetup tskew Skew-Tolerant Domino • Use overlapping clocks to eliminate latches at phase boundaries. – Second phase evaluates using results of first No latch at phase boundary b c f1 f1 f2 f2 a a b b c c Static a Dynamic f2 Static Dynamic f1 d Full Keeper • After second phase evaluates, first phase precharges • Input to second phase falls – Violates monotonicity? • But we no longer need the value • Now the second gate has a floating output – Need full keeper to hold it either high or low f H X f weak full keeper transistors Time Borrowing • Overlap can be used to – Tolerate clock skew – Permit time borrowing • No sequencing overhead toverlap tborrow tskew f1 Phase 1 Phase 2 Static Dynamic f2 Static Dynamic f2 Static Dynamic f2 Static Dynamic f1 Static Dynamic f1 Static Dynamic f1 Static Dynamic f1 Static f1 Dynamic t pd Tc f2 Multiple Phases • With more clock phases, each phase overlaps more – Permits more skew tolerance and time borrowing f1 f2 f3 f4 Phase 1 Phase 2 Phase 3 Phase 4 Static Dynamic f4 Static Dynamic f4 Static Dynamic f3 Static Dynamic f3 Static Dynamic f2 Static Dynamic f2 Static Dynamic f1 Static Dynamic f1 Clock Generation en clk f1 f2 f3 f4 Timing issues • Set up and hold time: • Every flip-flop has restrictive time regions around the active clock edge in which input should not change • We call them restrictive because any change in the input in this regions the output may be the expected one • It may be derived from either the old input, the new input, or even in between the two. Timing issues • The setup time is the interval before the clock where the data must be held stable. • The hold time is the interval after the clock where the data must be held stable. • Hold time can be negative, which means the data can change slightly before the clock edge and still be properly captured. • Most of the current day flip-flops has zero or negative hold time. Timing issues Timing issues • To avoid setup time violations: •The combinational logic between the flip-flops should be optimized to get minimum delay. • Redesign the flip-flops to get lesser setup time. • Tweak launch flip-flop to have better slew at the clock pin, this will make launch flip-flop to be fast there by helping fixing setup violations. • Play with clock skew (useful skews). •To avoid hold time violations: • By adding delays (using buffers). •One can add lockup-latches (in cases where the hold time requirement is very huge, basically to avoid data slip).