Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L17 S.1 This Lecture Reading Dynamic sequential circuits - Reading assignment – Rabaey, et al, 7.3, 7.7 Timing issues, Intro to datapath design - Reading assignment – Rabaey, et al, 10.1-10.3.3; 11.1-11.2 Next lecture Intro to datapath design - Reading assignment – Rabaey, et al, 11.1-11.2 Adder design - Reading assignment – Rabaey, et al, 11.3 Sp12 CMPEN 411 L17 S.2 Last Lecture: Static MS ET Implementation Slave Master I2 T2 I3 I5 T4 I4 T3 QM D I1 T1 clk clk !clk Sp12 CMPEN 411 L17 S.3 I6 Q Dynamic ET Flipflop master slave !clk D clk T I1 1 QM T master transparent slave hold Q 2 C clk I2 1 C !clk 2 tsu = tpd_tx thold = zero tc-q = 2 tpd_inv + tpd_tx clk !clk Sp12 CMPEN 411 L17 S.4 master hold slave transparent Pseudostatic Dynamic Latch Robustness considerations limit the use of dynamic FF’s coupling between signal nets and internal storage nodes can inject significant noise and destroy the FF state leakage currents cause state to leak away with time internal dynamic nodes don’t track fluctuations in VDD that reduces noise margins A simple fix is to make the circuit pseudostatic clk QM Q !clk Add above logic added to all dynamic latches Sp12 CMPEN 411 L17 S.5 Dynamic ET FF Race Conditions !clk D clk T I1 1 QM T clk !clk Sp12 CMPEN 411 L17 S.6 Q 2 C clk I2 1 C !clk 2 0-0 overlap race condition toverlap0-0 < tT1 + tI1 + tT2 1-1 overlap race condition toverlap1-1 < thold Fix 1: Dynamic Two-Phase ET FF Keep clock non-overlap large enough, but with 4 clock singals to route clk1 clk2 T D I1 1 QM T Q 2 C !clk1 I2 1 C !clk2 2 master transparent slave hold clk1 tnon_overlap clk2 master hold slave transparent Sp12 CMPEN 411 L17 S.7 Fix 2: C2MOS (Clocked CMOS) ET Flipflop A clock-skew insensitive FF Master Slave M2 clk M4 D !clk M3 M1 clk !clk Sp12 CMPEN 411 L17 S.8 M6 QM C1 !clk M8 Q clk M7 M5 C2 C2MOS (Clocked CMOS) ET Flipflop A clock-skew insensitive FF Master Slave M2 clk Mon 4 off D !clk Mon 3 off M1 master transparent slave hold QM C1 !clk clk Moff 8 on Q Moff 7 on C2 M5 clk !clk Sp12 CMPEN 411 L17 S.9 M6 master hold slave transparent C2MOS FF 0-0 Overlap Case Clock-skew insensitive as long as the rise and fall times of the clock edges are sufficiently small M2 0 M4 D M6 0 QM M8 Q C1 C2 M1 M5 clk clk !clk !clk Sp12 CMPEN 411 L17 S.10 C2MOS FF 1-1 Overlap Case M2 M6 QM D 1 M3 Q C1 1 M1 M5 clk clk !clk !clk Sp12 CMPEN 411 L17 S.11 M7 C2 Fix 3: True Single Phase Clocked (TSPC) Latches Negative Latch In clk Positive Latch clk Q hold when clk = 1 transparent when clk = 0 Sp12 CMPEN 411 L17 S.12 In Q clk clk transparent when clk = 1 hold when clk = 0 Embedding Logic in TSPC Latch A PUN B Q In clk clk PDN Q clk A B Sp12 CMPEN 411 L17 S.13 clk TSPC ET FF Master D clk on off clk master transparent slave hold clk Sp12 CMPEN 411 L17 S.14 Slave on off QM on clk off on clk off Q master hold slave transparent Choosing a Clocking Strategy Choosing the right clocking scheme affects the functionality, speed, and power of a circuit Two-phase designs + robust and conceptually simple - need to generate and route two clock signals - have to design to accommodate possible skew between the two clock signals Single phase designs + + + - only need to generate and route one clock signal supported by most automated design methodologies don’t have to worry about skew between the two clocks have to have guaranteed slopes on the clock edges Sp12 CMPEN 411 L17 S.15 Review: Sequential Definitions Use two, level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent) Static storage static uses a bistable element with feedback to store its state and thus preserves state as long as the power is on - Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based) Dynamic storage dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit (by adding a feedback loop on the output) so that it is pseudostatic Sp12 CMPEN 411 L17 S.16 Timing Classifications Synchronous systems All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i.e., a global clock signal) Functionality is ensure by strict constraints on the clock signal generation and distribution to minimize - Clock skew (spatial variations in clock edges) - Clock jitter (temporal variations in clock edges) Asynchronous systems Self-timed (controlled) systems No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc.) Hybrid systems Synchronization between different clock domains Interfacing between asynchronous and synchronous domains Sp12 CMPEN 411 L17 S.17 Review: Synchronous Timing Basics R1 In clk D Q tclk1 tc-q, tsu, thold, tcdreg R2 Combinational logic D Q tclk2 tplogic, tcdlogic Under ideal conditions (i.e., when tclk1 = tclk2) T tc-q + tplogic + tsu thold ≤ tcdlogic + tcdreg Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions) jitter causes T to change on a cycle-by-cycle basis Sp12 CMPEN 411 L17 S.18 Sources of Clock Skew and Jitter in Clock Network 4 power supply 3 interconnect 6 capacitive load clock 1 generation PLL 7 capacitive coupling 2 clock drivers 5 temperature Skew manufacturing device variations in clock drivers interconnect variations environmental variations (power supply and temperature) Sp12 CMPEN 411 L17 S.19 Jitter clock generation capacitive loading and coupling environmental variations (power supply and temperature) Positive Clock Skew Clock and data flow in the same direction R1 In R2 Combinational logic D Q D Q tclk1 clk tclk2 T 1 >0 2 delay T+ 3 4 + thold T: T + tc-q + tplogic + tsu so T tc-q + tplogic + tsu - thold : thold + ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - > 0: Improves performance, but makes thold harder to meet. If thold is not met (race conditions), the circuit malfunctions independent of the clock period! Sp12 CMPEN 411 L17 S.20 Negative Clock Skew Clock and data flow in opposite directions R1 In R2 D Q Combinational logic tclk1 D Q tclk2 delay T T+ 1 2 <0 3 4 T: T + tc-q + tplogic + tsu so T tc-q + tplogic + tsu - thold : thold + ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - < 0: Degrades performance, but thold is easier to meet (eliminating race conditions) Sp12 CMPEN 411 L17 S.21 clk Clock Jitter Jitter causes T to vary on a cycle-bycycle basis R1 Combinational logic In tclk clk T -tjitter T: +tjitter T - 2tjitter tc-q + tplogic + tsu so T tc-q + tplogic + tsu + 2tjitter Jitter directly reduces the performance of a sequential circuit Sp12 CMPEN 411 L17 S.22 Combined Impact of Skew and Jitter Constraints on the minimum clock period ( > 0) R1 In R2 Combinational logic D Q D Q tclk1 tclk2 T 1 T+ >0 6 12 -tjitter T tc-q + tplogic + tsu - + 2tjitter thold ≤ tcdlogic + tcdreg – – 2tjitter > 0 with jitter: Degrades performance, and makes thold even harder to meet. (The acceptable skew is reduced by jitter.) Sp12 CMPEN 411 L17 S.23 Clock Distribution Networks Clock skew and jitter can ultimately limit the performance of a digital system, so designing a clock network that minimizes both is important In many high-speed processors, a majority of the dynamic power is dissipated in the clock network. To reduce dynamic power, the clock network must support clock gating (shutting down (disabling the clock) units) Clock distribution techniques Balanced paths (H-tree network, matched RC trees) - In the ideal case, can eliminate skew - Could take multiple cycles for the clock signal to propagate to the leaves of the tree Clock grids - Typically used in the final stage of the clock distribution network - Minimizes absolute delay (not relative delay) Sp12 CMPEN 411 L17 S.24 H-Tree Clock Network If the paths are perfectly balanced, clock skew is zero Clock Can insert clock gating at multiple levels in clock tree Can shut off entire subtree if all gating conditions are satisfied Idle condition Clock Sp12 CMPEN 411 L17 S.25 Gated clock Clock Grid Network Distributed buffering reduces absolute delay and makes clock gating easier, but is sensitive to variations in the buffer delay The secondary buffers isolate the local clock nets from the upstream local logic load and amplify the area clock signals degraded by the RC network Clock main clock buffer secondary clock buffers Sp12 CMPEN 411 L17 S.26 decreases absolute skew gives steeper clocks Only have to bound the skew within the local logic area DEC Alpha 21164 (EV5) Example 300 MHz clock (9.3 million transistors on a 16.5x18.1 mm die in 0.5 micron CMOS technology) single phase clock 3.75 nF total clock load Extensive use of dynamic logic 20 W (out of 50) in clock distribution network Two level clock distribution Single 6 inverter stage main clock buffer at the center of the chip Secondary clock buffers drive the left and right sides of the clock grid in m3 and m4 Total equivalent driver size of 58 cm !! Sp12 CMPEN 411 L17 S.27 Secondary Clock Buffers Sp12 CMPEN 411 L17 S.28 Clock Skew in Alpha Processor Absolute skew smaller than 90 ps The critical instruction and execution units all see the clock within 65 ps Sp12 CMPEN 411 L17 S.29 ASIC example Sp12 CMPEN 411 L17 S.30 Microprocessor example Sp12 CMPEN 411 L17 S.31 Dealing with Clock Skew and Jitter To minimize skew, balance clock paths using H-tree or matched-tree clock distribution structures. If possible, route data and clock in opposite directions; eliminates races at the cost of performance. The use of gated clocks to help with dynamic power consumption make jitter worse. Shield clock wires (route power lines – VDD or GND – next to clock lines) to minimize/eliminate coupling with neighboring signal nets. Use dummy fills to reduce skew by reducing variations in interconnect capacitances due to interlayer dielectric thickness variations. Beware of temperature and supply rail variations and their effects on skew and jitter. Power supply noise fundamentally limits the performance of clock networks. Sp12 CMPEN 411 L17 S.32 Clock Skew Scheduling 12 16 Minimum clock period with zero skew 16 16 A Sp12 CMPEN 411 L17 S.33 12 C Clock Skew Scheduling i j k 12 16 1 T = 15 pulse at i,k tight pulse at j max Sp12 CMPEN 411 L17 S.34 16 C 12 Clock Scaling Sp12 CMPEN 411 L17 S.35 Next Lecture and Reminders Next lecture Intro to datapath design - Reading assignment – Rabaey, et al, 11.1-11.2 Adder design - Reading assignment – Rabaey, et al, 11.3 Sp12 CMPEN 411 L17 S.36