Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Timing Issues Revised from Digital Integrated Circuits, © Jan M. Rabaey el EE141 Integrated © Digital Circuits2nd 1 Timing Issues Case Study: DEC Alpha 21164 Early version of DEC Alpha 21064 used a single global clock driver operating at 200MHz. Clock skew <200ps Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load: 3.75 nF Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: • Single 6-stage driver at center of chip • Secondary buffers drive left and right side clock grid in Metal3 and Metal4 Total driver size: 58 cm! EE141 Integrated © Digital Circuits2nd 2 Timing Issues 21164 Clocking tcycle= 3.3ns trise = 0.35ns tskew = 150ps Clock waveform final drivers pre-driver Location of clock driver on die EE141 Integrated © Digital Circuits2nd 2 phase single wire clock, distributed globally 2 distributed driver channels Reduced RC delay/skew Improved thermal distribution 3.75nF clock load 58 cm final driver width Local inverters for matching Conditional clocks in caches to reduce power More complex race checking Device variation 3 Timing Issues Clock Drivers EE141 Integrated © Digital Circuits2nd 4 Timing Issues Clock Skew in Alpha Processor The resulting RC-network was extracted at every grid points and simulated 0 skew at driver Maximum skew 90ps EE141 Integrated © Digital Circuits2nd 5 Timing Issues EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS tcycle= 1.67ns trise = 0.35ns Global clock waveform tskew = 50ps 2 Phase, with multiple conditional buffered clocks 2.8 nF clock load 40 cm final driver width PLL EE141 Integrated © Digital Circuits2nd Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking 6 Timing Issues 21264 Clocking EE141 Integrated © Digital Circuits2nd 7 Timing Issues EV6 Clock Results ps 300 305 310 315 320 325 330 335 340 345 ps 5 10 15 20 25 30 35 40 45 50 GCLK Skew GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%) EE141 Integrated © Digital Circuits2nd 8 Timing Issues EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains + widely dispersed drivers DLL DLL DLL NCLK (Mem Ctrl) + DLLs compensate static and lowfrequency variation GCLK (CPU Core) SYSCLK EE141 Integrated © Digital Circuits2nd L2R_CLK (L2 Cache) PLL L2L_CLK (L2 Cache) + divides design and verification effort - DLL design and verification is added work + tailored clocks 9 Timing Issues PLL Block Diagram Reference clock Up Phase detector Charge pump Loop filter vcont VCO Down Local clock Divide by N System Clock PLL is an analog circuit, which is sensitive to all sources of noise (it might need to be guarded by careful layout). When stable, the system clock is N times the reference clock frequency and ideally 0 skew from the reference clock EE141 Integrated © Digital Circuits2nd 10 Timing Issues Clock Generation using DLLs Delay-Locked Loop (Delay Line Based) U fREF Phase Det D Charge Pump DL Filter fO Make the clocks aligned by adjusting the delay (phase) Phase-Locked Loop (VCO-Based) fREF U ÷N PD D CP VCO Filter fO EE141 Integrated © Digital Circuits2nd 11 Timing Issues DLL-Based Clock Distribution VCDL ••• Digital Circuit ••• Digital Circuit CP/LF Phase Detector VCDL GLOBAL CLK The output clock frequency of DLL is the same as the input clock and also zero skew CP/LF Phase Detector This kind of configuration have become popular in today’s high-performance microprocessors 12 EE141 Integrated Circuits2nd © Digital Timing Issues Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol EE141 Integrated © Digital Circuits2nd 13 Timing Issues Synchronous Pipelined Datapath R1 D Q In CLK Logic Block #1 tpd,reg R2 D Q tpd1 Logic Block #2 R3 D Q tpd2 Logic Block #3 R4 D Q tpd3 Possible problems: Clock skew and jitter Significant noise when all clock transitions Logic delay at each stage is not the same EE141 Integrated © Digital Circuits2nd 14 Timing Issues Self-Timed Pipelined Datapath Req Req HS Ack In Done F1 tpF1 EE141 Integrated © Digital HS Ack Start R1 Req Circuits2nd Start R2 Req HS Ack Done F2 tpF2 ACK Start R3 Done F3 Out tpF3 15 Timing Issues Properties of Self-Timed Datapath In contrast to the global approach, timing signals are generated locally, which avoids skew and jitter problems. Separating the physical and logical ordering functions can result in a potential increase in performance. (the completed data does not have to wait for arrival of the next clock edge in order to proceed to the subsequent processing stages) Block can be shut down to save power Self-time circuits are, by nature, robust regarding variations in manufacturing and operating conditions. Whereas synchronous design is limited by their performance at the extremes of the operating conditions EE141 Integrated © Digital Circuits2nd 16 Timing Issues Completion Signal Generation Using Redundant Signal Encoding EE141 Integrated © Digital Circuits2nd 17 Timing Issues Ratioed logic with Improved Loads V DD M1 V DD M2 Out Out A A B B PDN1 PDN2 V SS V SS Differential Cascode Voltage Switch Logic (DCVSL) EE141 Integrated © Digital Circuits2nd 18 Timing Issues Completion Signal in DCVSL VDD VDD B0 Start Done B1 B0 B1 In1 In1 In2 In2 PDN PDN Start When start=0, B0,B1 are pre-charged low, thus done=0 When start=1, one of B0,B1 becomes high, thus done=1 EE141 Integrated © Digital Circuits2nd 19 Timing Issues Self-Timed Adder VDD VDD Start C0 C0 P0 C1 G0 P1 C2 G1 P2 C3 G2 P3 Start C4 C4 G3 Start VDD C4 C4 C3 C3 C2 C2 C1 C1 Order? Start Start C0 C0 Done P0 K0 C1 P1 K1 C2 P2 K2 Start (a) Differential carry generation EE141 Integrated Circuits2nd © Digital C3 P3 K3 C4 C4 (b) Completion signal This safely assumes that sum generation that depends on the arrival of the carry signal, is faster than Done signal. 20 Timing Issues Completion Signal Using Current Sensing Redundant coding approach allows tracking of the signal statistics, but comes at expense of power dissipation since every single gate must make a transition for each new input. Another way is to use critical path replica approach which is configured as a delay element. The critical path replica should track the critical path of the logic without any glitching transition. When the output of the delay line goes high/low, the logic computation is complete. The advantage of this approach is that the logic can be implemented as complementary logic style. EE141 Integrated © Digital Circuits2nd 21 Timing Issues Completion Signal Generation LOGIC In Out NETWORK Start DELAY MODULE Done Using Delay Element (e.g. in memories) EE141 Integrated © Digital Circuits2nd 22 Timing Issues Inputs Start Input Register Completion Signal Using Current Sensing VDD Start Output Static CMOS Logic tdelay A GNDsense Current Sensor toverlap A B tMDG Done Done Min Delay Generator tpd-NOR B Output valid In series with logic, monitors the current flowing through the logic. 0 when logic idle, 1 when logic switching EE141 Integrated © Digital Circuits2nd 23 Timing Issues Hand-Shaking Protocol Besides the generation of completion signals, a self-time approach also requires a hand-shaking protocol to order the circuit events to avoid races. The four events: data change, request data acceptance acknowledge proceed in a cyclic order (note: successive cycles may take different amount of time) A two-phase protocol: active cycle of sender and active cycle of receiver. 24 EE141 Integrated Circuits2nd © Digital Timing Issues Hand-Shaking Protocol Two Phase Handshake, active cycle of the sender and receiver EE141 Integrated © Digital Circuits2nd 25 Timing Issues Event Logic – The Muller-C Element A F C B (a) Schematic VDD A A B S R (a) Logic Q A B Fn+1 0 0 1 1 0 1 0 1 0 Fn Fn 1 (b) Truth table VDD VDD B F B F B A A F B B (b) Majority Function EE141 Integrated © Digital Circuits2nd (c) Dynamic 26 Timing Issues Implementation The correct operation requires a strict ordering of signaling events. Muller-C elements performs an AND operation on events. It is a basic element in Self-timed circuit design. EE141 Integrated © Digital Circuits2nd 27 Timing Issues 2-Phase Handshake Protocol Initially, Req, Ack and Data ready all set to 0. Then Data ready goes to 1, Req 1. Then Data acceptance 1, Ack 1. Send data, Data ready 0, so Req 0 finally. This completes one cycle. EE141 Integrated © Digital Circuits2nd 28 Timing Issues Example: Self-timed FIFO Out In R1 En R2 R3 Done Reqi Req0 C C C Acko Acki All 1s or 0s -> pipeline empty Alternating 1s and 0s -> pipeline full EE141 Integrated © Digital Circuits2nd 29 Timing Issues 4-Phase Handshake Protocol Also known as RTZ (Return to Slower, but unambiguous Zero) send data only when H-L EE141 Integrated © Digital Circuits2nd 30 Timing Issues 4-Phase Handshake Protocol Implementation using Muller-C elements EE141 Integrated © Digital Circuits2nd 31 Timing Issues Self-Resetting Logic completion detection (L1) Precharged Logic Block (L1) completion detection (L2) Precharged Logic Block (L2) completion detection (L3) Precharged Logic Block (L3) Pipelined version of self-timed logic. Precharging L1 happens when L2 received the data and does not need the output of L1 anymore. VDD int out A B EE141 Integrated © Digital C Circuits2nd Post-charge logic: it is possible to precharge a block based on completion of its own output. But inputs should be reset 32 Timing Issues Clock-Delayed Domino GND CLK2 (to next stage) CLK1 VDD Q1 (also D2) D1 Pulldown Network Used in IBM 1Ghz processor No global clock: clock for the next stage is derived from that of the previous stage The two inverters and TG emulates the worst case delay of the dynamic logic (can be controlled by sizing) It can provide both inverting (remove Q1) and non-inverting logic EE141 Integrated © Digital Circuits2nd 33 Timing Issues Asynchronous-Synchronous Interface fin Synchronous system Asynchronous system fCLK Synchronization Consider a computer. The internal operation of the computer is synchronized by a global clock, but the keyboard input is asynchronous. A synchronous need to sample at regular intervals and check its value. If the sampling rate is high enough, no transitions will be missed. But it might happen that the signal is in transition when being sampled. This could result in undefined state, which must be resolved before interpreted further. EE141 Integrated © Digital Circuits2nd 34 Timing Issues Synchronizers and Arbiters Problem: Circuit has to make a decision in limited time - which decision is not important Synchronizer: Circuit that implements a decision-making regarding the interface is called a synchronizer. Caveat: Building a perfect synchronizer that always delivers a legal answer is impossible! Solution: But, we can decrease the error probability at the expense of delay EE141 Integrated © Digital Circuits2nd 35 Timing Issues A Simple Synchronizer based on a CLK latch int D I1 Q I2 CLK Suppose that the latch is sampling the input on the rising edge of the clock. There is possibility that the setup time and hold time is violated. The output maybe undefined. But finally it will go to 0 or 1 (two stable state for crossed-coupled inverters) EE141 Integrated © Digital Circuits2nd 36 Timing Issues Synchronizer: Output Trajectories Vout 2.0 1.0 0.0 0 100 200 300 time [ps] Single-pole model for a flip-flop EE141 Integrated © Digital Circuits2nd 37 Timing Issues Positive Feedback: Bi-Stability Vi2 V o1 1 o V 1 o V 52 i V V i1 V o2 A V i 2 = V o1 C 1 o B V 52V i 1 = V o2 i V When the gain of inverter in transient region is larger than 1, A & B are the only stable operating points, C is metastable. EE141 Integrated © Digital Circuits2nd 38 Timing Issues Arbiters Req1 Req2 Ack1 Arbiter Req1 A Ack2 B Ack2 Ack1 (a) Schematic symbol Req2 Req1 (b) Implementation Req2 VT gap A B metastable Ack1 Ack2 EE141 Integrated © Digital Circuits2nd (c) Timing diagram t When both request are submitted, A, B are temporarily metastable and then drive to mutually exclusive states 39 Timing Issues PLL-Based Clock Synchronization Chip 1 Chip 2 Data Digital System Digital System fsystem = N x fcrystal Divider PLL reference clock PLL Clock Buffer fcrystal , 200<Mhz Crystal Oscillator Crystal oscillator generate accurate signal less than 200Mhz. For Giga hz clock, PLL is need. Also, PLL can be used to synchronize the two clocks. EE141 Integrated © Digital Circuits2nd 40 Timing Issues