Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clock Generation Distribution Clock Generation Single phase clock system Simplest clocking methodology is to use a single clock in conjunction with a register. Clocks are generated with global clock buffers CLK and CLK bar are generated locally. On-Chip Clock Insertion Delay Clock driver ext. Clk Insertion delay D int. Clk Clk D Clk 5 Clock Generation using DLLs U fREF Phase Det Charge Pump DL D Filter fO PLL Reference clock Local clock Up Phase detector Charge pump Loop filter vcont VCO Down Divide by N System Clock • Phase locked loops (PLL) are used to generate internal clocks on chips for two main reasons: to synchronize the internal clock of a chip with an external clock to operate the internal clock at a higher rate than the external clock input Phase-Locked Loop Block Diagram and Operation Clock driver PLL ext. Clk p Cload cv PD VCO LP Cload int. Clk Cload ext. Clk int. Clk p cv 135o out of phase 45o out of phase 9 Ring-Oscillator-Based VCO with CMOS Inverters as Delay Elements Vreg 1 f osc Nov. 14, 2003 n 2k 1 Tinv 1 ; k 1 2nTinv 10 System clocking schemes: (a) single-phase clock; (b) two-phase clock; (c) multiple-phase clock Clk (a) 1 2 (b) 1 2 3 4 k two-phase clock Function of clock distribution network Synchronize millions (billions) of separate elements Within a time scale on order of ~10 ps Clock Parameters: Period (T), Width, Rise and Fall Times trise tfall Clock W W w T - duty cycle T 14 Clock Parameters: Period, Width, Clock Skew and Clock Jitter tDRVCLK Ref_Clock tskew t jit tskew t jit Received Clock T tRCVCLK 15 Clock Uncertainties t DRV_CLK Ref_Clock tskew tskew t jit Received Clock tRCV_CLK T t jit Clock uncertainty: jitter+skew 16 Timing Constraint Static and dynamic time analysis • Timing analysis is the methodical analysis of a digital circuit to determine if the timing constraints imposed by components or interfaces are met. • Typically, this means that you are trying to prove that all set-up, hold, and pulse-width times are being met. • • a chip must meet the timing constraints in order to operate at the intended clock rate, • Function of clock distribution network: • Synchronize millions (billions) of separate elements Within a time scale Clock Parameters: Period (T), Width, Rise and Fall Times trise tfall Clock W W w T - duty cycle T 20 •Devices perform the operation in a active clock cycle Setup Time: • The Time when input data is available and stable before the clock pulse is applied is called Setup time. • Setup time is the minimum amount of time the data signal should be held steady before the clock event so that the data are reliably sampled by the clock. This applies to synchronous circuits such as the flip-flop. • Or the amount of time the Synchronous input (D) must be stable before the active edge of the Clock. Hold time: • The Time after clock pulse where data input is held stable is called hold time. • Hold time is the minimum amount of time the data signal should be held steady after the clock event so that the data are reliably sampled. This applies to synchronous circuits such as the flipflop. • Or in short the amount of time the synchronous input (D) must be stable after the active edge of clock. Clock Parameters: Period, Width, Clock Skew and Clock Jitter tDRVCLK Ref_Clock tskew t jit tskew t jit Received Clock T tRCVCLK 24 Metrics/Goals • Besides basic connectivity, what makes a clock network good or bad? – Skew – Jitter – Power – Area – Slew rates Clock Skew Clock Skew • Defined as: Maximum difference in arrival times of clock signal to any 2 latches/FF’s fed by the network Clock Skew Skew = max | t1 – t2 | Clock Skew • Causes: – Designed (unavoidable) variations – mismatch in buffer load sizes, interconnect lengths – Temperature gradients – changes MOSFET performance across die – IR voltage drop in power supply – changes MOSFET performance across die • Note: Delay from clock generator to fan-out points (clock latency) is not important by itself – BUT: increased latency leads to larger skew for the same amount of relative variation Clock Skew • Effect: – Eats into timing budget – Needs to be considered for maximum (setup) and minimum (hold) path timings Cycle time Requirements • clock waveforms must be particularly clean and sharp., • No skew special attention has be made by designing the clock tree. CAD tools are able to design balanced clock trees. Jitter Jitter • From one clock cycle to the next, the period is not exactly the same each time • Maximum difference in phase of clock between any two periods is jitter NOTES : JITTER J1 = t2 – t1 JITTER J2 = t3 – t2 Jitter • Caused by variations in clock period that result from: – Phased-lock loop (PLL) oscillation frequency – Various noise sources affecting clock generation and distribution • Ex. Power supply noise which dynamically alters the drive strength of intermediate Buffer stages • Jitter can be reduced by minimizing power supply noise (IR and L*di/dt) Jitter Impact on Timing Budget • Needs to be considered in maximum path timing (setup) • Typically on the order of 50ps in high-end microprocessors Clock Power Clock Power • Power consumption in clocks due to: – Clock drivers – Long interconnections – Large clock loads – all clocked elements (latches, FF’s) are driven • Different components dominate – Depending on type of clock network used – Ex. Grid – huge pre-drivers & wire cap. drown out load cap. Clocks: Power-Hungry P = α C Vdd2 f Not only is the clock capacitance large, it switches every cycle! Low Power Clocking Techniques • Gated clocks – Prevent switching in areas of the chip not being used – Easier in static designs • Edge-triggered flip-flops in ARM rather than transparent latches in Alpha – Reduced load on clock for each flip-flop Clock Distribution Metric: Area • Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area Top-level metals are used to reduce RC delays – These levels are precious resources (unscaled) • By minimizing area used, we also reduce wiring capacitance & power Slew Rates • To maintain signal integrity and latch performance, minimum slew rates are required • Too slow – clock is more susceptible to noise, latches are slowed down, eats into timing budget • Too fast – burning too much power, overdesigned network, enhanced ground bounce Slew Rates • Latch set-up times are dependent on clock input slew rates (eats into timing budget) • Short-circuit power grows with larger slew rates – This can be significant for large clock drivers Ref : IBM sebsite, Carring Technology Trends: Power • Heavily pipelined design more latches, more capacitive load for clock • Larger chips more wire-length needed to cover the entire die • Complexity more functionality and devices means more clocked elements • Dynamic logic more clocked elements Clock Distribution Clock tree style 3 Fanout Balance Tree Model 1 H-Tree Model Less flexible Net applicable to placement 2 Binary Tree Model Easy to construct Weak for blest latch distribution 4 Spine and trunk Model (Fish and Bone) Easy to adjust Net Loading Many dummy cells are needed Skew Hardly influenced by Process Scattering Die size increase Clock Distribution Example Alpha 21264 clock distribution -- grid + Htree approach Power = 32% of total Wire usage = 3% of metals 3 & 4 4 major clock quadrants, each with a large driver connected to local grid structures Network Types: Grid • Gridded clock distribution was common on earlier DEC Alpha microprocessors • Advantages: – Skew determined by grid density and not overly sensitive to load position – Clock signals are available everywhere – Tolerant to process variations – Usually yields extremely low skew values Pre-drivers Global grid Grid Disadvantages • Huge amounts of wiring & power – Wire cap large – Strong drivers needed – pre-driver cap large – Routing area large • To minimize all these penalties, make grid pitch coarser – Skew gets worse – Losing the main advantage • Don’t overdesign – let the skew be as large as tolerable Still – grids seem non-feasible for SOC’s Network Types: Tree • Original H-tree – One large central driver – Recursive H-style structure to match wire-lengths – Halve wire width at branching points to reduce reflections A B H-Tree Problems – slew degradation along long RC paths – unrealistically large central driver • Clock drivers can create large temperature gradients (ex. Alpha 21064 ~30° C) – non-uniform load distribution • Inherently non-scalable (wire resistance skyrockets) • Solution to some problems – Introduce intermediate buffers along the way – Specifically at branching points Buffered Clock Tree Buffered H-tree • Advantages – Ideally zero-skew – Can be low power (depending on skew requirements) – Low area (silicon and wiring) – CAD tool friendly (regular) • Disadvantages – Sensitive to process variations – Local clocking loads are inherently non-uniform Clock Skew and Clock balancing • Clock skew – Hold time violation is critical to working silicon – Aggressive skew budget for high speed operation – Large turn-around-time for clock tree synthesis at P&R stage – Skew Source : process + voltage + temp + load + jitter} – Skew Budget == ( Target Cycle Time ) /20 , min clk->Q • Solution – CTS (Clock Tree Synthesis) – Insert dummy delay at Synthesis Over-design Practical problem in Clock tree synthesis • Problems – Large chip size due to SOC integration – Unbalanced FF distribution – Top-level : Interconnect RC dominant – Iteration cost – Test clock – Multiple clock frequency • Solution – Plan from the early design stage – Skew budgeting : 100ps @ 200MHz Block level clock tree • Block-level clock skew – Driver-limited – Optimization of the buffer strength and number • Clock tree synthesis – Commercial tool – Many iterations – Long turn-around-time • Clock tree planning – Virtual clock tree generation – Need engineering approximation Real Clock Tree clk.4.1 clk.5.1 clk.3.1 Clock tree style Trunk-and-Branch Top Level Clock Distribution PLL NW NE L2 SS SW system SE Real Example