Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE260B – CSE241A Winter 2005 Clocking Website: http://vlsicad.ucsd.edu/courses/ece260b-w05 ECE 260B – CSE 241A Clocking 1 Slides courtesy of Prof. Andrew B. Kahng http://vlsicad.ucsd.edu Outline Problem Statement Clock Distribution Structures Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Construction Embedding ECE 260B – CSE 241A Clocking 2 http://vlsicad.ucsd.edu Why Clocks? Clocks provide the means to synchronize By allowing events to happen at known timing boundaries, we can sequence these events Greatly simplifies building of state machines No need to worry about variable delay through combinational logic (CL) All signals delayed until clock edge (clock imposes the worst case delay) FSM Courtesy K. Yang, UCLA Comb Logic register register ECE 260B – CSE 241A Clocking 3 register Comb Logic Dataflow http://vlsicad.ucsd.edu Clock Distribution Network General goal of clock distribution Deliver clock to all memory elements with acceptable skew Deliver clock edges with acceptable sharpness Clocking network design is one of the greatest challenges in the design of a large chip Consume up to 1/3 of chip power Accurate signal delay Signal integrity Subject to uncertainty / variation of different processes / operating conditions ECE 260B – CSE 241A Clocking 4 http://vlsicad.ucsd.edu Clock Design Components Oscillator Dividers Buffers Strong drivers Reduce delay Signal integrity / slew rate Interconnects Balanced trees, meshes, etc. Shielding (e.g., for crosstalk reduction) Non-tree links / feedback loops ECE 260B – CSE 241A Clocking 5 http://vlsicad.ucsd.edu Clock Distribution Objective Minimum / bounded skew performance / hold time requirements Guaranteed slew rate / signal integrity Small insertion delay Robustness under process / operating condition variation Minimum cell / routing area Minimum power consumption ECE 260B – CSE 241A Clocking 6 http://vlsicad.ucsd.edu Clock Distribution Robustness Subject to Radically different loading (flip-flop density) Interconnect coupling From lot-to-lot Across the die Buffers Metal width Supply voltage variation across the die Signal integrity Delay variation Process variation Across the die ECO (Engineering Change Order) Both static IR drop Dynamic voltage drop Temperature ECE 260B – CSE 241A Clocking 7 http://vlsicad.ucsd.edu Issues in Clock Distribution Network Design Skew Process, voltage, and temperature Data dependence Noise coupling Load balancing Power, CV2f (consume up to 1/3 of total chip power) Clock gating Flexibility/Tunability Compactness – fit into existing layout/design Facilitate ECO ECE 260B – CSE 241A Clocking 8 http://vlsicad.ucsd.edu Skew: Clock Delay Varies With Position ECE 260B – CSE 241A Clocking 9 http://vlsicad.ucsd.edu Clock Skew Causes Designed (unavoidable) variations – mismatch in buffer load sizes, interconnect lengths Process variation – process spread across die yielding different Leff, Tox, etc. values Temperature gradients – changes MOSFET performance across die IR voltage drop in power supply – changes MOSFET performance across die Note: Delay from clock generator to fan-out points (clock latency) is not important by itself BUT: increased latency leads to larger skew for same amount of relative variation Sylvester Shepard, 2001 ECE/ 260B – CSE 241A Clocking 10 http://vlsicad.ucsd.edu Outline Problem Statement Clock Distribution Structures Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Construction Embedding ECE 260B – CSE 241A Clocking 11 http://vlsicad.ucsd.edu Clock Distribution Structures RC-Tree Less capacitance More accuracy Flexible wiring Grids Reliable Less data dependency Tunable (late in design) Shown here for final stage drivers driving F/F loads ECE 260B – CSE 241A Clocking 12 http://vlsicad.ucsd.edu Grids Gridded clock distribution common on earlier DEC Alpha microprocessors Advantages: Skew determined by grid density, not too sensitive to load position Clock signals available everywhere Tolerant to process variations Usually yields extremely low skew values Disadvantages: Predrivers Global grid Huge amount of wiring and power To minimize such penalties, need to make grid pitch coarser lose the grid advantage Sylvester Shepard, 2001 ECE/ 260B – CSE 241A Clocking 13 http://vlsicad.ucsd.edu H-Tree H-tree (Bakoglu) One large central driver, recursive structure to match wirelengths Halve wire width at branching points to reduce reflections Disadvantages Slew degradation along long RC paths Unrealistically large central driver courtesy of P. Zarkesh-Ha - Clock drivers can create large temperature gradients (ex. Alpha 21064 ~30° C) Non-uniform load distribution Inherently non-scalable (wire R growth) Partial solution: intermediate buffers at branching points Sylvester Shepard, 2001 ECE/ 260B – CSE 241A Clocking 14 http://vlsicad.ucsd.edu Buffered H-tree Advantages Ideally zero-skew Can be low power (depending on skew requirements) Low area (silicon and wiring) CAD tool friendly (regular) Disadvantages Sensitive to process variations - Devices Want same size buffers at each level of tree - Wires Want similar segment lengths on each layer in each source-sink path !!! Local clocking loads inherently non-uniform Sylvester Shepard, 2001 ECE/ 260B – CSE 241A Clocking 15 http://vlsicad.ucsd.edu Tree Balancing Some techniques: Con: Routing area often more valuable than Silicon a) Introduce dummy loads b) Snaking of wirelength to match delays Sylvester Shepard, 2001 ECE/ 260B – CSE 241A Clocking 16 http://vlsicad.ucsd.edu Examples From Processor Chips H-Tree, Asymmetric RC-Tree (IBM) Grids DEC [Alphas] Serpentines Intel x86 [Young ISSCC97] ECE 260B – CSE 241A Clocking 17 http://vlsicad.ucsd.edu Example Skews From Processor Chips DEC-Alpha 21064 clock spines DEC-Alpha 21064 RC delays DEC-Alpha 21164 RC local delays DEC-Alpha 21164 RC delays for Global Distribution (Spine + Grid) ECE 260B – CSE 241A Clocking 18 http://vlsicad.ucsd.edu ReShape Clocks Example (High-End ASIC) Balanced, shielded H-tree for pre-clock distribution Mesh for block level distribution All routes 5-6u M6/5, shielded with 1u grounds ~10 buffers per node E.g., ganged BUFx20’s Output mesh must hit every sub-block output mesh ECE 260B – CSE 241A Clocking 19 http://vlsicad.ucsd.edu Block Level Mesh (.18u) Clumps of 1-6 clock buffers, surrounded by capacitor pads Shielded input and output m6 shorting straps Pre-clock connects to input shorting straps 1u m5 ribs every 20 - 30 u (4 to 6 rows) Max 600u stride ECE 260B – CSE 241A Clocking 20 http://vlsicad.ucsd.edu Problems with Meshes Burn more power at low frequencies Difficult for ‘spare’ clock domains that will not tolerate regioning Blocks more routing resources (solution: integrated power distribution with ribs can provide shielding for ‘free’) Post placement (and routing) tuning required No ‘beneficial skew’ possible Clock gating only easy at root Fighting tools to do analysis: Clumped buffers a problem in Static Timing Analysis tools Large shorted meshes a problem for STA tools What does Elmore delay calculation look like for a non-tree? Need full extraction and SPICE-like simulation to determine skew ECE 260B – CSE 241A Clocking 21 http://vlsicad.ucsd.edu Benefits of Meshes Deterministic since shielded all the way down to rib distribution No ECO placement required: all buffers preplaced before block placement Low latency since uses shorted (= ganged, parallel) drivers, therefore lower skew ECO placements of FFs later do not require rebalancing of tree “Idealized” clocking environment for “concurrent dance” of RTL design and timing convergence ECE 260B – CSE 241A Clocking 22 http://vlsicad.ucsd.edu Hybrid Structure Balanced tree on the top Mesh in the middle Minimize skew Steiner minimum tree at the bottom Minimize cost Facilitate ECO ECE 260B – CSE 241A Clocking 23 http://vlsicad.ucsd.edu Outline Problem Statement Clock Distribution Structures Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Construction Embedding ECE 260B – CSE 241A Clocking 24 http://vlsicad.ucsd.edu Process Variation Intra-die and inter-die variations Intra-die variation is increasingly significant since 0.13um technology Systematic and random variations Systematic variation is due to equipment, process, etc. - Global len aberration in lithograthy causes systematic variation - Pattern-dependent optical proximity, chemical mechanical polish (CMP) Random variation is due to inherent variation Spatial correlation across a chip Fast vs. slow corners ECE 260B – CSE 241A Clocking 25 http://vlsicad.ucsd.edu Process Variation Metal wires Width variation can be estimated by LUT(width, spacing) Thickness variation CMP local density Thickness variation also depends on wire width and spacing Could be up to 30-40% in 90nm process Transistors Channel length variation (delay ~ L1.5) Thin gate oxide tox variation Vth variation Up to 30% variation in term of driving capability ECE 260B – CSE 241A Clocking 26 http://vlsicad.ucsd.edu Process Variations – SPICE model Process variations are reflected into a statistical SPICE model Usually only a few parameters have a statistical distribution (e.g. : {DL, DW, TOX,VTn, VTp}) and the others are set to a nominal value The nominal SPICE model is obtained by setting the statistical parameters to their nominal value ECE 260B – CSE 241A Clocking 27 Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB http://vlsicad.ucsd.edu Global Variations (Inter-die) Process variations Performance variations Critical path delay of a 16-bit adder All devices have the same set of model parameters value ECE 260B – CSE 241A Clocking 28 Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB http://vlsicad.ucsd.edu Local Variations (Intra-die) Each device instance has a slightly different set of model parameter values (aka device mismatch) The performance of some analog circuits strongly depends on the degree of matching of device properties Digital circuits are in general more immune to mismatch, but clock distribution network is sensitive (clock skew) ECE 260B – CSE 241A Clocking 29 Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB http://vlsicad.ucsd.edu Statistical Design Need to account for process variations during design phase •Statistical design –Nominal design –Yield optimization –Design centering ECE 260B – CSE 241A Clocking 30 Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB http://vlsicad.ucsd.edu Statistical Design ECE 260B – CSE 241A Clocking 31 Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB http://vlsicad.ucsd.edu Process Variation Tolerance Enhancement Rule of thumb: balanced tree Identical buffers at identical heights Drive identical subtree loads Can we do better than this? Process variation tolerant clock design Bounded-skew DME Topology construction - With process variation tolerance in objective Useful skew scheduling - To the center of permissible ranges ECE 260B – CSE 241A Clocking 32 http://vlsicad.ucsd.edu Signal Integrity Crosstalk Supply voltage drop IR, L dI/dt, LC resonance Temperature Capacitive, inductive Increased resistance with higher temperature Substrate coupling Parasitic resistance, capacitance in the substrate layer ECE 260B – CSE 241A Clocking 33 http://vlsicad.ucsd.edu Crosstalk Due to the coupling capacitance between interconnections, a signal switching on a net (aggressor) may affect the voltage waveform on a neighboring net (victim) Noise Propagation Increased Delay ECE 260B – CSE 241A Clocking 34 http://vlsicad.ucsd.edu Circuit Model for Crosstalk ECE 260B – CSE 241A Clocking 35 http://vlsicad.ucsd.edu Crosstalk Simulation ECE 260B – CSE 241A Clocking 36 http://vlsicad.ucsd.edu Design for Crosstalk It can be both capacitive and inductive Capacitive is dominant at current switching speeds To reduce it: Use of shielding layer (inter-layer) Use of shielding wire (intra-layer) GND VDD GND Substrate ECE 260B – CSE 241A Clocking 37 http://vlsicad.ucsd.edu Clock Gating Reduce power consumption by temporarily shutting down part of the circuit FF Q FF combinational logic D Additional cost of enabling CLK1 circuits CLK2 CLK ENABLING ECE 260B – CSE 241A Clocking 38 http://vlsicad.ucsd.edu Outline Problem Statement Clock Distribution Statement Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Construction Embedding ECE 260B – CSE 241A Clocking 39 http://vlsicad.ucsd.edu Skew = Local Constraint Timing is correct as long as the clock signals of sequentially adjacent FFs arrive within a permissible skew range FF -d + thold race condition < D : longest path d : shortest path Skew FF < safe Tperiod - D - tsetup cycle time violation permissible range W. Dai, UC260B Santa Cruz241A Clocking 40 ECE – CSE http://vlsicad.ucsd.edu “Useful Skew” Design Robustness Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge FF FF 2 ns 6 ns 4 FF T = 6 ns 0 “0 0 0”: at verge of violation 4 0 “2 0 2”: more safety margin 2 W. Dai, UC260B Santa Cruz241A Clocking 41 ECE – CSE -2 http://vlsicad.ucsd.edu Constraints on Skews FFi receives clock signal delayed by xi MIN_DEL 0 < 1 : if nominal clock delay is xi, then actual clock delay must fall within interval xi x xi For FF to operate correctly when clock edge arrives at time x, the correct input data must be present and stable during the time interval (x – SETUP, x + HOLD) For 1 i,j L (#FFs), we compute lower and upper bounds MIN(i,j) and MAX(i,j) for the time that is required for a signal edge to propagate from FFi to FFj Avoid double-clocking (race condition) xi + MIN(i,j) xj + HOLD Avoid zero-clocking xj + SETUP + MAX(i,j) xj + P; ECE 260B – CSE 241A Clocking 42 P = clock period http://vlsicad.ucsd.edu Optimal Useful Skews by Linear Programming LP_SPEED (clock period reduction): minimize P s.t. xj - xj HOLD – MIN(i,j) xi– xj + P SETUP + MAX(i,j) xi MIN_DEL LP_SAFETY (robustness): Maximize M s.t. xj - xj – M HOLD – MIN(i,j) xi– xj – M SETUP + MAX(i,j) – P xi MIN_DEL Notes - J. P. Fishburn, “Clock Skew Optimization”, IEEE Trans. Computers 39(7) (1990), pp. 945-951. - T. G. Szymanski, “Computing Optimal Clock Schedules”, Proc. DAC, June 1992, pp. 399-404. - Useful Skew optimization is similar to Retiming optimization - Peak current reductions are a side benefit ECE 260B – CSE 241A Clocking 43 http://vlsicad.ucsd.edu Outline Problem Statement Clock Distribution Structures Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Design Embedding For zero skew (ZST-DME) For bounded skew (BST-DME) ECE 260B – CSE 241A Clocking 44 http://vlsicad.ucsd.edu Zero-Skew Tree (ZST) Problem Zero Skew Clock Routing Problem (S,G): Given a set S of sink locations and a connection topology G, construct a ZST T(S) with topology G and having minimum cost. Skew = maximum value of |td(s0,si) – td(s0,sj)| over all sink pairs si, sj in S. Td = signal delay (from source s0) Connection topology G = rooted binary tree with nodes of S as leaves Edge ea in G is the edge from a to its parent |ea| is the (assigned) length of edge ea Cost = total edge length ECE 260B – CSE 241A Clocking 45 http://vlsicad.ucsd.edu Zero-Skew Example (555 sinks, 40 obstacles) ECE 260B – CSE 241A Clocking 46 http://vlsicad.ucsd.edu A Zero-Skew Routing Algorithm Finds a ZST under linear delay model with minimum cost over all ZSTs with topology G and sink set S Terms Manhattan Arc: line segment with slope +1 or –1 Tilted Rectangular Region (TRR): collection of points within a fixed distance of a Manhattan arc - Core = Manhattan arc Radius = distance Merging segment = locus of feasible locations for a node v in the topology, consistent with minimum wirelength - If v is a sink, then ms(v) = {v} If v is an internal node, then ms(v) is the set of all points within distance |ea| of ms(a), and within distance |eb| of ms(b) ECE 260B – CSE 241A Clocking 47 http://vlsicad.ucsd.edu Phase 1: Tree of Merging Segments Goal: Construct a tree of merging segments corresponding to topology G Merging segment of a node depends on merging segment of its children bottom-up construction Let a, b be children of v. We want placements of v that allow TSa and TSb to be merged with minimum added wire while preserving zero skew Merging cost = |ea| + |eb| Fact: The intersection of two TRRs is also a TRR and can be found in constant time Constant time per each new merging segment linear time (in size of S) to construct entire tree ECE 260B – CSE 241A Clocking 48 http://vlsicad.ucsd.edu Phase 2: Find Node Placements Goal: Find exact locations (“embeddings”) pl(v) of internal nodes v in the ZST topology If v is the root node, then any point on ms(v) can be chosen as pl(v) If v is an internal node other than the root, and p is the parent of v, then v can be embedded at any point in ms(v) that is at distance |ev| or less from pl(p) Detail: create square TRR trrp with radius ev and core equal to pl(p); placement of v can be any point in ms(v) trrp Each instruction executed at most once for each node in G, and TRR intersection is O(1) time Find_Exact_Placements is O(n) DME is O(n) ECE 260B – CSE 241A Clocking 49 http://vlsicad.ucsd.edu Outline Problem Statement Clock Distribution Structures Robustness / Signal Integrity Control Clock Design: Skew Scheduling Topology Design Embedding For zero skew (ZST-DME) For bounded skew (BST-DME) ECE 260B – CSE 241A Clocking 50 http://vlsicad.ucsd.edu Non-Zero Skew Bounds Given a skew bound, where can internal nodes of the given topology (e.g., a, b, v) be placed? skew 0 a 2 4 6 6 2 4 4 2 skew 0 2 v 6 s0 v a ECE 260B – CSE 241A Clocking 51 b Topology s1 s2 s3 s4 4 b 6 http://vlsicad.ucsd.edu BST-DME Bottom-Up Phase Bottom-Up: build tree of merging regions corresponding to given topology B=4 s0 a b Topology s1 s2 s3 s4 s2 s0 mr(a) s1 v mr(v) s3 mr(b) s4 ECE 260B – CSE 241A Clocking 52 http://vlsicad.ucsd.edu BST-DME Top-Down Phase s0 v a s1 s2 s3 s4 s2 B=4 s0 s1 a b Topology v s3 b s4 ECE 260B – CSE 241A Clocking 53 http://vlsicad.ucsd.edu Good Luck for the Mid-Term! ECE 260B – CSE 241A Clocking 54 http://vlsicad.ucsd.edu