Download ELEN 468 Advanced Logic Design

Delay Model and Simulation ELEN 468 Lecture 30 1 Simulation with Delay A X B X A C C 3 X D 2 13 D B X 0 10 15 20 30 40 50 tsim A=x B=x C=x D=x A=1 B=0 B=1 A=0 C=1 B=0 C=0 C=0 D=0 D=1 D=1 ELEN 468 Lecture 30 2 Delay Models Gate delay    Intrinsic delay Layout-induced delay due to capacitive load Waveform slope-induced delay Net delay/transport delay  Signal propagation delay along interconnect wires Module path delay  Delay between input port and output port ELEN 468 Lecture 30 3 Inertial Delay Delay is caused by charging and discharging node capacitors in circuit A C D B Gate delay and wire delay Pulse rejection  If pulse with is less than delay, the pulse is ignored ELEN 468 Lecture 30 4 Gate Delay and (yout, x1, x2); // default, zero gate delay and #3 (yout, x1, x2); // 3 units delay for all transitions and #(2,3) G1(yout, x1, x2); // rising, falling delay and #(2,3) G1(yout, x1, x2), G2(yout2, x3, x4); // Multiple instances a_buffer #(3,5,2) (yout, x); // UDP, rise, fall, turnoff bufif1 #(3:4:5, 6:7:9, 5:7:8) (yout, xin, enable); // min:typ:max / rise, fall, turnoff •Simulators simulate with only one of min, typ and max delay values •Selection is made through compiler directives or user interfaces •Default delay is typ delay ELEN 468 Lecture 30 5 Gate and Wire Model C R r: resistance per unit length c: capacitance per unit length L rL cL/2 ELEN 468 Lecture 30 cL/2 6 Example of Model 0 0 rL1 1 L2 C2 L3 2 rL2 R L1 2 3 cL2/2+C2 1 C3 cL1/2 rL3 (L1+L2+L3)c/2 ELEN 468 Lecture 30 3 cL3/2+C3 7 Delay Estimation 2 R2 R 0 C0 R1 C2 1 C1 R3 3 C3 D0 = R ( C0 + C1 + C2 + C3 ) D1 = D0 + R1 ( C1 + C2 + C3 ) D2 = D1 + R2 C2 D3 = D1 + R3 C3 ELEN 468 Lecture 30 8 Clock Scheduling LD: logic delay Register i Clock Combinational Logic ti Register j tj ELEN 468 Lecture 30 9 Timing Constraints tj hold setup LDmin ti LDmax skewij = ti – tj >= holdmax – LDmin skewij = ti – tj <= CP – LDmax – setupmax CP: clock period ELEN 468 Lecture 30 10 Assignment ELEN 468 Lecture 30 11 Blocking and Non-blocking Assignment initial begin a = 1; b = 0; a = b; // a = 0; b = a; // b = 0; end initial begin a = 1; b = 0; a <= b; // a = 0; b <= a; // b = 1; end Blocking assignment “=“   Statement order matters A statement has to be executed before next statement Non-blocking assignment “<=“     Concurrent assignment Normally the last assignment at certain simulation time step If it triggers other blocking assignments, it is executed before the blocking assignment it triggers If there are multiple non-blocking assignments to same variable in same behavior, latter overwrites previous ELEN 468 Lecture 30 12 Procedural Continuous Assignment Continuous assignment establishes static binding for net variables Procedural continuous assignment (PCA) establishes dynamic binding for variables   “assign … deassign” for register variables only “force … release” for both register and net variables ELEN 468 Lecture 30 13 Intra-assignment Delay: Blocking Assignment // B = 0 at time 0 // B = 1 at time 4 … #5 A = B; // A = 1 C = D; … A = #5 B; // A = 0 C = D; … A = @(enable) B; C = D; … A = @(named_event) B; C= D; … If timing control operator(#,@) on LHS    Blocking delay RHS evaluated at (#,@) Assignment at (#,@) If timing control operator(#,@) on RHS    Intra-assignment delay RHS evaluated immediately Assignment at (#,@) ELEN 468 Lecture 30 14 Example initial begin a = #10 1; b = #2 0; c = #3 1; end initial begin d <= #10 1; e <= #2 0; f <= #3 1; end t 0 2 3 10 12 15 a x x x 1 1 1 ELEN 468 Lecture 30 b x x x x 0 0 c x x x x x 1 d x x x 1 1 1 e x 0 0 0 0 0 f x x 1 1 1 1 15 Tell the Differences always @ (a or b) y = a|b; Which one describes or gate? always @ (a or b) #5 y = a|b; Event control is blocked always @ (a or b) y = #5 a|b; always @ (a or b) y <= #5 a|b; ELEN 468 Lecture 30 16 Race Condition always @ ( posedge clk ) c = b; // c will get previous b or new b ? always @ ( posedge clk ) b = a; ELEN 468 Lecture 30 17 Avoid Race Condition always @ ( posedge clk ) begin c = b; b = a; end // Solution 1: merge always always @ ( posedge clk ) c = #1 b; always @ ( posedge clk ) b = #1 a; // Solution 2: intra-assignment delay always @ ( posedge clk ) c <= b; always @ ( posedge clk ) b <= a; // Solution 3: non-blocking assignment ELEN 468 Lecture 30 18 Finite State Machine ELEN 468 Lecture 30 19 FSM Example: Speed Machine a = 1, b = 0 b=1 a = 1, b = 0 low medium stopped b=1 b=1 a: accelerator b: brake accelerator brake clock b=1 speed high a = 1, b = 0 a = 1, b = 0 ELEN 468 Lecture 30 20 Verilog Code for Speed Machine // Explicit FSM style module speed_machine ( clock, accelerator, brake, speed ); input clock, accelerator, brake; output [1:0] speed; reg [1:0] state, next_state; parameter parameter parameter parameter stopped = 2`b00; s_slow = 2`b01; s_medium = 2`b10; s_high = 2`b11; assign speed = state; always @ ( posedge clock ) state <= next_state; always @ ( state or accelerator or brake ) if ( brake == 1`b1 ) case ( state ) stopped: next_state <= stopped; s_low: next_state <= stopped; s_medium: next_state <= s_low; s_high: next_state <= s_medium; default: next_state <= stopped; endcase else if ( accelerator == 1`b1 ) case ( state ) stopped: next_state <= s_low; s_low: next_state <= s_medium; s_medium: next_state <= s_high; s_high: next_state <= s_high; default: next_state <= stopped; endcase else next_state <= state; endmodule ELEN 468 Lecture 30 21 State Encoding Example # 0 1 2 3 4 5 6 7 Binary 000 001 010 011 100 101 110 111 Gray 000 001 011 010 110 111 101 100 Johnson 0000 0001 0011 0111 1111 1110 1100 1000 ELEN 468 Lecture 30 One-hot 00000001 00000010 00000100 00001000 00010000 00100000 01000000 10000000 22 State Encoding A state machine having N states will require at least log2N bits register to store the encoded representation of states Binary and Gray encoding use the minimum number of bits for state register Gray and Johnson code:  Two adjacent codes differ by only one bit  Reduce simultaneous switching   Reduce crosstalk Reduce glitch ELEN 468 Lecture 30 23 One-hot Encoding Employ one bit register for each state Less combinational logic to decode Consume greater area, does not matter for certain hardware such as FPGA Easier for design, friendly to incremental change case and if statement may give different result for one-hot encoding Runs faster ‘define state_0 3’b001 ‘define state_1 3’b010 ‘define state_2 3’b100 ELEN 468 Lecture 30 24 Transistor Level Model ELEN 468 Lecture 30 25 Static CMOS Circuits module cmos_inverter ( out, in ); output out; input in; supply0 GND; supply1 PWR; Vdd in d out drain pmos ( out, PWR, in ); nmos ( out, GND, in ); endmodule source gate ELEN 468 Lecture 30 26 Pull Gates module nmos_nand_2 ( Y, A, B ); output Y; input A, B; supply0 GND; tri w; pullup ( Y ); nmos ( Y, w, A ); nmos ( w, GND, B ); endmodule Vdd Vdd Y Y A A B B ELEN 468 Lecture 30 27 Assign Drive Strengths nand ( pull1, strong0 ) G1( Y, A, B ); wire ( pull0, weak1 ) A_wire = net1 || net2; assign ( pull1, weak0 ) A_net = reg_b; Drive strength is specified through an unordered pair   one value from { supply0, strong0, pull0, weak0 , highz0 } the other from { supply1, strong1, pull1, weak1, highz1 } Only scalar nets may receive strength assignment When a tri0 or tri1 net is not driven , it is pulled to indicated logic value with strength of pull0 or pull1 The trireg net models capacitance holds a charge after the drivers are removed, the net has a charge strength of small, medium(default) or large capacitor ELEN 468 Lecture 30 28 Signal Strength Levels Supply Drive St0 Strong Drive Pu0 Pull Drive La0 Large Capacitor We0 Weak Drive Me0 Medium Capacitor Sm0 Weak Capacitor HiZ0 High Impedance Su0 Su1 St1 Pu1 La1 We1 Me1 Sm1 HiZ1 Signal strength – signal’s ability to act as a logic driver determining the resultant logic value on a net   Signal contention between multiple drivers of nets Charge distribution between nodes in a circuit Default – strong drive Capacitive strengths may be assigned only to trireg nets ELEN 468 Lecture 30 29 Strength Reduction Dependence of output strength on input strength   Combinational and pull gate – NO, except 3-state gates Transistor switch and bi-directional gates – YES In general, output strength <= input strength ELEN 468 Lecture 30 30 Transistor Switch and Bi-directional Gate Transistor switch  nmos, pmos, cmos Bi-directional gate  tran, tranif0, tranif1 If input ( supply0 or supply1 )  Output ( strong0, strong1 ) Otherwise  Output strength = input strength ELEN 468 Lecture 30 31 Signal Contention: Known Strength and Known Value Signal with greater strength dominates Same strength, different logic values   wand -> and, wor -> or Otherwise -> x driver1 We0 Pu1 driver2 Pu1 ELEN 468 Lecture 30 32 Synthesis ELEN 468 Lecture 30 33 Unexpected and Unwanted Latch Combinational logic must specify output value for all input values Incomplete case statements and conditionals (if) imply   Output should retain value for unspecified input values Unwanted latches ELEN 468 Lecture 30 34 Example of Unwanted Latch module myMux( y, selA, selB, a, b ); input selA, selB, a, b; output y; reg y; always @ ( selA or selB or a or b ) case ( {selA, selB} ) 2’b10: y = a; 2’b01: y = b; endcase endmodule b selA’ selB selA selB’ en y latch a ELEN 468 Lecture 30 35 Synthesis of case and if case and if statement imply priority   Synthesis tool will determine if case items of a case statement are mutually exclusive If so, synthesis will treat them with same priority and synthesize a mux A synthesis tool will treat casex and casez same as case   “x” and “z” will be treated as don’t cares Post-synthesis simulation result may be different from pre-synthesis simulation ELEN 468 Lecture 30 36 Example of if and case … input [3:0] data; output [1:0] code; reg [1:0] code; always @(data) begin // implicit priority if ( data[3] ) code = 3; else if (data[2]) code = 2; else if (data[1]) code = 1; else if (data[0]) code = 0; else code = 2’bx; end … … input [3:0] data; output [1:0] code; reg [1:0] code; always @(data) case (data) 4’b1000: code = 3; 4’b0100: code = 2; 4’b0010: code = 1; 4’b0001: code = 0; default: code = 2’bx; endcase … ELEN 468 Lecture 30 37 Synthesis of Register Variables A hardware register will be generated for a register variable when    It is referenced before value is assigned in a behavior Assigned value in an edge-sensitive behavior and is referenced by an assignment outside the behavior Assigned value in one clock cycle and referenced in another clock cycle Multi-phased latches may not be supported in synthesis ELEN 468 Lecture 30 38 Synthesis of Arithmetic Operators If corresponding library cell exists, an operator will be directly mapped to it Synthesis tool may select among different options in library cell, for example, when synthesize an adder    Small wordlength -> ripple-carry adder Long wordlength -> carry-look-ahead adder Need small area -> bit-serial adder Implementation of “*” and “/”   May be inefficient when both operands are variables If a multiplier or the divisor is a power of two, can be implemented through shift register ELEN 468 Lecture 30 39 Static Loops without Internal Timing Controls –> Combinational Logic module count1sA ( bit_cnt, data, clk, rst ); parameter data_width = 4; parameter cnt_width = 3; output [cnt_width-1:0] bit_cnt; input [data_width-1:0] data; input clk, rst; reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp; always @ ( posedge clk ) if ( rst ) begin cnt = 0; bit_cnt = 0; end else begin cnt = 0; tmp = data; for ( i = 0; i < data_width; i = i + 1 ) begin if ( tmp[0] ) cnt = cnt + 1; tmp = tmp >> 1; end bit_cnt = cnt; end endmodule ELEN 468 Lecture 30 40 Static Loops with Internal Timing Controls –> Sequential Logic module count1sB ( bit_cnt, data, clk, rst ); parameter data_width = 4; parameter cnt_width = 3; output [cnt_width-1:0] bit_cnt; input [data_width-1:0] data; input clk, rst; reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp; always @ ( posedge clk ) if ( rst ) begin cnt = 0; bit_cnt = 0; end else begin cnt = 0; tmp = data; for ( i = 0; i < data_width; i = i + 1 ) @ ( posedge clk ) begin if ( tmp[0] ) cnt = cnt + 1; tmp = tmp >> 1; end bit_cnt = cnt; end endmodule ELEN 468 Lecture 30 41 Non-Static Loops without Internal Timing Controls –> Not Synthesizable module count1sC ( bit_cnt, data, clk, rst ); parameter data_width = 4; parameter cnt_width = 3; output [cnt_width-1:0] bit_cnt; input [data_width-1:0] data; input clk, rst; reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp; always @ ( posedge clk ) if ( rst ) begin cnt = 0; bit_cnt = 0; end else begin cnt = 0; tmp = data; for ( i = 0; | tmp; i = i + 1 ) begin if ( tmp[0] ) cnt = cnt + 1; tmp = tmp >> 1; end bit_cnt = cnt; end endmodule ELEN 468 Lecture 30 42 Non-Static Loops with Internal Timing Controls –> Sequential Logic module count1sD ( bit_cnt, data, clk, rst ); parameter data_width = 4; parameter cnt_width = 3; output [cnt_width-1:0] bit_cnt; input [data_width-1:0] data; input clk, rst; reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp; always @ ( posedge clk ) if ( rst ) begin cnt = 0; bit_cnt = 0; end else begin: bit_counter cnt = 0; tmp = data; while ( tmp ) @ ( posedge clk ) begin if ( rst ) begin cnt = 0; disable bit_counter; end else begin cnt = cnt + tmp[0]; tmp = tmp >> 1; end bit_cnt = cnt; end end endmodule ELEN 468 Lecture 30 43 VHDL ELEN 468 Lecture 30 44 Example -- eqcomp4 is a four bit equality comparator -- Entity declaration entity eqcomp4 is port ( a, b: in bit_vector( 3 downto 0 ); equals: out bit ); -- equal is active high end eqcomp4; -- Architecture body architecture dataflow of eqcomp4 is begin equals <= ‘1’ when ( a = b ) else ‘0’; end dataflow; ELEN 468 Lecture 30 45 Behavioral Descriptions library ieee; use ieee.std_logic_1164.all; entity eqcomp4 is port ( a, b: in std_logic_vector( 3 downto 0 ); equals: out std_logic ); end eqcomp4; architecture behavioral of eqcomp4 is begin comp: process ( a, b ) -- sensitivity list begin if a = b then equals <= ‘1’; else equals <= ‘0’; -- sequential assignment endif end process comp; end behavioral; ELEN 468 Lecture 30 46 Dataflow Descriptions library ieee; use ieee.std_logic_1164.all; entity eqcomp4 is port ( a, b: in std_logic_vector( 3 downto 0 ); equals: out std_logic ); end eqcomp4; architecture dataflow of eqcomp4 is begin equals <= ‘1’ when ( a = b ) else ‘0’; end dataflow; -- No process -- Concurrent assignment ELEN 468 Lecture 30 47 Structural Descriptions library ieee; use ieee.std_logic_1164.all; entity eqcomp4 is port ( a, b: in std_logic_vector( 3 downto 0 ); end eqcomp4; equals: out std_logic ); use work.gatespkg.all; architecture struct of eqcomp4 is signal x : std_logic_vector( 0 to 3); begin u0: xnor2 port map ( a(0), b(0), x(0) ); -- component instantiation u1: xnor2 port map ( a(1), b(1), x(1) ); u2: xnor2 port map ( a(2), b(2), x(2) ); u3: xnor2 port map ( a(3), b(3), x(3) ); u4: and4 port map ( x(0), x(1), x(2), x(3), equals ); end struct; ELEN 468 Lecture 30 48 Test and Design For Testability ELEN 468 Lecture 30 49 Single Stuck-at Fault Three properties define a single stuck-at fault    Only one line is faulty The faulty line is permanently set to 0 or 1 The fault can be at an input or output of a gate Example: XOR circuit has 12 fault sites ( ) and 24 single Faulty circuit value stuck-at faults c 1 0 a d b e Good circuit value j s-a-0 g 1 0(1) 1(0) h i z 1 k f Test vector for h s-a-0 fault ELEN 468 Lecture 30 50 Stuck-Open Example Vector 1: test for A s-a-0 (Initialization vector) pMOS FETs 1 0 0 0 A B nMOS FETs Vector 2 (test for A s-a-1) VDD Stuckopen C 0 Two-vector s-op test can be constructed by ordering two s-at tests 1(Z) Good circuit states Faulty circuit states ELEN 468 Lecture 30 51 Stuck-Short Example Test vector for A s-a-0 PFETs 1 0 A VDD IDDQ path in faulty circuit Stuckshort B NFETs C Good circuit state 0 (X) Faulty circuit state ELEN 468 Lecture 30 52 Test Pattern for Stuck-At Faults a b c Ygood = (a●b●c)’ No need to enumerate all input combinations to detect a fault a SA1 b c Ya-SA1 = (b●c)’ Test pattern: {a,b,c} = 011 ELEN 468 Lecture 30 53 Fault Simulation Fault simulation Problem: Given  A circuit  A sequence of test vectors  A fault model  Determine  Fault coverage - fraction (or percentage) of modeled faults detected by test vectors  Set of undetected faults Motivation  Determine test quality and in turn product quality  Find undetected fault targets to improve tests ELEN 468 Lecture 30 54 Goal of Design for Testability (DFT) Improve    Controllability Observability Predictability ELEN 468 Lecture 30 55 Scan Storage Cell D Si N’/T Clk Q, So SSC SSC D ELEN 468 Lecture 30 Q 56 Integrated Serial Scan PI PO Combinational SFF logic SFF SCANOUT SFF Control SCANIN ELEN 468 Lecture 30 57 Interconnect Timing Optimization ELEN 468 Lecture 30 58 Buffers Reduce Wire Delay x/2 R rx/2 cx/4 cx/4 x/2 C R rx/2 cx/4 cx/4 C ∆t t_unbuf = R( cx + C ) + rx( cx/2 + C ) t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb x t_buf – t_unbuf = RC + tb – rcx2/4 ELEN 468 Lecture 30 59 Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 slackmin = -50 RAT = Required Arrival Time Slack = RAT - Delay slackmin = 50 Decouple capacitive load from critical path RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack = 50 RAT = 700 Delay = 400 Slack = 300 ELEN 468 Lecture 30 60 Slew Constraints When a buffer is inserted, assume ideal slew rate at its input Check slew rate at downstream buffers/sinks If slew is too large, candidate is discarded ELEN 468 Lecture 30 61 Cost-Slack Trade-off 1000 0 Slack (ps) 0 1 2 3 4 5 6 7 -1000 -2000 -3000 -4000 # of Buffers ELEN 468 Lecture 30 62 Wire Sizing: Monotone Property Ancestor edges cannot be narrower than downstream edges ELEN 468 Lecture 30 63 Area or Radius? Radius: the longest source-sink path length •Prim’s minimum spanning tree •Small total wire length •Long path to sinks •Dijkstra’s shortest path tree •Short path to sinks •Large total wire length ELEN 468 Lecture 30 64 Area Radius Trade-off Find a solution in middle   Not too much area Not too long radius How to find an ideal point? ELEN 468 Lecture 30 65 Gate Characteristics ELEN 468 Lecture 30 66 I-V Characteristics Cutoff region   d Vgs < Vt Ids = 0 g s Linear region   Vgs > Vt, 0 < Vds < Vgs-Vt Ids = B[(Vgs-Vt)Vds – V2ds/2] Ids Saturation region   Vgs > Vt, 0 < Vgs-Vt < Vds Ids = B(Vgs-Vt)2/2 B = a W/L Vds ELEN 468 Lecture 30 67 Falling Time Falling time = t1 + t2 t1 = Vout drops from 0.9Vdd to Vdd-Vt t2 = Vout drops from Vdd-Vt to 0.1Vdd Falling time = rising time ≈ k C / (B Vdd) Delay ≈ Falling time / 2 ELEN 468 Lecture 30 68 Gate Power Dissipation Leakage power Dynamic power Short circuit power ELEN 468 Lecture 30 69 Leakage Power Static Leakage current = a ● Vdd Leakage current = b/Vt Killer to CMOS technology Vdd Vdd Leakage out out Leakage Linear ELEN 468 Lecture 30 Saturation 70 Dynamic Power Occurs at each switching Pd = CL●Vdd2●fp fp switching frequency Vdd Vdd out Linear ELEN 468 Lecture 30 out Saturation 71 Short Circuit Power During switching, there is a short moment when both PMOS and CMOS are partially on Ps = Q●(Vdd-Vt)3●tr●fp tr rising time ELEN 468 Lecture 30 Input falling Vdd Vdd out out Input rising 72 Low Power Design ELEN 468 Lecture 30 73 Clock Gating Gate off clock to idle functional units   e.g., floating point units need logic to generate disable signal R Functional e unit g  increases complexity of control logic  consumes power  timing critical to avoid clock glitches at OR gate output  additional gate delay on clock signal clock disable  gating OR gate can replace a buffer in the clock distribution tree ELEN 468 Lecture 30 74 Active Power Reduction - Supply Voltage Reduction Static Low Supply Voltage Slow Fast Slow Dynamic High Supply Voltage Pros: • Always active in saving Cons: • Additional power delivery network • Needs special care of interface between power domains • signals close to Vt – excessive leakage and reduced noise margins Adjusting operation voltage and frequency to performance requirements: • High performance – high Vdd & frequency • Power saving – low Vdd & frequency Pros: • Doesn’t limit performance Cons: • Penalty of transition between different power states can be high (in performance and power) • Additional control logic ELEN 468 Lecture 30 75 Dynamic Frequency and Voltage Scaling Always run at the lowest supply voltage that meets the timing constraints   DFS (dynamic frequency scaling) saves only power DVS (dynamic voltage scaling) + DFS saves both energy and power A DVS+DFS system requires the following  A programmable clock generator (PLL)  PLL from 200MHz  700MHz in increments of 33MHz  A supply regulation loop that sets the minimum VDD necessary for operation at the desired frequency  32 levels of VDD from 1.1V to 1.6V  An operating system that sets the required frequency + supply voltage to meet the task completion deadlines  heavier load  ramp up VDD, when stable speed up clock  lighter load  slow down clock, when PLL locks onto new rate, ramp down VDD ELEN 468 Lecture 30 76 Design with Dual Vth Dual Vth evaluation Dual Vth design   Two flavors of transistors: slow – high Vth, fast – low Vth Low Vth are faster, but have ≈10X leakage ELEN 468 Lecture 30 77 Power Gating Using Sleep Transistors Or can reduce leakage by gating the supply rails when the circuit is in sleep mode   in normal mode, sleep = 0 and the sleep transistors must present as small a resistance as possible (via sizing) in sleep mode, sleep = 1, the transistor stack effect reduces leakage by orders of magnitude Or can eliminate leakage by switching off the power supply (but lose the memory state) ELEN 468 Lecture 30 78

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ELEN 468 Advanced Logic Design