Download CSE 477. VLSI Systems Design

Advanced Digital Integrated Circuits Static CMOS Logic CMOS Inverter: A First Look VDD Vin Vout CL CMOS Inverter: Steady State Response VDD VDD VOL = 0 VOH = VDD VM = f(Rn, Rp) Rp Vout = 1 Vout = 0 Rn Vin = 0 Vin = V DD CMOS Properties  Full rail-to-rail swing  high noise margins  Logic levels not dependent upon the relative device sizes  transistors can be minimum size  ratioless  Always a path to Vdd or GND in steady state  low output impedance (output resistance in k range)  large fan-out (albeit with degraded performance)  Extremely high input resistance (gate of MOS transistor is near perfect insulator)  nearly zero steady-state input current  No direct path steady-state between power and ground  no static power dissipation  Propagation delay function of load capacitance and resistance of transistors Review: Short Channel I-V Plot (NMOS) X 10-4 2.5 VGS = 2.5V 2 VGS = 2.0V 1.5 1 VGS = 1.5V 0.5 VGS = 1.0V 0 0 0.5 1 1.5 2 2.5 VDS (V) NMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = 0.4V Review: Short Channel I-V Plot (PMOS)  All polarities of all voltages and currents are reversed -2 VDS (V) -1 0 0 VGS = -1.0V -0.2 VGS = -1.5V -0.4 -0.6 VGS = -2.0V -0.8 VGS = -2.5V -1 X 10-4 PMOS transistor, 0.25um, Ld = 0.25um, W/L = 1.5, VDD = 2.5V, VT = -0.4V Transforming PMOS I-V Lines  Want common coordinate set Vin, Vout, and IDn IDn IDSp = -IDSn VGSn = Vin ; VGSp = Vin - VDD VDSn = Vout ; VDSp = Vout - VDD Vout Vin = 0 Vin = 0 Vin = 1.5 Vin = 1.5 VGSp = -1 VGSp = -2.5 Mirror around x-axis Vin = VDD + VGSp IDn = -IDp Horiz. shift over VDD Vout = VDD + VDSp CMOS Inverter Load Lines PMOS 2.5 NMOS X 10-4 Vin = 0V Vin = 2.5V 2 Vin = 0.5V Vin = 2.0V 1.5 Vin = 1.0V 1 Vin = 2V 0.5 Vin = 1V Vin = 1.5V Vin = 1.5V Vin = 0.5V Vin = 1.5V Vin = 1.0V Vin = 2.0V Vin = 0.5V 0 Vin = 2.5V 0 0.5 1 1.5 Vout (V) 2 2.5 Vin = 0V 0.25um, W/Ln = 1.5, W/Lp = 4.5, VDD = 2.5V, VTn = 0.4V, VTp = -0.4V CMOS Inverter VTC NMOS off PMOS res 2.5 NMOS sat PMOS res Vout (V) 2 1.5 NMOS sat PMOS sat 1 NMOS res PMOS sat 0.5 NMOS res PMOS off 0 0 0.5 1 1.5 Vin (V) 2 2.5 CMOS Inverter: Switch Model of Dynamic Behavior VDD VDD Rp Vout Vout CL Vin = 0 CL Rn Vin = V DD  Gate response time is determined by the time to charge CL through Rp (discharge CL through Rn) Simulated Inverter VM VM is relatively insensitive to variations in device ratio  1.5 1.4 1.3 setting the ratio to 3, 2.5 and 2 gives VM’s of 1.22V, 1.18V, and 1.13V  1.2 1.1 Increasing the width of the PMOS moves VM towards VDD  1 0.9 0.8 0.1 1 (W/L)p/(W/L)n Note: x-axis is semilog ~3.4 10 Increasing the width of the NMOS moves VM toward GND  Noise Margins Determining VIH and VIL By definition, VIH and VIL are where dVout/dVin = -1 (= gain) 3 VOH = VDD NMH = VDD - VIH NML = VIL - GND 2 VM Approximating: VIH = VM - VM /g VIL = VM + (VDD - VM )/g 1 VOL = GND0 VIL Vin VIH A piece-wise linear approximation of VTC So high gain in the transition region is very desirable CMOS Inverter VTC from Simulation 0.25um, (W/L)p/(W/L)n = 3.4 (W/L)n = 1.5 (min size) VDD = 2.5V 2.5 Vout (V) 2 VM  1.25V, g = -27.5 1.5 VIL = 1.2V, VIH = 1.3V NML = NMH = 1.2 (actual values are VIL = 1.03V, VIH = 1.45V NML = 1.03V & NMH = 1.05V) 1 0.5 0 0 0.5 1 Vin (V) 1.5 2 2.5 Output resistance low-output = 2.4k high-output = 3.3k Combinatorial vs. Sequential  Combinatorial Circuit      Only depend on the input No memory effect Operational/Computational structures Time does not matter Sequential Circuit    Memory based Time matters Used to keep the state of the circuit CMOS Circuit Styles  Static complementary CMOS - except during switching, output connected to either VDD or GND via a lowresistance path  high noise margins - full rail to rail swing - VOH and VOL are at VDD and GND, respectively      low output impedance, high input impedance no steady state path between VDD and GND (no static power consumption) delay a function of load capacitance and transistor resistance comparable rise and fall times (under the appropriate transistor sizing conditions) Dynamic CMOS - relies on temporary storage of signal values on the capacitance of high-impedance circuit nodes   simpler, faster gates increased sensitivity to noise Static Complementary CMOS  Pull-up network (PUN) and pull-down network (PDN) VDD PMOS transistors only In1 In2 PUN InN In1 In2 InN pull-up: make a connection from VDD to F when F(In1,In2,…InN) = 1 F(In1,In2,…InN) PDN pull-down: make a connection from F to GND when F(In1,In2,…InN) = 0 NMOS transistors only PUN and PDN are dual logic networks  One and ONLY one of the networks conduct @ DC Threshold Drops VDD PUN VDD S D VDD D 0  VDD VGS S CL VDD  0 PDN D VDD S CL 0  VDD - VTn CL VGS VDD  |VTp| S D CL Construction of PDN  NMOS devices in series implement a NAND function A•B A B  On NMOS Creates a zero NMOS devices in parallel implement a NOR function A+B A B Dual PUN and PDN  PUN and PDN are dual networks   DeMorgan’s theorems A+B=A•B [!(A + B) = !A • !B or !(A | B) = !A & !B] A•B=A+B [!(A • B) = !A + !B or !(A & B) = !A | !B] a parallel connection of transistors in the PUN corresponds to a series connection of the PDN  Complementary gate is naturally inverting (NAND, NOR, AOI, OAI)  Number of transistors for an N-input logic gate is 2N CMOS NAND A A B F 0 0 1 0 1 1 1 0 1 1 1 0 B A•B A B A B CMOS NOR B A A+B A B A B A B F 0 0 1 0 1 0 1 0 0 1 1 0 Complex CMOS Gate B A C D OUT = !(D + A • (B + C)) A D B C Standard Cell Layout Methodology Routing channel VDD signals GND What logic function is this? Cell Design   Standard Cells  General purpose logic  Can be synthesized  Same height, varying width Datapath Cells  For regular, structured designs (arithmetic)  Includes some wiring in the cell  Fixed height and width Standard Cell Layout Methodology – 1980s Routing channel VDD signals GND Standard Cell Layout Methodology – 1990s Mirrored Cell No Routing channels VDD VDD M2 M3 GND Mirrored Cell GND Standard Cells N Well VDD Cell height 12 metal tracks Metal track is approx. 3 + 3 Pitch = repetitive distance between objects Cell height is “12 pitch” 2 Cell boundary In Out GND Rails ~10 Standard Cells With minimal diffusion routing VDD VDD VDD M2 In Out In Out In Out M1 GND GND Standard Cells Proprietary cells, Artizan, Virage, etc. VDD 2-input NAND gate VDD B A B Out A GND Multi-Fingered Transistors One finger Two fingers (folded) Less diffusion capacitance XNOR/XOR Implementation XNOR XOR A A AB B A B B AB A B AB  How many transistors in each?  Can you create the stick transistor layout for the lower left circuit? AB VTC is Data-Dependent 0.5/0.25 NMOS 0.75 /0.25 PMOS 3 A M3 B M4 2 A,B: 0 -> 1 B=1, A:0 -> 1 A=1, B:0->1 F= A • B D A S D B M1 VGS1 = VB 1 M2 VGS2 = VA –VDS1 S weaker PUN Cint 0 0  1 2 The threshold voltage of M2 is higher than M1 due to the body effect () VTn1 = VTn0 VTn2 = VTn0 + ((|2F| + Vint) - |2F|) since VSB of M2 is not zero (when VB = 0) due to the presence of Cint Static CMOS Full Adder Circuit !Cout = !Cin & (!A | !B) | (!A & !B) !Sum = Cout & (!A | !B | !Cin) | (!A & !B & !Cin) B A B B A Cin A B Cin Cin !Cout !Sum A A B B A Cin A B Cin A B Cout = Cin & (A | B) | (A & B) Sum = !Cout & (A | B | Cin) | (A & B & Cin) Switch Delay Model Req A A Rp A Rp Rp B Rn Rp CL Cint A Rn A Cint A NAND2 Rp A B Rn B INV CL Rn Rn A B CL NOR2 Input Pattern Effects on Delay  Delay is dependent on the pattern of inputs  Low to high transition  Rp A Rp both inputs go low - delay is 0.69 Rp/2 CL B  one input goes low - delay is 0.69 Rp CL Rn CL B Rn A  High to low transition  both inputs go high - delay is 0.69 2Rn CL Cint Delay Dependence on Input Patterns 3 A=B=10 2.5 Voltage [V] 2 A=1 0, B=1 1.5 A=1, B=10 1 Input Data Delay Pattern (psec) A=B=01 67 A=1, B=01 64 A= 01, B=1 61 A=B=10 45 A=1, B=10 80 A= 10, B=1 81 0.5 0 -0.5 0 100 200 time [ps] 300 400 NMOS = 0.5m/0.25 m PMOS = 0.75m/0.25 m CL = 100 fF Transistor Sizing Rp 2 A Rp B Rn 2 B 2 Rn A Rp 4 B 2 CL Cint Rp 4 Cint A 1 Rn Rn A B CL 1 Transistor Sizing a Complex CMOS Gate A B 8 6 C 8 6 4 3 D 4 6 OUT = D + A • (B + C) A D 2 1 B 2C 2 Fan-In Considerations A B C D A CL B C3 C C2 D C1 Distributed RC model (Elmore delay) tpHL = 0.69 Reqn(C1+2C2+3C3+4CL) Propagation delay deteriorates rapidly as a function of fan-in – quadratically in the worst case. tp as a Function of Fan-In 1250 quadratic tp (psec) 1000 Gates with a fan-in greater than 4 should be avoided. 750 tpHL 500 250 tp tpL linear H 0 2 4 6 8 fan-in 10 12 14 16 tp as a Function of Fan-Out tpNOR2 tpNAND2 tpINV tp (psec) 2 All gates have the same drive current. Slope is a function of “driving strength” 4 6 8 10 eff. fan-out 12 14 16 tp as a Function of Fan-In and Fan-Out  Fan-in: quadratic due to increasing resistance and capacitance  Fan-out: each additional fan-out gate adds two gate capacitances to CL tp = a1FI + a2FI2 + a3FO RC Tree Definitions  RC tree characteristics  A unique resistive path exists between the source node and any node of the network s r1 r2 1 c1 - Single input (source) node, s - All capacitors are between a node and GND c2 r3 4 r4 3 c3 - No resistive loops  2 c4 ri i Path resistance (sum of the resistances on the path from the input node to node i) ci i rii =  rj  (rj  [path(s  i)] j=1  Shared path resistance (resistance shared along the paths from the input node to nodes i and k) N rik =  rj  (rj  [path(s  i)  path(s  k)]) j=1  A typical wire is a chain network with (simplified) Elmore N delay of DN =  cirii i=1 Chain Network Elmore Delay D1=c1r1 r1 1 Vin c1 r2 D2=c1r1 + c2(r1+r2) 2 c2 ri-1 i-1 ci-1 ri rN i ci N cN Di=c1r1+ c2(r1+r2)+…+ci(r1+r2+…+ri) N Elmore delay equation i DN =  cirii =  ci  rj Di=c1req+ 2c2req+ 3c3req+…+ icireq VN Elmore delay in NAND4 Fast Complex Gates: Design Technique 1  Transistor sizing   as long as fan-out capacitance dominates Progressive sizing InN CL MN In3 M3 C3 In2 M2 C2 In1 M1 C1 Distributed RC line M1 > M2 > M3 > … > MN (the fet closest to the output is the smallest) Can reduce delay by more than 20%; decreasing gains as technology shrinks Fast Complex Gates: Design Technique 2  Transistor ordering critical path In3 1 M3 charged CL In2 1 M2 C2 charged In1 M1 01 C1 charged delay determined by time to discharge CL, C1 and C2 critical path 01 In1 M3 CLcharged In2 1 M2 C2 discharged In3 1 M1 C1 discharged delay determined by time to discharge CL Sizing and Ordering Effects A 3 B 3 C 3 D A 44 B 45 C 46 C2 D 47 C1 3 CL= 100 fF C3 Progressive sizing in pull-down chain gives up to a 23% improvement. Input ordering saves 5% critical path A – 23% critical path D – 17% Fast Complex Gates: Design Technique 3  Alternative logic structures F = ABCDEFGH Fast Complex Gates: Design Technique 4  Isolating fan-in from fan-out using buffer insertion CL CL Symmetric Gates  Inputs can be made perfectly symmetric 2 2 A 1 1 B 1 1 Y More Combinatorial Circuits  There are two other important combinatorial circuit families    Transmission Gates Logic Dynamic Logic Both of these families are useful in the design of high performance circuits NMOS Transistors in Series/Parallel  Primary inputs drive both gate and source/drain terminals  NMOS switch closes when the gate input is high A B X Y X = Y if A and B A X  B X = Y if A or B Y Remember - NMOS transistors pass a strong 0 but a weak 1 PMOS Transistors in Series/Parallel  Primary inputs drive both gate and source/drain terminals  PMOS switch closes when the gate input is low A B X Y X = Y if A and B = A + B A X  B X = Y if A or B = A  B Y Remember - PMOS transistors pass a strong 1 but a weak 0 Pass Transistor (PT) Logic B B A 0 B F =AB A B F =AB 0 Gate is static – a low-impedance path exists to both supply rails under all circumstances  N transistors instead of 2N  No static power consumption  Ratioless  Bidirectional (versus undirectional)  VTC of PT AND Gate B 1.5/0.25 2 A Vout, V 0.5/0.25 B=VDD, A=0VDD 1 0.5/0.25 B 0 0.5/0.25  A=VDD, B=0VDD A=B=0VDD F= AB 0 0 1 2 Pure PT logic is not regenerative - the signal gradually degrades after passing through a number of PTs (can fix with static CMOS inverter insertion) Differential PT Logic (CPL) B A A B B PT Network A A B B Inverse PT Network B B F F=AB B B F=AB B AND/NAND B A F=A+B B A A F B A A F F A F=AB A F=A+B B F=AB A OR/NOR XOR/XNOR CPL Properties  Differential so complementary data inputs and outputs are always available (so don’t need extra inverters)  Still static, since the output defining nodes are always tied to VDD or GND through a low resistance path  Design is modular; all gates use the same topology, only the inputs are permuted.  Simple XOR makes it attractive for structures like adders  Fast (assuming number of transistors in series is small)  Additional routing overhead for complementary signals  Still have static power dissipation problems Solution 3: Transmission Gates (TGs)  Most widely used solution C C A A B B C C C = GND A = VDD B C = VDD  C = GND A = GND B C = VDD Full swing bidirectional switch controlled by the gate signal C, A = B if C = 1 Resistance of TG W/Lp=0.50/0.25 30 0V 25 Rn Rp Resistance, k 20 2.5V Vout Rp Rn 15 2.5V 10 Req W/Ln=0.50/0.25 5 0 0 1 2 TG Multiplexer S S S S F S VDD In2 S F In1 S F = !(In1  S + In2  S) GND In1 In2 Dynamic CMOS  In static circuits at every point in time (except when switching) the output is connected to either GND or VDD via a low resistance path.   fan-in of N requires 2N devices Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes.   requires only N + 2 transistors takes a sequence of precharge and conditional evaluation phases to realize logic functions Dynamic Gate CLK CLK Mp off Mp on Out In1 In2 In3 CLK CL PDN 1 Out !((A&B)|C) A C B Me CLK Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1) off Me on Conditions on Output  Once the output of a dynamic gate is discharged, it cannot be charged again until the next precharge operation.  Inputs to the gate can make at most one transition during evaluation.  Output can be in the high impedance state during and after evaluation (PDN off), state is stored on CL Properties of Dynamic Gates  Logic function is implemented by the PDN only   number of transistors is N + 2 (versus 2N for static complementary CMOS) should be smaller in area than static complementary CMOS  Full swing outputs (VOL = GND and VOH = VDD)  Nonratioed - sizing of the devices is not important for proper functioning (only for performance)  Faster switching speeds     reduced load capacitance due to lower number of transistors per gate (Cint) so a reduced logical effort reduced load capacitance due to smaller fan-out (Cext) no Isc, so all the current provided by PDN goes into discharging CL Ignoring the influence of precharge time on the switching speed of the gate, tpLH = 0 but the presence of the evaluation transistor slows down the tpHL Properties of Dynamic Gates, con’t  Power dissipation should be better      But power dissipation can be significantly higher due to  higher transition probabilities  extra load on CLK PDN starts to work as soon as the input signals exceed VTn, so set VM, VIH and VIL all equal to VTn   consumes only dynamic power – no short circuit power consumption since the pull-up path is not on when evaluating lower CL- both Cint (since there are fewer transistors connected to the drain output) and Cext (since there the output load is one per connected gate, not two) by construction can have at most one transition per cycle – no glitching low noise margin (NML) Needs a precharge clock Dynamic Behavior CLK 2.5 Out Evaluate In1 1.5 In2 In3 In & CLK 0.5 In4 CLK Out Precharge -0.5 0 0.5 #Trns VOH VOL VM NMH 6 2.5V 0V VTn 2.5-VTn Time, ns NML VTn tpHL 1 tpLH tp 110ps 0ns 83ps Gate Parameters are Time Independent The amount by which the output voltage drops is a strong function of the input voltage and the available evaluation time.  Noise needed to corrupt the signal has to be larger if the evaluation time is short – i.e., the switching threshold is truly time independent. CLK 2.5 Voltage (V)  Vout (VG=0.45) 1.5 Vout (VG=0.55) 0.5 Vout (VG=0.5) VG -0.5 0 20 40 60 Time (ns) 80 100 Issues in Dynamic Design 1: Charge Leakage CLK 4 CLK 3 Mp Out 1 CL A=0 2 CLK Evaluate VOut Me Precharge Leakage sources Minimum clock rate of a few kHz Impact of Charge Leakage Output settles to an intermediate voltage determined by a resistive divider of the pull-up and pull-down networks  Once the output drops below the switching threshold of the fan-out logic gate, the output is interpreted as a low voltage. CLK 2.5 Voltage (V)  Out 1.5 0.5 -0.5 0 20 Time (ms) 40 A Solution to Charge Leakage  Keeper compensates for the charge lost due to the pulldown leakage paths. Keeper CLK Mp Mkp !Out A CL B CLK Me Same approach as level restorer for pass transistor logic Issues in Dynamic Design 2: Charge Sharing CLK Mp Out A CL B=0 CLK Ca Me Charge stored originally on CL is redistributed (shared) over CL and CA leading to static power consumption by downstream gates and possible circuit malfunction. Cb When Vout = - VDD (Ca / (Ca + CL )) the drop in Vout is large enough to be below the switching threshold of the gate it drives causing a malfunction. Solution to Charge Redistribution CLK Mp Mkp CLK Out A B CLK Me Precharge internal nodes using a clockdriven transistor (at the cost of increased area and power) Issues in Dynamic Design 3: Backgate Coupling  Susceptible to crosstalk due to 1) high impedance of the output node and 2) capacitive coupling  Out2 capacitively couples with Out1 through the gate-source and gate-drain capacitances of M4 CLK Mp A=0 M1 B=0 M2 CLK Out1 =1 CL1 M6 M5 Out2 =0 M4 CL2 M3 Me Dynamic NAND Static NAND In Issues in Dynamic Design 4: Clock Feedthrough  A special case of capacitive coupling between the clock input of the precharge transistor and the dynamic output node CLK Mp A CL B CLK Out Me Coupling between Out and CLK input of the precharge device due to the gatedrain capacitance. So voltage of Out can rise above VDD. The fast rising (and falling edges) of the clock couple to Out. Clock Feedthrough CLK Clock feedthrough Out In1 2.5 In2 1.5 In3 In4 In & CLK 0.5 Out CLK -0.5 0 0.5 Time, ns 1 Clock feedthrough Cascading Dynamic Gates V CLK Mp CLK CLK Mp Out1 Out2 In In CLK Me CLK Out1 VTn Me V Out2 t Only a single 0  1 transition allowed at the inputs during the evaluation period! Domino Logic CLK In1 In2 In3 CLK Mp 11 10 PDN Me Out1 CLK Mp Mkp 00 01 In4 In5 CLK PDN Me Out2 Why Domino? CLK In1 Ini PDN Inj CLK Ini Inj PDN Ini Inj PDN Like falling dominos! Ini Inj PDN

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CSE 477. VLSI Systems Design