Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CMOS Design Methodologies The Design Problem Source: sematech97 A growing gap between design complexity and design productivity Design Methodology • Design process traverses iteratively between three abstractions: behavior, structure, and geometry • More and more automation for each of these steps Design Analysis and Verification • Accounts for largest fraction of design time • More efficient when done at higher levels of abstraction - selection of correct analysis level can save multiple orders of magnitude in verification time • Two major approaches: – Simulation – Verification Digital Data treated as Analog Signal VD D Sp Vin Vou t 5.0 Gn ,p In Dn,p Out Vo ut (V) Bp 3.0 tpHL 1.0 Bn Sn –1.0 0 0.5 1 t (nsec) Circuit Simulation Both Time and Data treated as Analog Quantities Also complicated by presence of non-linear elements (relaxed in timing simulation) 1.5 2 Circuit versus Switch-Level Simulation 5.0 Circuit CIN OUT[2] 3.0 OUT[3] 1.0 –1.0 0 5 10 15 20 Switch time (nsec) Design analysis and simulation • Spice - exact but time consuming • discrete time steps • circuit models • timing simulation with partitioning and relaxation method Gate level simulation • faster than switch level • functional simulation • VHDL description used Structural Description of Accumulator entity accumulator is port ( -- definition of input and output terminals DI: in bit_vector(15 downto 0) -- a vector of 16 bit wide DO: inout bit_vector(15 downto 0); CLK: in bit ); end accumulator; architecture structure of accumulator is component reg -- definition of register ports port ( DI : in bit_vector(15 downto 0); DO : out bit_vector(15 downto 0); CLK : in bit ); end component; component add -- definition of adder ports port ( IN0 : in bit_vector(15 downto 0); IN1 : in bit_vector(15 downto 0); OUT0 : out bit_vector(15 downto 0) ); end component; -- definition of accumulator structure signal X : bit_vector(15 downto 0); begin add1 : add port map (DI, DO, X); -- defines port connectivity reg1 : reg port map (X, DO, CLK); end structure; Design defined as composition of register and full-adder cells (“netlist”) Data represented as {0,1,Z} Time discretized and progresses with unit steps Description language: VHDL Other options: schematics, Verilog Behavioral Description of Accumulator entity accumulator is port ( DI : in integer; DO : inout integer := 0; CLK : in bit ); end accumulator; architecture behavior of accumulator is begin process(CLK) variable X : integer := 0; -- intermediate variable begin if CLK = '1' then X < = DO + D1; DO <= X; end if; end process; end behavior; Design described as set of input-output relations, regardless of chosen implementation Data described at higher abstraction level (“integer”) Behavioral simulation of accumulator Discrete time Integer data (Synopsys Waves display tool) Design verification Electrical verification • checking number of inversions between two C2MOS gates • checking pull-up and pull down ratio in pseudo-NMOS gates • checking minimum driver size to maintain rise and fall times • checking charge sharing to satisfy noisemargins Design verification Timing verification • Spice too long simulation time • RC delay estimated using PenfieldRubinstein-Horowitz method • identification of critical path (avoid false paths) Timing Verification Critical path Enumerates and rank orders critical timing paths No simulation needed! (Synopsys-Epic Pathmill) Design verification Formal verification • components described behaviorally • circuit model obtained from component models • resulting circuit behavior computed with design specifications • no generally acceptable verifier exists Implementation approaches Custom circuit design • • • • • • labor intensive high time-to-market cost amortized over a large volume reuse as a library cell was popular in early designs layout editor, DRC, circuit extraction Layout editor 1. Polygon based (Magic) 2. Symbolic layout • transistor symbols • relative positioning • compaction • stick diagram description • design rules automatically satisfied • automatic pitch matching Custom Design – Layout Editor Magic Layout Editor (UC Berkeley) Symbolic Layout VDD 3 Out In 1 GND Stick diagram of inverter • Dimensionless layout entities • Only topology is important • Final layout generated by “compaction” program Design rule checking • on-line DRC - rules checked and errors flagged during layout • batch DRC - post design verification Circuit extraction Circuit schematic derived from layout transistors are build with proper geometry parasitic capacitances and resistances evaluated extraction of inductance requires 3D analysis Cell-based design • • • • reduced cost reduced time reduced integration density reduced performance Cell-based design • • • • standard cell compiled cells module generators macrocell place and route Standard cell • library contains basic logic cells - inverter, AND/NAND, OR/NOR, XOR/NXOR, flip-flop - AOI, MUX, adder, compactor, counter, decoder, encoder, • fan-in and fan-out specified • schematic uses cells from library • layout automatically generated Standard cell • cells have equal heights • cell rows separated by routing channels Standard cell design layout Standard cell and description Standard cell • large design cost amortized over a large number of designs • large number of different cells with different fan-ins • large fan-out for cells to be used in different designs • synthesis tools made standard cell design popular • standard cell design outperform PLA in area and speed • standard cell benefit from multi level logic synthesis Compiled cell • cell layout generated on the fly • transistor or gate level netlist used with transistor size specified • layout densities approach that of human designers Circuit schematics with transistor sizing Compiled cell Generated layout Automatic pitch matching Module generators • logic level cells not efficient for subcircuit design - shifters, adders, multipliers, data paths, PLAs, counters, memories • Macrocell generators - use design parameters like number of bits • data path compilers - use bit slice modules and repeat them N times - generate interconnections between modules Datapath compilers Feedtroughs used to improve routing Datapath compilers Datapath compiler results Macrocell place and route • channel routing - metal 2 horizontal segments - metal 1 vertical segments • over the block routing (3-6 metal layers used) Macrocell place and route Array-based design implementation To avoid slow fabrication process which takes 3-4 weeks : • • • • mask programmable arrays fuse based FPGAs nonvolatile FPGAs RAM based FPGAs Mask programmable arrays • gate-array - similar to standard cell • sea-of-gate - routed over the cells (high density) - wires added to make logic gates • challenge in design is to utilize the maximum cell capacity • utilization < 75% for random logic design Mask programmable arrays Macrocell Design Methodology Macrocell Floorplan: Defines overall topology of design, relative placement of modules, and global routes of busses, supplies, and clocks Interconnect Bus Routing Channel Macrocell-Based Design Example SRAM SRAM Data paths Standard cells Video-encoder chip [Brodersen92] Gate Array — Sea-of-gates polysilicon VD D rows of uncommitted cells metal possible contact GND In1 In2 Uncommited Cell In3 In4 routing channel Committed Cell (4-input NOR) Out Sea-of-gate Primitive Cells Oxide-isolation PMOS PMOS NMOS NMOS NMOS Using oxide-isolation Using gate-isolation Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS) Prewired Arrays Categories of prewired arrays (or fieldprogrammable devices): • Fuse-based (program-once) • Non-volatile EPROM based • RAM based Programmable Logic Devices PLA PROM PAL Fuse-based FPGA’s Actel sea-of-gate and standard cell approach Fuse-based FPGA’s Example : XOR gate obtained by setting : A=1, B=0, C=0, D=1, SA=SB=In1, S0=S1=In2 Fuse-based FPGA’s Anti-fuse provides short (low resistance) when blown out Nonvolatile FPGA’s • • • • • • programming similar to PROM erasable programmable logic devices - EPLD electrically erasable - EEPLD design partitioned into macrocells flip-flops used to make sequential circuits software used to program interconnections to optimize use of hardware • input specified from schematics, truth tables, state graphs, VHDL code EPLD Block Diagram Primary inputs Macrocell Courtesy Altera Corp. RAM based (volatile) FPGA’s • programming is fast and can be repeated many times • no high voltage needed • integration density is high • information lost when the power goes off XILINX FPGA’s • • • • configurable logic blocks CLBs used five input two output combinational blocks two D flip flops are edge or level triggered functionality and multiplexers controlled by RAM • RAM can be used as look-up table or a register file XILINX FPGA’s XILINX FPGA’s • each cell connected to 4 neighbors • routing channels provide local or global connections • switching matrices(RAM controlled) are used for switching between channels XILINX FPGA’s XILINX FPGA’s (XC4025) • • • • • 32 × 32 CLBs 25000 gates 422 k bites of RAM operates at 250 MHz 32 kbit adder uses 62 CLBs XILINX FPGA’s (XC4025) Design synthesis Circuit synthesis • derivation of the transistors schematics from logic functions - complementary CMOS - pass transistor - dynamic - DCVSL (differential cascode voltage switch logic) • transistor sizing - performance modeling using RC equivalent circuits - layout generation • synthesis not popular due to designers reluctance Logic synthesis • state transition diagrams, FSM, schematics, Boolean equations, truth tables, and HDL used • synthesis - combinational or sequential - multi level, PLA, or FPGA • logic optimization for - area, speed , power - technology mapping Logic optimization • Expresso - two level minimization tool (UCB) • state minimization and state encoding • MIS - multilevel logic synthesis (UCB) Example : S = (AB) Ci Co= AB + ACi + BCi Logic optimization Multilevel implementation of adder generated by MIS II cell library from University of Mississippi Architecture synthesis • • • • behavioral or high level synthesis optimizing translation e.g. pipelining Cathedral and HYPER tools HYPER tutorial and synthesis example: http://infopad.eecs.berkeley.edu/~hyper Architecture synthesis example Architecture synthesis Vertical and Orthogonal CMOS COSMOS Savas Kaya – Stack two MOSFETs under a common gate – Improve only hole mobility by using strained SiGe channel • pMOS transconductance equal to nMOS – Reduce parasitics due to wiring and isolating the sub-nets Conventional CMOS COSMOS: Complementary Orthogonal Stacked MOS Technology Base • Strained Si/SiGe layers – Built-in strain traps more carriers and increases mobility • Equal+high electron and hole mobilities (Jung et al.,p.460,EDL’03) • SOI (silicon-on-Insulator) substrates – active areas on buried oxide (BOX) layer – Reduces unwanted DC leakage and AC parasitics Mizuno et al., p.988, TED’03 Cheng et al., p.L48, SST’04 COSMOS Structure • Single common gate: mid-gap metal or poly-SiGe • Ultra-thin channels: 2-6nm to control threshold/leakage – Strained Si1-xGex for holes (x0.3) – Strained or relaxed Si for electrons • Substrate: SOI COSMOS Structure - 3D View I • Single gate stack: mid-gap metal or poly-SiGe – Must be engineered for a symmetric threshold In units of mm COSMOS Structure - 3D View II • Conventional self-aligned contacts – Doped S/D contacts: p- (blue) or n- (red) type • Inter-dependence between gate dimensions: W L L nMOS W pMOS COSMOS Gate Control • A single gate to control both channels – High-mobility strained Si1-xGex (x0.3) buried hole channel • High Ge% eliminates parallel conduction and improves mobility • Lowers the threshold voltage VT – Electrons are in a surface channel – Requires fine tuning for symmetric operation 3D Characteristics: 40nm Device • Symmetric operation – No QM corrections • Lower VT – Features in sub-threshold operation • Related to p-i-n parasitic diode included in 3D COSMOS Inverter • No additional processing – Just isolate COSMOS layers and establish proper contacts – Significantly shorter output metallization Top view Peel-off top views 3D TCAD Verification • Inverter operation verified in 3D 40nm COSMOS NOT gate driving CL=1fF load Applications • Low power static CMOS: – Should outperform conventional devices in terms of speed • Multiple input circuit example: NOR gate • Area tight designs : – FPGA, Sensing/testing, mpower etc. ?