Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE260B – CSE241A Winter 2005 Introduction and ASIC Flow Instructor: Bao Liu Website: http://vlsicad.ucsd.edu/courses/ece260b-w05 ECE 260B – CSE 241A Intro and ASIC Flow .1 Slides courtesy of Prof. Andrew B. Kahng http://vlsicad.ucsd.edu Why not a Silicon Compiler? Ideal Reality Silicon Compiler Design methodology Simple Complex No human interaction Lots of human interaction Spec/Matlab/VHDL synthesis verification ? placement routing Circuit on Silicon ECE 260B – CSE 241A Intro and ASIC Flow .2 http://vlsicad.ucsd.edu Teams in a Design Process VLSI designers CAD developers Process people VLSI designers Spec/Matlab/VHDL CAD developers Testing team synthesis verification ? placement routing Testing team Circuit on Silicon Process people ECE 260B – CSE 241A Intro and ASIC Flow .3 http://vlsicad.ucsd.edu Class Objectives Learn about ASIC implementation flow: VerilogGDSII Semi-custom implementation of CMOS digital circuits, and optimization with respect to different constraints: area, speed, power, reliability, cost Understand impact of constraints, tradeoffs, technology scaling Get some feel for each phase of the implementation flow Learn about building blocks: wires, gates, memories Prepare for future design experiences Get some feel for industry-standard design tools, libraries - Will mostly use Cadence BuildGates and SOC Encounter, and Artisan TSMC 0.18/0.13um libraries Synthesize small cores from RTL into GDSII ECE 260B – CSE 241A Intro and ASIC Flow .4 http://vlsicad.ucsd.edu Outline Introduction Technology Evolution Silicon Complexity System Complexity Design Flows Traditional State of the Art - Design Metrics - Design Closure ECE 260B – CSE 241A Intro and ASIC Flow .5 http://vlsicad.ucsd.edu Technology Evolution: Cost and Integration Drivers Moore’s Law is about cost Pentium 4 die shot: Increased integration, decreased cost more possibilities for semiconductor-based products ECE 260B – CSE 241A Intro and ASIC Flow .6 2.2cm Slide courtesy of Mary Jane Irwin, PSU http://vlsicad.ucsd.edu Sense of Scale (Scaling) What fits on a VLSI Chip today? State of the art logic chip 20mm on a side (400mm2) 0.13mm drawn gate length 0.5mm wire pitch 8-level metal 0.5mm (8 l) For comparison 32b RISC processor - 8K l x 16Kl 64b FP Processor SRAM - about 32l x 32l per bit - 8K x 16K is 128Kb, 16KB 0.13mm (2 l) DRAM 32b RISC Processor 20mm (40,000 wire pitches) 320,000 l - 8l x 16l per bit - 8K x16K is 1Mb, 128KB ECE 260B – CSE 241A Intro and ASIC Flow .7 Slide courtesy of Ken Yang, UCLA http://vlsicad.ucsd.edu MOS Transistor Scaling (1974 to present) S=0.7 [0.5x per 2 nodes] Poly Pitch Metal Pitch (Typical MPU/ASIC) (Typical DRAM) Decreased transistor/feature sizes Increased variability (tox, BEOL, DFM, SEU, etc.) Short channel effect, leakage power ECE 260B – CSE 241A Intro and ASIC Flow .8 Source: 2001 ITRS - Exec. Summary, ORTC Figure http://vlsicad.ucsd.edu HP / LOP / LSTP Device Roadmaps Parameter Vdd Vth (V) Ion (uA/um) CV/I (ps) Ioff (uA/um) Type 99 01 03 05 07 10 13 16 MPU LOP LSTP MPU LOP LSTP MPU LOP LSTP MPU LOP LSTP 1.5 1.3 1.3 0.21 0.34 0.51 1041 636 300 2.00 3.50 4.21 1.2 1.2 1.2 0.19 0.34 0.51 926 600 300 1.63 2.55 4.61 1.0 1.1 1.2 0.13 0.36 0.53 967 600 400 1.16 2.02 2.96 0.9 1.0 1.2 0.09 0.33 0.54 924 600 400 0.86 1.58 2.51 0.7 0.9 1.1 0.05 0.29 0.52 1091 700 500 0.66 1.14 1.81 0.6 0.8 1.0 0.021 0.29 0.49 1250 700 500 0.39 0.85 1.43 0.5 0.7 0.9 0.003 0.25 0.45 1492 800 600 0.23 0.56 0.91 0.4 0.6 0.9 0.003 0.22 0.45 1507 900 800 0.16 0.35 0.57 MPU LOP LSTP 0.00 1e-4 1e-6 0.01 1e-4 1e-6 0.07 1e-4 1e-6 0.30 3e-4 1e-6 1.00 7e-4 1e-6 3 1e-3 3e-6 7 3e-3 7e-6 10 1e-2 1e-5 ECE 260B – CSE 241A Intro and ASIC Flow .9 http://vlsicad.ucsd.edu SEMATECH Prototype BEOL stack, 2000 Wire Global (up to 5) Via Passivation Dielectric Etch Stop Layer Dielectric Capping Layer Copper Conductor with Barrier/Nucleation Layer Intermediate (up to 4) Local (2) Pre Metal Dielectric Tungsten Contact Plug Reverse-scaled global interconnects Growing interconnect complexity Performance critical global interconnects ECE 260B – CSE 241A Intro and ASIC Flow .10 http://vlsicad.ucsd.edu Intel 130nm BEOL Stack Intel 6LM 130nm process with vias shown (connecting layers) Aspect ratio = thickness / minimum width ECE 260B – CSE 241A Intro and ASIC Flow .11 http://vlsicad.ucsd.edu Interconnect Capacitance: Parallel Plate Model ILD = interlevel dielectric L W T HILD SiO2 Substrate Bottom plate of cap can be another metal layer Cint = eox * (W*L / tox) ECE 260B – CSE 241A Intro and ASIC Flow .12 http://vlsicad.ucsd.edu Line Dimensions and Fringing Capacitance Lateral cap w S Capacitive coupling Crosstalk effect Signal integrity ECE 260B – CSE 241A Intro and ASIC Flow .13 http://vlsicad.ucsd.edu Interconnect Evolution and Modeling Needs Before 1990, wires were thick and wide while devices were big and slow In the 1990s, scaling (by scale factor S) led to smaller and faster devices and smaller, more resistive wires Large wiring capacitances and device resistances Wiring resistance << device resistance Model wires as capacitances only Reverse scaling of properties of wires RC models became necessary In the 2000s, frequencies are high enough that inductance has become a major component of total impedance ECE 260B – CSE 241A Intro and ASIC Flow .14 http://vlsicad.ucsd.edu Evolving Interconnects Affect Timing Interconnect capacitance > gate input capacitance Better prediction Interconnect resistance no longer ignorable Better modeling: distributed R(L)C network, AWE, etc. Effective capacitance < total load capacitance Interconnect delay > gate delay for sub-micron technologies ECE 260B – CSE 241A Intro and ASIC Flow .15 http://vlsicad.ucsd.edu Sub-Wavelength Optical Lithography What are implications of this picture? •Slide courtesy of Numerical Technologies, Inc. ECE 260B – CSE 241A Intro and ASIC Flow .16 http://vlsicad.ucsd.edu …Complexity of Photomasks How many wafers, on average, are printed with a mask set? ECE 260B – CSE 241A Intro and ASIC Flow .17 http://vlsicad.ucsd.edu Summary of Technology Scaling Scaling of 0.7x every three .25u 1997 5LM .18u 1999 6LM .13u 2002 7LM (two?) years .10u 2005 7LM .07u 2008 8LM .05u 2011 9LM Interconnect delay dominates system performance consumes up to 70% of clock cycle Cross coupling capacitance is dominating cross capacitance 100%, ground capacitance 0% ground capacitance is 90% in .18u huge signal integrity implications (e.g., guardbands in static analysis approaches) Multiple clock cycles required to cross chip whether 3 or 15 not as important as fact of “multiple” > 1 ECE 260B – CSE 241A Intro and ASIC Flow .18 http://vlsicad.ucsd.edu New Materials Implications Lower dielectric permittivity reduces total capacitance doesn’t change cross-coupled / grounded capacitance proportions Copper metallization reduces RC delay avoids electromigration (factor of 4-5 ?) thinner deposition reduces cross cap Multiple layers of routing enabled by planarization; 10% extra cost per layer reverse-scaled top-level interconnects relative routing pitch may increase room for shielding ECE 260B – CSE 241A Intro and ASIC Flow .19 http://vlsicad.ucsd.edu Technical Issues Manufacturability (chip can't be built) antenna rules minimum area rules for stacked vias CMP (chemical mechanical polishing) area fill rules layout corrections for optical proximity effects in subwavelength lithography; associated verification issues Signal integrity (failure to meet timing targets) crosstalk induced errors timing dependence on crosstalk IR drop on power supplies Reliability (design failures in the field) electromigration on power supplies hot electron effects on devices wire self heat effects on clocks and signals ECE 260B – CSE 241A Intro and ASIC Flow .20 http://vlsicad.ucsd.edu Noise Analog design concerns are due to physical noise sources because of discreteness of electronic charge and stochastic nature of electronic transport processes example: thermal noise, flicker noise, shot noise Digital circuits due to large, abrupt voltage swings, create deterministic noise which is several orders of magnitude higher than stochastic physical noise still digital circuits are prevalent because they are inherently immune to noise Technology scaling and performance demands make noisiness of digital circuits a big problem Courtesy Hormoz/Muddu, ASIC99 Silicon Complexity Challenges Silicon Complexity = impact of process scaling, new materials, new device/interconnect architectures Non-ideal scaling (leakage, power management, circuit/device innovation, current delivery) Coupled high-frequency devices and interconnects (signal integrity analysis and management) Manufacturing variability (library characterization, analog and digital circuit performance, error-tolerant design, layout reusability, static performance verification methodology/tools) Scaling of global interconnect performance (communication, synchronization) Decreased reliability (soft error uncertainty, gate insulator tunneling and breakdown, joule heating and electromigration) Complexity of manufacturing handoff (reticle enhancement and mask writing/inspection flow, manufacturing NRE cost) ECE 260B – CSE 241A Intro and ASIC Flow .22 If you don’t know a term, ask… http://vlsicad.ucsd.edu In a PDA… Reference Design: personal digital assistant (PDA) Composed of CPU, DSP, peripheral I/O, and memory ECE 260B – CSE 241A Intro and ASIC Flow .23 http://vlsicad.ucsd.edu Required Performance for Multi-Media Processing 0.01 0.1 Video MPEG1 Extraction JPEG Audio Voice 1 10 GOPS 100 Compression MP/MLMPEG2 Extraction MP/HL MPEG4 Sentence Translation Voice Auto Translation Dolby-AC3 MPEG Word Recognition Graphics 3D Graphics 10Mpps 100Mpps 2D Graphics Communication SW Defined Radio VoIP Modem Face Recognition Recognition Modem Voice Print RecognitionMoving Picture Recognition FAX GOPS: Giga Operations Per Second ECE 260B – CSE 241A Intro and ASIC Flow .24 http://vlsicad.ucsd.edu …Implemented With an SoC MM Application 0.18um / 400MHz / 470mW (typical) MP3 JPEG Simple Moving Picture Available Time 6-10Hr PWM RTC I2C USB USB OST KEY MMC GPIO I-cache D-cache 32KB 32KB 6.5MTrs. Max 400MHz DMA controller I2S UART AC97 Peripheral Area 4 – 48MHz Processor Area CPU FICP SSP Sound Specification MMC CPG PWR MEM Cnt. LCD Cnt. SDRAM Flash LCD 64MB 32MB Data Transfer Area 100MHz If the PDA must have 200h standby time with a 120g battery… ? ECE 260B – CSE 241A Intro and ASIC Flow .25 http://vlsicad.ucsd.edu System Complexity Challenges System Complexity = exponentially increasing transistor counts, with increased diversity (mixed-signal SOC, …) Reuse (hierarchical design support, heterogeneous SOC integration, reuse of verification/test/IP) Verification and test (specification capture, design for verifiability, verification reuse, system-level and software verification, AMS self-test, noise-delay fault tests, test reuse) Cost-driven design optimization (manufacturing cost modeling and analysis, quality metrics, die-package co-optimization, …) Embedded software design (platform-based system design methodologies, software verification/analysis, codesign w/HW) Reliable implementation platforms (predictable chip implementation onto multiple fabrics, higher-level handoff) Design process management (team size / geog distribution, data mgmt, collaborative design, process improvement) ECE 260B – CSE 241A Intro and ASIC Flow .26 http://vlsicad.ucsd.edu Outline Introduction Technology Evolution Silicon Complexity System Complexity Design Flows Traditional State of the Art - Design Metrics - Design Closure ECE 260B – CSE 241A Intro and ASIC Flow .27 http://vlsicad.ucsd.edu Levels of VLSI Design in a Traditional Flow Specification Architecture gates, flip-flops, and the connections between them RTL transistor circuits to realize logic elements Placement Device Extraction and Timing Verification behavior of individual circuit elements Routing Layout Verification Circuit High Level Synthesis Logic high-level design of component - state defined - logic partitioned into major blocks Architecture Design Synthesis what the system (or component) is supposed to do geometry used to define and connect circuit elements GDSII Process steps used to define circuit elements ECE 260B – CSE 241A Intro and ASIC Flow .28 Manufacturing http://vlsicad.ucsd.edu Design Principles (Traditional) Partition the problem (hirarchical design) Different abstraction levels: RTL, gate-level, switch-level, transistor-level Orthogonize concerns Abstraction vs. implementation Logic vs. timing Constrain the design space to simplify the design process Balance between design complexity and performance E.g., standard-cell methodology ECE 260B – CSE 241A Intro and ASIC Flow .29 http://vlsicad.ucsd.edu VLSI Design Flow Evolution Expanding in two directions System-on-Chip (SoC) Design Design for Manufacturability (DFM) Architecture Design High Level Synthesis More design metrics Area Timing Power Signal Integrity Reliability Tighter Integration Design closure RTL/GDSII sign-off re-defined Synthesis Verification RTL Placement Extraction and Timing Verification Routing GDSII Manufacturing ECE 260B – CSE 241A Intro and ASIC Flow .30 http://vlsicad.ucsd.edu Design Procedure and Tools Behavior modeling Matlab/C/VHDL Logic synthesis High Level Synthesis DesignCompiler, BuildGates, … Verification of synthesis - Formal Verification (Verplex) - Static timing analysis (PrimeTime) Architecture Design Synthesis Verification RTL Place and route Astro, SOCE, … Verification of layout Placement - DRC, ERC, LVS (Calibre) - Extraction (SignalStorm) Extraction and Timing Verification Routing - Delay Calculation (CeltIC) - Simulation (SPICE) GDSII DFM Manufacturing ECE 260B – CSE 241A Intro and ASIC Flow .31 http://vlsicad.ucsd.edu Design Principles(State of the Art) Integrate the problem (design closure) Balance design metrics Back-annotation, predictability Area/timing/power/signal integrity/reliability Explore the design space Balance between design complexity and performance Platform-based SoC design ECE 260B – CSE 241A Intro and ASIC Flow .32 http://vlsicad.ucsd.edu Design Methodologies (+ business models) Full-Custom (high effort, leading-edge performance, high-volume) Semi-Custom (strong infrastructure, economical in lower volumes) ASIC (Application-Specific Integrated Circuit) Standard Cell/Gate Array/Via Programmable/Structured ASIC FPGA Special Analog (custom layout, I/Os and sense amps) Mixed-Signal / RF (unique to each process, no scaling) System-on-Chip ( System-in-Package) Various components: IP blocks, ASIC, FPGA, memory, uP, RF, etc. Define implementation platform, hardware-software co-design Performance vs. complexity ECE 260B – CSE 241A Intro and ASIC Flow .33 http://vlsicad.ucsd.edu Flow Wire Model 3-D RLC Modeling Tool r,s, m Layers Layout rules Parasitic Extraction Library C-Model Standard Cell Library Device model Schematic Entry Cell Synthesis Library (Timing/Power/Area) Place & Route Library (Ports) Structural Verilog Model Behavioral Synthesis Model Verilog Structural RTL Functional Block Layout P&R Floorplan P&R Functional Static Timing ECE 260B – CSE 241A Intro and ASIC Flow .34 Characterization Layout Entry Slide courtesy of Mary Jane Irwin, PSU Global Layout Floorplan DRC/ERC/LVS Static/Dynamic Timing w/extract Power/Area Scan/Testability Clock Routing/Analysis http://vlsicad.ucsd.edu Traditional Taxonomy Behavioral Level Design IO Pad Placement Logic Design and Simulation Logic Synthesis Logic Partitioning Die Planning Front End Power/Ground Stripes, Rings Routing Global Placement Detail Placement Simulation Floorplanning Clock Tree Synthesis and Routing Design Verification Timing Verification Global Routing Test Generation Back End ECE 260B – CSE 241A Intro and ASIC Flow .35 LVS DRC ERC Extraction and Delay Calc. Timing Verification Detail Routing http://vlsicad.ucsd.edu Generic Flow Steps Library preparation Library data preparation Design data preparation •Physical floorplanning •Place and route •RC extraction •Formal verification •Physical verification •Release to manufacturing Logic design Specification to RTL RTL simulation Hierarchical floorplanning Synthesis Formal verification Gate level simulation Static timing analysis ECE 260B – CSE 241A Intro and ASIC Flow .36 Physical design Design for test Engineering change order http://vlsicad.ucsd.edu Library and Design Data Models and technology data required to execute the design flow Power, timing: ALF, DCL, OLA, .lib, STAMP Layout: LEF, DEF, GDSII Delays and path timing, parasitics: SDF, GCF, SDC, DSPF, RSPF, SPEF, SPICE Layout rules: ECE 260B – CSE 241A Intro and ASIC Flow .37 Dracula, Calibre “deck” http://vlsicad.ucsd.edu Architecture Design Platform-based SoC Design Platform is a library of design resources Helps design space exploration Meet in the middle Embedded system Application space Application instance Platform specification Hardware-software co-design System platform Platform design-space exploration Platform instance Architecture space Figure courtesy of Alberto Sangiovanni-Vincentelli, UCB ECE 260B – CSE 241A Intro and ASIC Flow .38 http://vlsicad.ucsd.edu High-Level Synthesis (Behavior RTL) Scheduling Resource allocation of the input specification language to the internal representation Parallelism extraction Design of control style and clocking scheme Compilation Assignment of operation to the allocated hardware components Controller synthesis Selection of the types of hardware components and the number for each type to be included in the final implementation Module binding Assignment of each operation to a time slot corresponding to a clock cycle or time interval usually via data flow analysis techniques … ECE 260B – CSE 241A Intro and ASIC Flow .39 http://vlsicad.ucsd.edu Architecture Level Floorplanning Defines the basic chip layout architecture Define the standard cell rows and I/O placement locations Place RAMs and other macros Separate gate array, memory, analog, RF blocks Define power distribution structures such as rings and stripes Allow space for clock, major buses, etc. Rules of thumb for cell density are used to initially calculate design size ECE 260B – CSE 241A Intro and ASIC Flow .40 http://vlsicad.ucsd.edu Logic Synthesis Conversion of RTL to gate-level netlist Targeted to a foundry-specific library Can be performed hierarchically (block by block) Timing-driven Clock information Primary input arrival times, primary output required times Input driving cells, output loading False paths, multi-cycle paths Interconnect delay may be calculated based on a “wireload model” which uses fanout to estimate delay Clock parameters (insertion delay, skew, jitter, etc.) are assumed to be attainable later in place and route ECE 260B – CSE 241A Intro and ASIC Flow .41 http://vlsicad.ucsd.edu Formal Verification RTL description and gate level netlist are compared to verify functional equivalence, thereby verifying the synthesis results Formal methods Graph isomorphism Binary Decision Diagram (BDD) Emerging technology that supplements the more traditional gate-level simulation approach FV also performed after place-and-route (if gate netlist changes) ECE 260B – CSE 241A Intro and ASIC Flow .42 http://vlsicad.ucsd.edu RTL Simulation RTL code, written in Verilog, VHDL or a combination of both, is simulated to verify functional correctness Testbenches apply input stimulus to the design Several methods are used to verify the outputs Self-checking testbenches automatically verify output correctness and report mismatches Results can be stored in a file and compared to previous results Waveform displays can be used to interactively verify the outputs ECE 260B – CSE 241A Intro and ASIC Flow .43 http://vlsicad.ucsd.edu Gate-Level Simulation Covers both functionality and timing Cell timing is included in the simulation models and interconnect delay is passed from the synthesis run Worst case PVT conditions are used to analyze for setup violations, and best case PVT conditions are used to analyze for hold violations Correctness is only as good as the test vectors used Especially critical for non-synchronous designs, verification of false path and multi-cycle path constraints PVT = Process, Voltage, Temperature ECE 260B – CSE 241A Intro and ASIC Flow .44 http://vlsicad.ucsd.edu Static Timing Analysis Verifies that design operates at desired frequency Implicitly assumes correct timing constraints (!), e.g., boundary conditions Timing constraints are similar to those used by logic synthesis As with gate-level simulation, both best- and worst-case analysis is performed Typically performed on full-chip (not block) basis Verifies setup and hold times at FF inputs; can also check timing from and to PI’s and PO’s; can also check point-to-point delay values (with blocking of pins, etc.) May require modified constraints for inter-block issues: multiple clock domains, multi-cycle paths, etc. For compatibility with timing-driven layout flow, helps to have simple / single set of constraints Other issues: incremental analysis, … ECE 260B – CSE 241A Intro and ASIC Flow .45 http://vlsicad.ucsd.edu Block-Level Physical Floorplanning Reconcile logical and physical hierarchies Cells that are interconnected want to be close together Take advantage of RTL hierarchy Generate a physical hierarchy RTL hierarchy = best physical hierarchy? Often bundled within the same cockpit as the place and route tool Give placement some initial clues to reduce complexity ECE 260B – CSE 241A Intro and ASIC Flow .46 http://vlsicad.ucsd.edu Place and Route Automatically place the standard cells Generate clock trees Add any remaining power bus connections Route clock lines Route signal interconnects Design rule checks on the routes and cell placements Timing driven tools Require timing constraints and analysis algorithms similar to those used during the static timing analysis step ECE 260B – CSE 241A Intro and ASIC Flow .47 http://vlsicad.ucsd.edu RC(L) Extraction Calculate resistance and capacitance (and inductance) of interconnects Based on placement of cells Routing segments Calculate capacitive (inductive) effects of adjacent segments Extract capacitance between metal segments RC(L) data transferred back to Static timing analysis (back annotation) Gate level simulation Replaces wire load model used in synthesis Drive delay calculation, signal integrity analysis (crosstalk, other noise), static timing Q: How do parasitics and noise affect performance? ECE 260B – CSE 241A Intro and ASIC Flow .48 http://vlsicad.ucsd.edu Physical Verification DRC – Design Rule Check LVS – Layout Versus Schematic Verifies that layout and netlist are equivalent at the transistor level Electrical Rule Check Spacing, min dimension rules Dangling nets, floating nodes GDSII (Stream Format) Final merge of layout, routing and placement data for mask production ECE 260B – CSE 241A Intro and ASIC Flow .49 http://vlsicad.ucsd.edu Release to Manufacturing Final edits to the layout are made DRC and LVS are run to verify the correctness of the modified database ‘Tapeout’ documentation is prepared prior to release of the GDSII to the foundry Pad location information is prepared, typically in a spreadsheet Manufacturing steps Metal fill and metal stress relief rules are checked Manufacturing information such as scribe lanes, seal rings, mask shop data, part numbers, logos and pin 1 identification information for assembly are also added Cadence’s Virtuoso is used for custom-manual edits of the mask layers generation of masks silicon processing wafer testing assembly and packaging manufacturing test ECE 260B – CSE 241A Intro and ASIC Flow .50 http://vlsicad.ucsd.edu A More Detailed Design Flow Design Specs Lib.+CWLM Lib.+CWLM Fnl. Design Synthesis Floor-plan & PG Placement Physical re-synth Clock distribution Route, scan re-order Timing analysis, IPO Fnl., pwr., SI ECO Reqmts. ERC, DRC, LVS Tape-out A. Khan, Simplex/Altius ECE 260B – CSE 241A Intro and ASIC Flow .51 Constraints • Architectural optimization (timing) • Inter-group buses, bandwidth • Clock, SI, test; validation • • • • Floorplanning and custom WLM Power distribution (Internal, I/O) I/O driver, padring design Board-level timing, SI • Row definitions • Placement of cells • Congestion analysis • Placement-based re-synthesis • Noise minimization, isolation • Clock distribution • Full routing • Scan stitching, re-ordering • Full RC back-annotation • Hierarchical timing, electrical and SI analysis and IPO/ECO http://vlsicad.ucsd.edu Outline Introduction Technology Evolution Silicon Complexity System Complexity Design Flows Traditional State of the Art - Design Metrics - Design Closure ECE 260B – CSE 241A Intro and ASIC Flow .52 http://vlsicad.ucsd.edu More Design Metrics and Techniques Area Dynamic Static Leakage Signal Integrity Variation (Vdd, thermal, process variation (tox, BEOL)) Electromigration Hot electron effect (SEU) ECE 260B – CSE 241A Intro and ASIC Flow .53 Logic transformation, transistor sizing Buffering, re-routing Power minimization Synthesis (technology mapping) Placement, routing Performance optimization Crosstalk (capacitive, inductive) Supply voltage drop (IR drop, LdI/dt) Reliability Gate Interconnect Power Cost minimization Cell area Wirelength Timing Gating (sleep transistors), variant Vdd Process optimization Dual-Vth Signal Integrity Sizing, net ordering, shielding P/G design, placement, synthesis Reliability Statistical design optimization Design margin http://vlsicad.ucsd.edu Design Flow Evolution (ITRS-2003) Past (250–180nm) System Design Future (65nm –) Present (130–90nm) System System Design Model System System Design Model System Model SPEC Functional Perf. HW/SW Model Optimization SPEC Functional HW/SW Performance Testability Verification Opt Verification Functional Verification RTL SW SW SW RTL Cockpit SW Opt Auto-Pilot Opt Optimize Hw/Sw Testability + Placement Opt Verification Cockpit Performance Testability File Verification Place/Wire + Timing Analysis + Logic Opt Place Wire other Circuit Place other Optimize Logic SW Logic Wire Auto-Pilot EQ Check EQ Check + Timing Analysis EQ check Performance Synthesis Comm. Data Model Repository File MASKS Analyze Comm. Hw/Sw Data Model Repository Perf. Timing Power Noise Test Mfg. other Analyze Timing MASKS Power Noise Test other Multiple design files are converged into one efficient Data Model Disk accesses are eliminated in critical methodology loops Verification of function, performance, testability and other design criteria all move to earlier, higher levels of abstraction followed by Equivalence checking Assertion-driven design optimizations MASKS Industry standard interfaces for data access and control Incremental modular tools for optimization and analysis Design Convergence Drivers and Approaches Wireload Model Helps delay estimation at synthesis stage Gate delay = f(input slew, load cap) Wire cap = f’(fanout number) Cap Empirical Different for each technology, library, tool, design, and design stage Statistical (from library), custom (multiple iterations), structural (look at adjacent nets) … Large deviation remains Routing obstacles (hard IP blocks, macros, etc.) Routing algorithms/implementations (timing driven, net ordering, details) ECE 260B – CSE 241A Intro and ASIC Flow .56 2 5 10 15 #Pins 15 10 % Est Error 5 0 0 5 10 -5 -10 Design http://vlsicad.ucsd.edu 15 Interconnect Statistics Local Interconnect SLocal = STechnology SGlobal = SDie Global Interconnect What are some implications? ECE 260B – CSE 241A Intro and ASIC Flow .57 http://vlsicad.ucsd.edu Rent’s Rule Power law distribution N = Gp lgN N: number of nets G: number of gates p: Rent exponent between 0 ~ 1 Foundation of statistical interconnect prediction lgG Empirical, unclear theoretical root ECE 260B – CSE 241A Intro and ASIC Flow .58 http://vlsicad.ucsd.edu Constructive Interconnect Prediction Statistical models have their limitations Critical paths and the law of small numbers Statistics properties, e.g., average wirelength Extreme statistics properties, e.g., critical path length Implementation details Routing congestion, e.g., horizontal effect Timing optimization, e.g., layer assignment Via blockage, pin accessability, wrong way routing, etc. Predict by construction (physical synthesis) try a fast (global) router Scheffer and Nequist, Proc. ACM SLIP 2000, pp. 139-144 ECE 260B – CSE 241A Intro and ASIC Flow .59 http://vlsicad.ucsd.edu Goal: Design Convergence What must converge? logic, timing, power, SI, reliability in a physical embedding support front-end signoff with a predictable back-end Achieve Convergence through Predictability correct by construction (“assume, then enforce”) - constraints and assumptions passed downstream; not much goes upstream - ignores concerns via guardbanding - separates concerns as able (e.g., FE logic/timing vs. BE spatial embedding) construct by correction (“tight loops”) - logic-layout unification; synthesis-analysis unification, concurrent optimization elimination of concerns - reduced degrees of freedom, pre-emptive design techniques - e.g., power distribution, layer assignment / repeater rules ECE 260B – CSE 241A Intro and ASIC Flow .60 http://vlsicad.ucsd.edu “Physical Prototyping Philosophy” RT L Functionality known Prototype delivers accurate physical data Levels of accuracy Gates Physical Prototype Timing / routability known Hierarchical timing budgeting: Floorplan / Placement Routing ECE 260B – CSE 241A Intro and ASIC Flow .61 Placement-acknowledgeable synthesis (PKS) Including global route Post-detailed-route (In-Place Optimization, i.e., IPO) Chip-level CTS, top-level route and IPO, power analysis and grid design Block-level synthesis, placement, IPO, routing “Handoff with enough physical information to ensure correct results” M. Courtoy, Silicon Perspective http://vlsicad.ucsd.edu Coarse Placement Drives Partitioning, Coarse Routing Drives Pin Assignment / Timing Opt Physical Prototype Partitioning Block 1 Block 2 Block 3 Block-Level Pin Assignments Block-Level Timing Budgets M. Courtoy, Silicon Perspective Full-chip prototype results in optimal pin placement Results in narrower channels and reduced die size Reduces the routing congestion Improves the chip timing Accurate timing budgets result in predictable timing convergence Cool Pictures of the Pieces… Full Chip Power Planning Power IR Drop Analysis Place Detailed Trial Route RC Extraction Delay Calc / STA IPO Full Chip Physical Prototype Timing Closure Hierarchical Clock Tree Synthesis 100ps skew 150ps skew 130ps skew 50ps skew 120ps skew 50ps skew Block-Level Optimization Partition “Tape Out Every Day” ECE 260B – CSE 241A Intro and ASIC Flow .63 M. Courtoy, Silicon Perspective http://vlsicad.ucsd.edu