Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design Hao Yu Berkeley Design Automation [email protected] Chunta Chu and Lei He EE Department UCLA The work was performed at UCLA and was partially supported by NSF and UC-MICRO Decap Allocation for Clean Power Delivery 2 Chip-package co-design requires a noise-free off-chip power delivery system (PDS) c Modeling inductance is a must Decoupling capacitors (decaps) are allocated on chip-package interface to satisfy power integrity It is a challenging task to find a fast yet accurate decap allocation for a largescale design How to consider the large and complex physical-level layout during the systemlevel design? decap c Physical Level Challenge 3 Module 2 Module 1 Finite parastic impedance affects the circuit functionality at chip-package interface Supply volatage drop and electromagnetic (EM) coupling Distributed post-layout model burdens the system-level power integrity analysis and design Millions of nodes and terminals with dense inductances The Need of Macromodeling Representing a large and complex power delivery system blindly leads to expensive design cycles A compact representation by macromodeling is needed Existing decap allocation methods with macromodeling [Zheng:CICC’04, Chen:ISPD’06] Generate PDS macromodel Apply simulated annealing to add/remove one decap to a legal position Can not efficiently handle a large-scale design 5 Limitations of Existing Macromodeling How to use it ? project Small but dense Macromodeling algorithms [PVL, PACT, PRIMA] are limited to handle a large-scale PDS 1. Become ineffective when terminal number is large 2. Do not provide the sensitivity information 3. Destroy the structure of state matrix 6 Our Decap Problem Formulation A multiple-ring-based problem formulation Represent decap solution by combination of multi-level templates Constrain by noise integral at I/O instead of noise amplitude in [ Chen:ISPD’06] Optimization Method Each step inserts a template with a given decap type based on sensitivity instead of simulated-annealing The key is to efficiently calculate sensitivity from macromodel 7 TBS2: Macromodeling for PDS Principle Terminal Selection Parameterization Capture the essential input/output behavior Compute performance sensitivities from the layout modifications Structured Simulation Sparsely arrange couplings (sparsity), leverage diverse physical domains (latency) and analyze at block-levels (hierarchy) A structured and parameterized macromodel connects layout with system 8 TBS2 (1) Principle Terminal Selection The input signals (J =B x I) are temporally correlated Described by a correlation matrix C (N x N) Correlated terminals [b0 b1 b2] can be simplified with use of a principal component analysis (PCA) Select K principle terminals by K-means method 9 TBS2 (2) Parameterization Decaps can be parametrically described by The sizing vector (D) for M2 types of decaps and the topological matrix (X) for M1 levels of rings 4 1 3 1 2 1 6 6 7 5 6 7 8 -1 0 0 0 5 7 4 5 1 4 8 X(2,6)= 3 0 3 2 2 -1 1 0 0 8 Total M1XM2 types of parameterized templates described by a parameterized state matrix in s-domain 10 TBS2 (3) Structured Macromodeling G0 (MxM) DG1 G0 (MxM) (MxM) G0 DG2 0 G0 (MxM) (MxM) (MxM) Structured projection 0 0 G0 (MxM) (MxM) (MxM) (MxM) DG1 G0 (MxM) (MxM) DGK 0 G0 (MxM) (MxM) (MxM) Sparse and block-triangular Block-wise nominal and sensitivity Voltage response DGN (MxM) Time domain Details can be found in TBS1 [Yu:DAC’06] and [Yu:ISLPED’06] 12 Improved Accuracy By TBS2 Reduction A non-uniform RLC mesh is reduced by an 80th-order reduction using TBS2 and PRIMA TBS2 matches more poles than PRIMA w.r.t principle terminals The waveform accuracy is improved in both frequency/time domain by TBS2 13 Our Decap Algorithm Overview 1. 2. 3. 4. Apply TBS2 just one-time to generate a structured and parameterized macromodel Calculate block-level nominal noise at each terminal and its sensitivity w.r.t the partitioned template Check if noise integral satisfies constraints Allocate decaps for each block according to the sensitivity in a greedy fashion TBS2 Calculate nominal+ sensitivity Check Constraints update Template 14 Reduced Runtime and Cost of Decap Allocation Comparing three methods: 1) Simulated-annealing with noise amplitude [Chen:ISPD’06]; 2) Multiple-ring with noise amplitude [this paper]; 3) Multiple-ring with noise integral [this paper] MRA-NI is up to 97X faster than SA-NA due to structured andparameterized macromodel from TBS2 MRA-NI reduces decap cost by up to 16% due to a more accurate integrity metric using noise integral 15 Conclusions 1. 2. Macromodel connects the system-level design with the physical-level layout TBS2: Structured and parameterized macromodel Provide a fast yet accurate computational prototyping for large/complex system Solve an integrity-driven decap allocation for chip-package co-design Such a block-wise macromodel and optimization can be applied to other layout optimization problems 16