Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Impact of Parameter Variations on Multi-core chips E. Humenay, D. Tarjan, K. Skadron © 2004, Kevin Skadron Department of Computer Science University of Virginia 1 © 2004, Kevin Skadron Motivation • Process variations are projected to severely impact the yield of high-performance semiconductors • Multi-core architectures have become the future trend of high-performance chips • Understanding how process variations interact with CMPs is required 2 Variation Types • PVT Variations - Process - Voltage - Temperature © 2004, Kevin Skadron This work primarily focuses on process variations 3 Process Variations P variations stem from a variety of sources • Within-Die (WID) • Die-to-Die (D2D) • Wafer-to-Wafer (W2W) • Core-to-Core (C2C) © 2004, Kevin Skadron • 4 WID Variations • WID variations can be further sub-divided © 2004, Kevin Skadron • • Systematic (WIDsys) Random (WIDrand) • Threshold voltage, Vth, and effective channel length, Leff, are the 2 parameters most susceptible to random variations • Systematic Variations cause parameter values to be spatially correlated • • Can be modeled as deterministic or random WID variations cause C2C variations 5 Drain Induced Barrier Lowering (DIBL) • Ideally, Vth and Leff values are independent of each other • The DIBL effect introduces a dependency Vth Vth0 VDDe © 2004, Kevin Skadron • (DIBL Leff ) DIBL causes there to be an exponential dependency between Leff and sub-threshold leakage 6 © 2004, Kevin Skadron Modeling Methodology • In order to estimate the impact of P variations on delay it is necessary to have a critical path (CP) model • Prior CP models vary inputs into RC delay equation for Monte-Carlo analyses. • Simplicity comes at the expense of accuracy. 7 CP Modeling: Prior Work • Fmax GCP model (Bowman, JSSC ‘02) – Ncp ~ Number of critical paths – Lcp ~ Number of gates in critical path (Logic Depth) © 2004, Kevin Skadron Ncp • Lcp Marculescu DAC ’05 • Ncp ~ stage’s device count. 8 Importance of Ncp • As Ncp increases mean delay increases and delay variation decreases 0.04 0.035 Ncp 0.025 1 0.02 2 4 16 0.015 128 0.01 0.005 1.066 1.061 1.055 1.050 1.044 1.039 1.033 1.028 1.022 1.017 1.011 1.006 1.000 0.995 0.989 0.984 0.978 0.973 0.967 0.962 0 0.956 Count/Samples © 2004, Kevin Skadron 0.03 Normalized Delay 9 Modified CP Model • Goal: More accurately describe each functional unit’s delay distribution in order to determine which functional units will affect the final frequency distribution • Improvements Considering wire delay when determining Lcp Better Ncp assignments Importance of Weff: Vth ~ 1 / Weff Leff © 2004, Kevin Skadron • • • 10 Modified CP Model • Categorize each stage as being either SRAM or combinational logic • SRAM • • • • • © 2004, Kevin Skadron • L1s TLBs Register File Rename Map Issue Queue Logic • • • Type SRAM LOGIC Ncp Hi Lo Lcp Lo Hi Weff Lo/Hi Hi Execution Units Decode Stage Issue Select 11 SRAM model • Modified version of CACTI 4.0 is used to estimate fraction of access time susceptible to device variations • Ncp ~ number of read ports • Weff is dependent on unit type • © 2004, Kevin Skadron • L1 caches are assumed to be optimized for area (minimal sized Weff) Time critical SRAM units have larger widths (Assume 5x larger than min) • Only consider variation in SRAM access time 12 Combinational Logic Model • Logic model is based off of Sklansky adder • Delay modeled with Horowitz delay equation i:k © 2004, Kevin Skadron • Critical path is carry circuitry Weff is chosen to alleviate fan-out delay i:k i:j Gi:k Pi:k Gk-1:j • k-1:j Pk-1:j k-1:j i:j i:j Gi:j Gi:k Pi:k Gk-1:j Pi:j i:j Gi:j Gi:j Gi:j Pi:j Pi:j 13 WIDrand: SRAM delay • • Because of large Ncp L1 is likely to be slowest SRAM unit Nominal Frequency is 3GHz 0.09 64KB L1 Count/Samples 120 Entry RF 8KB TLB 0.06 6.96 6.78 6.60 6.42 6.24 6.06 5.88 5.70 5.52 5.34 5.16 4.98 4.80 4.62 4.44 4.26 4.08 3.90 3.72 3.54 3.36 3.18 3.00 2.82 2.64 2.46 0 2.28 © 2004, Kevin Skadron 0.03 % Frequency Slowdown Due to Random Process Variations 14 WIDrand: SRAM vs. Logic • L1 will also be slower than logic Count/Samples 0.09 0.06 64b Adder Critical Path 64KB L1 Cache 6.96 6.42 5.88 5.34 4.8 4.26 3.72 3.18 2.64 2.1 1.56 1.02 0.48 -0.1 -0.6 -1.1 -1.7 -2.2 -2.8 0 -3.3 © 2004, Kevin Skadron 0.03 % Frequency Slowdown Due to Random Process Variations 15 WIDsys Pattern • WIDsys model is derived from actual measurements (Friedberg ISQED’05) Fast, High-leakage Leff 28 POWER4-like core scaled to 45nm © 2004, Kevin Skadron 14mm 27 26 Slow, Low-leakage 14mm 25 16 Impact of WIDsys on Delay • • WIDsys can cause frequency from core-to-core to differ by as much as 5% Large Lcp value causes combinational logic units to be more affected by WIDsys variation 12 % Frequency Slowdown © 2004, Kevin Skadron 10 8 64KB L1 6 Logic 4 2 0 0 2 4 6 8 10 12 % WID Systematic Variation in Leff 17 Random Leakage Variation • WIDrand will not have an impact on leakage at the architectural level since total leakage is an aggregate sum 0.05 Number of Transistors 0.045 0.04 1 2 4 0.03 0.025 0.02 0.015 0.01 0.005 14.8 14 13.3 12.5 11.8 11 10.3 9.5 8.75 8 7.25 6.5 5.75 5 4.25 3.5 2.75 2 1.25 0 0.5 Count/Samples © 2004, Kevin Skadron 0.035 Normalized Aggregate Leakage 18 C2C Leakage Variation • • Figure shows core leakage when considering all possible core locations on a die 3 different magnitudes of DIBL are considered • BSIM suggests .15 (best-case) 120 DIBL 80 0.15 60 0.14 0.13 40 20 Normalized Core Leakage 45 1. 42 1. 39 1. 36 1. 3 33 1. 1. 27 1. 24 1. 21 1. 18 1. 15 1. 12 1. 09 1. 06 1. 03 1 0 1. # of Core Positions on Chip © 2004, Kevin Skadron 100 19 © 2004, Kevin Skadron Conclusions • L1 caches will determine the WID mean frequency. Variations in other units will not directly affect the frequency distribution • Considering wire delay in CP model causes device variations to have less of an impact on the frequency distribution • WID variations do not result in significant C2C frequency differences • At 45nm, C2C sub-threshold leakage variation may be as much as 45% 20