* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 35% Wiring
Switched-mode power supply wikipedia , lookup
Chirp spectrum wikipedia , lookup
Time-to-digital converter wikipedia , lookup
Alternating current wikipedia , lookup
Mains electricity wikipedia , lookup
Distribution management system wikipedia , lookup
Utility frequency wikipedia , lookup
Techniques to Mitigate the Effects of Congenital Faults in Processors Smruti R. Sarangi Process Variation Corner rounding, edge shortening (courtesy IBM Microelectronics) 2 Smruti R. Sarangi Semiconductor Fabrication facility (courtesy tabalcoaching.com) 3 Smruti R. Sarangi Photolithography Unit (Courtesy Upenn) 4 Smruti R. Sarangi Basic Lithographic Process The source of light is typically a argon-flouride laser The light passes through an array of lenses to reach the silicon substrate The resolution limit is given by: R = k1λ / NA NA = n sin θ To decrease the resolution we need to : Decrease the wavelength Increase the refractive index 5 Smruti R. Sarangi Parameter Variation Parameter Variation P Process Threshold Voltage – Vt V Supply Voltage T Temperature Transistor Length – Leff 6 Smruti R. Sarangi Why is Variation a Problem ? Unpredictability of Vt , Leff and T implies : Lower chip frequency and higher leakage courtesy Shekhar Borkar, Intel 7 Smruti R. Sarangi Implications on Design Decisions Static timing analysis not possible Overly conservative designs Chips too slow Performance of a generation lost Possible solution Clock the chip at an unsafe frequency Tolerate resulting timing errors Reduce timing errors Architectural techniques Circuit techniques 8 Smruti R. Sarangi Overview Model for Process Variation Model for Timing Errors due to Process Variation Techniques to Tolerate Timing Errors Techniques to Reduce Timing Errors Dynamic Optimization 9 Smruti R. Sarangi Process Variation Process Variation Systematic Variation Random Variation Lens aberrations Mask deformities Thickness variation in CMP Photo-lithographic effects Variable dopant density Line edge roughness 10 Smruti R. Sarangi Modeling Systematic Variation Break into a million cells Variation Map 1000 1000 11 Smruti R. Sarangi Systematic and Random Variation Distribution of systematic components Normal distribution Normal Distribution Spatial Correlation Multi-variate Normal Distribution Superimpose random variation on top of systematic 12 Smruti R. Sarangi Overview Model for Process Variation Model for Timing Errors due to Process Variation Techniques to Tolerate Timing Errors ISQED ‘07 Techniques to Reduce Timing Errors Dynamic Optimization 13 Smruti R. Sarangi Timing Errors P(E) = 1 – cdf(tclk) Timing errors Distribution of path delays in pipe stage: No variation Distribution of path delays in pipe stage: With variation 14 Smruti R. Sarangi Model for Timing Errors Basic assumptions A structure consists of many critical paths The critical path depends on the input critical path delay > clock period timing error clock period = delay of the longest critical path at maximum temperature no variation All pipeline stages are tightly designed 0 slack 15 Smruti R. Sarangi Paths in a Pipeline Stage t Timing errors 1 f pdf(t) cdf (t) Error rate: PE (t) = 1 – cdf(t) 16 Smruti R. Sarangi Basic Kinds of Structures Logic Memory Heterogeneous critical paths ALUs, comparators, sense-amps Homogenous critical paths SRAMs, CAMs Mixed x% memory and (100-x)% logic Used to model renamer, wakeup/select 17 Smruti R. Sarangi Logic Critical Path 35% Wiring 65% Gates Elmore Delay Model Alpha Power Law Tg LeffVDD (T )(VDD Vth) 18 Smruti R. Sarangi Logic Delay Distribution of path delays – no variation dwire + dgate = 1 Dwire Dvarlogic = (d logi + * dgate)* Dlogi c +dgatec*Dextra Distribution of path delays with variation Relative gate delay due to systematic variation in P,V, T Delay due to variation in the random and syst. component within a stage Obtain Dlogic using a timing analysis tool 19 Smruti R. Sarangi Memory Delay Memory Cell Delay dist. Memory Line Use Kirchoff’s equations Long channel trans. equations Multi-variable Taylor expansion extend analysis done by Roy et. al. IEEE TCAD ‘05 max. distribution Delayline = max(Delaycell) 20 Smruti R. Sarangi Combined Error Model We have the delay distributions – cdf(t) – for memory and logic with variation For each structure per access, P(E) = 1 – cdf(t) P(E) per inst. = P(E) , =accesses/inst. Combined error rate per instruction P(E)total = P(E) 21 Smruti R. Sarangi Validation – Logic S. Das et. al. ‘05 22 Smruti R. Sarangi Overview Model for Process Variation Model for Timing Errors due to Process Variation Techniques to Tolerate Timing Errors Techniques to Reduce Timing Errors Dynamic Optimization 23 Smruti R. Sarangi Variation Aware Timing Speculation (VATS) Multicore Chip Unsafe frequency Checker Error free: - Lower freq - Safe design Diva Checker Processor Core L0 Cache Razor Latches L1 Cache 24 Smruti R. Sarangi Other VATS Checkers TIMERRTOL – Uht et. al. Razor – Dan Ernst et. al., MICRO 2003 X-Checker – X. Vera et. al, SELSE 2006 X-Pipe – X. Vera et. al., ASGI 2006 Sato and Arita, COSLP 2003 25 Smruti R. Sarangi Overview Model for Process Variation Model for Timing Errors due to Process Variation Techniques to Tolerate Timing Errors Submitted to ISCA ‘07 Techniques to Reduce Timing Errors Dynamic Optimization 26 Smruti R. Sarangi Errror Rate(PE) Tilt f frequency Shift Errror Rate(PE) Error Rate(PE) Basic Mechanisms – Shift and Tilt Before f After frequency f Before After frequency 27 Smruti R. Sarangi Architectural Mechanisms Resizable issue queue (Albonesi et. al.) switch pass trans. off smaller queue shifts the error rate curve Original New error rate SRAM/CAM array Pass Transistors SRAM/CAM array Pass Transistors SRAM/CAM array Sense Amps 28 Smruti R. Sarangi Gate Sizing Transistor Width – W Delay A + B/W Power W Make faster paths slower to save power Gate Sizing Original path delay dist. 29 Smruti R. Sarangi Optimization: Replicate ALUs Difference in Error Rate Tradeoff is power vs errors IDEA : Switch between the two ALUs Use gate sized ALU if it is not timing critical and vice versa 30 Smruti R. Sarangi Adaptive Body Bias (ABB) – Vbb Vbb Delay Leakage Vbb Delay Leakage Error Rate(PE) Fine Grain ABB and ASV Adaptive Supply Voltage (ASV) -- Vdd Vdd Delay Leakage Dynamic f frequency Multicore Chip Vary: Supply Voltage(ASV) Body Voltage (ABB) Core 31 Smruti R. Sarangi Overview Model for Process Variation Model for Timing Errors due to Process Variation Techniques to Tolerate Timing Errors Techniques to Reduce Timing Errors Dynamic Optimization 32 Smruti R. Sarangi Dynamic Behavior Temperature Activity Factors 33 Smruti R. Sarangi Formulate an Optimization Problem Optimization Input Constraints Output Goals Constraints Temperature – At all points T < TMAX Power – Total core power < PMAX Error – Total errors < ErrMAX Goal – Maximize performance 34 Smruti R. Sarangi Outputs Outputs: 1 + 30 + 1 + 1 = 33 ALU 15 ABB/ASV regions Vdd Vbb f 30 values of (Vdd, Vbb) 33 outputs f, Vdd, Vbb can take many values Very large state space Issue queue size 35 Smruti R. Sarangi Dimensionality Reduction Find the max. frequency that each stage can support Find the slowest stage This is the core frequency Minimize power in the rest of the units Minimum Frequency Max. Frequency core frequency 1 2 3 4 5 Stages 6 7 36 Smruti R. Sarangi Inputs Phase Heat sink cycle Inputs : , TH, Vt0, Rth, Kleak activity factor accesses/cycle Forever Heat sink Thermal temperature resistance Constant in Leakage eqn. 37 Smruti R. Sarangi Optimization Overview fcore min fcore Inputs f(1) Freq. Algorithm Inputs Inputs f(15) Freq. Algorithm Power Algorithm Power Algorithm Inputs Vdd Vbb Vdd Vbb 38 Smruti R. Sarangi Fuzzy Logic Based Algorithm Fuzzy Logic Exhaustive Search based Algorithm (Freq/Power) Inputs + Very fast computation times - Computationally expensive + Incorporates detailed models - Requires detailed models - Slight inaccuracy + Accurate Results 39 Smruti R. Sarangi Final Picture fcore min fcore Inputs f(1) Fuzzy SubController1 Inputs Inputs f(15) Fuzzy SubController15 Fuzzy SubController1 Fuzzy SubController15 Inputs Vdd Vbb Vdd Vbb 40 Smruti R. Sarangi Timeline Heat Sink Cycle 2-3 secs Phase 120 ms Phase t 20 s 6 s 10 s New Phase Detected 1 step Test configuration 2 ms STOP Retuning Cycles 0.5 s 2 ms Bring to chosen working point Run Fuzzy Controller Algorithm Measure IPC and i 41 Smruti R. Sarangi Results 42 Smruti R. Sarangi Evaluation Framework Processor Modeled Athlon 64 floorplan 3-wide processor 12 stage pipeline 45 nm, Vdd = 1 V, 6 GHz Sherwood phase detector (ISCA ’03) 10 SpecInt and 10 SpecFp benchmarks, 1 billion insts. Core C Core C Core C Core C 4-core private L2 cache Variation Modeling PVT maps for 100 dies Fuzzy controller 10,000 training examples 25 rules 43 Smruti R. Sarangi Terminology Baseline Proc. with variation effects TS Baseline+DIVA checker TS+FU TS + FU replication TS+Queue TS + issue-queue resizing TS+ABB+ASV Both circuit level techniques TS+Dyn TS + dynamic optimization TS+All TS+FU+Queue+ABB+ASV+dyn NoVar Without any variation effects 44 Smruti R. Sarangi Error Plots Maximum Perf. point Maximum Perf. point ErrMAX TS only ALL = TS + ABB + ASV 45 Smruti R. Sarangi Execution Point constant constant errorpower Power frequency power constant freq. power errors frequency errors Frequency Log (Timing Error Rate) 46 Smruti R. Sarangi Frequency Oracle Fuzzy 49% 23% Static Frequency increase: 10 – 49 % 50% of the gains are due to dynamic opts. 47 Smruti R. Sarangi Performance 34% 19% Static We can nullify effects of variation and even speedup The performance loss due to fuzzy logic is minimal 48 Smruti R. Sarangi Conclusion Do not design processors for worst case Need to tolerate variation induced errors Contributions Model for timing errors New framework for tradeoffs in P, f and P(E) High dimensional dynamic adaptation Eval. of arch. techniques to tolerate/mitigate P(E) 10-49% increase in frequency 7-34% increase in performance 49 Smruti R. Sarangi Conclusion II CADRE (DSN’06) Arch. support to make a board level computer cycle-accurate deterministic Phoenix (MICRO’06 & Top Picks’07) arch. support to detect and patch processor design bugs 50 Smruti R. Sarangi BACKUP 51 Smruti R. Sarangi Algorithm Inputs : f, Vdd, Vbb Pdyn Verify T < TMAX T , Rth, TH Pleak Verify Err < ErrMAX Find fmax Delay , Pleak0, Vt Vt Error Model 52 Smruti R. Sarangi Memory Delay WL VDD 1 Tmem Icell Y Solve for Icell using long channel eqns. Icell = f(VtX,VtY,LX,LY) VtX,VtY,LX and LY are gaussian variables Icell X BL BR vtx, vty, lx, ly are the systematic components vtx, vty, lx, ly are the random components 53 Smruti R. Sarangi Memory Delay - II Find a distribution for Tmem Tmem is a function of four gaussian variables Model Tmem as a normal distribution Find the and for Tmem using multi-variable Taylor expansion This is the access time dist. for 1 bit A typical entry has 32-128 bits Find the max distribution of 32-128 normal variables Error probability = 1 – cdf(tmem) 54 Smruti R. Sarangi Fuzzy Low Level X Xj i ij W y ij yi y j Wij = exp[ -(( - )/ )2] W y W i i Final Output i W Wii Wij j 55 Smruti R. Sarangi Recovery Penalty 56 Smruti R. Sarangi Validation – Memory 57 Smruti R. Sarangi Power Max Power Limit Proc. with no variation – 25 W, PMAX = 30 W 58 Smruti R. Sarangi