* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download pptx - UCSD VLSI CAD Laboratory
Power inverter wikipedia , lookup
Electric power system wikipedia , lookup
Electrical substation wikipedia , lookup
Buck converter wikipedia , lookup
Fault tolerance wikipedia , lookup
Stray voltage wikipedia , lookup
History of electric power transmission wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Surge protector wikipedia , lookup
Rectiverter wikipedia , lookup
Distribution management system wikipedia , lookup
Opto-isolator wikipedia , lookup
Voltage optimisation wikipedia , lookup
Power engineering wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Alternating current wikipedia , lookup
Time-to-digital converter wikipedia , lookup
Optimizing Stochastic Circuits for Accuracy-Energy Tradeoffs Armin Alaghi3, Wei-Ting J. Chan1, John P. Hayes3, Andrew B. Kahng1,2 and Jiajia Li1 UC San Diego, 1ECE and 2CSE Depts., 3University of Michigan, EECS Dept. 0 Outline • Background and Previous Work • Problem Statement in SC Physical Design • Modeling Approach • Optimization Approach • Conclusions 1 Motivation: Low Power Challenge • Low power design is a grand challenge • Mobile devices must operate with extremely low power as the performance requirement of applications grow • Voltage scaling has slowed down in the recent years • Possible solution: to employ new design paradigms to overcome the challenges and achieve the performance improvements 10,000 9,000 Slow performance improvement due to power limit + slow voltage scaling 4W mobile platform power requirement 8,000 Power [mW] 7,000 6,000 5,000 4,000 1W SOC power requirement 3,000 2,000 1,000 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 Trend: Memory Static Power Trend: Logic Static Power Trend: Memory Dynamic Power Trend: Logic Dynamic Power Requirement: Dynamic plus Static Power [source] ITRS 2 New Paradigm: Stochastic Computing (SC) • Stochastic computing (SC) is a design paradigm that has gained attention recently due its low power and error tolerance • Random bit streams are used to represent operands • Complex arithmetic operations implemented by simple logic circuits Z = X1×X2 X2 4/8 6/8 Z 3/8 X1 3/8 = 4/8 6/8 3 Error Tolerance, Precision, and Accuracy Number to represent: 5/16 Stochastic: 0010 0001 0101 0010 Binary: 0.0101 • Bit-stream length grows exponentially with precision • Redundant representation provides error tolerance Correct = 3/8 • Inaccurate computation may occur 4 Area, Computation Efficiency, and Delay SC: smaller area, longer computation latency, and shorter critical path Stochastic multiplier Conventional binary multiplier Critical path 5 Application Context of SC • Stochastic representation is similar to analog “pulse-mode” signals, as well as neural signals • Stochastic computing circuit performs cheap pre-processing; saves resources Low cost preprocessing between two domains 6 Summary of Advantages/Disadvantages • Advantages • Low-complexity circuits (allows massive parallelism) • Error tolerance • Robustness to voltage scaling (explored and improved this work) • Disadvantages • Long computation time • Limited precision • Expensive conversion circuits and storage elements 7 Outline • Background and Previous Work • Problem Statement in SC Physical Design • Modeling Approach • Optimization Approach • Conclusions 8 Challenges, Problems, and Our Contributions Challenges of stochastic computing (SC) design: • Current digital design flow does not comprehend the tradeoff between accuracy and power in SC • Physical implementation of SC circuits has not been well explored Problems: • What is the efficient way to estimate error while exhaustive simulation is not feasible? • Given a synthesized SC circuit, what is the physical implementation recipe? Our contributions: • We introduce the delay matching problem in SC • We reduce the computation error by balancing delay paths • We propose a Markov chain model for error estimation 9 Stochastic Computing: Scope of Study • Design Metrics • Energy • Accuracy (new model is proposed in this work) Metrics covered in this • Circuit area work • Design Parameters • • • • Computation latency (N) Frequency Scaling (f) Voltage scaling (V) Netlist Implementation (New optimization is proposed in this work) 10 Outline • Background and Previous Work • Problem Statement in SC Physical Design • Modeling Approach • Optimization Approach • Conclusions 11 Balance of Path Delay Matters Three scenarios of signal transitions (A) Ideal: stable states of logic values are captured (B) Balanced delay: all the transitions arrive at the same time (C) Unbalanced delay: causing extra errors due to glitches or delayed transitions x 0 z x1 Correct Correct Error Sample clock (A) Ideal (B) Balanced (C) Unbalanced 12 Markov Chain for Error Prediction • Markov chain (MC) has been previously used to model sequential SC circuits • We augment the states for delay-induced transition errors Only correct states in the previous from the behavior model SC behavior model • Transition probability are trained by a small set of simulation results • Stationary probability distribution is obtained by solving the Markov chain • C1, D1, G1 decide the output expected values Errors induced by • Used for error estimation glitches and delayed transitions 13 Result: Markov Chain for Error Prediction • Model is accurate for larger errors • The model is less accurate when error is small Precise prediction for high error magnitude On-going work: to improve the accuracy for small errors 14 Outcome of Accuracy Model Study Before our work: • SC behavior model is based on pre-layout simulation • SC behavior model did not consider the cell delay and wire delay contributed by physical implementation Our work: • Augment the SC behavior model by considering delayed transitions and glitches contributed by physical implementation • Optimize the physical implementation by balancing the timing paths Correct Correct Error Balanced delays 15 Outline • Background and Previous Work • Problem Statement in SC Physical Design • Modeling Approach • Optimization Approach • Conclusions 16 Challenges of SC Physical Implementation • Clock is fast to compensate for long computation latency • Launch and capture flip-flops may be far apart in a huge array of SC circuits • Unbalanced paths due to circuit structures and variations Previous analysis shows delay balance matters • The timing is more critical when DVFS lowers the supply voltage Long physical distance in a huge array Analog frontend circuit or random number generator SC Path 1 (long) sub-circuits x0 z x1 Converter to binary number system Path 2 (short) faster clock to compensate for long latency 17 Post-P&R Optimization for SC Circuits Problem statement: • Given an SC circuit and a range of supply voltages, we seek an implementation that minimizes error across the voltages • Observation: • Transition errors increase at lower voltages due to path delay mismatch • Approach: ILP-based retiming after P&R by commercial tool • Optimization constraints: • #Buffers / #wires inserted to compensate for shorter paths • Bounded delay variation across voltages • Buffer power penalty • Objective: minimize path delay differences • Improves accuracy • Side note: Similar to multi-corner multi-mode (MCMM) CTS skew optimization: Skew <-> Path delay differences MCMM <-> Delays are evaluated at multiple supply voltages Power penalty <-> #Buffer insertion 18 ILP Formulation for Buffer Insertion Minimize 𝑈 (𝑈 : max normalized delay delta) where 𝑈 ≥ 𝐷𝑚𝑎𝑥 𝐾 𝑘 𝑘 ′ ′ 𝑘 ∙ (𝐷 𝑚𝑎𝑥 − 𝐷 𝑚𝑖𝑛 ) (𝑈 : normalized delay difference) (1) 𝐷𝑚𝑎𝑥 𝐾 𝐷𝑚𝑎𝑥 Max path delay at highest voltage 𝑘 𝐷𝑚𝑎𝑥 Path delay at Vk 𝑘 ′ 𝐷 𝑚𝑎𝑥 Max delay at Vk after optimization 𝑘 𝐷′ 𝑚𝑖𝑛 Max delay at Vk after optimization Subject to 𝐷′ 𝑘 𝑖 = 𝐷𝑖 𝑘 + 1≤𝑖≤𝑀,1≤𝑗≤𝑄 𝑐𝑟𝑗 ∙ 𝑑𝑖 𝑘 (𝐷𝑖 ′ : opt. path delay; 𝐷 i: original delay) 1≤𝑗≤𝑄 𝑐𝑟𝑗 ≤ 1 (𝑐𝑟𝑗 : binary number denoting buffer insertion) 𝑘 𝛼 ∙ 𝐷𝑚𝑎𝑥 ≥ 𝐷′𝑗 𝑘 (𝛼: empirical parameter) 𝑘 𝛽 ∙ 𝐺𝑘 ≥ 1≤𝑟≤𝑅,1≤𝑗≤𝑄 𝑐𝑟𝑗 ∙ 𝑔𝑗 (𝛽: empirical normalize delay parameter; mismatch across voltages because 𝑔𝑗 : buffer leakage power; 𝐺 circuit leakage power) (2) (3) (4) (5) (1) U: To the ranges of delays are different for each Vk (2)(3) The inserted delay is decided by 𝑐𝑟𝑗 (to insert buffer to a net or 𝑘 not) and 𝑑𝑖 (cell delay at Vk) (4) To exclude solutions with too many buffers inserted (5) To limit the leakage power penalty 19 Heuristics for Buffer Choices • Heuristic 1: various buffer/wire types to compensate for delay between voltages • We provide buffer candidates with different delay sensitivity to voltage scaling • We provide wire detour options to provide wider voltage sensitivity range • Heuristic 2: pruning buffers in the candidates to speed up MILP • Solutions are pruned within sub-regions in the tradeoff space by choosing cells in the regions with lowest leakage Without pruning With pruning Wire detouring 20 Result: Improved Accuracy by Balancing Paths Path delays Average Errors STRAUSS (UMich) + Conventional P&R (ICC) ReSC (UMN) + Conventional P&R (ICC) ReSC (UMN) + Proposed P&R Opt. Less inter-path delay skew Lower error 21 Result: Improved Input Delay Window • Safe timing window: timing margin between clock edge and input delay • Before optimization: small input delay variation will cause errors • After: Safe timing window = half of the clock cycle Original delay distribution Safe window Safe window Opt. Clock period = 150ps 22 Result: Improved Energy Cost by Balancing Paths Improved accuracy = Less voltage scaling needed = Higher energy efficiency Conventional P&R flow (ICC) fails to meet accuracy constraint when VDD is low Our proposed P&R optimization reduce delay mismatch at lower voltages and leads to lower energy cost for the same accuracy 23 MC Model: Improved Simulation Runtime • The proposed Markov chain model is verified on four different SC application circuits #Cycle (Ex.) #Cycles (MC) Green: New MC model GammaCorr 1024 10 Blue: Exhaustive simulation PolySmall 256 10 Neuron 100 10 Less simulation cycles 24 Result: Gamma Correction • • • • Testcase: Gamma correction Both SC and conventional circuits are signed off at 1.0V SC still generates recognizable image at 0.6V Energy saving of SC = 66% 25 Outline • Background and Previous Work • Problem Statement in SC Physical Design • Modeling Approach • Optimization Approach • Conclusions 26 Conclusions • We identify the impact of delay-induced errors and propose a Markov chain-based model for error estimation • We propose a new physical implementation approach that improves the energy-accuracy tradeoff • The experiment results show significant energy and benefit over previous work Future work • Markov chain model improvement • Comprehensive tradeoff recipe for performance, accuracy, and energy 27 Thank you ! 28