Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu Overview • • • • Introduction Proposed Techniques Experiment Result Conclusion 2 Design Challenges Process Variations Device Aging Power 3 Adaptive Circuit • Apply power according to actual chip variations • More energy-efficient than the worst case desgin 4 Sensors and Tuning Knobs in Adaptive Circuit • Delay variation sensors – Critical path replica, used in IBM Power5 processor – Canary flip-flop, like in Razor • Tuning knobs – Adaptive body bias – Adaptive supply voltage (voltage interpolation) Liang, et al., Micro 2009 5 Pros and Cons of Adaptive Circuit Pros: More energy-efficient than the worst case designs Cons: Potentially large area overhead Higher design complexity 6 Clustering for Adaptive Circuits Two extreme cases: • Tune each cell individually? Too much overhead • Tune entire circuit collectively? Less energy savings Achieve desired trade-off? • Clustering 7 Overview • Introduction • Proposed Techniques – Overall Flow – Time and Location Aware Cell Clustering – Clustering Driven Incremental Placement • Experiment Result • Conclusion 8 Proposed Flow 9 Existing Clustering Methods • Manual partition for regular datapath Timing Location • Cluster cells based on spatial proximity • Cluster cells based on their timing characteristics – Kulcarni, et al., TCAD 2008 – Monte Carlo (MC) simulation – Optimize for each MC run Timing Location 10 Need to Consider Both Timing & Spatial Proximity Both paths A & B are critical Bad Cluster: A1 & B2 (Similar timing characteristic) Good Cluster: A2(A1) & B2 (B1) 11 Clustering Algorithm Clustering algorithm 𝐿𝑙𝑜𝑦𝑑′ 𝑠 𝐾 − 𝑚𝑒𝑎𝑛𝑠 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 Distance definition Dist = α| si – sj | + β| 𝜓i - 𝜓j | + γdij Timing Location • Timing: • Si : Slack of gate i • 𝜓i : Slack sensitivity of gate i • Location: • dij : Physical distance between gate i & j 12 Timing Slack si = 𝑚𝑎𝑥(𝑚𝑖𝑛 𝑠i , 𝜃𝑚𝑎𝑥 , 𝜃𝑚𝑖𝑛) • Slack includes nominal value + standard deviation, or at process corners • Cut out threshold 𝜃 Slacks are treated as the same if they are too large or too small 13 Timing Sensitivity ψ= 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙_𝑃𝑎𝑡ℎ_𝑆𝑙𝑎𝑐𝑘_𝐷𝑒𝑐𝑟𝑒𝑎𝑠𝑒 𝑇𝑢𝑛𝑖𝑛𝑔_𝐶𝑜𝑠𝑡 • Critical path slack decrease due to tuning at cell 𝑐𝑖 • Tuning cost => power increase due to tuning Subsequent adaptivity optimization prefers low cost tuning 14 Correlations? • Highly correlated cells are clustered together • Spatial correlation is partially addressed by spatial proximity • Structural correlation – Not every cell on one critical path need to be tuned – Structurally correlated cells are rarely too far apart 15 Overview • Introduction • Proposed Techniques – Overall Flow – Time and Location Aware Cell Clustering – Clustering Driven Incremental Placement • Experiment Result • Conclusion 16 Cluster Driven Incremental Placement • • • • • Make cells of the same cluster to form a contiguous region Minimize total cell movement Assignment problem Simultaneously processing all clusters => multicommodity flow problem Process one cluster at a time 17 Min-Cost Network Flow Formulation Source nodes Sink nodes 18 Implementation Them min-cost network flow problem is solved by the Edmond-Karp algorithm Move cells heuristically for fractional flow solutions 19 Wirelength Overhead Control • After incremental placement, wirelength increase is estimated • If the increase > threshold, rerun clustering with increased weight on spatial proximity 20 Overview • • • • Introduction Proposed Techniques Experiment Result Conclusion 21 Experiment Setup • Benchmark: – ICCAD 2014 Incremental Timing-Driven Placement Contest benchmark suites – 7 circuits, (130K , 960K) cells • Adaptive body bias is employed as platform of adaptive circuit design • # cluster is empirically chosen in a range from 10 to 25 22 Comparison and Methodology Methods that are compared: 1. 2. 3. 4. Over design Location-driven clustering Timing-driven clustering Location and timing driven clustering (Ours) Methodology For each method, simulate multiple times with varying parameters and report the average results 23 Testcases and Placement Perturbations Circuit # gates Cells moved Avg. cell move distance edit_dist 130674 5.1% 7 matrix_mult 155341 5.7% 8 vga_lcd 164891 8.1% 27 b19 219268 8.3% 12 leon3mp 649191 3.9% 89 leon2 794286 8.3% 50 netcard 958792 5.5% 90 24 Results from only Forward Body Bias 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 Over design Power location-aware clustering ∆Area Timing-aware clustering Ours ∆Wire-length Our method achieves • 99% timing yield like other methods • 1/4 less power than over design • 1/3 less area overhead • substantial less wire overhead 25 Results from Forward and Reversed Body Bias 1.2 1.0 0.8 0.6 0.4 0.2 0.0 Over design location-aware clustering Power Timing-aware clustering ∆Area Ours ∆Wire-length Our method achieves • 99% timing yield • similar power • much less area overhead than location-only • much less wire overhead than timing-only 26 Power/Area – Timing Tradeoff Circuit “mgc_matrix_mult” Power/area – Timing tradeoff Δ Area Adapt Power 12000 14000 10000 12000 10000 8000 8000 6000 6000 4000 4000 2000 0 3850 2000 3900 3950 4000 4050 4100 0 4150 Timing constraint Adapt Power Δ Area 27 Impact of Weighting Factors Circuit “mgc_matrix_mult” α| si – sj | + β| 𝜓i - 𝜓j | + γdij α β γ # clusters Adapt Power ∆ Area ∆ wire 0 1 1 23 3707 4111 0.00% 5 1 1 24 3500 1851 0.61% 6.5 7.2 15 20 1 5 5 5 1 1 1 1 1 0 2 4 1 1 1 1 0 1 1 1 22 22 12 12 20 24 24 24 7764 9582 10376 6686 3771 3500 3377 4330 8538 10585 10903 8423 2533 1851 1653 3673 0.76% 0.82% 6.70% 8.50% 9.50% 0.61% 0.60% 0.55% 28 Entire Flow Runtime 6000 5000 4000 3000 2000 1000 0 Over Design Location Cluster Ours Runtime (s) 29 Conclusion Clustering and cluster-driven placement are proposed for adaptive circuit designs Reduce area and power overhead of adaptive circuit, outperform previous methods Assure gates of the same cluster locate in a contiguous region Wire-length increase <1%. 30 Built-In Self Optimization for Variation Resilience of Analog Filters Thank you!