* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PROCEED: Pareto Optimization-based Circuit
Pulse-width modulation wikipedia , lookup
Electrical substation wikipedia , lookup
Opto-isolator wikipedia , lookup
Power factor wikipedia , lookup
Power inverter wikipedia , lookup
Voltage optimisation wikipedia , lookup
Buck converter wikipedia , lookup
Electrification wikipedia , lookup
History of electric power transmission wikipedia , lookup
Wireless power transfer wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Power MOSFET wikipedia , lookup
Audio power wikipedia , lookup
Standby power wikipedia , lookup
Electric power system wikipedia , lookup
Alternating current wikipedia , lookup
Mains electricity wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Power electronics wikipedia , lookup
Distribution management system wikipedia , lookup
Power over Ethernet wikipedia , lookup
PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices ASP-DAC 2014 Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department of Electrical Engineering, University of California, Los Angeles ASP-DAC 2014 1 Need for Device Evaluation • Traditional CMOS technologies are reaching physical limits • Many alternative emerging devices under investigation: TFET, CNT, Heterogeneous CMOS, etc. need to be able quickly compare them to guide technology development • How should we compare emerging devices ? − Comprehensive, systematic and automated comparison in context of how they are going to be used − Account for various design types and circuit-level optimizations − Fast and flexible evaluation framework − Cover the wide performance range (KHz to GHz) NanoCAD Lab ASP-DAC 2014 2 Prior Work • Three classes of works – Devices level1,6: Ion/Ioff, Subthreshold Slope (SS ), CV/I, CV2 – Canonical circuit level: Simple circuits + Analytical model4 based power-delay tradeoff – Full design flow: Library generation, Synthesis, Placement and Routing • Existing evaluation benchmarks neglect how modern circuits really use devices These can dramatically change the conclusions. − Circuit topology dependence (e.g., logic depth) − Design-time power optimization (Multi-Vth and multiple gate sizes) − Runtime adaptive power management (DVFS, Gating) 1L. Wei, et. al., IEDM, 2010 J. Frank, et. al., IBM J, 2006 6M. Luisier, et. al., IEDM, 2011 4D. NanoCAD Lab ASP-DAC 2014 3 Outline PROCEED Methodology Example Experimental Results NanoCAD Lab ASP-DAC 2014 4 Proposed Framework: PROCEED Wire capacitance, resistance, and chip area Logic depth histogram, Fan-out Interconnect model Canonical circuit construction Variation information Ratio of throughput Gate sizes Device model, Activity Pareto optimizer Power management Delay and power Pareto curves Throughput power Pareto curves Vdd, Vth, gate sizes constraints NanoCAD Lab ASP-DAC 2014 5 Canonical Circuit Construction • Utilize essential design information – Logic depth histogram, average number of transistors per gate, average fan-out, average interconnect load and chip area – Ignore detailed circuit design • Little impact on device performance evaluation 20 Bin 5 Logic depth histogram ... S5 S5 Long paths Logic Depth 16 Bin 4 12 Bin 3 8 Bin 2 Short paths 4 Bin 1 0 0.0 0.2 Frequency S4 ... S4 Simulation blocks (Si) S3 ... S3 S2 ... S2 S1 ... S1 – Construct logic paths in corresponding bins Single stage 0.4 (a) Simulation block Single stage Fan-out Gate Inverter Interconnect 1 stage Signal in Si ... 1 stage Signal out i stages (b) NanoCAD Lab ASP-DAC 2014 6 – Gate (Nand, XOR, etc.) – Buffer, interconnect, fan-out load Tuning parameters – Gate sizes, Vdd, Vth Canonical Circuit Construction • Logic depth histogram is estimated using slack histogram • Interconnect 1 2 ASP-DAC 2014 Model for interconnect length 2.5 2.0 1.5 1.0 0.5 0.0 0 1 2 3 Transistor width (m) J. A. Davis, et. al.,, IEEE TED, 1998. R. S. Ghaida and P. Gupta, IEEE Trans. CAD, 2012. NanoCAD Lab 3.0 Relative interconnect load (a.u.) – Interconnect is proportional to the square-root of chip area1 – Chip area is assumed to be linear to cell area – Cell area is modeled as a function of transistor width from DRE2 (b) 7 4 5 Pareto Optimization: Overview Existing Point • Objective functions are weighted sum of delay and power Simulate starting point1 Building models2(a) and trust region2(b) 4 e lop Power s D-P New Point Weighted sum3(a) and gradient descent.3(b) 2(b) Trust region 3(b) Gradient descent Simulate to evaluate new points and update Pareto points.4 1 Start point D-P slope 2(a)Building models : Pi P VDD1 ,...,VDDk ,VT 1 ,..., VTl , W1 ,..., W p Di D VDD1 ,...,VDDk ,VT 1 ,..., VTl , W1 ,..., W p Existing Point 3(a) Adaptive Weight is determined by slope of existing Pareto Fronts Delay Local models NanoCAD Lab ASP-DAC 2014 8 – Non-convex problem – Gradient descent used • Delay and power are approximated with second order functions in trust region • Trust region shrinks during optimization • Logarithmic barrier is incorporated to confine parameter range Pareto Optimization: Modeling • Build simulation-block-level delay and power models by utilizing circuit simulations results. T DSi yi,0 + Δyi = DSi ,0 + GDi Δyi 1 T Δyi H Di Δyi 2 1 PSi yi,0 + Δyi = PSi ,0 + GPiT Δyi ΔyiT H Pi Δyi 2 • Objective delay is the longest delay of all logic paths (constructed by simulation blocks) – Using high order norm to estimate max-delay function This can make the objective function continuous for gradient calculation D X WD max DS 1 y1 , DS 2 y2 ,..., DSn yn WD DS 1 y1 , DS 2 y2 ,..., DSn yn • Objective power is the weighted total power n P Wi PSi i 1 NanoCAD Lab ASP-DAC 2014 9 K Power Management: Overview • Modern circuits allow devices to operate in three modes: normal, power saving and sleep mode. – A 2nd lower Vdd is applied to devices in power saving mode – Device is turned off in sleeping mode • Pick best 2nd Vdd and optimally divide time spent on each mode – Need input of the ratio of average throughput to peak throughput – Use polynomial models for delay and power of simulation block 60 as a function of Vdd Model fit for delay and power 15 Symbols: Power and delay data from simulations Lines: Fitting models Delay (ps) • Minimizing average power consumption: f1P1 + f2P2 5 20 0 0.3 NanoCAD Lab ASP-DAC 2014 10 0.4 0.5 VDD (V) 0.6 0.7 Power (W) 10 40 Validation of PROCEED 45nm SOI CortexM0 Activity = 20% (PROCEED) Activity = 20% (RTL Compiler) Activity = 20% (Model [18]) Activity = 20% (PROCEED w/o ldh) Activity = 100% (PROCEED) Activity = 100% (RTL Compiler) Activity = 100% (Model [18]) Activity = 100% (PROCEED w/o ldh) Power (W) 0.08 100% Activity 20% Activity 0.00 800 850 900 950 1000 1050 1100 Clock period (ps) • • Model3 is frequently used for device evaluation – Ignoring logic depth histogram and using analytical delay and power models is inaccurate Proposed methodology is 21X (average) more accurate – Efficient to evaluate devices with performance range from MHz to GHz 3D. NanoCAD Lab ASP-DAC 2014 11 J. Frank, et. al., IBM J, 2006 Outline PROCEED Methodology Example Experimental Results NanoCAD Lab ASP-DAC 2014 12 Impact of Activity 10000 Impact of activity on 45nm ARM CortexM0 TFET activity=100% TFET activity=1% SOI activity=100% SOI activity=1% Power (W) 1000 100 Activity=100% 97.5X 10 9.4X 96ns 1 Activity=1% 50 100 150 200 Clock period (ns) 250 300 • Low activity circuit benefits low leakage devices – TFET better than SOI MOSFET for low-activity, low-performance circuits NanoCAD Lab ASP-DAC 2014 13 Impact of Circuit Topology 0 300 600 900 1200 Benchmark impact on 45nm P 10 1500 SOI ARM CortexM0 SOI MIPS TFET ARM CortexM0 TFET MIPS MIPS 106ns (CortexM0) 73ns (MIPS) CortexM0 300 600 900 Frequency 1200 1 Activity=1% 2Vdd and 2Vt 1500 50 100 150 Clock period (ns) 200 • CortexM0 is more evenly distributed in LDH • Power consumption in MIPS is dominated by short logic paths – More accommodating to low power devices ( TFETs) NanoCAD Lab ASP-DAC 2014 14 Power (W) Logic depth 0 60 50 40 30 20 10 0 60 50 40 30 20 10 0 Frequency Impact of Power Management 45nm ARM CortexM0 with power management 10 Power (W) Activity=1% 10.9M 12.6M 1 16.1M 21.5M TFET | Ave TP/Peak TP | SOI 1.0 0.5 0.2 0.1 0.1 0.01 Ave. TP/Peak TP = 100%, 50%. 20%. 10% 5 10 15 20 25 30 Peak throughput (M ops/s) 35 40 • Peak throughput cross-points shift higher as ratio of average to peak throughput decreases – This indicates TFET may be a better device for applications with wide dynamic range in performance needs NanoCAD Lab ASP-DAC 2014 15 Impact of Variation Impact of variation on 45nm ARM CortexM0 TFET w/o variation SOI w/o variation TFET w/ variation SOI w/ variation Power (W) 10 130ns 96ns 1 50 100 150 200 250 Clock period (ns) • Variation evaluation indicates that TFET suffers more from effective voltage drop – TFET and SOI are assumed to have peak 10% Vdd and 50mV Vth worst case variation – High Sub-threshold Slope (SS )devices are more sensitive to voltage drop NanoCAD Lab ASP-DAC 2014 16 Conclusion • Proposed new methodology for evaluating emerging devices accounts for circuit topology, adaptivity, variability, and use context • Proposed methodology is efficient and accurate for device evaluation over broad operating range. – Effective Pareto -based optimization heuristic – Accurate circuit simulation and device compact model • Example comparison of TFET and SOI devices – TFET is better for low activity, low logic depth and high dynamic performance design – TFET is more sensitive to voltage drop and threshold voltage shifting • Entire PROCEED source-code (MATLAB, C++) will be made available openly. NanoCAD Lab ASP-DAC 2014 17 Q&A Thanks! NanoCAD Lab ASP-DAC 2014 18