* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Grant Proposal for Project Name - WSU EECS
Pulse-width modulation wikipedia , lookup
Audio power wikipedia , lookup
Power over Ethernet wikipedia , lookup
Electrical substation wikipedia , lookup
Stray voltage wikipedia , lookup
Solar micro-inverter wikipedia , lookup
Wireless power transfer wikipedia , lookup
Electric power system wikipedia , lookup
Power inverter wikipedia , lookup
Electrification wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Opto-isolator wikipedia , lookup
Power MOSFET wikipedia , lookup
Buck converter wikipedia , lookup
History of electric power transmission wikipedia , lookup
Variable-frequency drive wikipedia , lookup
Distribution management system wikipedia , lookup
Surge protector wikipedia , lookup
Life-cycle greenhouse-gas emissions of energy sources wikipedia , lookup
Power electronics wikipedia , lookup
Distributed generation wikipedia , lookup
Power supply wikipedia , lookup
Power engineering wikipedia , lookup
Voltage optimisation wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Circuits and Architectures to Deliver Low Power and High Speed Systems. By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Outline  CMOS Scaling  Its benefits and  The challenges it brings about  Various  Their Techniques for Limiting Leakage Currents shortfalls  Bridging the speed-Power Gap  The Tunable  Emerging Body Biasing Scheme Devices and Technologies  Concluding Remarks CMOS Scaling and its Benefits  Aggressive CMOS scaling has been a very positive development allowing:  Fast switching devices, thus high speed computing.  Massive integration due to miniaturization  No longer do we need multiple chips to implement a microprocessor and its peripherals  In fact, we can now have multiple computing elements on a single die resulting in system on a chip. CMOS Scaling and its Challenges  CMOS scaling results in:  increased leakage currents (5X/node) and  Increased dynamic power dissipation.  The interconnect does not scale as fast as the transistor thus  Highly integrated designs require elaborate clock distribution schemes.  IPs within a System on a Chip would be difficult to synchronize with a single clock source. Scaling Implications Global Interconnects Global Interconnects Module2 Scaled Local Interconnects Module1 Dynamic Vs Leakage Power Research Motivation Desire to Bridge the Speed-Power Gap by Exploring the feasibility of optimizing devices to operate effectively in both sub-threshold and above threshold voltages.  Emerging Technologies that are Ultra-Low power can benefit from increased speed.   Wearable computers, sensor networks, implantable medical technology  Emphasis on design for energy-efficiency Existing Low Power Design Approaches  Solve energy dissipation problem from a region of operation standpoint  Sub-threshold design  DTMOS: shows a 5.5 times increase in current    SBB: 4.4 times frequency increase Above threshold (Super-threshold) design    Dynamic threshold provides energy efficiency MTCMOS: high and low threshold devices VT Scheme: reduce power by 50% using ABB and “sleep”/“active” modes Architectural  Gating Techniques: 45% of total power DTMOS/SBB Output Voltage Clamping SBB, DTMOS, TBB 1.8 V 600 mV Traditional Proposed Approach  Change approach to include all possible operating regions: Tunable Body Biasing (TBB)  Sub-threshold and super-threshold operation bridged  Ultra-low energy and low speed or high energy and high speed  Utilize body biasing to improve performance of sub-threshold operation    Target increased performance at sub-threshold and slightly above threshold. Save energy by eliminating idle time and process continuously with variable power supplies (perform just in time task completion) Target applications  Mobile, battery operated (power constrained), variable processing devices  Cell phones, PDAs, notebooks, wireless sensors, embedded systems, ASICs, medical technology, etc. TBB Implementation  Goals  Attain ON state current gain while minimizing OFF state leakage current increase  Highlight advantages of sub-threshold operation while allowing super-threshold operation if needed  Control bulk terminal to tunable potentials depending on VDD and desired region of operation  MOS Bulk Control Circuits  Multiplexer-based approach   Two transistors per bulk control circuit Utilizes Vthn0 TBB Bulk Control Circuits TBB MOS Bulk Control Signal VDD   pMOS Bulk nMOS Bulk VSS<VDD ≤Vthn0 VSS VDD VDD > Vthn0 VDD – Vthn0 Vthn0 Relies on passing of good/poor logic “1” and logic “0” properties of pass-transistors Requires external control signals  SubVt and SubVt_b TBB Bulk Control Circuit Simulation Super-threshold: pBulk = VDD – Vthn0 Sub-threshold: pBulk = 0 V Device Optimization  TBB encourages varying supply voltages  How will devices be sized for optimal operation at any supply voltage?  Maintain symmetric switching  Examine inverter at varying supply voltages Device Optimization (Switching Point) VDD Ideal Inverter Threshold Simulated Inverter Threshold Percent Variation 1.8 V 900 mV 900 mV 0.0% 1.0 V 500 mV 498 mV 0.4% 376.2 mV 188.1 mV 198.7 mV 5.6% 188.1 mV 94.05 mV 108.6 mV 13.4% Sub-threshold Noise Margins   Noise Margins significant for proper logic levels TBB and Traditional static CMOS inverter have comparable noise margins   TBB VIH is 12.5% worse TBB VIL is 14.3% better 300 Propagation Delay Transmission Gate Inverter Two Input NAND Two Input NOR Two Input XOR AVERAGE SWITCHING DELAY (ns) 250 200 Gate Traditional Delay TBB Delay % Decrease TG 98 ns 14 ns 86 Inv 125 ns 20 ns 84 NAND 133 ns 18 ns 86 NOR 163 ns 25 ns 85 XOR 289 ns 40 ns 89 150 100 50 0 TRADITIONAL SBB TBB Static CMOS at Vdd = Vthn0 with varying Body Biasing DTMOS Review of SubVth Circuits Benefits  So far, the presentation has shown:  TBB requires control of MOS bulks to span the operating regions of interest. Implementation is successful.  Study of simple logic gates showed:   TBB gives a dramatic speed increase (up to 7x) Static CMOS design style is suitable for sub-threshold and superthreshold operation  Sizing  of efficient devices for the TBB approach is possible However, how will a complex system perform?  Design with previous knowledge (logic style, sizing)  Analyze post-layout simulations Complex System-on-Chip Design Using TBB Work addresses the challenges of  Global Interconnect Delays  Clock distribution  Synchronization of unrelated clocks and  Power dissipation Conclusion  TBB scheme has been devised to span all regions of operation from ultra-low power to high-speed. New kind of body biasing  Forward-biasing causes exponential sub-threshold current gain    Focus on sub-threshold and slightly above threshold to utilize leakage Bulk control circuits are effective   Leads to 7 times frequency increase in simple logic gates 4% area and 8.9% power dissipation increase Static CMOS is ideal overall design style  Device sizing at either sub-threshold or super-threshold allows efficient operation with variable supply voltages Concluding Remarks  Allowing tunable operation allows the designer to choose operating point (kHz, MHz, GHz) – Energy Dissipation is affected.  Other schemes do not offer this flexibility  TBB can lead to significant energy savings  LFSR results show TBB gives:  Maximal 5.7 times speed increase (sub-threshold)  Comparable energy at super-threshold and favorable at subthreshold  Favorable EDP at all operating regions  Operate at the same speed with less energy dissipation  Idle state leakage current can be minimized by collapsing the supply voltage Integrating Research Into Instruction Data Path Circuits  Memory Design  Sub-System   ROUTER CHIP Incorporating Research into Instruction     A long term objective is to place some of the integrated chips on development boards such as those Digilent Inc produces. The integrated chips become part of a system and can be used in some of our low level courses. Most important is the use of these programmable boards to show case the research outcomes, particularly to visiting prospective students. A sample development board: Questions and Comments Welcome! Multiple Clock Domain Synchronization ; EqualClock s ; RationalCl ocks Computational Module Computational Module ; ArbitraryC locks Computational Module MicroNetwork f fast n  1   n  f slow n  Z  n  Q   Synchronous Islands Computational Module Isochronous Communication Computational Module Computational Module Reducing Interconnect Delays   Improved latency and bandwidth Global interconnects are pipelined at or near the rate of computation Sources of Power Consumption Ptotal  Pstatic  Pdynamic  Pshort circuit Pstatic  Pleakage  PDC Pdynamic  Vdd  Vswing  f clk  Cload Pshort circuit  Vswing  I avg short circuit Most straight forward method to reduce power consumption from any source is to reduce VDD  Controlling frequency directly manipulates dynamic power  Controlling device threshold manipulates leakage current, affecting leakage and short circuit power.  Distributed FIFO Control Circuitry Traditional vs. Tunable Body Biasing Traditional Body Biasing Vdd LocalClock2 V delay (ps) freq (GHz) 1 111.2 0.7 current Tunable Body Biasing LocalClock2 current Tunable BB % diff uA delay (ps) freq (GHz) uA freq current 9 3100 103.1 9.7 2988 7.8 -3.6 172.55 5.8 1240 177.7 5.6 1042 -3.4 -16 0.35 1354.5 0.7383 71 1438 0.6954 72.9 -5.8 -2.7 0.2 96700 0.0103 2.81 16640 0.0601 5.051 483 79.8 The synchronizer/buffer shows an increase in performance at sub-threshold voltages when using tunable body biasing Tunable Body Biasing Current (uA) Max Freq (GHz) Vdd (V) Traditional Body Biasing Tunable Body Biasing Peak Avg Power (uW) Idle Peak Avg Idle 1 4 5597 2382 8.696 5597 2382 8.696 0.7 2 2222 803.4 4.873 1555.4 562.38 3.411 0.35 0.125 131.1 35.58 1.468 45.885 12.453 0.514 0.2 0.01 7.452 2.895 1.349 1.49 0.579 0.27 1 4 5140 2460 9.54 5140 2460 9.54 0.7 2 2050 833 4.423 1435 583.1 3.096 0.35 0.167 132 39.8 1.589 46.2 13.93 0.556 0.2 0.015 9.468 4.03 1.239 1.894 0.806 0.248 Pursuit of Low Power Operation  It is likely that not all IP blocks in a SoC need to operate at high speed  Power dissipation for those IP blocks could be reduced by operating at a lower voltage  TBB offers the possibility to dynamically operate at either sub-threshold or superthreshold voltages Variable Voltage SoC Vdd1 Vdd4 Vdd5 Computational Module Computational Module Computational Module Vdd2 MicroNetwork Consider a SoC with 50 IP blocks, each requiring communication at a rate of 10 MHz  Each IP could operate at subthreshold levels  The channel could operate at super-threshold voltages while the IP blocks are in sub-threshold  Computational Module Synchronous Islands Vdd3 Isochronous Communication Computational Module Computational Module Idle vs Operating Power Idle Vdd (V)  Current (uA) Operating Current Power (uW) (uA) Power (uW) 1 16.9 16.9 2988 2988 0.7 5.3 3.71 1042 729.4 0.35 1.5 0.525 72.9 25.52 0.2 0.925 0.185 5.051 1.01 During idle periods, it is advantageous to reduce leakage current by  Reducing the power supply voltage or  Increasing the threshold voltage (e.g. bulk voltage manipulation) Speed at Varying VDD Delay Comparison of a TBB and Traditional LFSR 100000 Minimum Clock Period ( ns ) 10000 1000 TBB Delay Traditional Delay 100 TBB 5.7x Faster At 376.2 mV TBB 20% Faster At 1.8 V 10 1 0 0 0.2 0.4 0.6 0.8 1 1.2 Supply Voltage ( V ) 1.4 1.6 1.8 2 Energy-delay Product Energy Delay Product for TBB with Control 8-Bit LFSR 10000000 Energy Delay Product ( ns*fJ ) 1000000 100000 TBB Energy-delay Product Traditional Energy-delay Product 10000 EDP of TBB outperforms Traditional at ALL operating regions, significantly in super-threshold 1000 100 0 0.2 0.4 0.6 0.8 1 1.2 Supply Voltage ( V ) 1.4 1.6 1.8 2 Regions of Operation Delay vs. Energy Dissipation Tradeoff for TBB LFSR 10000 10000000 TBB Delay TBB Energy Dissipation 1000000 Clock Period ( ns ) 100 1.1 GHz with 3.85 nJ/cycle 3.9 MHz with 0.6 fJ/cycle 10000 1000 100 222.2 MHz with 103 fJ/cycle 10 10 1 0 1 0 0.3262 0.3762 0.5643 0.7524 1.1286 Supply Voltage ( V ) 1.5048 1.8 Energy Dissipation ( fJ ) 100000 1000 Contributions of this work  Proposed scheme alleviates the communication bottleneck and offers a way to synchronize SoC multiple clocks Perform data transfers up to 10 GHz  Proposed scheme maintains high performance under the influence of any clock skew  6.5 GHz for any process corner and any skew   Low power FIFO scheme with a small impact on area when used in SoCs with many modules Contributions of this work  Process corners have a minor impact on performance, resulting in a 10% reduction of speed  The optimal voltage for minimum energy consumption per transaction is at 2Vth  Introduction of TBB to address leakage and dynamic power dissipation  500% increase in performance at sub-threshold voltages with a modest 80% increase in power  5-10% less power dissipation than traditional body biasing Summary of Proposed FIFO Scheme  Linear FIFO scheme that addresses  Signal propagation across communication channel   Successful Synchronization    Synchronizes equal, rational & arbitrary clocks 6.5 GHz sustained performance after process corner analysis using 3 stages. Compared to CN scheme    Sustained throughput over long distances Fewer devices per stage, fewer stages needed 25% higher performance, 12% lower power Operates at both super- and sub-threshold voltages     Lower instantaneous power demands from local clocks (less di/dt) Optimal energy per transaction at 0.7V in a 65nm process Sub-threshold reduces power by 3 orders of magnitude Tunable Body Biasing provides 50% increased performance in sub-threshold while maintaining super-threshold operation TBB Scalability At 90 nm, the % difference is much less At 180 nm, TBB sub-threshold static power % is large Technology 180 nm 90 nm Body Biasing and Operating Region Total Average Power Dissipation Static Power Contribution [%] Total Average Power Dissipation Static Power Contribution [%] Traditional in Sub-threshold 193 pW 0.1% 13.1 nW 1.8% Traditional in Super-threshold 39.6 μW Negligible 22.1 μW negligible TBB in Sub-threshold 1430 pW 25.2% 20.4 nW 6.1% TBB in Super-threshold 39.4 μW 0.000034% 22.1 μW 0.0025% Total TBB sub-threshold power is large Total TBB sub-threshold power isn’t so large LFSR Energy vs. Frequency TBB and Traditional LFSR Energy Dissipation vs Frequency 225 200 Energy Dissipation [fJ] 175 150 125 100 75 50 Traditional Energy TBB Energy 25 0 0 100 200 300 400 500 600 Frequency [MHz] 700 800 900 1000 1100 TBB Implementation Cont. TBB Implementation Cont. Logic Gate Analysis (Power) Power Dissipation vs Supply Voltage 1000.0000 100.0000 Traditional CMOS Power Power Dissipation [ nW ] 10.0000 TBB CMOS Power 1.0000 0.1000 0.0100 0.0010 0.0001 0.25 0.3762 0.75 Supply Voltage 1.8 Inverter Power Dissipation VDD Power Dissipation [fW] 0.3262 8.27 0.4262 •Average Power •[nW] Maximum Frequency [MHz] Period [ns] 3.5 0.416 2400.0 11.41 30.0 2.6 380.0 0.5643 15.64 651.6 41.7 24.0 1.8 82.30 68.60 833.3 1.2 VDD Power Dissipation [fW] 0.3262 8.52 0.4262 •Average Power •[nW] Maximum Frequency [MHz] Period [ns] 22.4 2.6 380.0 13.00 259.8 20. 50.0 0.5643 15.13 2102.0 138.9 7.2 1.8 81.47 81.5 1000. 1.0 Logic Gate Analysis (Energy) Energy Dissipation vs Supply Voltage 180 160 140 Energy Dissipation [ fJ ] Traditional CMOS Energy TBB CMOS Energy 120 100 80 60 40 20 0 0.25 0.3762 0.75 Supply Voltage [V] 1.8 Logic Gate Analysis (EDP) EDP vs Power Supply 30000 25000 20000 EDP [ fJ*ns ] Traditional CMOS EDP TBB CMOS EDP 15000 10000 5000 0 -5000 0.25 0.3762 0.75 Supply Voltage [V] 1.8 Logic Gate Analysis (Fan-in) 1400 1200 Propagation Delay [ ns ] 1000 800 Traditional NAND TBB NAND Traditional NOR TBB NOR 600 400 200 0 One Two Three Number of Inputs Four Logic Gate Analysis (Logic Styles) Energy Dissipation vs Supply Voltage 70 60 Traditional Pseudo-nMOS Energy Energy Dissipated [ fJ ] 50 TBB Pseudo-nMOS Energy TBB 40 30 20 10 0 0.5*Vthn 0.75*Vthn Vthn - 50 mV Supply Voltage [V] Vthn Vthn + 50 mV 1.5*Vthn Power Comparison of a TBB and Traditional LFSR LFSR Power Dissipation 800 Average Power Dissipation ( uW ) 700 600 500 TBB Power 400 Traditional Power 300 200 100 0 -100 0 0.2 0.4 0.6 0.8 1 1.2 Supply Voltage ( V ) 1.4 1.6 1.8 2 Device Optimization (Optimal Region) Delay vs. Energy Dissipation Tradeoff for TBB LFSR 4000 4500000 3500 4000000 Clock Period ( ns ) 3000000 2500 TBB Delay 2500000 TBB Energy Dissipation 2000 2000000 1500 1500000 1000 1000000 500 500000 0 0 0.3262 0.3762 0.5643 0.7524 Supply Voltage ( V ) 1.1286 1.5048 1.8 Energy Dissipation ( fJ ) 3500000 3000 Regions of Operation Super-threshold (1.8 V) Sub-threshold (250 mV) Optimal (750 mV) Design Delay (ns) Energy (fJ) Delay (ns) Energy (fJ) Delay (ns) Energy (fJ) Traditional LFSR 0.7 437.6 20000 105 7 74.1 TBB LFSR 0.6 437 4500 22.8 4.5 73.6 GHz kHz MHz Logic Gate Results  Results Highlights  TBB, SBB, and DTMOS increase speed up to 7 times in sub-threshold  Static CMOS has best overall logic style performance  Pseudo-nMOS, Domino, and pass-transistor still are valuable in niche situations  TBB and Traditional Noise Margins are comparable
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            