Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of electric power transmission wikipedia , lookup
Power engineering wikipedia , lookup
Mains electricity wikipedia , lookup
Opto-isolator wikipedia , lookup
Surge protector wikipedia , lookup
Alternating current wikipedia , lookup
Voltage optimisation wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Life-cycle greenhouse-gas emissions of energy sources wikipedia , lookup
Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor Advisor: Prof. Marios Papaefthymiou September 28th, 2005 Outline Power dissipation in conventional CMOS Standard approaches to reduce power dissipation Introduction to energy recovery circuits Background - Boost Logic Description of 3 new circuits designed Comparison of different circuits operation, reported simulation results pros and cons from an energy standpoint energy dissipation power supply variation Conclusion and future work 2 Power dissipation in conventional CMOS designs Streaming applications small amount of logic large number of Buffers Long wires – Large capacitance C Driving this C wastes energy Throughput-limited datapaths Strict requirement on throughput Longer latencies can be tolerated (DSP applications) [ATMEL76C120 78MHz] P Ceff Vdd2 f 3 Conventional approaches to reducing power: voltage scaling and pipelining 8 Lower dissipation Lower leakage Limitations: 6 Voltage(V) Unpipelined 5 4 2-stage pipeline 3 6-stage pipeline 2 1 0 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Voltage (V) Delay(ns) Limited by threshold voltages 7 Delay (ns) Voltage scaling can result in significant energy gains Vth scaling limited by manufacturing processes Overhead of flip-flops Increasing the delay, limited scalability 4 Reduced voltage drivers and voltage converters High swing vdd vddL Low swing vdd Voltage converter out Limited by VTH Delay in level conversion Requirements for efficient operation Energy efficient level conversion No throughput impact due to level conversion delay Point of diminishing returns! 5 Energy dissipation in CMOS E=C.V.V=CV2 (1/2)CV2 1 Ediss CV 2 2 DC source k .CV Td (V Vth ) Reducing V decreases Ediss, but input E=C.V.0=0 eventually will make the devices go into sub-threshold region Delay increases exponentially as V is decreased no energy recovered back into the supply point of diminishing returns in scaling Vdd 6 Energy Recovery Circuits Switching energetics different from vanilla CMOS DC supply replaced by an AC supply Energy required to swing the voltage on a node is much less than the energy stored Use of inductors to supply and recover charge Resonate current through inductors from power clock to load capacitance Energy recovery gates can be used as timing elements Latency overhead does not translate to a throughput penalty 7 Energy recovery charging/discharging Ediss Source V Ve t Rdt 1 CV 2 2 t / RC 2 0 V T T t V 0 VT 2C T sin Tt RC cos Tt Rdt 0 2T 4 2 2 R 2C 2T 2 2 T t V V N t RC CV 2 T 2 I s (t ) Rdt CV 0 N v 2 RC 8 T CV 2 1 CV 2 2 N Ediss 0 as T, Ediss easy to generate 8 Energy Recovery: A Brief History Reversible computing proposed as a method of achieving asymptotically zero energy computation Early circuit design (Inverter chains) Maksimovic, Oklobdzija (1-clock / 2-phase, 1.2 µm process, 40MHz) Dickinson and Denker (4 phase , 0.9 µm process, 250MHz) Athas et. al (Graphics Processor, 0.5 µm process, 15MHz) Kim et. al (True Single Phase Logic ) 8-bit multiplier @ 140MHz (0.5 µm process) Fundamental requirement of gradual power clock transitions Use of diodes to recover energy (Delay and Energy inefficient) Tracking power clock at it fastest transition only Pfet evaluation trees 9 Background [Sathe: ISLPED ’05] hybrid energy recovery family with high gate overdrive and voltage scaling no diodes, data-independent capacitance 0.13m acts as a timing element; no throughput penalty process less sensitive to power supply variation compared to vanilla CMOS Sim post layout: upto differential outputs for data-independent capacitance seen 1.6GHz by power clock Chip: 750MHz- 65% energy saving compared to conventional voltage scaled pipelined CMOS design 1.3GHz Type 1: Boost logic high energy dissipation at low frequencies (50MHz200MHz) 10 Structure and operation of Boost Logic PC M2 M1 Vdd’ Vdd’ M5 M7 PC evaluation out PC ___ out N-tree evaluation M6 ___ PC M3 Vss’ Reduced potential evaluation compl. eval N-tree evaluation M4 M8 Vss’ ___ PC Boost stage _ Vdd’ f Vdd’ f Vdd’ f Energy recovery Eval Senseamp Eval Senseamp Eval Senseamp sense- N-tree amplification Vss’ N-tree N-tree Boost _ f Boost Boost Vss’ 1 Vdd V dd Vc 2 1 Vss V dd Vc 2 Vc Vth f Vss’ _ f 11 Energy Dissipation in Type 1 (Boost) Vdd’ always a fight between weak pull-up and pull down! M5 f 0 out Vdd N-tree evaluation M6 _ f 1 Vss’ E I crow.V .T Increasing crowbar at lower frequencies Energy dissipation keeps on increasing How do we decrease this? Sim. With 32bit RC adder 0.13m 12 Circuit Configurations Investigated Type 2: static CMOS in the evaluation stacks Type 3: use of static CMOS stack and an inverter to create differential outputs with lesser area overhead Type 4: A new domino CMOS logic in the evaluation stage and a modified energy recovery sense amplifier 13 Type 2 circuit: CMOS stacks in evaluation tree Complementary CMOS stacks differential outputs driven to full rails (Vdd’ and Vss’) f Vdd’ M1 f f Compl. P-tree pullup P-tree pullup ___ out out Compl. N-tree pulldown N-tree pulldown M2 reduces crowbar significantly Vdd’ M1 Vss’ _ f M2 _ f _ f Vss’ Sim. With 32bit RC adder with clock generator 0.13m 14 Type 2: Energy Dissipation Percentage contribution to total energy for different time periods (32 bit adder in Type 1) Percentage contribution to total energy for different time periods (32 bit adder in Type 2) 100% 100% 90% 90% 80% 80% 70% 60% 50% 40% 30% 70% 60% E(Crowbar) 50% 40% E(Power clock) 30% 20% 20% 10% 10% 0% 5.00E-09 1.00E-08 2.00E-08 time period (T) 5.00E-08 E(Crowbar) E(Power clock) 0% 5.00E-09 1.00E-08 2.00E-08 5.00E-08 time period (T) significant area overhead (6N+10) compared to Type 1(2N+10) limited fan-in slow operation of PMOS 15 Type 3: CMOS stack with complementary inverter f Vdd’ M1 f Vdd’ M3 f P-tree pullup Use inverter to create output differential lesser energy diss. at low frequencies 3N+10 area overhead M5 ___ out out M6 N-tree pulldown M4 M2 _ f _ f Vss’ _ f Vss’ Total energy/cycle vs. time period T for type 3 circuit 2.5E-11 Sim. With 32bit RC adder with clock generator 0.13m Energy/cycle 2E-11 1.5E-11 type 1 type 3 1E-11 5E-12 0 0.00E+00 1.00E-08 2.00E-08 3.00E-08 4.00E-08 5.00E-08 6.00E-08 time period (T) 16 Type 3: Limitations due to sub-threshold operation of inverter f Sim. With 32bit RC adder with clock generator f f out out out out 0.13m at 10MHz f at 100MHz due to limited drive, the inverter operates in sub-threshold region V (out ) V (out ) shrinks with increasing frequency, fanout reliable operation (wrt. ∆V) only till ~ 50MHz how can we increase the inverter drive? 17 Type 3: with low-threshold devices in the inverter stack Improvement obtained for lower frequencies Sensitive to coupling noise process variation Operation not robust for f>100MHz 18 A New Structure Need to create a good differential voltage with minimum area overhead and energy dissipation Need to modify the “Boost” sense amplifier stage to make the output voltage differential independent of fan-out loading Need to have good tolerance for power supply variations 19 Type 4: Domino CMOS with transmission gates f Vdd’ M1 transmission gate f outint _ f _ f f M3 n1 _ f precharge n1,n2 M4 M5 M6 evaluation Vss’ _ f _ f enables low-swing pulldown M2 out _____ outint n2 Compl. Pull down N-tree M2 f M1 ___ out M5 M6 outint M3 proxy output lines (low C) mask high C lines f _ M7 f _ f M4 equalization sense amplification 20 Operation: Evaluation/hold Phase f0 dual N-tree evaluates and pulls down one proxy output line transmission gates transfer charge to low C lines f Vdd’ M1 f outint _ f M3 n1 _ f M1 f M4 M5 M6 f _ f _ f Dual N-tree M2 Vss’ _ f (out int/ out int ) out _____ outint n2 M2 weak 0 ___ out M5 M6 outint M3 f _ M7 f _ f M4 weak 1 No crowbar because headers are switched off Transistor M7 in the sense amplifier stage keeps out / out equalized at approx. Vdd/2 21 Operation: Precharge/amplify phase f1 outputs pulled to rails in a recovery fashion by the cross coupled inverters f Vdd’ M1 f outint _ f M3 n1 _ f M4 M5 M6 transmission gates Dual N-tree M2 Vss’ f _ f _ f M2 out _____ outint n2 _ f isolate evaluate circuit from sense amp transfer charge to M1 f ___ out M5 M6 outint M3 f _ M7 f _ f M4 Transistor M7 in the sense amplifier stage is cut-off n1 and n2 pre-charge high to Vdd’ 22 Type 4: Simulation Results evaluate/ hold Sim. With 32bit RC adder with clock generator 0.13m evaluate/ hold precharge/ amplify 23 Type4: Energy Dissipation 32-bit adder simulations with clock generator Shows substantial energy savings wrt Type 1 (Boost) Voltage differential independent of fan-out loading Works between 10MHz-200MHz Sim. With 32bit RC adder with clock generator 0.13m 24 Energy Comparison of Different Topologies Energy savings in Type 4 coming from: low-Cap. proxy output lines small charge-up of internal nodes isolation of eval. stage from sense amplifier elimination of crowbar Type 1 Type 3 25%-65% reduction in energy over operating range of frequencies with small area overhead Type 4 Type 2 Sim. With 32bit RC adder with clock generator 0.13m 25 Robustness to Variations in Power Supply Effect of power supply variation on delay (at 100MHz) percentage change in delay 20 15 10 5 domino pseudo NMOS 0 vanilla CMOS(1.2V) -5 -10 -15 -20 -15 -10 -5 0 5 10 15 percentage change in power supply Delay variation is less than 5% for a 10% variation in power supply Type 4 circuit seen to be relatively insensitive to power supply variation compared to CMOS 26 Conclusions and Future Work Conclusions: Design of 3 structures to improve energy recovery efficiency at low frequencies without use of diodes, multiple clock domains A new domino style topology resulting in substantial energy savings with minimal area overhead Relatively insensitive to power supply variations Future work: Improve resonance of the Type 4 circuit Redesign on the clock generator to investigate potential power savings Performance of the circuit post-layout and comparisons Continuing investigations into other kinds of logic structures 27