* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt
Voltage optimisation wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Power factor wikipedia , lookup
History of electric power transmission wikipedia , lookup
Mains electricity wikipedia , lookup
Wireless power transfer wikipedia , lookup
Alternating current wikipedia , lookup
Standby power wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Electrification wikipedia , lookup
Electric power system wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Audio power wikipedia , lookup
Power over Ethernet wikipedia , lookup
ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs Lecture 16: Power Reduction Techniques November 5, 2013 Overview • FPGAs generally considered power hungry compared to ASIC and processor counterparts - Mostly due to unused interconnect • Recent area of extensive research • Device techniques - Voltage scaling - Sleep mode • Software techniques - Reduced switching - Reduced capacitance Lecture 16: Power Reduction Techniques November 5, 2013 Dynamic Power ° Dynamic power is required to charge and discharge load capacitances when transistors switch. ° One cycle involves a rising and falling output. ° On rising output, charge Q = CVDD is required ° On falling output, charge is dumped to GND VDD Short circuit current iDD(t) Charge/discharge current fsw C Courtesy: Harris Lecture 16: Power Reduction Techniques November 5, 2013 Dynamic Power T Pdynamic 1 iDD (t )VDD dt T 0 T VDD iDD (t )dt T 0 VDD TfswCVDD T CVDD 2 f sw Short circuit power <10% of dynamic power Lecture 16: Power Reduction Techniques VDD iDD(t) fsw C November 5, 2013 FPGA Static Power Consumption ° Junction leakage ° Gate oxide leakage ° Subthreshold leakage Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Static Power Consumption ° Junction leakage • Small fraction of leakage ° Gate oxide leakage • When Vgs < Vt still some source-drain current • Increases exponentially as Vt decreases • Decreases exponentially as Vgs decreases ° Subthreshold leakage • Increases exponentially as Vgs increases Technology trend Lecture 16: Power Reduction Techniques Courtesy: Nowak November 5, 2013 FPGA Power Reduction Goals • Dynamic power goals - Reduce Vdd along non-critical paths - Low swing signalling - Use CAD approaches to limit long high-toggle paths - Pdynamic = 0.5 * C * Vdd2 * f • Static power goals - Cut-off Vdd for unused transistors - Use high Vt transistors for SRAM cells - Various other voltage biasing techniques Lecture 16: Power Reduction Techniques November 5, 2013 Traditional Routing Switch Courtesy: Anderson SRAM cell S S ... CONFIG S S i1 i1 i2 i3 i4 MP2 S MP1 i2 OUT ….. MUX S MUX in VINT i3 S i4 level-restoring buffer Lecture 16: Power Reduction Techniques November 5, 2013 Proposed Switch Designs: Anderson ° Based on 3 observations: • Routing switch inputs tolerant to weak-1 signals (level-restoring buffers). • Considerable slack in FPGA designs many switches can be slowed down. • Most routing switches feed other routing switches. - Can produce weak-1 logic signals. Lecture 16: Power Reduction Techniques November 5, 2013 “Basic” Switch Design CONFIG SRAM cell S S ... VDD S VDD MNX MPX ~SLEEP LOW_POWER v SLEEP VVD VVD i1 i2 i3 i4 OUT ….. MUX GND in MODE OPERATION: Lecture 16: Power Reduction Techniques LOW_POWER s GND ~LOW_POWER high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF November 5, 2013 High-Speed Mode CONFIG SRAM cell S S ... VDD S VDD MNX MPX ~SLEEP LOW_POWER v SLEEP VVD = VDD VVD i1 i2 i3 i4 OUT ….. MUX output swing: rail-to-rail. GND in MODE OPERATION: Lecture 16: Power Reduction Techniques LOW_POWER s GND ~LOW_POWER high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF November 5, 2013 Low-Power Mode CONFIG SRAM cell S S ... VDD S VDD MNX MPX ~SLEEP LOW_POWER v SLEEP VVD = VDD - VTH VVD i1 i2 i3 i4 OUT ….. MUX GND in LOW_POWER s ~LOW_POWER output swing: output swing: GND-toGND-to(VDD -VTH). GND (VDD-VTH). high-speed: MNX & MPX ON MODE OPERATION: low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF Lecture 16: Power Reduction Techniques November 5, 2013 Sleep Mode CONFIG SRAM cell S S ... VDD S VDD MNX MPX ~SLEEP LOW_POWER v SLEEP VVD VVD i1 i2 i3 i4 OUT ….. MUX GND in LOW_POWER s GND ~LOW_POWER high-speed: MNX & MPX ON MODE OPERATION: low-power: MNX ON, MPX OFF Lecture 16: Power Reduction Techniques sleep: MNX OFF, MPX OFF November 5, 2013 Leakage Power Results: Anderson % leakage power reduction vs. high-speed mode 70 60.8 Basic 60 50 40 39.7 36 38.7 30 20 10 0.3 0 LP mode Lecture 16: Power Reduction Techniques Sleep mode LP mode (+unused fanout) LP mode (+used fanout) Traditional switch November 5, 2013 Region Constrained Placement • Rather than just focusing on routing, consider constraining logic • Most circuits exhibit locality • Gayasen: FPGA’2004 Lecture 16: Power Reduction Techniques November 5, 2013 Region Constrained Placement • Several issues to consider • Size of sleep transistor - Too large: increases leakage, area - Too small: affects logic performance • Size of region - Too large: possibly unused resources, complicates placement - Too small: Sleep transistors take up too much room Lecture 16: Power Reduction Techniques November 5, 2013 Experimental Flow: RCP • Different region sizes considered for flow • Area constraints for portions of design determined by hand • May encourage designers to create granular designs Lecture 16: Power Reduction Techniques November 5, 2013 Power Savings: RCP • Note significant reduction in leakage power savings as region size increases • Bottom curve primarily due to luck Lecture 16: Power Reduction Techniques November 5, 2013 Performance Limitation: RCP • Performance limited by use of regions • Nearly 10% clock frequency reduction for many designs Lecture 16: Power Reduction Techniques November 5, 2013 Low-swing Signalling • Techniques we have examined so far look at tinkering with supply voltage • Also possible to modify wire signalling to reduce voltage swing • Most of FPGA is made up of interconnect • Approach targets dynamic power consumption George and Rabaey: 1997 Lecture 16: Power Reduction Techniques November 5, 2013 Low-swing Signalling • Interconnect swing is at 0.8V while rest of circuit operates at 1.5V • Cascode circuitry used at sink to overcome slow speed issues • 50% energy savings at cost of 25% delay Lecture 16: Power Reduction Techniques November 5, 2013 Alternate approach: Modifying FPGA CAD • FPGA architecture modification impact all designseven those that don’t care about power • Can placement and routing be modified to consider dynamic power - Need to know which signals are high toggle - Attempt to minimize length of high-toggle wires - Minimize impact on performance and area • Techniques fit well into our previous work on placement and routing Lamoreaux and Wilton Lecture 16: Power Reduction Techniques November 5, 2013 Modifying FPGA CAD Placement • Previous cost metrics for annealing considered bounding box wire length and timing costs • Include additional term which considers signal switching activity Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Placement for Power • Previous cost metrics for annealing considered bounding box wire length and timing costs • Include additional term which considers signal switching activity • Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4% Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Routing Modifications for Power • Original routing cost function takes congestion b(n) and delay(n) into account • Augment with factor that takes net activity into account • Minimize length of most active nets, even in the presence of congestion. Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Routing for Power Results • Potential benefits somewhat limited by placement • Note that most nets have low activity • Power is decreased by 6% but delay increased by 4%. Energy savings of about 3% Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Embedded Memory Blocks ° Embedded memory blocks (EMBs) are important parts of FPGAs ° Consume roughly 14% of Altera Stratix II dynamic power * • Increasing in recent designs * Stratix II Low Power Applications Note, 2005 Lecture 16: Power Reduction Techniques November 5, 2013 Embedded Memory Block Port Internal View Clk Enable Bit Line Pre-charge MClk MClk Clk BIT BIT RAM cell Row Decode Column Mux Write Buffers Sense Amps MClk Write Enable MClk Address Read Enable MClk Latch Read Data Write Data Reducing clocking saves dynamic power Lecture 16: Power Reduction Techniques November 5, 2013 Power Optimization #1 ° Convert EMB read enable/write enable signals to associated read/write clock enable signals ° Limitations • Each port has read or write enable control signal • Embedded memory block has read enable input Before Data Vcc Wren Write Address Data Wr clk enable Write enable After Q Q Rd clk enable Vcc Read enable Rden Write Read Address Address Clock Lecture 16: Power Reduction Techniques Read Address Data Wren Vcc Write Address Data Wr clk enable Write enable Q Q Rd clk enable Rden Read enable Vcc Write Read Address Address Read Address Clock November 5, 2013 Implementation ° Conversion mode • Ties off R/W enable to RAM clock enables • Doesn’t make transform if CE already present on port ° Combining mode • AND user RAM clock enables with derived R/W clock • Could impact performance Write Enable User-defined Write Clk Enable Lecture 16: Power Reduction Techniques Combined Write Clk Enable November 5, 2013 FPGA RAM Processing FIFO, Shift Register, RAM specification Create Logical Memory Logical-tophysical RAM processing Logical RAMs/ logic Memory/ logic placement Placed Memory RAM blocks/ logic ° FIFOs and Shift registers converted into logical RAMs ° Logical RAMs mapped to RAM blocks Lecture 16: Power Reduction Techniques November 5, 2013 Mapping RAM to EMBs ° Implementation choice can impact design area, performance, and power. ° Some mappings may require multiple EMBs User-defined (logical) memory 16K bits 4k deep x 4 wide Physical (EMB) memory 4K bits 4K bits 4K bits 4K bits M4K M4K M4K M4K 512K MRAM Lecture 16: Power Reduction Techniques November 5, 2013 Memory Organization ° Each EMB can be configured to have different depth and width (e.g. Stratix II M4K) 4K words deep 128 words deep 512 words deep 32 bits wide 8 bits wide 1 bit wide ° All hold 4K bits ° Slightly lower power consumption for wider EMB configurations (not including routing) Lecture 16: Power Reduction Techniques November 5, 2013 Area and Delay Optimal Mapping ° Configure each EMB to be as deep as possible ° Number of address bits on each EMB same as on logical memory ° Area and performance efficient: no external logic needed ° Power inefficient: All EMBs must be active during each logical RAM access Vertical Slicing Logical memory 4k words deep and 1 bit wide (4 times) 4k words deep and 4 bits wide 4 EMBs active during access Addr[0:11] EMB Data[0:3] Lecture 16: Power Reduction Techniques November 5, 2013 Alternative Mapping ° Configure EMB to have width of logical RAM (e.g. 1Kx4) • Allows shutdown of some RAMs each cycle • But adds some logic ° Saves RAM power, adds combinational logic and register power Addr[10:11] Horizontal Slicing Addr Decoder 1K deep x 4 wide More Power Efficient: Logical memory (4 times) Addr[0:9] 1 EMB active during access 4k words deep and 4 bits wide 4 Addr[10:11] Data[0:3] Lecture 16: Power Reduction Techniques November 5, 2013 RAM Slicing - Example ° Power reduction available with different slicing 4kx32 Dynamic Power Dynamic Power (mW) Multiplexer Power Increasing 140 Best range 120 100 80 60 40 20 0 128 256 512 1k 2k 4k EMB Power Increasing Maximum Depth Lecture 16: Power Reduction Techniques November 5, 2013 Power Optimization #2: Power-aware RAM Partitioning FIFO, Shift Register Create Logical Memory Power-aware Physical RAM processing Power Library Insert Decode and Mux Logic Memory/ Logic Placement Completed placement ° Algorithm considers possible logical to physical RAM mappings Lecture 16: Power Reduction Techniques November 5, 2013 Experimental Approach ° 40 designs evaluated ° Quartus 5.1 ° Mapped to smallest possible device and target max frequency ° Simulation with test vectors ° Power analysis with PowerPlay Lecture 16: Power Reduction Techniques November 5, 2013 Memory Power ° 21.0% average reduction for all techniques (9.7% with convert/combine) 80 Enable convert/ combine % Dyn Power Reduction 70 60 Enable convert/ combine + Mem partition 50 40 30 20 10 0 -10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 Designs Lecture 16: Power Reduction Techniques November 5, 2013 Overall Core Dynamic Power ° 6.8% average power reduction for all techniques (2.6% with convert/combine) 35 Enable convert/ combine % Dyn. Power Reduction 30 Enable convert/ combine + mem partition 25 20 15 10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 -5 Designs Lecture 16: Power Reduction Techniques November 5, 2013 Design Performance ° 1.0% average performance loss for all techniques (0.1% for enable convert/combine) Average Design Clock Frequency 10 % Frequency Improvement 5 0 -5 -10 Enable Convert/ Combine -15 Enable Convert/ Combine + Mem Partition -20 -25 -30 Designs Lecture 16: Power Reduction Techniques November 5, 2013 Results Summary ° Almost 7% core dynamic power reduction across all designs • Some designs benefit more than others ° Minimal clock frequency hit for most designs Enable convert Enable convert/ combine Enable convert/ combine + Mem partition -1.8% -2.6% -6.8% Memory dynamic power -6.3% -9.7% -21.0% Max clk freq -0.1% -0.2% -1.0% LUT count 0.0% 0.1% 0.7% Core dynamic power Lecture 16: Power Reduction Techniques November 5, 2013 Impact of Multiple Embedded Memory Blocks ° Rerun 40 designs but only allow one type of target EMB for each mapping ° All designs targeted to Stratix II EP2S180 ° Significant power impact for most designs versus EP2S180 target with no restrictions M512 Designs completed Core dynamic power Memory power Max clk freq. LUT count Lecture 16: Power Reduction Techniques M4K M-RAM 23 38 4 40.4% 6.6% 47.3% 279.5% 33.3% 754.0% -2.2% 0.6% -1.0% 0.4% -0.5% 0.0% November 5, 2013 Summary ° Key to reducing RAM power is keeping clocks disabled. ° Movement of read/write enables to clock enables limits dynamic activity ° Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement ° Overall • About 21% average memory power reduction - 10% enable convert/combine • About 7% average dynamic power reduction - 3% enable convert/combine • Diversity of EMBs reduces power by 33% Lecture 16: Power Reduction Techniques November 5, 2013 Summary • FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical • FPGA companies just now embracing power-aware CAD, power-aware architectures on the way • Many circuit-level techniques still possible • RTL CAD synthesis techniques provide a promising area for exploration Lecture 16: Power Reduction Techniques November 5, 2013