* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download dac06-fpga_memory_power - Computer Science and Engineering
Power factor wikipedia , lookup
Buck converter wikipedia , lookup
Wireless power transfer wikipedia , lookup
History of electric power transmission wikipedia , lookup
Standby power wikipedia , lookup
Voltage optimisation wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Electric power system wikipedia , lookup
Electrification wikipedia , lookup
Audio power wikipedia , lookup
Distribution management system wikipedia , lookup
Rectiverter wikipedia , lookup
Alternating current wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Power engineering wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Mains electricity wikipedia , lookup
Leakage Power Reduction of Embedded Memories on FPGAs through Location Assignment Yan Meng, Tim Sherwood, and Ryan Kastner University of California, Santa Barbara Department of Electrical & Computer Engineering ExPRESS Group: http://express.ece.ucsb.edu Outline Motivation The leakage problem of embedded memories on FPGAs is of growing importance Synthesis techniques for leakage power optimization of embedded memories Conclusions Motivation FPGAs are attractive options High processing power, flexibility, reconfigurability Power is becoming critical Why worry about power? Heat dissipation, portability Where does power go in CMOS? Dynamic power consumption Switching power due to charging and discharging load capacitors Short circuit currents between supply rails when both transistors are on during switching Leakage power consumption Leakage Power/Total Power Technology Scaling and Leakage Power Dissipation 100% 80% 20nm 60% 40% 20% 15nm 30nm 130nm 70nm 50nm 0% 1999 2001 2003 2005 2007 2009 Year Leakage is dominating over dynamic power as technology scales down (improving speed, transistor density and functionality) On-chip Memory Leakage Control Why control leakage through on-chip memory? Caches on microprocessors Huge portion of chip area Leakage is proportional to the number of transistors Major source of leakage consumption [Roy01, Hu01, Flautner02,Mudge04] 50% 2005 [ITRS 02] Dynamic reshuffling due to cache replacement policies Cache hierarchy with data replication Memories on FPGAs Configuration SRAMs: not on critical paths, high Vth Embedded memories Accesses are usually statically scheduled Not necessary a part of memory hierarchy with inclusion (2006) Virtex-5 (2005) Cyclone II (2005) Spartan-3E (2004) Stratix II (2004) Virtex-4 SX (2004) Virtex-4 FX (2004) Virtex-4 LX (2003) Spartan-3/3L (2002) Stratix (2002) Stratix GX (2002) Spartan-IIE (2002) Cyclone (2001) Virtex-II Pro (2001) Virtex-II (2001) APEX II (2001) Mercury (2000) Spartan-II (2000) Virtex-E EM (2000) Virtex-E (2000) ACEX 1K (1999) APEX 20K (1998) Spartan/XL New Mainstream Mature/others (1996) Virtex (1998) Spartan (1997) FLEX 6000 (1994) FLEX 10K Leakage problem of embedded memories is of growing importance Embedded memory bits/logic cells > 20x 120 100 80 60 40 20x 20 0 Ratio of Embedded Memory Bits/Logic Cells Leakage Power Optimization of Embedded Memories on FPGAs BRAM Line BRAM Line 0 time BRAM Line Motivating Example t 0 time Temporal information Spatial information t 0 time t Outline Motivation Synthesis techniques for leakage power optimization of embedded memories Temporal Temporal + spatial Conclusions Temporal Information Precedence order between variables Saving power on variables Keep frequently accessed lines active to ensure high performance Turn off lines that are not used for a long time Use low supply voltage to save power for the rest Using the generalized model to calculate maximal leakage power savings for variables [Meng’ HPCA05] Definitions – Intervals access(v) access(v) |Ii| Last use Time Live interval Dead interval time between two successive accesses to the same variable v within a memory entry Dead interval time before the first access or after the last access to a variable Definitions – Operating Modes Active mode Power on the whole line No power saving Sleep mode [Roy01, Hu01] Sleep/“turn off” transistors Lose data Voltage Vdd 0 |Ii| Active Voltage Vdd 0 Drowsy mode [Flautner02,Mudge04] Use low supply voltage to save power when it is not needed Preserve data for fast reaccess Wake up to the high voltage and return data s1 Voltage s2 s3 |Ii| Sleep Vdd Vddlow 0 d1 d2 Drowsy d3 |Ii| Choosing Operating Modes |Ii| Active mode Sleep mode Drowsy mode ? Inflection Points Which mode to apply on each interval? Active-drowsy inflection point a The least amount of time drowsy mode needs to save energy a arg min {EDrowsysaving (t ) 0} d1 d 3 Sleep-drowsy inflection point b t The time where sleep and drowsy modes consume the same amount of energy b {t : EDrowsy (t ) ESleep (t )} EDrowsy ESleep P (d ) * d i 1, 2 , 3 L i P (s ) * s i 1, 2 , 3, 4 L i i i Selecting Operating Modes with Inflection Points I |I|? a<|I|≤b Active Interval Active Mode Drowsy Interval Drowsy Mode Sleep Interval Sleep Mode Optimal Leakage Management Policy Oracle knowledge of all interval lengths based on static scheduling Applying the appropriate operating mode on each variable interval Obtaining maximal leakage power saving Formal proof of the optimality [Meng HPCA’05] Outline Motivation Synthesis for leakage power optimization of embedded memories Temporal Temporal + spatial Conclusions Spatial Information BRAM Line Spatial layout of data leads to different potentials of power savings BRAM Line 0 time t One variable per entry 0 time t Minimal number of entries BRAM Line BRAM Line BRAM Line Memory Leakage Optimization Techniques t 0 time BRAM Line 0 time sleep-dead t 0 time used-active t min-entry BRAM Line the state-of-the-art t BRAM Line 0 time 0 time drowsy-long t 0 time path-place t Location Assignment Schemes (I) The state of the art: no leakage control BRAM Line Full-active 0 time t Location Assignment Schemes (II) Turning off the unused part BRAM Line Used-active 0 time t Location Assignment Schemes (III) Packing variables into the minimal number of entries and turning off the rest BRAM Line Min-entry 0 time t Location Assignment Schemes (IV) Min entry + sleep dead intervals BRAM Line Sleep-dead 0 time t Location Assignment Schemes (V) Min entry + sleep dead + drowsy long BRAM Line Drowsy-long 0 time t I1 I2 I3 time start I1 end start e1 4 entries Extended DAG Modeling e2 I3 start e1 e2 E1 I1 I2 w1 w2 e3 I3 e4 w3 e5 I1 I2 w1 w2 e3 I3 e4 E2 w3 E3 e5 end end Temporal information E4 +Spatial information Path-place Algorithm Greedily covering DAG with N node-disjoint paths. The length of a path indicates the power saving of a memory entry. First sort all vertices in topological order A vertex is covered each time to calculate the longest path reaching it, iff not adjacent to other nodes Sum the weights of the final level vertices, edges, and virtual edges from start to end if k < N Complexity: O((n+e)*N) Location Assignment Schemes (VI) Data layout with leakage awareness Power savings on unused entries, dead and live intervals BRAM Line Path-place 0 time t BRAM Line BRAM Line 0 time t 0 time BRAM Line 0 time sleep-dead t 0 time used-active t min-entry BRAM Line the state-of-the-art t BRAM Line BRAM Line Location Assignment Schemes 0 time t drowsy-long 0 time path-place t Embedded Memory Leakage-aware Design Flow Exploring temporal and spatial information Path traversal and location assignment Introduced for deciding the best data layout within embedded memory to achieve the maximal leakage saving Radix-2 FFT Example for ( le=4, k=0; k<2; k++) { le /= 2; for ( j=0; j<le; j++) { ... for ( i=j; i<4; i += 2*le) { ... tmpi = imag[i]; imag[i] += imag[i + le]; ti = tmpi - imag[i + le]; imag[i + le] = ... } ... } ... } intervals imag[3] imag[2] imag[1] imag[0] 0 Location Path traversal Scheduling assignment Compilation intervals n=0 imag[3] imag[2] for ( le=4, k=0; k<2; k++) { le /= 2; for ( j=0; j<le; j++) { ... for ( i=j; i<4; i += 2*le) { ... tmpi = imag[i]; n=0 imag[i] += imag[i + le]; n=0 ti = tmpi - imag[i + le]; n=0 imag[i + le] = ... } 10 20 30 ... } ... } n=1 n=1 n=0 n=0 imag[1] imag[0] n=1 n=1 n=0 0 10 20 30 n=0 50 time 40 n=1 n=1 n=1 n=1 40 50 time Empirical Study Experimental setup Simulation of a configurable double-port synchronous RAM with 18K-bits Read/write ports: both ports can read the same memory cell simultaneously, but can’t write to the same location (no write conflict). Configurable: 1-bit, 2-bit, 4-bit, 9-bit, or 18-bit eCACTI [Dutt’04]: modeling transistor leakage DSP benchmarks: dft, idft, fft-2, fft-4, filter, mp Comparing Different Schemes Percentage of Power Savings Full-active Sleep-dead OPT Used-active Drowsy-long Min-entry Path-place 100% 95% 80% 76% 60% 37% 40% 20% 0% idft dft fft-4 fft-2 filter mp average Conclusions Leakage is dominating dynamic power as technology scaling trends hold Leakage problem of embedded memories is of growing importance Explored temporal and spatial information for optimizing leakage power, achieving significant leakage saving 95% Backup Multimedia, Internet, Cellular Telephony Won’t work The machine is too hot. BATTERY (50+ lbs) The battery is too heavy. Power Optimization Techniques Power Design Time Non-active modules Dynamic Reduced Vdd Logic synthesis Pin ordering Transistor sizing Multi-Vdd islands Path balancing Tradeoff area for power Clock/power gating Leakage + Multi-Vth MTCMOS (critical/non-critical paths) Sleep transistors Multi-Vdd Variable Vth Run time DVS DFS (based on workload) + Variable Vth Saving Leakage Power without Performance Degradation Deriving the interval lengths with static scheduling Scheduling any needed data just before it is needed Avoiding any performance impact The Generalized Model Parameterized model Inputs Wake-up latencies Interval distribution Leakage power of each state Transition energy between states Output Maximal power saving [Meng HPCA’05] Active P(Active) EAS EAD EDA Drowsy P(Drowsy) ESA Sleep P(Sleep) 5 entries Example of path-place 5 4 3 2 1 e1 start e4 E1 start I1 w1 e2 E2 end time I4 w4 E3 I2 w2 e3 TopList: {I4, I1, I2, I3} e5 E7 E6 5 entries E4 E5 0 time t I3 w3 e6 end E8 Outline Motivation Synthesis for leakage power optimization of embedded memories Temporal Temporal + spatial Conclusions