* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide
Electrification wikipedia , lookup
Pulse-width modulation wikipedia , lookup
Electric power system wikipedia , lookup
Power inverter wikipedia , lookup
Mercury-arc valve wikipedia , lookup
Ground loop (electricity) wikipedia , lookup
Variable-frequency drive wikipedia , lookup
Electrical ballast wikipedia , lookup
Electrical substation wikipedia , lookup
Three-phase electric power wikipedia , lookup
Power engineering wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Immunity-aware programming wikipedia , lookup
History of electric power transmission wikipedia , lookup
Current source wikipedia , lookup
Voltage regulator wikipedia , lookup
Power MOSFET wikipedia , lookup
Power electronics wikipedia , lookup
Distribution management system wikipedia , lookup
Surge protector wikipedia , lookup
Current mirror wikipedia , lookup
Stray voltage wikipedia , lookup
Buck converter wikipedia , lookup
Opto-isolator wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Voltage optimisation wikipedia , lookup
Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006 Modern Computing Challenges • Performance • Power – Energy consumption, max instantaneous power, di/dt • Temperature – Total heat output, “hot spots” • Reliability – Neutron strikes, alpha particles, MTBF, design flaws • Approaches: Circuit, microarchitecture, compiler • Constraint: Fixed HW-SW interface (e.g., x86) 2 of 26 Typical Approaches • Optimize using SW or HW techniques in isolation • Performance – SW: Compile-time optimizations – HW: Architectural improvements, VLSI technology • Reliability: Code/data duplication (HW or SW) • Power & Temperature SW – HW control mechanisms – Profile + recompile cycle HW 3 of 26 Modern Design Constraints Compilers – “Compile once, run anywhere” – Cannot ship “MS Office for 1Q05 batch of Pentium-4 3GHz, > 1GB RAM, BrandX power supply, located in high altitudes…” Microarchitecture – Limited window of application knowledge (past must predict the future) VLSI – Guaranteed correctness, reliability We currently must optimize for the common case (but must design for the worst case) 4 of 26 The Power of Virtualization • A HW-SW interface layer SW Applications Binary Modifier HW x86 x86 Initially SWI HWI Eventually 5 of 26 Dynamic Binary Modification • Creates a modified code image at run time Examples: EXE Transform Code Cache Profile Execute • • • • • • Dynamo (HP) DAISY/BOA (IBM) CMS (Transmeta) Mojo (Microsoft) Strata (UVa) Pin (Intel) 6 of 26 Dynamic Instrumentation Demo • Pin – Four architectures – IA32, EM64T, IPF, XScale – Four OSes – Linux, FreeBSD, MacOS, Windows – http://rogue.colorado.edu/pin/ 7 of 26 Dynamic Optimization Demo • DynamoRIO – Windows and Linux for IA32 – http://www.cag.lcs.mit.edu/dynamorio/ 8 of 26 Dynamic Binary Modification • Creates a modified code image at run time Examples: EXE Transform Code Cache Profile Execute • • • • • • Dynamo (HP) DAISY/BOA (IBM) CMS (Transmeta) Mojo (Microsoft) Strata (UVa) Pin (Intel) • Always triggered by software events … until now 9 of 26 Tortola: Symbiotic Optimization • Enable HW/SW Communication SW Applications Binary Modifier HW 10 of 26 Simulation Methodology • SimpleScalar 4.0 for x86 • Wattch 1.02 power extensions • Pin dynamic instrumentation system (x86/Linux version) SW Application Benchmarks Binary Modifier Pin HW Wattch & Simplescalar/x86 11 of 26 Tortola Applications • Combine global program information with runtime feedback – System-specific power usage – Application-specific heat anomalies – Workload/input specific performance optimization • Reduce hardware complexity – No more backwards compatibility warts – Fix bugs after shipment – Reduce time to market for new architectures • One such application: The di/dt problem 12 of 26 The Di/dt Problem • Voltage stability is important for reliability, performance • Low-power techniques have a negative side effect: current variation • Dips (undershoots) in supply voltage – can cause incorrect values to be calculated or stored • Spikes (overshoots) in supply voltage – can cause reliability problems 13 of 26 The Di/dt Problem • ITRS cites noise management as a Grand Challenge for 5-10 year time frame • Several trends are aggravating the issue: – – – – Voltage is scaling down with technology Current draw is increasing Package impedance is not scaling as quickly Aggressive clock gating causes large swings in processor current draw (di/dt) 14 of 26 Di/dt Solutions Software MicroArch Circuit-Level Compiler Optimizations Co-Designed MicroArch & SW Binary Modifier Sensor/Actuator Mechanisms Decoupling capacitors More Vdd Gnd pins on package 15 of 26 Sensor-Actuator Mechanisms • On-chip voltage sensors detect abnormally high/low voltage levels • On-chip actuator then attempts to quickly raise/lower the processor’s current draw – Phantom firing • increases current (at the expense of power) – Resource throttling • reduces current (at the expense of performance) 16 of 26 Detecting Imminent Emergencies Soft Emergency Operating Voltage Range 1.05V 1.03V Hard Emergency Control Threshold 1V 0.97V 0.95V 17 of 26 Targeting Mid-Frequency Di/dt 20 cycles 60 cycles Processor Current (A) Processor Current (A) • Problematic: wide current spike • Worst case: pulse at the resonant frequency *From: Minimum Voltage Time (Cycles) Supply Voltage (V) Supply Voltage (V) Maximum Voltage Joseph et al. HPCA-9 Minimum Voltage Time (Cycles) 18 of 26 Parallel High Current Sequential Low Current A Di/dt Stressmark BEGIN_LOOP: … ldt $f1, ($4) divt $f1, $f2, $f3 divt $f3, $f2, $f3 stt $f3, 8($4) ldq $7, 8($4) cmovne $31, $7, $3 stq $3, $(4) stq $3, $(4) stq $3, $(4) … stq $3, $(4) … JUMP BEGIN_LOOP But…Actuator engages every loop iteration degrading performance Why not correct the problem in the code? 19 of 26 Proposed Solution • Leverage our additional software layer to supplement existing solutions • Microarchitecture provides feedback to our softwarebased virtual layer Altered Binary VL Executable Modifier Executable SW HW Sensor+Actuator Ext Microprocessor 20 of 26 Required Investigations • Characterizing emergencies – How often do we see di/dt emergency loops? • Communication between the microarchitecture and the virtual layer – What information should be passed to virtual layer during an emergency? • Fixing di/dt via binary modification – Will existing techniques help? – New algorithms? 21 of 26 Static vs. Dynamic Instances Data suggests modifying a few code sequences will eliminate many voltage emergencies 1000000 Distinct Total 10000 1000 100 apsi ammp sixtrack facerec equake galgel mesa applu mgrid art swim wupwise twolf bzip2 vortex gap perlbmk eon parser crafty mcf gcc 1 vpr 10 gzip Emergencies 100000 22 of 26 Possible Compiler Optimizations • Our goal is to – Smooth out current profile, or – Knock pulses off of the resonant frequency • Some existing options – Software pipelining, code motion, instruction padding Executable Apply Altered Executable Optimizations Binary Modifier Sensor+Actuator Ext’ns Microprocessor 23 of 26 Loop Unrolling & SW Pipelining Problematic loop: Unrolled loop: A A A A B B B B A A B B Current Loop unrolling disrupts resonance pulse Current Software pipelining smoothes profile Iteration=1 Iteration=2 Iteration=3 Current A A B A B A B B A A B 24 of 26 B Unrolling the Di/dt Stressmark H L 1.02V Before Loop Unrolling H1 H2 L1 L2 After Loop Unrolling 1.01V 1.00V 0.99V 0.98V 0.97V 25 of 26 Summary • Symbiotic program optimization is a powerful approach • The di/dt problem – well suited for a symbiotic solution • The Tortola design can also target power reduction, temperature reduction, reliability, etc. http://www.tortolaproject.com/ 26 of 26