Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 4, December 2005, Pages 397-422. Presented by: Manal Houri March 23, 2006 Computer Science and Engineering Motivation Low-power computing has long been an important design objective for mobile, battery-operated devices. More recently, however, power consumption in high-performance microprocessors has drawn considerable attention from industry and researchers. Traditionally, power dissipation in CMOS technology has been significantly lower than other technologies. However, at current speeds and feature sizes, CMOS power consumption has increased dramatically. This makes microprocessor cooling increasingly difficult and expensive. Computer Science and Engineering CSE 8383 Advanced Computer Architecture Direction As a result, over the last few years, power has become a firstpriority concern to microprocessor designers/manufacturers. Industry and researchers are eyeing chip multiprocessor architectures (CMPs), which can attain higher performance by running multiple threads in parallel. By integrating multiple cores on a chip, designers hope to deliver performance growth while depending less on raw circuit speed (power that is). So far, however, very little work has been done on the powerperformance issues involving parallel applications executing on multiprocessors, in general and on multicore chips, in particular. Computer Science and Engineering CSE 8383 Advanced Computer Architecture Problem In a parallel run, processors synchronize and exchange data as they cooperate toward a common goal. Synchronization and communication constitute overheads that typically grow in importance as we increase the number of processors. This generally results in decreased parallel efficiency—speedup over number of processors used. In an application’s changing parallel efficiency, it is not obvious which voltage and frequency levels should be applied, in combination with the appropriate number of processors, to optimize a certain power-performance trade-off and/or meet a particular constraint. Computer Science and Engineering CSE 8383 Advanced Computer Architecture Objective of this paper Investigate the power-performance issues of running parallel applications on a CMP: Develop an analytical model to study the effect of: the number of processors used, the parallel efficiency, and the voltage/frequency scaling applied, on the performance and power consumption delivered by a CMP. More specifically: Optimizing power consumption given a performance target, Optimizing performance given a certain power budget, Then, Provide detailed simulations of parallel applications running on a power-performance model of a CMP are conducted. Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study – Power Equations This equation establishes the relationship between the supply voltage V and maximum operating frequency fmax. fmax = η (V − Vth)α/ V Where: fmax: maximum operating frequency η, α: experimentally derived constants V: supply voltage Vth: threshold voltage Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study – Power Equations This equation defines power consumption P as the sum of dynamic and static components. P = PD + PS = AC V2 f + V Where: PD : dynamic component PS : static component A: gate activity factor C: total capacitance V: supply voltage f: operating frequency Ileak: leakage current at nominal supply voltage Vn and room temperature Tstd (25°C). is the abbreviation of dependency of this curve-fitted formula on supply voltage and temperature. Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study – Performance Equations Execution time formula as proposed by Hennessy and Patterson: -1 t = IC CPI f IC: dynamic instruction count CPI: average number of cycles per instruction Parallel efficiency of an application running on N processors: Parallel efficiency Nominal parallel efficiency t1 (N ) Nt N (N ) IC1CPI1 f11 NICN CPI N f N1 IC1CPI1 NICN CPI N Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario I: Power Optimization Goal Find the configuration that maximizes power savings while delivering a pre-specified level of performance, in particular, deliver the performance of a sequential execution on one processor at full throttle. In other words, t1 = tN -1 = IC CPI f -1 IC CPI f N N N Using the definition of nominal parallel efficiency, we rewrite the above as: f1 fN N n ( N ) Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario I: Power Optimization Performance can be expressed for N-processor configuration as: If we define the voltage scaling ratio above can be rewritten as: the Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario I: Power Optimization We look at 2 process technologies: 130 nm and 65 nm. The operating temperature of the single-core is set at 100°C. Using a 32-way CMP baseline, configurations running on different number of processor cores are studies. Assumption: unused cores are shut down. Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario I: Power Optimization 130 nm technology Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario I: Power Optimization 65 nm technology Computer Science and Engineering CSE 8383 Advanced Computer Architecture Scenario I: Power Optimization – Experimental Evaluation Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario II: Performance Optimization Goal: Find the configuration that maximizes performance under a constrained power budget. The maximum power budget is set to that of executing on one processor at full throttle, P1 = PN Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario II: Performance Optimization The performance gain or speedup S on N processors can be expressed as: Using the definition of voltage ratio get: and equation of frequency, we Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario II: Performance Optimization In order to compute , we use P1 = PN: f V N2 Now we can solve for S having obtained . Computer Science and Engineering CSE 8383 Advanced Computer Architecture Analytical Study - Scenario II: Performance Optimization Computer Science and Engineering CSE 8383 Advanced Computer Architecture Scenario II: Performance Optimization – Experimental Evaluation Computer Science and Engineering CSE 8383 Advanced Computer Architecture Concluding remarks This study shows that, under the right circumstances, parallel computing may bring significant power-performance benefits over a uniprocessor setup of similar performance. However, it also illustrates the dependency of the optimum operating point on multiple interacting factors, such as: the application’s parallel efficiency, the chip’s voltage/frequency scaling characteristics, the process technology, and restrictions in performance, power, and, sometimes, number of available processors. This study shows that these factors can interact in a nonobvious way, making power-performance optimization of on-chip parallel computation quite challenging. Computer Science and Engineering CSE 8383 Advanced Computer Architecture