Download CSE 8383 Advanced Computer Architecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Power-Performance Considerations of
Parallel Computing on Chip
Multiprocessors
Jian Li and Jose F. Martinez
ACM Transactions on Architecture and Code Optimization, Vol. 2,
No. 4, December 2005, Pages 397-422.
Presented by: Manal Houri
March 23, 2006
Computer Science and Engineering
Motivation



Low-power computing has long been an important design objective
for mobile, battery-operated devices.
More recently, however, power consumption in high-performance
microprocessors has drawn considerable attention from industry
and researchers.
Traditionally, power dissipation in CMOS technology has been
significantly lower than other technologies.


However, at current speeds and feature sizes, CMOS power
consumption has increased dramatically.
This makes microprocessor cooling increasingly difficult and expensive.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Direction




As a result, over the last few years, power has become a firstpriority concern to microprocessor designers/manufacturers.
Industry and researchers are eyeing chip multiprocessor
architectures (CMPs), which can attain higher performance by
running multiple threads in parallel.
By integrating multiple cores on a chip, designers hope to deliver
performance growth while depending less on raw circuit speed
(power that is).
So far, however, very little work has been done on the powerperformance issues involving parallel applications executing on
multiprocessors, in general and on multicore chips, in particular.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Problem




In a parallel run, processors synchronize and exchange data as they
cooperate toward a common goal.
Synchronization and communication constitute overheads that
typically grow in importance as we increase the number of
processors.
This generally results in decreased parallel efficiency—speedup over
number of processors used.
In an application’s changing parallel efficiency, it is not obvious
which voltage and frequency levels should be applied, in
combination with the appropriate number of processors, to optimize
a certain power-performance trade-off and/or meet a particular
constraint.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Objective of this paper

Investigate the power-performance issues of running parallel
applications on a CMP:
 Develop an analytical model to study the effect of:

the number of processors used,
 the parallel efficiency,
 and the voltage/frequency scaling applied, on the
performance and power consumption delivered by a CMP.
 More specifically:
 Optimizing power consumption given a performance target,
 Optimizing performance given a certain power budget,
 Then,
 Provide detailed simulations of parallel applications running
on a power-performance model of a CMP are conducted.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study




Power Equations
Performance Equations
Scenario I: Power Optimization
Scenario II: Performance Optimization
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study




Power Equations
Performance Equations
Scenario I: Power Optimization
Scenario II: Performance Optimization
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study – Power Equations


This equation establishes the relationship between the
supply voltage V and maximum operating frequency fmax.
fmax = η (V − Vth)α/ V

Where:
 fmax: maximum operating frequency
 η, α: experimentally derived constants
 V: supply voltage
 Vth: threshold voltage
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study – Power Equations


This equation defines power consumption P as the sum
of dynamic and static components.
P = PD + PS = AC V2 f + V

Where:
 PD : dynamic component
 PS : static component
 A: gate activity factor
 C: total capacitance
 V: supply voltage
 f: operating frequency
 Ileak: leakage current at nominal supply voltage Vn and room
temperature Tstd (25°C).

is the abbreviation of dependency of this curve-fitted
formula on supply voltage and temperature.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study




Power Equations
Performance Equations
Scenario I: Power Optimization
Scenario II: Performance Optimization
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study – Performance Equations

Execution time formula as proposed by Hennessy and
Patterson:
-1
 t = IC CPI f



IC: dynamic instruction count
CPI: average number of cycles per instruction
Parallel efficiency of an application running on N
processors:
Parallel efficiency
Nominal parallel
efficiency
t1
 (N ) 
Nt N
 (N ) 

IC1CPI1 f11
NICN CPI N f N1
IC1CPI1
NICN CPI N
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study




Power Equations
Performance Equations
Scenario I: Power Optimization
Scenario II: Performance Optimization
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario I: Power
Optimization

Goal
 Find the configuration that maximizes power savings
while delivering a pre-specified level of performance,
in particular, deliver the performance of a sequential
execution on one processor at full throttle.
 In other words, t1 = tN
-1 = IC CPI f -1
 IC CPI f
N
N N
 Using the definition of nominal parallel efficiency, we
rewrite the above as:
f1
fN 
N n ( N )
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario I: Power
Optimization

Performance can be expressed for N-processor
configuration as:

If we define the voltage scaling ratio
above can be rewritten as:
the
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario I: Power
Optimization



We look at 2 process technologies: 130 nm and 65 nm.
The operating temperature of the single-core is set at
100°C.
Using a 32-way CMP baseline, configurations running
on different number of processor cores are studies.
 Assumption: unused cores are shut down.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario I: Power
Optimization
130 nm
technology
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario I: Power
Optimization
65 nm
technology
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Scenario I: Power Optimization –
Experimental Evaluation
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study




Power Equations
Performance Equations
Scenario I: Power Optimization
Scenario II: Performance Optimization
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario II: Performance
Optimization

Goal:
 Find the configuration that maximizes performance
under a constrained power budget.
 The maximum power budget is set to that of
executing on one processor at full throttle, P1 = PN
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario II: Performance
Optimization

The performance gain or speedup S on N processors
can be expressed as:

Using the definition of voltage ratio
get:
and equation of frequency, we
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario II: Performance
Optimization

In order to compute
, we use P1 = PN:
f
V N2

Now we can solve for S having obtained
.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Analytical Study - Scenario II: Performance
Optimization
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Scenario II: Performance Optimization –
Experimental Evaluation
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Concluding remarks


This study shows that, under the right circumstances, parallel
computing may bring significant power-performance benefits
over a uniprocessor setup of similar performance.
However, it also illustrates the dependency of the optimum
operating point on multiple interacting factors, such as:






the application’s parallel efficiency,
the chip’s voltage/frequency scaling characteristics,
the process technology,
and restrictions in performance, power, and, sometimes,
number of available processors.
This study shows that these factors can interact in a
nonobvious way, making power-performance optimization of
on-chip parallel computation quite challenging.
Computer Science and Engineering
CSE 8383 Advanced Computer Architecture
Related documents