Download microprocessor

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Grid energy storage wikipedia , lookup

Microprocessor wikipedia , lookup

Power engineering wikipedia , lookup

Transcript
1
A Summary of the Paper: Energy Dissipation in
General Purpose Microprocessors
Chirag Sharma

Abstract—A critical summary of the paper [1] discussing
energy dissipation in general purpose microprocessors is
presented. The paper explores methods of reducing energy
consumption that do not lead to performance loss and explores
methods to reduce delay by exploiting instruction level
parallelism. Firstly introduction and summary of the contents of
the paper are presented. Next the methods used and results
obtained in the paper are presented. This is followed by
conclusion and significance of the paper. Finally the follow-on
research that could be performed in the area of research of the
paper and paper’s relation to my field of study are presented.
energy-delay product is to shrink the technology. The
improvement due to technology shrinking would be
approximately constant for all processors, so it is ignored in
the paper.
The paper looks at a lower-bound on the energy and energydelay product for a processor by investigating a number of
ideal machines. Finally, the paper looks at two processors
designed by the authors, a RISC machine and a superscalar
processor called TORCH [2]. These real machines are
optimized and then compared against ideal ones.
III.
I.
INTRODUCTION
T
HE interest in lowering the power of a processor has
grown dramatically in the recent past. This is partly
because of the high power levels of today’s state-of-the art
processors, as well as the growing market of portable
computation devices. For processors, power and performance
are strongly correlated. Low-power invariably means lower
performance. The paper shows that the reason for the slow
progress in processors is because the energy-delay product of a
processor is roughly set by the energy-delay product of the
underlying technology.
II. SUMMARY
The authors start the discussion with the need to have a
suitable metric for energy efficiency to compare various
processors available. Power is not a good metric since it is
proportional to the clock frequency. By simply reducing the
clock speed we can reduce the power dissipated but that will
also reduce the performance of the microprocessor. Energy
consumed in a microprocessor is proportional to cv2. Hence it
can be reduced by reducing the supply voltage or decreasing
the capacitance by using smaller transistors. Both of these
changes increase the delay of the circuits, so energy is also not
a good metric.
The best metric according to the authors is the product of
energy and delay (in Joules/SPEC or its inverse SPEC2/W) of
a microprocessor since all processors are close to each other in
terms of energy-delay product. An effective way to reduce the
METHODS AND RESULTS
For investigating the lower bound on the energy and energydelay product, three idealized machines were considered: an
unpipelined processor, a simple pipelined RISC processor, and
a superscalar processor. All processors basically perform the
same operations. They fetch instructions from a cache, use that
information to determine which operands to fetch from a
register file, then either operate on this data, or use it to
generate an address to fetch from the data cache, and finally
store the result. In real processor, there is much more
overhead. Since overhead depends on the implementation, it is
neglected in the paper. The authors assume that energy cost of
performing computation is zero, and communication costs
within the datapath are zero. They only considered the energy
needed to read and write memories, and, where required, the
energy to clock storage elements, such as pipeline latches. The
authors assume that there ideal machines never need
speculative operations.
For simulations the unpipelined processor consisted of a
simple state machine that fetches an instruction, fetches the
operands, performs the operation, and stores the result. The
authors assume that no energy is dissipated in clocking as the
processor is not pipelined. For the pipelined machine, authors
use the traditional MIPS or DLX [3] five-stage pipeline. In
addition to the energy required to read and write memories,
energy required to clock the latches in the pipeline and the
Program Counter (PC) chain. The ideal superscalar processor
is similar to the pipelined machine, except that it can execute a
maximum of two instructions per cycle. This machine is an
idealization of TORCH machine [2].
2
would require much more sophisticated optimizations since the
remaining energy is dissipated in many small units, none of
which account for a significant fraction of the total energy
dissipation.
The paper talks about reducing energy dissipation in general
purpose microprocessors which is very important as power is
an important design goal for portable devices like laptops and
cell phones. Moreover, facts like slow improvements in
battery technology and air-cooling techniques reaching their
peak limits of optimization, stress the need for low energy
dissipation in microprocessors. The paper comes up with a
good metric for energy efficiency. It shows the advantages of
using clock gating to reduce dynamic power dissipated on
clock transitions.
Fig. 1. Normalized energy-delay product of ideal machines
Figure 1 shows the energy-delay product. Pipelining
provides a big boost in performance for a very little energy
cost, so it gives almost a 2  improvement in energy-delay
product. Superscalar issue, on the other hand, only gives a
small improvement in energy-delay product.
As has been shown before in [4] and [5], some simple
optimizations can yield significant gain in energy dissipation.
The simple RISC processor is similar to the original MIPS
R3000 described by Kane [6], except it includes on-chip
caches. It also includes all the overhead associated with the
architecture. TORCH [2] is a statically scheduled two-way
superscalar processor. The authors used TORCH as an
example because of its smaller overhead compared to that of
an aggressive dynamically scheduled processor. Following are
a few techniques that authors used to reduce waste in
implementations.
Clock gating can be used to eliminate transitions that should
never have happened. In their implementation, authors
qualified latches in datapath when instruction does not produce
a result. In TORCH, densely coded No-Operations (NOP’s)
improve the code density and reduce the number of instruction
cache accesses. The authors also qualified all latches that are
not in the main execution datapath.
Results obtained due to these optimizations were
encouraging. Through the use of clock gating, approximately
one third (33%) of the clock power or close to 15% of the total
power was saved. Other important optimizations were to
eliminate access to the caches and the register file when the
machine is stalled and to eliminate accesses to the instruction
cache when an instruction is dynamically NOPed. By doing so
approximately 8% of the total power was saved.
IV. CONCLUSION AND SIGNIFICANCE OF PAPER
The authors through simulation show that it is possible to
reduce the energy requirements by careful design. The paper
shows that using easy-to-implement optimizations, the energy
can be reduced by approximately 25%. Further improvements
V. FOLLOW-ON RESEARCH
The authors here considered dynamic power dissipation
only and neglected static power and leakage power. With
shrinking technologies leakage power is becoming important
as subthreshold currents grow exponentially with increase in
temperature and decrease in threshold voltage. The follow-on
research that could be performed should include coming up
with simulation models which take static power consumption
also into account.
VI. RELATION TO FIELD OF STUDY
The paper being reviewed has only slight relevance to
reviewer’s field of research. The reviewer is currently working
on implementing a mixed-signal integrated circuit to locate
faults on aircraft wiring. This implementation would be
Application Specific Integrated Circuit (ASIC) and not a
microprocessor based system. This implementation being
mixed signal would have some digital components, but since it
would be working at very low power static power dissipation
would be equally important as dynamic power dissipation.
REFERENCES
[1] Ricardo Gonzalez and Mark Horowit, “Energy dissipation in
general purpose microprocessors,” IEEE J. Solid-State Circuits,
vol. 31, no. 9, pp. 1277-1284.
[2] M. D. Smith, “Support for speculative execution in highperformance processors,” Ph.D. thesis, Stanford University,
Stanford, CA, Nov. 1992.
[3] J. L. Hennessy and D. A. Patterson, Computer Architecture A
Quantitative Approach, 1st ed. Morgan Kauffman, 1990.
[4] R. Bechade et. al, “ A 32 b 66 MHz 1.8 W microprocessor,” in
IEEE Int. Solid-State Circuits Conf. Feb. 1994, pp. 208-209.
[5] T. Biggset, et. al, “A 1 Watt 68040-compatible microprocessor,”
in Symp. Low Power Eletr. , IEEE Solid-State Circuits Council,
Oct. 1994, vol. 1, pp. 12-13.
[6] G. Kane, MIPS RISC Architecture. Englewood Cliffs, NJ:
Prentice Hall, 1988.