Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 A Summary of the Paper: Energy Dissipation in General Purpose Microprocessors Chirag Sharma Abstract—A critical summary of the paper [1] discussing energy dissipation in general purpose microprocessors is presented. The paper explores methods of reducing energy consumption that do not lead to performance loss and explores methods to reduce delay by exploiting instruction level parallelism. Firstly introduction and summary of the contents of the paper are presented. Next the methods used and results obtained in the paper are presented. This is followed by conclusion and significance of the paper. Finally the follow-on research that could be performed in the area of research of the paper and paper’s relation to my field of study are presented. energy-delay product is to shrink the technology. The improvement due to technology shrinking would be approximately constant for all processors, so it is ignored in the paper. The paper looks at a lower-bound on the energy and energydelay product for a processor by investigating a number of ideal machines. Finally, the paper looks at two processors designed by the authors, a RISC machine and a superscalar processor called TORCH [2]. These real machines are optimized and then compared against ideal ones. III. I. INTRODUCTION T HE interest in lowering the power of a processor has grown dramatically in the recent past. This is partly because of the high power levels of today’s state-of-the art processors, as well as the growing market of portable computation devices. For processors, power and performance are strongly correlated. Low-power invariably means lower performance. The paper shows that the reason for the slow progress in processors is because the energy-delay product of a processor is roughly set by the energy-delay product of the underlying technology. II. SUMMARY The authors start the discussion with the need to have a suitable metric for energy efficiency to compare various processors available. Power is not a good metric since it is proportional to the clock frequency. By simply reducing the clock speed we can reduce the power dissipated but that will also reduce the performance of the microprocessor. Energy consumed in a microprocessor is proportional to cv2. Hence it can be reduced by reducing the supply voltage or decreasing the capacitance by using smaller transistors. Both of these changes increase the delay of the circuits, so energy is also not a good metric. The best metric according to the authors is the product of energy and delay (in Joules/SPEC or its inverse SPEC2/W) of a microprocessor since all processors are close to each other in terms of energy-delay product. An effective way to reduce the METHODS AND RESULTS For investigating the lower bound on the energy and energydelay product, three idealized machines were considered: an unpipelined processor, a simple pipelined RISC processor, and a superscalar processor. All processors basically perform the same operations. They fetch instructions from a cache, use that information to determine which operands to fetch from a register file, then either operate on this data, or use it to generate an address to fetch from the data cache, and finally store the result. In real processor, there is much more overhead. Since overhead depends on the implementation, it is neglected in the paper. The authors assume that energy cost of performing computation is zero, and communication costs within the datapath are zero. They only considered the energy needed to read and write memories, and, where required, the energy to clock storage elements, such as pipeline latches. The authors assume that there ideal machines never need speculative operations. For simulations the unpipelined processor consisted of a simple state machine that fetches an instruction, fetches the operands, performs the operation, and stores the result. The authors assume that no energy is dissipated in clocking as the processor is not pipelined. For the pipelined machine, authors use the traditional MIPS or DLX [3] five-stage pipeline. In addition to the energy required to read and write memories, energy required to clock the latches in the pipeline and the Program Counter (PC) chain. The ideal superscalar processor is similar to the pipelined machine, except that it can execute a maximum of two instructions per cycle. This machine is an idealization of TORCH machine [2]. 2 would require much more sophisticated optimizations since the remaining energy is dissipated in many small units, none of which account for a significant fraction of the total energy dissipation. The paper talks about reducing energy dissipation in general purpose microprocessors which is very important as power is an important design goal for portable devices like laptops and cell phones. Moreover, facts like slow improvements in battery technology and air-cooling techniques reaching their peak limits of optimization, stress the need for low energy dissipation in microprocessors. The paper comes up with a good metric for energy efficiency. It shows the advantages of using clock gating to reduce dynamic power dissipated on clock transitions. Fig. 1. Normalized energy-delay product of ideal machines Figure 1 shows the energy-delay product. Pipelining provides a big boost in performance for a very little energy cost, so it gives almost a 2 improvement in energy-delay product. Superscalar issue, on the other hand, only gives a small improvement in energy-delay product. As has been shown before in [4] and [5], some simple optimizations can yield significant gain in energy dissipation. The simple RISC processor is similar to the original MIPS R3000 described by Kane [6], except it includes on-chip caches. It also includes all the overhead associated with the architecture. TORCH [2] is a statically scheduled two-way superscalar processor. The authors used TORCH as an example because of its smaller overhead compared to that of an aggressive dynamically scheduled processor. Following are a few techniques that authors used to reduce waste in implementations. Clock gating can be used to eliminate transitions that should never have happened. In their implementation, authors qualified latches in datapath when instruction does not produce a result. In TORCH, densely coded No-Operations (NOP’s) improve the code density and reduce the number of instruction cache accesses. The authors also qualified all latches that are not in the main execution datapath. Results obtained due to these optimizations were encouraging. Through the use of clock gating, approximately one third (33%) of the clock power or close to 15% of the total power was saved. Other important optimizations were to eliminate access to the caches and the register file when the machine is stalled and to eliminate accesses to the instruction cache when an instruction is dynamically NOPed. By doing so approximately 8% of the total power was saved. IV. CONCLUSION AND SIGNIFICANCE OF PAPER The authors through simulation show that it is possible to reduce the energy requirements by careful design. The paper shows that using easy-to-implement optimizations, the energy can be reduced by approximately 25%. Further improvements V. FOLLOW-ON RESEARCH The authors here considered dynamic power dissipation only and neglected static power and leakage power. With shrinking technologies leakage power is becoming important as subthreshold currents grow exponentially with increase in temperature and decrease in threshold voltage. The follow-on research that could be performed should include coming up with simulation models which take static power consumption also into account. VI. RELATION TO FIELD OF STUDY The paper being reviewed has only slight relevance to reviewer’s field of research. The reviewer is currently working on implementing a mixed-signal integrated circuit to locate faults on aircraft wiring. This implementation would be Application Specific Integrated Circuit (ASIC) and not a microprocessor based system. This implementation being mixed signal would have some digital components, but since it would be working at very low power static power dissipation would be equally important as dynamic power dissipation. REFERENCES [1] Ricardo Gonzalez and Mark Horowit, “Energy dissipation in general purpose microprocessors,” IEEE J. Solid-State Circuits, vol. 31, no. 9, pp. 1277-1284. [2] M. D. Smith, “Support for speculative execution in highperformance processors,” Ph.D. thesis, Stanford University, Stanford, CA, Nov. 1992. [3] J. L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach, 1st ed. Morgan Kauffman, 1990. [4] R. Bechade et. al, “ A 32 b 66 MHz 1.8 W microprocessor,” in IEEE Int. Solid-State Circuits Conf. Feb. 1994, pp. 208-209. [5] T. Biggset, et. al, “A 1 Watt 68040-compatible microprocessor,” in Symp. Low Power Eletr. , IEEE Solid-State Circuits Council, Oct. 1994, vol. 1, pp. 12-13. [6] G. Kane, MIPS RISC Architecture. Englewood Cliffs, NJ: Prentice Hall, 1988.