Download Razor something that can reduce power

An Improved Error Recovery Mechanism to the Razor Pipeline Architecture Timothy Loo Vincent Ng University of California, Berkeley 1. Introduction After decades of astonishing improvement in integrated circuit performance, digital circuits have come to a point in which there are many problems ahead of us. Two main problems are power consumption and process variations. In the past few decades, circuit designs have followed Moore’s law and the number of transistors on a chip has doubled every two years. As we fit more transistors into a given area and clocked them faster and faster, power density increases exponentially. The most effective way to reduce power density is to minimize the supply voltage, as predicted by CV2f. Currently, we have been successful in containing the power density within tolerable levels, but this will not last. One barrier comes from the threshold voltage. In order to maintain the same performance, we have to reduce the threshold voltage together with the supply voltage. However, reducing threshold voltage leads to an exponential increase in off-state leakage current. Leakage current has become so significant that a further reduction in threshold voltage and supply voltage has slowed or even stopped. Without voltage scaling, the power density of a chip will increase without bound. A temperature on a processor already approaches that of a hot plate, and this trend cannot be allowed it to continue. If a solution to this ever increasing power consumption cannot be found, Moore’s law will come to an end and we will no longer be able to experience the tremendous increase in performance experienced in the past. The other problem of the IC industry is process variations. As transistor sizes approach atomic levels, it is very difficult to fabricate exactly what we specify. For example, a variation of dopant implantation on order of a few atoms may translate to a huge difference in dopant concentration, and may cause a noticeable shift in the threshold voltage. Because traditional designs dictate that our circuit must always function correctly in all circumstances, the huge process variations present today forces designers to allocate more voltage margin on top of the typical case in order to ensure proper operation. To make things worse, other temperature, die-to-die, and IR drop variations further increases safety margins needed. The general practice of conservative over design has become a barrier to low power design. Because these large margins are used only to prevent the rare worst case scenario from failing, a large amount of energy can be saved if these margins are eliminated and instead utilize an errorresilient logic that can dynamically tune itself for all kinds of variations. We will then be able to run our chip at the lowest possible energy consumption. Power consumption and variations have become two of the most important roadblocks for extending Moore’s law and these problems must be resolved before we can continue to improve the performance of electronics at the amazing pace we enjoyed in the past decades. 2. Previous work One of the more popular schemes to reduce power consumption in recent years is dynamic voltage scaling (DVS). Most electronic devices are seldom used it at its maximum capacity; in the majority of the time, the processor is idle. Instead of clocking the processor at its maximum speed and creating lots of idle time, we can reduce the clock speed. This will loosen the timing constraints and allow us to lower the supply voltage. Ample research has been done on clock control methods and one of the m popular methods is to use a ring oscillator as the clock. The activity of the processor will control the supply voltage; as supply voltage is reduced, the frequency of the ring oscillator will automatically reduce as well. DVS circuits are designed such that timing constraints are met at all supply voltage levels. More sophisticated frequency control circuits include the multiple tap reset-able delay line [1] and the digital sliding controller [3]. Research has also been done on choosing the best voltage given a specific program. T. Perling, et al. has investigated an algorithm of using a weighted average of the previous activities to determine the best voltage level [4]. They find that this algorithm gives an energy saving that is far from the optimal case. Nonetheless, DVS has been show to provide significant energy saving ranging from 20% to 83%, with an average energy saving of 43% [4]. DVS is a very efficient way to reduce power consumption and has been incorporated into nearly all energyconscious products. Even though DVS was able to reduce the energy consumption significantly, it does not address the problem of variations. The supply voltage has to be high enough to guarantee that the circuit is always giving the correct result. Ernst et al. proposed the Razor topology that can account for variations [2]. In Razor, the output of a pipeline stage is latched again by a shadow latch after a preset delay. The result in the shadow latch is timed to guarantee that the value correct at all possible operating conditions. This value is then compared with the one in the main latch to determine whether a timing error has occurred. If an error has occurred, the value in the shadow latch is fed back into pipeline to continue the calculations. Forward progress is guaranteed but at a slight timing penalty. As long as the error rate is small, the hit on performance is minimal. Because we no longer need to set the supply voltage at a level that guarantees correct operation at all times, we can reduce the supply voltage and enjoy significantly lower power consumption. Figure 1[2] show that if we allow a 1.3% error rate, we can reduce power consumption by 22% and 35% when compared to a zero margin voltage level and an environmental margin voltage level respectively. Given the promising result of razor logic, there are a few complications associated with razor logic that needs to be addressed. One problem is the short path constraint. If the pipeline stage is too fast, the shadow latch will latch onto the result of the next cycle instead of the cycle it intents to latch. This is usually not the problem because razor latches are only inserted into places where the timing constraint can fail. However, because the delay is data dependent, we have to put in delays where necessary to guarantee that the short path constraint is always satisfied. The other issue is the possibility that the error signal itself may be meta-stable. This can happen because the error signal depends on the output of the main latch, which could have latched while the result is transitioning. The Razor paper [2] also proposed a way to account for meta-stability in both the error signal and the output of the main latch. According to the Razor’s design [2], the whole pipeline has to be flushed and rerun at a safe voltage if meta-stability occurs in the error signal. Flushing of the entire pipeline is necessary because detecting the meta-stability of the error signal requires two clock cycles and the correct value stored in the shadow latch will The amazing thing about Razor is that it can dynamically tune the supply voltage to run the circuit at the lowest supply voltage possible regardless of any global or local variations. Given the processor activity and the error rate, Razor can use activity to control the frequency and use error rate to control the supply voltage. DVS alone can reduce power consumption by more than 46% [4] and Razor can reduce an additional 40% [2]. With both methods, we can reduce power consumption significantly without compromising performance. have lost. This could be very expensive in both power and performance, but fortunately it rarely occurs. There is already a simple circuit that checks whether the output of the main latch is meta-stable and the possibility that this circuit fails is very rare [2]. Nonetheless, the additional circuit to check for meta-stability is very small and poses a negligible burden on power and performance; it does not hurt to be safe [2]. The last complication comes from the fact that we need to guarantee that the inputs to the writeback stage is correct, otherwise we might have overwritten the memory or registers with corrupt data before we are able to detect any errors. The paper [2] proposes to put a dummy stage before the write-back stage to account for this. This does not hurt the performance but will require a non-negligible use of hardware and power. The Razor research group [2] designed a razor processor and ran several different benchmark programs on it. The results are shown in Table 1. They show that, most of the time, by using Razor, energy consumption can be reduced by an average of 42.4 % with a performance hit of less than 1%. They also show that all the additional buffers and latches required in a razor processor only increase the power consumption by 3% during error free operation and the error correction overhead is 1% for a 10% error rate. Recently, the Razor research group [5] has proposed two more sophisticated voltage control mechanism for the Razor design. The first proposal is to use local voltage level for each pipeline stage instead of a global supply voltage for all stages. This can provide an additional 15% energy reduction comparing to the original global supply voltage razor design mentioned above. However, local supply voltage complicates the design significantly and thus the paper instead focused on dynamic retiming. They noticed that the EX stage and the ID stage creates up more than 90% of the errors in the global supply razor design [5]; consequently, they dynamically skewed the clock such that these stages receive more time than the other stages. The retiming controller makes use of an inverter delay line to dynamically skew the boundaries between pipeline stages such that each stage has similar error rates. The additional hardware this approach requires takes up less than 0.1% of the total chip energy consumption while providing a 12% additional energy reduction compared to the original razor design. 3. Project Approach In our project, we propose to expand upon the Razor topology in order to alleviate one of its limitations. In additional to the shortest path and meta-stability issues mentioned above, the Razor also suffers from a limitation in its error handling architecture. If an error is detected by the shadow latches in a certain stage, the Razor system halts the previous stages through clock gating and inserts a bubble into the next stage. Then it instructs the current and previous stages to restart using the correct data stored in the shadow latches. The insertion of a bubble effectively cuts the data throughput by half. Other recovery approaches such as counter flow pipelining fare even worse because it requires that the stages behind to the errant stage be flushed. We propose to modify the Razor architecture such that error recovery does not necessitate a bubble insertion and thus does not hurt the pipeline’s throughput. To accomplish this task, our proposal utilizes retiming and slack passing concepts discussed in various papers and books [5, 6, and 7]. In the original Razor error recovery design, a bubble is inserted because an error in logic stage x is detected after a half cycle has passed. By the time the error is detected, there is not enough time remaining for the correct data to be entered into stage x + 1 and still finish before the data is latched. Consequently, the data in that stage x + 1 must be thrown away with a bubble and the correct data entered in the next clock cycle. However, this problem can be solved if the correct data is given extra time in stage x + 1 to successfully compute. Slack passing concepts suggests that stages can be given more time to process by borrowing slack from other stages [6], [7]. Instead of using latches though, clock retiming can be used instead as mentioned in the second Razor paper [6] to create slack by dynamically delaying the clock signal to the flip flop at the end to stage x + 1 to lock one half clock cycles later than normal. Thus with the extra time, the correct data can now progress normally through the pipeline without a bubble insertion. In order to make up this borrowed time, a dummy stage can be added after this stage or at the end of the pipeline in order to provide the extra slack. An extra dummy pipeline stage is suitable because it does not affect throughput, only latency. Moreover, the Razor design already uses a dummy stage at the end of the pipeline to handle meta-stability issues; consequently, no extra stage will be added to the pipeline. Although the current approach does not require additional stages in the pipeline itself, duplicate stages operating parallel to the pipeline is necessary. Because the time spent in stage x + 1 is 1.5 times longer than normal, the data in the previous stage needs be bypassed to a duplicate pipeline until the data can be merged back into the main pipeline. The amount of duplication required depends on how far away the dummy slack stage is from the errant stage. Table 2 shows the general timing flow of the proposed design. The table displays the data in each stage at any given clock cycle. It is assumed that the data has a propagation delay of one clock cycle in each stage, excluding the dummy stage. At time 0, the pipeline is filled. At time 0.5 in stage EX, data 2 is incorrectly computed and passed on to the MEM stage at time 1.0. The shadow latches detects this error and replaces the data at time 1.5. The flip flop of stage MEM is then delayed by half a clock cycle so that it locks at time 2.0. Because data 2 is occupying stage MEM in time 2.0, data 3 is shifted into a duplicate stage. At time 2.5, data 2 is finished and is passed onto the dummy stage. At time 3.0, data 3 from the MEM2 stage is inserted into the dummy stage—thus completing the recovery process. The goal of our project is to implement the Razor pipeline architecture with and without our error handling design in a pipelined Wallace multiplier circuit. We will then explore various aspects of this design including performance and correctness issues, power and area consumption, and robustness of our design to multiple errors. We will verify the correctness and viability of our concept, design, and implementation. We will analyze the power and space usage of the error correction design to determine if the boost in performance can be justified. Finally, we will explore our design to determine if it can handle multiple errors that may occur simultaneously in different logic stages or sequentially in the same logic stages. References [1] S.Dhar, D. Maksimovic, and B. Kranzen, “Closed-Loop Adaptive Voltage Scaling Controller For Standard-Cell ASICs,” 2002 Int’l Symposium on Low Power Electronics and Design (ISLPED-2002), August 2002. [2] D. Ernst, N.Kim, S.Das, S.Pant, R.Rao, T.Pham, C.Ziesler, D.Blaauw, T.Austin, K. Flantner, and T. Mudge, “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation”, MICRO-36, December 2003 [3] J.Kim and M. Horowitz, “An Efficient Digital Sliding Controller for Adaptive Power Supply Regulation”, IEEE Symposium on VLSI Circuits, June 2001, pp. 133-136 [4] T. Pering, T. Burd, and R. Brodersen. “The Simulation and Evaluation of Dynamic Voltage Scaling Algorithms.” Proceedings of Int’l Symposium on Low Power Electronics and Design 1998, pp. 76-81, June 1998. [5] S. Lee, S.Das, T. Pham, T. Austin, D Blaauw, and T. Mudge. “Reducing Pipeline Energy Demands with Local DVS and Dynamic Retiming” Proceedings of Int’l Symposium on Low Power Electronics and Design, 2004 pp 319-214 [6] S. Krishnamohan, N. Mahapatra. “Increasing the Energy Efficiency of Pipelined Circuits via Slack Redistribution” GLSVLSI ’05, April 2005 [7] J. Rabaey, A. Chandrakasan, B. Nikolic. Digital Integrated Circuits. 2003 Pearson Education International. New Jersey. Stage Time IF ID EX MEM Dummy MEM2 0 4 3 2 1 0 -0.5 4 3 2* 1 0 -1.0 5 4 3 2* 1 -1.5 5 4 3 2 1 -2.0 6 5 4 2 -3 2.5 6 5 4 -2 3 3.0 7 6 5 4 3 -3.5 7 6 5 4 3 -4.0 8 7 6 5 4 -4.5 8 7 6 5 4 -Table 2: Timing Diagram of the proposed Razor error recovery scheme. The number in each entry identifies the data in each stage at any given time (incremented in half clock cycles). 2* denotes an incorrectly computed piece of data. The underlines indicate when the flip flop locks the data at the end of each stage. Dashes [--] indicates unimportant data.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Razor something that can reduce power