Download Razor something that can reduce power

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Power inverter wikipedia , lookup

Three-phase electric power wikipedia , lookup

Variable-frequency drive wikipedia , lookup

Power engineering wikipedia , lookup

Electrical substation wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

History of electric power transmission wikipedia , lookup

Power MOSFET wikipedia , lookup

Islanding wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Rectifier wikipedia , lookup

Voltage regulator wikipedia , lookup

Power electronics wikipedia , lookup

Opto-isolator wikipedia , lookup

Stray voltage wikipedia , lookup

Distribution management system wikipedia , lookup

Surge protector wikipedia , lookup

Buck converter wikipedia , lookup

AC adapter wikipedia , lookup

Alternating current wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Voltage optimisation wikipedia , lookup

Mains electricity wikipedia , lookup

Transcript
An Improved Error Recovery Mechanism to the Razor Pipeline Architecture
Timothy Loo Vincent Ng
University of California, Berkeley
1. Introduction
After decades of astonishing improvement
in integrated circuit performance, digital circuits
have come to a point in which there are many
problems ahead of us. Two main problems are
power consumption and process variations.
In the past few decades, circuit designs have
followed Moore’s law and the number of
transistors on a chip has doubled every two
years. As we fit more transistors into a given
area and clocked them faster and faster, power
density increases exponentially. The most
effective way to reduce power density is to
minimize the supply voltage, as predicted by
CV2f. Currently, we have been successful in
containing the power density within tolerable
levels, but this will not last. One barrier comes
from the threshold voltage. In order to maintain
the same performance, we have to reduce the
threshold voltage together with the supply
voltage. However, reducing threshold voltage
leads to an exponential increase in off-state
leakage current. Leakage current has become so
significant that a further reduction in threshold
voltage and supply voltage has slowed or even
stopped. Without voltage scaling, the power
density of a chip will increase without bound. A
temperature on a processor already approaches
that of a hot plate, and this trend cannot be
allowed it to continue. If a solution to this ever
increasing power consumption cannot be found,
Moore’s law will come to an end and we will no
longer be able to experience the tremendous
increase in performance experienced in the past.
The other problem of the IC industry is
process variations. As transistor sizes approach
atomic levels, it is very difficult to fabricate
exactly what we specify. For example, a
variation of dopant implantation on order of a
few atoms may translate to a huge difference in
dopant concentration, and may cause a
noticeable shift in the threshold voltage. Because
traditional designs dictate that our circuit must
always function correctly in all circumstances,
the huge process variations present today forces
designers to allocate more voltage margin on top
of the typical case in order to ensure proper
operation. To make things worse, other
temperature, die-to-die, and IR drop variations
further increases safety margins needed. The
general practice of conservative over design has
become a barrier to low power design. Because
these large margins are used only to prevent the
rare worst case scenario from failing, a large
amount of energy can be saved if these margins
are eliminated and instead utilize an errorresilient logic that can dynamically tune itself
for all kinds of variations. We will then be able
to run our chip at the lowest possible energy
consumption.
Power consumption and variations have
become two of the most important roadblocks
for extending Moore’s law and these problems
must be resolved before we can continue to
improve the performance of electronics at the
amazing pace we enjoyed in the past decades.
2. Previous work
One of the more popular schemes to reduce
power consumption in recent years is dynamic
voltage scaling (DVS). Most electronic devices
are seldom used it at its maximum capacity; in
the majority of the time, the processor is idle.
Instead of clocking the processor at its
maximum speed and creating lots of idle time,
we can reduce the clock speed. This will loosen
the timing constraints and allow us to lower the
supply voltage. Ample research has been done
on clock control methods and one of the m
popular methods is to use a ring oscillator as the
clock. The activity of the processor will control
the supply voltage; as supply voltage is reduced,
the frequency of the ring oscillator will
automatically reduce as well. DVS circuits are
designed such that timing constraints are met at
all supply voltage levels. More sophisticated
frequency control circuits include the multiple
tap reset-able delay line [1] and the digital
sliding controller [3]. Research has also been
done on choosing the best voltage given a
specific program. T. Perling, et al. has
investigated an algorithm of using a weighted
average of the previous activities to determine
the best voltage level [4]. They find that this
algorithm gives an energy saving that is far from
the optimal case. Nonetheless, DVS has been
show to provide significant energy saving
ranging from 20% to 83%, with an average
energy saving of 43% [4]. DVS is a very
efficient way to reduce power consumption and
has been incorporated into nearly all energyconscious products.
Even though DVS was able to reduce the
energy consumption significantly, it does not
address the problem of variations. The supply
voltage has to be high enough to guarantee that
the circuit is always giving the correct result.
Ernst et al. proposed the Razor topology that can
account for variations [2]. In Razor, the output
of a pipeline stage is latched again by a shadow
latch after a preset delay. The result in the
shadow latch is timed to guarantee that the value
correct at all possible operating conditions. This
value is then compared with the one in the main
latch to determine whether a timing error has
occurred. If an error has occurred, the value in
the shadow latch is fed back into pipeline to
continue the calculations. Forward progress is
guaranteed but at a slight timing penalty. As
long as the error rate is small, the hit on
performance is minimal. Because we no longer
need to set the supply voltage at a level that
guarantees correct operation at all times, we can
reduce the supply voltage and enjoy
significantly lower power consumption. Figure
1[2] show that if we allow a 1.3% error rate, we
can reduce power consumption by 22% and 35%
when compared to a zero margin voltage level
and an environmental margin voltage level
respectively.
Given the promising result of razor logic,
there are a few complications associated with
razor logic that needs to be addressed. One
problem is the short path constraint. If the
pipeline stage is too fast, the shadow latch will
latch onto the result of the next cycle instead of
the cycle it intents to latch. This is usually not
the problem because razor latches are only
inserted into places where the timing constraint
can fail. However, because the delay is data
dependent, we have to put in delays where
necessary to guarantee that the short path
constraint is always satisfied. The other issue is
the possibility that the error signal itself may be
meta-stable. This can happen because the error
signal depends on the output of the main latch,
which could have latched while the result is
transitioning. The Razor paper [2] also proposed
a way to account for meta-stability in both the
error signal and the output of the main latch.
According to the Razor’s design [2], the whole
pipeline has to be flushed and rerun at a safe
voltage if meta-stability occurs in the error
signal. Flushing of the entire pipeline is
necessary because detecting the meta-stability of
the error signal requires two clock cycles and the
correct value stored in the shadow latch will
The amazing thing about Razor is that it can
dynamically tune the supply voltage to run the
circuit at the lowest supply voltage possible
regardless of any global or local variations.
Given the processor activity and the error rate,
Razor can use activity to control the frequency
and use error rate to control the supply voltage.
DVS alone can reduce power consumption by
more than 46% [4] and Razor can reduce an
additional 40% [2]. With both methods, we can
reduce power consumption significantly without
compromising performance.
have lost. This could be very expensive in both
power and performance, but fortunately it rarely
occurs. There is already a simple circuit that
checks whether the output of the main latch is
meta-stable and the possibility that this circuit
fails is very rare [2]. Nonetheless, the additional
circuit to check for meta-stability is very small
and poses a negligible burden on power and
performance; it does not hurt to be safe [2]. The
last complication comes from the fact that we
need to guarantee that the inputs to the writeback stage is correct, otherwise we might have
overwritten the memory or registers with corrupt
data before we are able to detect any errors. The
paper [2] proposes to put a dummy stage before
the write-back stage to account for this. This
does not hurt the performance but will require a
non-negligible use of hardware and power.
The Razor research group [2] designed a
razor processor and ran several different
benchmark programs on it. The results are
shown in Table 1. They show that, most of the
time, by using Razor, energy consumption can
be reduced by an average of 42.4 % with a
performance hit of less than 1%. They also show
that all the additional buffers and latches
required in a razor processor only increase the
power consumption by 3% during error free
operation and the error correction overhead is
1% for a 10% error rate.
Recently, the Razor research group [5] has
proposed two more sophisticated voltage control
mechanism for the Razor design. The first
proposal is to use local voltage level for each
pipeline stage instead of a global supply voltage
for all stages. This can provide an additional
15% energy reduction comparing to the original
global supply voltage razor design mentioned
above. However, local supply voltage
complicates the design significantly and thus the
paper instead focused on dynamic retiming.
They noticed that the EX stage and the ID stage
creates up more than 90% of the errors in the
global supply razor design [5]; consequently,
they dynamically skewed the clock such that
these stages receive more time than the other
stages. The retiming controller makes use of an
inverter delay line to dynamically skew the
boundaries between pipeline stages such that
each stage has similar error rates. The additional
hardware this approach requires takes up less
than 0.1% of the total chip energy consumption
while providing a 12% additional energy
reduction compared to the original razor design.
3. Project Approach
In our project, we propose to expand upon
the Razor topology in order to alleviate one of
its limitations. In additional to the shortest path
and meta-stability issues mentioned above, the
Razor also suffers from a limitation in its error
handling architecture. If an error is detected by
the shadow latches in a certain stage, the Razor
system halts the previous stages through clock
gating and inserts a bubble into the next stage.
Then it instructs the current and previous stages
to restart using the correct data stored in the
shadow latches. The insertion of a bubble
effectively cuts the data throughput by half.
Other recovery approaches such as counter flow
pipelining fare even worse because it requires
that the stages behind to the errant stage be
flushed.
We propose to modify the Razor
architecture such that error recovery does not
necessitate a bubble insertion and thus does not
hurt the pipeline’s throughput. To accomplish
this task, our proposal utilizes retiming and slack
passing concepts discussed in various papers and
books [5, 6, and 7]. In the original Razor error
recovery design, a bubble is inserted because an
error in logic stage x is detected after a half
cycle has passed. By the time the error is
detected, there is not enough time remaining for
the correct data to be entered into stage x + 1
and still finish before the data is latched.
Consequently, the data in that stage x + 1 must
be thrown away with a bubble and the correct
data entered in the next clock cycle.
However, this problem can be solved if the
correct data is given extra time in stage x + 1 to
successfully compute. Slack passing concepts
suggests that stages can be given more time to
process by borrowing slack from other stages [6],
[7]. Instead of using latches though, clock
retiming can be used instead as mentioned in the
second Razor paper [6] to create slack by
dynamically delaying the clock signal to the flip
flop at the end to stage x + 1 to lock one half
clock cycles later than normal. Thus with the
extra time, the correct data can now progress
normally through the pipeline without a bubble
insertion. In order to make up this borrowed
time, a dummy stage can be added after this
stage or at the end of the pipeline in order to
provide the extra slack. An extra dummy
pipeline stage is suitable because it does not
affect throughput, only latency. Moreover, the
Razor design already uses a dummy stage at the
end of the pipeline to handle meta-stability
issues; consequently, no extra stage will be
added to the pipeline.
Although the current approach does not
require additional stages in the pipeline itself,
duplicate stages operating parallel to the pipeline
is necessary. Because the time spent in stage x
+ 1 is 1.5 times longer than normal, the data in
the previous stage needs be bypassed to a
duplicate pipeline until the data can be merged
back into the main pipeline. The amount of
duplication required depends on how far away
the dummy slack stage is from the errant stage.
Table 2 shows the general timing flow of the
proposed design. The table displays the data in
each stage at any given clock cycle. It is
assumed that the data has a propagation delay of
one clock cycle in each stage, excluding the
dummy stage. At time 0, the pipeline is filled.
At time 0.5 in stage EX, data 2 is incorrectly
computed and passed on to the MEM stage at
time 1.0. The shadow latches detects this error
and replaces the data at time 1.5. The flip flop
of stage MEM is then delayed by half a clock
cycle so that it locks at time 2.0. Because data 2
is occupying stage MEM in time 2.0, data 3 is
shifted into a duplicate stage. At time 2.5, data 2
is finished and is passed onto the dummy stage.
At time 3.0, data 3 from the MEM2 stage is
inserted into the dummy stage—thus completing
the recovery process.
The goal of our project is to implement the
Razor pipeline architecture with and without our
error handling design in a pipelined Wallace
multiplier circuit. We will then explore various
aspects of this design including performance and
correctness issues, power and area consumption,
and robustness of our design to multiple errors.
We will verify the correctness and viability of
our concept, design, and implementation. We
will analyze the power and space usage of the
error correction design to determine if the boost
in performance can be justified. Finally, we will
explore our design to determine if it can handle
multiple errors that may occur simultaneously in
different logic stages or sequentially in the same
logic stages.
References
[1] S.Dhar, D. Maksimovic, and B. Kranzen,
“Closed-Loop Adaptive Voltage Scaling
Controller For Standard-Cell ASICs,” 2002 Int’l
Symposium on Low Power Electronics and
Design (ISLPED-2002), August 2002.
[2] D. Ernst, N.Kim, S.Das, S.Pant, R.Rao,
T.Pham, C.Ziesler, D.Blaauw, T.Austin, K.
Flantner, and T. Mudge, “Razor: A Low-Power
Pipeline Based on Circuit-Level Timing
Speculation”, MICRO-36, December 2003
[3] J.Kim and M. Horowitz, “An Efficient
Digital Sliding Controller for Adaptive Power
Supply Regulation”, IEEE Symposium on VLSI
Circuits, June 2001, pp. 133-136
[4] T. Pering, T. Burd, and R. Brodersen. “The
Simulation and Evaluation of Dynamic Voltage
Scaling Algorithms.” Proceedings of Int’l
Symposium on Low Power Electronics and
Design 1998, pp. 76-81, June 1998.
[5] S. Lee, S.Das, T. Pham, T. Austin, D Blaauw,
and T. Mudge. “Reducing Pipeline Energy
Demands with Local DVS and Dynamic
Retiming” Proceedings of Int’l Symposium on
Low Power Electronics and Design, 2004 pp
319-214
[6] S. Krishnamohan, N. Mahapatra.
“Increasing the Energy Efficiency of Pipelined
Circuits via Slack Redistribution” GLSVLSI ’05,
April 2005
[7] J. Rabaey, A. Chandrakasan, B. Nikolic.
Digital Integrated Circuits. 2003 Pearson
Education International. New Jersey.
Stage
Time IF ID EX MEM Dummy MEM2
0
4
3
2
1
0
-0.5
4
3
2*
1
0
-1.0
5
4
3
2*
1
-1.5
5
4
3
2
1
-2.0
6
5
4
2
-3
2.5
6
5
4
-2
3
3.0
7
6
5
4
3
-3.5
7
6
5
4
3
-4.0
8
7
6
5
4
-4.5
8
7
6
5
4
-Table 2: Timing Diagram of the proposed Razor
error recovery scheme. The number in each entry
identifies the data in each stage at any given time
(incremented in half clock cycles). 2* denotes an
incorrectly computed piece of data. The underlines
indicate when the flip flop locks the data at the end of
each stage. Dashes [--] indicates unimportant data.