Download Proceedings Template - WORD - UVA ECE Wiki

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Solar micro-inverter wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Buck converter wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Control theory wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Metadyne wikipedia , lookup

Control system wikipedia , lookup

Opto-isolator wikipedia , lookup

Transcript
Team MUX DSP
Adam Burton Mark Colombo
David Moore Daniel Toler
ECE 3663 – Spring 2010
University of Virginia
<adb7d,mjc8m,dmm4ce,dct4w>
@virginia.edu
ABSTRACT
In this paper, we describe the function of a DSP designed
according to the specifications provided by PICo.
1. INTRODUCTION
The Team MUX DSP is designed to exceed the goals outlined by
the PICo specifications. The ALU designed includes all of the
functions requested by PICo, and also implements a multiplier. A
metric was provided which weighed delay, area, and energy usage
in evaluating the DSP; our results according to this metric are
investigated, and our means of optimizing this metric are
discussed.
2. DESIGN DESCRIPTION
PICo specified four main blocks which would comprise the DSP:
two input registers, and ALU, and and output register. In addition,
the control input signals and a carry out signal were available in
order to operate the ALU. Our DSP is comprised of the four main
blocks specified by PICo; in addition, we have placed input
buffers between the input registers and the ALU to reduce the
fanout and thus improve the latency.
The ALU has an eight-to-one multiplexer that takes the three bit
control input and selects the output of one of the eight functions
which the ALU can perform. There are five functional blocks
connected to the mux, in order to fulfill PICo’s requirements.
These include a 16 bit OR gate, a 16bit AND gate, a
adder/subtractor, a shifter and a multiplier The adder/subtractor
fulfills two functions, and thus is connected to two inputs of the of
the multiplexer. The function of this particular block is described
below. The remaining two inputs are used for the NOP and Pass A
operations, which merely require moving data from the output of
one register to the input of another.
The 16 bit AND gate and the 16bit OR gate are comprised of an
array of 2 bit OR/AND gates. Because these gates were so simple
and don’t have any very long paths (unlike the adder), we thought
it best to keep them at their minimum size. By doing this we
hoped to minimize power and area while not affecting delay,
which would be determined by other, more complicated
components.
The adder/subtractor we used is a 16bit ripple-carry mirror adder.
When the block receives a subtraction signal, the carry-in is one
and the B inputs are inverted. Performing this operation is
equivalent to converting B to a negative number using two’s
complement, and thus subtraction is performed. For addition, the
carry-in is zero and both inputs are sent directly to the adder. We
chose to combine the adder and subtractor to improve
performance significantly on the area and power aspects of the
metric.
One matter of interest is the generation of the subtraction signal.
Because of the assignment of the control signals as indicated in
Table 1, it would have been possible to use the least significant
control bit as the subtraction signal. On the other hand, it is also
possible to generate the signal by using combinational logic to
generate a subtraction signal only when the 011 is presented as
the control. We chose the second option, because while it would
require slightly more area, we believed it might reduce power
used in some cases due to the ALU switching between addition
and subtraction when neither was being performed.
The shifter shifts the contents of register A by an amount
determined by the two least significant bits of register B, and is
comprised of 16 small shifter bitslices, which each consist of two
stages of pass gate logic. Each stage of pass gate logic is designed
to forward the signal to a different line depending on whether a
specific input bit is high or low, with one input bit activating an
additional shift of 1, and the other indicating an additional shift of
2. These additional shifts are in addition to the default shift of 1,
allowing the value in register A to be shifted up to 4 bits to the
left. Inverters acting as buffer in between each stage. While the
pass gate logic reduces output swing, this does not become an
issue because the inverters boost the signal to full swing. In
addition, the use of pass gates instead of transmission gates,
which would still require some sort of buffer, saves significant
area.
Table 1. Control Inputs and DSP Outputs
Control
Output
000
NOP
001
Pass A
010
Add
011
Subtract
100
Shift A
101
OR
110
AND
111
Multiply
The multiplier can handle two 16 bit inputs. However, because
our ALU only has 16 bit outputs the values of the inputs must be
limited to ensure that the output will be less than 2^15. The
upside to this is that the DSP maintains compatibility with 16 bit
systems. The multiplier is an array type, because the extra
function does factor into the metric, and because 16 bit words are
relatively small so the added complexity of a faster multiplier is
unnecessary.
3. INNOVATION
3.1 Sizing
It was determined during the second design review that the
adder/subtractor unit would be the critical path. Therefore, aside
from the adder/subtractor, all blocks are designed to have the
same pull up and pull down network resistance as a characteristic
inverter, in order to save area and power.
In order to determine the sizing for optimal delay of the adder, we
hoped to use logical effort [1]. Computing the exact logical effort
result for the adder/subtractor proved challenging, but using such
a calculation we realized that the inputs to the ALU would have to
be buffered with a larger buffer to drive all 8 functions, since we
could not reduce most of the width of the functions. We used a
buffer that corresponded to a fanout of 4 from the register in order
to avoid making the buffer itself have too much capacitance. This
proved to be sufficient, and we chose not to have too large a
buffer in order to avoid excess area.
The logical effort result also seemed to indicate that a very
gradual tapering of transistor sizes in the adder/subtractor would
be beneficial to delay. Because the portions of the adders which
produced the sum outputs were not on the critical path, they
remained at the characteristic size in order to save power and area.
On the other hand, the portions of the adders which produced the
carry out are very important to delay, and thus were what needed
to be tapered. With this in mind, simulations were performed to
find the best possible tapering rate, in order to reduce the delay by
about 40% from the untapered design.
3.2 Trade-offs
One of the trade-offs we dealt with was the type of adder to use.
We were debating using a carry look-ahead or another version of
the ripple carry adder. We ultimately decided on the mirror adder
because it used a small number of transistors, and it was faster
than a standard ripple carry adder. The balance of speed and small
area was important. In addition, we decided to make use of the
inversion property of the adder to improve the worst case delay.
Using the inversion property allows us to reduce the number of
gates in the critical path, and thus improve the worst-case delay of
the adder. This is extremely important to the delay, because the
adder’s worst-case delay is the most important factor in
determining the maximum operating frequency of the circuit.
3.3 Multiplier
The binary multiplier takes in two 16 bit binary numbers and
multiplies A by B. The multiplier gives a 16 bit output, which
means that the most significant bits of larger products will not be
available to be output. Nevertheless, we chose to create a 16 bit
multiplier rather than an 8 bit multiplier in order to allow the user
of the DSP to perform options such as the multiplication of a 12
bit number by 3.
3.4 Other Factors
In addition to the innovations described above, PICo should note
our choice of supply voltage. Using a supply of 5V, we were able
Improve delay over what would be possible with lower supply
voltages. The cost of increased supply voltage is increased power;
however, PICo’s metric for evaluating the circuit has a quadratic
dependence on delay, while having only linear dependence on
power and area. Thus we decided it would be best to choose a
supply which would improve the most important aspect of the
metric.
4. RESULTS
In this section we will present the components of the metric and
our final value for the metric, and will also discuss the
performance of our arbitrary function, the multiplier, which is not
included in the computation of the metric.
4.1 Metric
The first and most heavily weighted component of the metric is
delay. As discussed in section 3.1, the sizing of the
adder/subtractor and the use of buffers between the registers and
the large capacitance of the eight functions proved to be crucial to
reducing delay. In order to analyze the worst case delay, we used
the control input for addition, and added A=0x0000 with
B=0x7FFF, producing 0x7FFF at the output. We then changed A
to 0x0001, thus producing an output of 0x8000, a change which
required the carry to propagate from the least significant one bit
adder to the most significant one bit adder. It proved possible to
have this change occur at the ouput with a period as short as 7 ns,
corresponding to an operating frequency of 143 MHz.
The next component of the metric is area, which was computed by
adding the widths of all the transistors in the device. After
reducing the area as much as possible while still maintaining
speed and functionality, we computed a final area of 4221 um.
PICo specified a routine to be used to measure average energy
consumption of the DSP. This routine consisted of cycling
through all the functions of the ALU while alternating the A and
B inputs. By calculating the instantaneous power drawn by the
circuit and integrating over the period of operation, we arrived at
an average energy usage of 2.34 nJ.
The metrics discussed above are outlined in the table below, and
the final value of the metric is given.
Table 2. Metric Components and Metric Final Value
Component
Value
Delay, D
7*10-9 s
Area, A
4.221*10-3 m
Energy, E
2.3426*10-9 J
Metric =
D^2*A*E
4.846*10-28 s2*m*J
4.2 Multiplier Performance
While the design of our multiplier is focused more on offering
superior functionality than superior performance, in this section,
we present some results from the multiplier circuit.
The multiplier netlist was created by a writing a BASIC program
that can create netlists for an N-bit multiplier of arbitrary N. A
worst-case delay of _______ for the multiplier occurs for the
following
transition:
A=1111111111111111,
B=0000000000000001 transitions to A=1111111111111111,
B=1111111111111111. This proved to be difficult to simulate
due to the complexity of the circuit and the speed of the hardware
being used.
5. CONCLUSION
This paper has examined in detail the design and performance of a
DSP designed to PICo’s specifications. We believe that the Team
MUX DSP is the best choice for PICo, due to its superior delay,
area, and energy usage perfomance. Our thoughtful design and
implementation has led to a low score on the metric and clever
product that exceeds the requirements. In addition, the 16 bit
binary multiplier is a useful extra function that most designs will
lack. Since multiplication is such a common function, any design
which lacks it will present PICo with a severe handicap. For these
reasons, the DSP designed by Team MUX should be chosen.
6. REFERENCES
[1] I. Sutherland, B. Sproull, and D. Harris, Logical Effort,
Morgan Kaufmann, 1999.