Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
What are the characteristics
of DSP algorithms?
M. Smith and S. Daeninck
Tackled today
What are the basic characteristics of a
DSP algorithm?
Information on the TigerSHARC
arithmetic, multiplier and shifter units
Practice examples of C++ to assembly
code conversion
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
2
IEEE Micro Magazine Article
How RISCy is DSP?
Smith, M.R.; IEEE Micro,
Volume: 12, Issue: 6, Dec. 1992,
Pages:10 - 23
Available on line via the library “Electronic
web links”
Copy placed on ENCM515 Web site.
Make sure you read it before midterm
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
3
Characteristics of an FIR algorithm
Involves one of the three basic types of DSP algorithms
FIR (Type 1), IIR (Type 2) and FFT (Type 3)
Representative of DSP equations found in filtering,
convolution, correlation (Lab) and modeling
m1
y(n) x(n i)* h(i); 0 n
i 0
Multiplication / addition intensive
Simple format within a (long) loop
Many memory fetches of fixed and changing data
Handle “infinite amount of input data” – need FIFO buffer
when handling ON-LINE data
All calculations “MUST” be completed in the time interval
between samples
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
4
FIR
Input Value must be stored in circular
buffer
Filter operation must be performed on
circular buffer
For operational efficiency – Note that
latest value is the “last in the array”
Xarray = {Xm-1, Xm-2, Xm-3, … X1, X0 }
Harray = {Hm-1, Hm-2, Hm-3, … H1, H0 }
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
5
FIR
COMMON MISTAKE = MUCH WASTED LAB. TIME
For operational efficiency – Note that
latest value is the “last in the array”
Xarray = {Xm-1, Xm-2, Xm-3, … X1, X0 }
Harray = {Hm-1, Hm-2, Hm-3, … H1, H0 }
Can work with latest value “first in the
array” when doing C++, but does not
work for assembly code optimization
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
6
FIR
X[n – 1] = NewInputValue
Sum = 0;
For (count = 0 to N – 1) -- N of size 100+
Into last place of Input Buffer
Xvalue = X[count]; Hvalue = H[count];
Product = Xvalue * Hvalue;
Sum = Sum + Product; Multiply and Accumulate -- MAC
NewOutputValue = Sum;
Update Buffer – The T-operation in the picture
For (count = 1 to N – 1) -- Discard oldest X[0];
X[count – 1] = X[count];
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
7
Comparing IIR and FIR filters
Infinite Impulse Response
filters – few operations
to produce output from
input for each IIR stage
3 – 7 stages
Finite Impulse Response
filters – many operations
to produce output from
input. Long FIFO buffer which
may require as many operations
As FIR calculation itself.
Easy to optimize
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
8
S0
IIR -- Biquad
S1
S2
For (Stages = 0 to 3) Do
S0 = Xin * H5 + S2 * H3 + S1 * H4
Yout = S0 * H0 + S1 * H1 + S2 * H2
S2 = S1
S1 = S0
This second solution gives DIFFERENT result. Order of
calculation is different. The actual output difference depends
on how frequently samples are taken relative to how rapidly
the signal changes
CALCULATION SPEED IS DIFFERENT
Yout = S0 * H0 + S1 * H1 + S2 * H2
S2 = S1
S1 = S0
S0 = Xin * H5 + S2 * H3 + S1 * H4
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
9
We need to know how the processor
architecture affects speed of calculation
Register File and Compute Block
Volatile registers
Data
Summation
Multiply and Accumulate (MAC)
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
10
Register File and COMPUTE Units
Key Points
DAB – Data Alignment Buffer (special for quad fetches NOT writes)
Each block can load/store 4x32bit registers in a cycle.
4 inputs to Compute block, but only 3 Outputs to Register Block.
Highly parallel operations UNDER THE RIGHT CONDITIONS
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
11
NOTE – DATA PATH ISSUES OF
THE X-REGISTER FILE
1 output path (128
bit) TO memory
2 input paths FROM
memory
4 output (64-bit)
paths TO ALU,
multiplier, shifter
3 input paths (64bit) FROM ALU,
multiplier, shifter
NUMBER OF PATHS
HAS IMPLICATIONS
ON WHAT THINGS
CAN HAPPEN IN
PARALLEL
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
12
Register File - Syntax
Key Points
Each Block has 32x32 bit Data registers
Each register can store 4x8 bit, 2x16 bit or 1x32 bit words.
Registers can be combined into dual or quad groups. These
groups can store 8, 16, 32, 40 or 64 bit words.
XR7 -> 1x32 bit word
XLR7:6 -> 1x64 bit word
XFR1:0 -> 1x40 bit float
XSR3:2 -> 4x16 bit words
Multiple of 2
XBR3:0 -> 16x8 bit words
Register Syntax
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
Multiple of 4
13
Register File – BIT STORAGE
Both 32 bit and 64 bit registers
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
14
Volatile Data Registers
Non-preserved during a function call
Volatile registers – no need to save
24 Volatile DATA registers in each block
2 ALU SUMMATION registers in each block
XR0 – XR23
YR0 – YR23
XPR0, XPR1, YPR0, YPR1
5 MAC ACCUMULATE registers in each
block
XMR0 – XMR3, YMR0 – YMR3
XMR4, YMR4 – Overflow registers
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
15
Arithmetic Logic Unit (ALU)
2x64 bit input paths
2x64 bit output paths
8, 16, 32, or 64 bit
addition/subtraction Fixed-point
32 or 64 bit logical
operations - fixed-point
32 or 40 bit floating-point
operations
Can do the same on Y ALU
AT THE SAME TIME
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
16
Sample ALU Instruction
Example of 16 bit addition
XYSR1:0 = R31:30 + R25:24
Performs “short” addition in
X and Y Compute Blocks
XR1.HH = XR31.HH + XR25.HH
XR1.HL = XR31.HL + XR25.HL
XR0.LH = XR30.LH + XR24.LH
XR0.LL = XR30.LL + XR24.LL
YR1.HH = YR31.HH + YR25.HH
YR1.HL = YR31.HL + YR25.HL
YR0.LH = YR30.LH + YR24.LH
YR0.LL = YR30.LL + YR24.LL
8 additions at the same time
.LH, .HH is my notation
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
17
Sample ALU Instructions
Fixed-Point
Floating-Point
long word, word, short
Single, double precision
DSP Introduction,
word, byte (char)
M. Smith, ECE, University of Calgary,
Canada
18
Pass is an interesting instruction
XR4 = R5
Assignment statement -- makes XR4 XR5
XR4 = PASS R5
Still makes XR4 XR5 BUT USES A DIFFERENT
PATH THROUGH THE PROCESSOR
Sets the ALU flags (so that they can be used
for conditional tests)
PASS instructions can be put in parallel with
different instructions than assignments
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
19
Example code – parallel operations
occurring
int
int
int
int
int
int
x_two = 64, y_two = 16;
x_three = 128, y_three = 8;
x_four = 128, y_four = 8;
x_five = 64, y_five = 16;
x_odd = 0, y_odd = 0;
x_even = 0, y_even = 0;
x_odd = x_five + x_three;
x_even = x_four + x_two;
y_odd = y_five + y_three;
y_even = y_four + y_two;
XR2 = 64;;
XR3 = 128;;
XR4 = 128;;
XR5 = 64;;
YR2 = 16;;
YR3 = 8;;
YR4 = 8;;
YR5 = 16;;
XYR1:0 = R5:4 + R3:2;;
//XR1 = x_odd, XR0 = x_even
//YR1 = y_odd, YR1 = y_even
WRONG SYNTAX
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
20
Multiplier
Operates on fixed, floating and complex
numbers.
Fixed-Point numbers
Floating-Point numbers
32x32 bit with 32 or 64 bit results
4 (16x16 bit) with 4x16 or 4x32 bit results
Data compaction inputs – 16, 32, 64 bits, outputs 16, 32 bit
results
32x32 bit with 32 bit result
40x40 bit with 40 bit result
COMPLEX Numbers
32x32 bit with results stored in MR register
FIXED-POINT ONLY
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
21
Multiplier
XR0 = R1*R2;;
XR1:0 = R3*R5;;
XMR1:0 = R3*R5;; //uses XMR4
overflow
XR2 = MR3:2, XMR3:2 = R3*R5;;
XR3:2 = MR1:0, XMR1:0 = R3*R5;;
XFR0 = R1*R2;;
// 32 bit mult – 24 bit mantissa
XFR1:0 = R3:2*R5:4;;
//40 bit MULTIPLY
//32 bit mantissa
// high precision float
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
22
Multiplier --- with 32 or 16 bit results
Note minor changes in syntax
XR5:4 = R1:0*R3:2;;(16 bit results)
XR7:4 = R3:2*R5:4;; (32 bit results)
XMR1:0 += R3:2*R5:4;;(16 bit results) XMR3:0 += R3:2*R5:4;; (32 bit results)
XR3:2 = MR3:2, XMR3:2 = R1:0*R5:4;; (16 bit results) one instruction
XR3:0 = MR3:0, XMR3:0 = R1:0*R5:4;; (32 bit results)
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
23
Practice Examples
Convert from “C” into
assembly code – use
volatile registers
BAD DESIGN OF FLOATING
PT CODE
WILL INTRODUCE MANY
ERRORS
RE-WRITE CODE TO FIX
long int value = 6;
long int number = 7;
long int temp = 8;
value = number * temp;
float value = 6;
float number = 7;
long int temp = 8;
value = number * temp;
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
24
Avoiding common design errors
Convert from “C” into
assembly code – use
volatile registers
float value = 6.0; (XFR12)
float number = 7.0;
(XFR13)
long int temp = 8;
(XR18)
value = number * temp;
// Treat as
value = number *
(float) temp;
XR12 = 6.0;; //valueF12
// Sets XFR12 6.0
XR13 = 7.0;;//numberF13
XR18 = 8;; //tempR18
//(float) tempR18
XFR18 = FLOAT R18;;
//valueF12 = numberF13 * tempF18
XFR12 = R13 * R18;;
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
25
Shifter Instructions
2x64 bit input paths and 2x64 bit output paths
32, or 64 bit shifting operations
32 or 64 bit manipulation operations
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
26
Examples --- shift only integers
There is a FSCALE for floats (not shifter)
long int value = 128;
long int high, low;
XR0 = 2;;
XR1 = -XR2;;
XR2 = 128;;
low = value >> 2;
high = value << 2;
//low = value >> 2;
XR23 = ASHIFT XR2 BY –2;;
Or
XR23 = ASHIFT XR2 BY XR1;;
POSITIVE VALUE – LEFT
SHIFT
NEGATIVE VALUE – RIGHT
SHIFT
//high = value << 2;
XR22 = ASHIFT XR2 BY 2;;
Or
XR22 = ASHIFT XR2 BY XR0;;
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
27
ALU instructions
Under the RIGHT conditions can do multiple
operations in a single instruction.
Instruction line has 4x32 bit instruction slots.
Can do 2 Compute and 2 memory operations.
This is actually 4 Compute operations counting both compute
blocks.
One instruction per unit of a compute block, ie. ALU.
Since there are only 3 result buses, only one unit
(ALU or Multiplier) can use 2 result buses.
Not all instructions can be used in parallel.
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
28
Dual Operation Examples
FRm = Rx + Ry, FRn = Rx – Ry;;
Note that uses 4(8) different registers and not 6(12)
FR4 = R2 + R1, FR5 = R2 - R1;;
The source registers used around the + and – must be
the same. Very useful in FFT code
Can be floating(single or extended precision) or fixed(32
or 64 bit) add/subtract.
Rm = MRa, MRa += Rx * Ry;;
MRa must be the same register(s) (MR1:0 or MR 3:2)
Can be used on fixed(32 or 64 bit results)
COMPLEX numbers (on 16 bit values)
Rm = MRa, MRa += Rx ** Ry;;
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
29
Practice Examples
Convert to assembly code
Convert from “C” into
assembly code – use
volatile registers
#define value_XR12 XR12
Assignment operation
value_XR12 = 6;;
Multiply operations
value_XR12 = R5 * R6;
long int value = 6;
long int number = 7;
long int temp = 8;
value = number * temp;
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
30
Avoiding common design errors
Convert to assembly code
float value = 6.0;
float number = 7.0;
long int temp = 8;
value = value + 1;
number = number + 2;
temp = value +
number;
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
31
Tackled today
What are the basic characteristics of a
DSP algorithm?
Information on the TigerSHARC
arithmetic, multiplier and shifter units
Practice examples of C++ to assembly
code conversion
DSP Introduction,
M. Smith, ECE, University of Calgary,
Canada
32