Download 2-DSP Fundamentals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Positional notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Rounding wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Elementary mathematics wikipedia , lookup

Addition wikipedia , lookup

Arithmetic wikipedia , lookup

Transcript
Real time DSP
Professors:
 Eng. Julian Bruno
 Eng. Mariano Llamedo Soria
DSP fundamentals
Number representation and
word-length effects.
Recommended bibliography

RG Lyons, Understanding Digital Signal Processing. Prentice Hall
1997.


SW Smith, The Scientist and Engineer’s guide to DSP. California
Tech. Pub. 1997.


Ch3: Finite Wordlength Effects.
SM Kuo, BH Lee. Real-Time Digital Signal Processing. John Wiley
and Sons.


Ch4: DSP software.
VK Madisetti, DB Williams. Digital Signal Processing Handbook.
CRC Press.


Ch9: Digital Data Formats and their effects.
Ch 3.4 to 3.6: DSP Fundamentals and Implementations Considerations.
NOTE: Many images used in this presentation were extracted from the recommended bibliography
.
Fixed point representation – Two’s
complement system
N=3
Range -2N-1 to (2N-1-1)
for N data bits
011 3
010 2
DRdB = 20log(lv/sm)
001 1
DRdB: Dynamic Range in dB
lv: Largest possible value
sv: Smaller possible value
000 0
111
-1
Negation mechanism
Sign bit 011= 3
101=-3
Step
Original number 011=3
1 complement
100
Add 1
101
110 -2
101 -3
DRdB= 6.02dB . (N-1)
100 -4

One bit for sign, N-1 for number representation.
Very popular system, widely used.

Same logic for sum and subtraction.

Result
101=-3
Q formats
Different interpretations
bin
Q2
Q1
dec
Q15 means 15 bits for fractional part (aka 1.15)
Q31 means 31 bits for fractional part (aka 1.31)
Q12 means 12 bits for fraction 3 bits for integer
011 0.75
1.5
3
Formats definition
010 0.5
1
2
sign
int
001 0.25
0.5
1
QX
1
M
000 0
0
0
Q15
1
0
.
15
16
-1
Q31
1
0
.
31
32
-2
Q12
1
3
.
12
16
111
-0.25 -0.5
110 -0.5
-1
101 -0.75 -1.5
-3
100 -1
-4




-2
.
.
frac
Total
N
1+M
+N
DRQ15 = 6.02*15 = 90.3 dB
DRQ31 = 6.02*31 = 186.62 dB
Converting between Q formats
bin
Q3
Q2
Q1
0111
0.875
1.75
3.5
0.111
01.11
011.1
/2N: moves dot N places left
x2N: moves dot N places right
Decimal equivalency for
1.X formats
N 1
x10  b N   bm .2  m
Fractional representation is equivalent to integer
m 0
representation.
Fractional dot could be placed arbitrarily anywhere. Range -1 to 1-(2-N)
Most widely used formats are Q15 and Q31.
for N fractional bits
Dynamic range is exactly the same than their integer
counterparts “short” and “int” C language types.
Sum in Two’s complement
Integer Fromat
1
0
1
1
510
1
1
0
0
0
0
1
610
0
1
1
-510
1
0
0
0
-810
0
1
1
1310
+
+
0
1
1
0
Overflow !
Q3 Fromat
1
0
1
1
0.62510
+
1
0
1
1
0.62510
+
0
1
1
0
0.7510
1
0
0
0
-110
0
0
0
1
0.12510
1
0
Overflow !
0
1
1
1.62510
Sum in Two’s complement
Different Formats
3.0
1 0 1 1 .
-510
1.2
+
1 1 . 0 1
-0.7510
+
1.1
1 . 1
1.510
0.3
0 . 0 1 1
0.37510
3.1
1 1 0 0 . 1
-3.510
1.3
1 1 . 1 0 1
0.37510
For C=A+B, where
A is in P.Q format
B is in R.S format
The result C is in max(P,R).max(Q,S) format
Multiplication in Two’s complement
Fractional Format Q3
Integer Format
1
x
Xor
0
1
1.
510
1.
0
1
1
0.62510
0.
1
1
00
0.7510
x
Xor
0
1
1
0.0
1 1 1 0 1 1
610
Sign
extension
1 1 0 1 1
1 1 0 1 1
0
0
1 1 1 0 0 0 1 0
1 1 1 0 1 1
3010
1 1 0 0 0 1 0
1. 1 0 0 0 1 0 0
0.468810
For C=AxB, where
A and B are B bits wide, C is 2B bits wide.
Multiplication in Two’s complement
Different formats
1
x
Xor
0
1.
1
2.510
1.
0
1
1
0
1.
1
00
0.62510
x
Xor
0
1.
1
00 1.510
Sign
1 1 1 0 1 1
extension
1 1 1 0 1 1
1 1 0 1 1
1 1 0 1 1
0
1 1 1 0 0. 0 1 0
1.510
0
3.7510
1 1 1. 0 0 0 1 0
0.937510
Multiplication in Two’s complement.
Conclusion.
In conclusion, for C=A*B, where A and B
are words of N bits, and
A is in P.Q format
B is in R.S format
The result’s is in (P+R).(Q+S)
Multiplication in Two’s complement.
Examples.
A=1011
Int
 B=0110
 The result is
always the
same.
 In 2.6 format, as
numbers are <1,
the integer part
is never used.

Fract
A.FL B.FL
A
B
C=A*B
C format
C.binary
0
0
-5
6
-30
8.0
11100010.
0
1
-5
3
-15
7.1
1110001.0
0
2
-5
1.5
-7.5
6.2
111000.10
0
3
-5
0.75
-3.75
5.3
11100.010
1
0
-2.5
6
-15
7.1
1110001.0
1
1
-2.5
3
-7.5
6.2
111000.10
1
2
-2.5
1.5
-3.75
5.3
11100.010
1
3
-2.5
0.75
1.875
4.4
1110.0010
2
0
-1.25
6
-7.5
6.2
111000.10
2
1
-1.25
3
-3.75
5.3
11100.010
2
2
-1.25
1.5
-1.875
4.4
1110.0010
2
3
-1.25
0.75
-0.9375
3.5
111.00010
3
0
-0.625
6
-3.75
5.3
11100.010
3
1
-0.625
3
-1.875
4.4
1110.0010
3
2
-0.625
1.5
-0.9375
3.5
111.00010
3
3
-0.625
0.75
-0.46875
2.6
11.100010
Two’s complement arithmetic.
Conclusions.
Addition/Accumulation requires
representation in the same format (sign
extension, zero padding).
 Multiplication result is independent of the
format representation, as long as the result
place the fixed point in the right place.
 ADSP2100 family always assume
fractional multiplication, whereas modern
families can flag (IS) integer multiplication.

Dynamic range constraints
Overflow





Lost of precision
a
2a
…
7a
8a
bin
0001
0010
…
0111
OV
dec
0.125
0.25
…
0.875
OV
a
a2
…
a6
a7
bin
0110
0100
…
0001
0000
dec
0.75
0.563
…
0.178
0.133
For non integer Q formats, multiplying large sequence of
numbers cause loss of precision, but never overflow.
For non integer Q formats, summing large sequence of
numbers could cause overflow.
Dynamic range is closely related with the two previous
statements.
The greater dynamic range, the smaller probability that
overflow or loss of precision could happen.
Remember that most DSP algorithms multiply and sum
very often, so special care must be taken to prevent
overflow or loss of precision.
Avoiding overflow



Always use the maximum capability (guard bits)
of the accumulators during internal calculations.
Only round (or truncate) the final results to the
final data size and format if possible.
There is (almost) no lost of precision when
handling internal calculations with guard bits.
Avoiding overflow
System
0.9.β
0.8.β
Scale
Signal




Scaling down a signal is the most effective technique to
prevent overflow.
Scaling down always implies loss of precision.
Both scaling down and guard bits techniques must be
used in order to avoid overflow.
Always is more convenient to scale down system’s
coefficients instead of signals.
Avoiding overflow
Effect of β in SNR
For example adopting β=0.5 implies a
6.02 dB decrease of SNR. This is
equivalent that dividing by 2, rotating 1
time to the right, or losing 1 bit of
resolution.




Never
overflows
More relaxed
scaling
Scaling down always reduces SNR.
It is possible to use an absolute safe or a more relaxed criteria to
choose β value.
Many times it is preferable to use different Q fractional formats within
an algorithm.
As overflow is very probable to happen in fixed point processors,
special effort should be taken when coding algorithms and
debugging.
Minimizing overflow effects
Without saturation arithmetic
With saturation arithmetic
Always use saturating arithmetic.
 In case overflow occurs, decrease the
probability that an oscillation occurs.

Example of an overflow oscillation
For a system defined by:
and an input:
being the overflow rule:
having a 4 bit word length, and no
saturation arithmetic
We have the following output
Quantization word-length effects



The codec and system’s
coefficients are the main
generators of quantization
noise.
Codec’s noise can be thought
as a uniformly distributed PDF
between –LSB/2 and LSB/2.
The SNR of an ADC is
proportional to the word-length
and the loading factor.
Quantization word-length effects
Complex conjugated two poles band pass
And its difference equation



When defining a system in term of its coefficients, the finite
precision affect the behavior of the system itself.
Though there is a grid of possible locations where system’s
poles can be placed.
This grid depends first of the word-length and second of the
structure adopted to implement of the system.
Quantization word-length effects


There are structures are less sensitive to
coefficient quantization.
There is a trade-off between efficiency and
sensibility to coefficient quantization.
Floating point representation







This form of representation overcomes
limitations of precision and dynamic range of
fixed point.
This format segment data in sign, exponent and
mantissa.
Mantissa is represented as a fixed point number.
Exponent is represented in binary offset format.
The greater the be the larger the dynamic range.
The greater the bm the larger the precision.
There is a trade off between bm and be, and the
best balance occur at be≈b/4 and bm≈3b/4.
Floating point representation (I)




IEEE P754 is the most widely used floating point format.
As the point is floating, a process called normalization is
performed in order to use the full precision of bm bits, while
the exponent is adjusted properly.
Floating point arithmetic usually requires lot of logical
comparisons and branching, so software emulated floating
achieves low performance
Floating point DSPs implements in hardware all arithmetic
handling, so these DSPs outperforms their fixed point
counterparts in ease of use and performance (of course
being more expensive too).
Floating point representation (II)
Single Precision (32 bits)
Double Precision (64 bits)
Sign
Biased
exponent
Fraction
Value
Sign
Biased
exponent
Fraction
Value
Positive zero
0
0
0
0
0
0
0
0
Negative zero
1
0
0
-0
1
0
0
-0
Plus infinity
0
255(all 1s)
0
∞
0
255(all 1s)
0
∞
Minus infinity
1
255(all 1s)
0
-∞
1
255(all 1s)
0
-∞
0 or 1
255(all 1s)
≠0
NaN
0o1
255(all 1s)
≠0
NaN
Positive
normalized
0
0 < e < 255
f
2e-127 (1,f)
0
0 < e < 255
f
2e-1023 (1,f)
Negative
normalized
1
0 < e < 255
f
-2e-127 (1,f)
1
0 < e < 255
f
-2e-1023 (1,f)
Positive
denormalized
0
0
f≠0
2-126 (0,f)
0
0
f≠0
2-1022 (0,f)
Negative
denormalized
1
0
f≠0
-2-126 (0,f)
1
0
f≠0
-2-1022 (0,f)
NaN
Normalized & Denormalized numbers
(32-bit format )
Unused
0
2-126
Normalized numbers ( 1,f 2e-127)
2-125
Gap = 1.4e-45
2-124
2-123
Gap = 2.8e-45
Min. Positive Normalized
0 00000001 00000000000000000000000
1  21127  2 126
1.175494350822287507968e - 38
Denormalized
numbers
( 0,f 2-126)
0
2-126
2-125
2-124
2-123
Gap = 1.4e-45
Min. Positive Denormalized
0 00000000 00000000000000000000001
(1  2 23 )  2 126  2 149
1.4012984643248170709e - 45
Multiply
MULTIPLY
No
¿X = 0?
Yes
Z
No
¿Y = 0?
Yes
0
Add
exponents
Subtract
bias
Exponent
overflow?
Yes
Report
overflow
No
RETURN
Exponent
overflow?
Yes
Report
underflow
No
Multiply
significands
Normalize
Round
RETURN
Division
DIVIDE
¿X = 0?
Yes
Z
No
¿Y = 0?
Yes
0
Z
No
Subtract
exponents
Add bias
Yes
Report
overflow
Yes
Exponent
underflow?
Report
underflow
Exponent
overflow?

No
RETURN
No
Divide
significands
Normalize
Round
RETURN