Download 0 - Pages

Document related concepts

Large numbers wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Infinity wikipedia , lookup

Positional notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Elementary mathematics wikipedia , lookup

Arithmetic wikipedia , lookup

Addition wikipedia , lookup

Transcript
Integer Operations
1
Outline
• Arithmetic Operations
–
–
–
–
overflow
Unsigned addition, multiplication
Signed addition, negation, multiplication
Using Shift to perform power-of-2 multiply/divide
• Suggested reading
– Chap 2.3
Negation:取反
2
Unsigned Addition
u
• • •
v
• • •
u+v
• • •
UAddw(u , v)
• • •
Operands: w bits
+
True Sum: w+1 bits
Discard Carry: w bits
3
Unsigned Addition
• Standard Addition Function
– Ignores carry output
• Implements Modular Arithmetic
– s = UAddw(u , v) = (u + v) mod 2w
 u  v
UAdd w (u,v)  
w
u

v

2

u  v  2w
u  v  2w
P67 (2.9)
4
Visualizing Unsigned Addition
P68 Figure 2.16
Overflow
• Wraps Around
– If true sum ≥ 2w
– At most once
True Sum
UAdd4(u , v)
16
14
2w+1
12
Overflow
10
8
14
6
2w
12
10
4
8
2
0
0
4
0
Modular Sum
Module: 取模
v
6
u
2
4
2
6
8
10
12
0
14
5
Unsigned Addition Forms an Abelian Group
P68
• Closed under addition
– 0  UAddw(u , v)  2w –1
• Commutative (交换律)
– UAddw(u , v) = UAddw(v , u)
• Associative (结合律)
– UAddw (t, UAddw (u,v)) = UAddw (UAddw (t, u ), v)
6
Unsigned Addition Forms an Abelian Group
• 0 is additive identity
– UAddw (u , 0) = u
• Every element has additive inverse
– Let
UCompw (u ) = 2w – u
P68 (2.10)
– UAddw(u , UCompw (u )) = 0
7
Signed Addition
• Functionality
– True sum requires w+1 bits
– Drop off MSB
– Treat remaining bits as 2’s comp. integer
u  v  2 w ,

Tadd (u, v )  
u  v,
u  v  2 w ,

P70 (2.12)
PosOver:Positive Overflow
NegOver:Negative Overflow
TMaxw  u  v ( PosOver )
TMinw  u  v  TMaxw
u  v  TMinw ( NegOver )
8
Signed Addition
PosOver
TAdd(u , v)
P70 Figure 2.17
True Sum
0 111…1
2w–1
PosOver
TAdd Result
>0
v
<0
<0
NegOver
u
0 100…0
2w –1
011…1
0 000…0
0
000…0
>0
1 100…0
–2w –1
1 000…0
–2w
100…0
NegOver
9
Visualizing 2’s Comp. Addition
• Values
– 4-bit two’s comp.
– Range from -8 to +7
• Wraps Around
– If sum  2w-1
• Becomes negative
– If sum < –2w–1
• Becomes positive
10
Visualizing 2’s Comp. Addition P72 Figure 2.19
NegOver
TAdd4(u , v)
8
6
4
2
0
6
-2
4
2
-4
0
-6
-2
-8
-4
-8
-6
-4
u
v
-6
-2
0
2
4
-8
6
PosOver
11
Detecting Tadd Overflow
P71
• Task
– Given s = TAddw(u , v)
– Determine if s = Addw(u , v)
• Claim
– Overflow iff either:
•
•
–
u, v < 0, s  0
u, v  0, s < 0
2w–1
PosOver
2w –1
0
(NegOver)
(PosOver)
ovf = (u<0 == v<0) && (u<0 != s<0);
NegOver
12
Mathematical Properties of TAdd
• Two’s Complement Under TAdd Forms a
Group
– Closed, Commutative, Associative, 0 is
additive identity
– Every element has additive inverse
• Let
TCompw (u) 
u

TMinw
u  TMinw
u  TMinw
P73 (2.13)
• TAddw(u , TCompw (u )) = 0
13
Mathematical Properties of TAdd
• Isomorphic Algebra to UAdd
– TAddw (u , v) = U2T (UAddw(T2U(u ), T2U(v)))
• Since both have identical bit patterns
– T2U(TAddw (u , v)) = UAddw(T2U(u ), T2U(v))
14
Negating with Complement & Increment
P73
• In C
– ~x + 1 == -x
• Complement
– Observation: ~x + x == 1111…111 == -1
x
1001110 1
+ ~x
0110001 0
-1
1111111 1
15
Signed Addition
• Increment
– ~x + x + (-x + 1) == -1 + (-x + 1)
– ~x + 1 == -x
16
Multiplication
P75
• Computing Exact Product of w-bit numbers x, y
– Either signed or unsigned
• Ranges
– Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
• Up to 2w bits
– Two’s complement min: x *y ≥–2w–1*(2w–1–1) = –22w–2 + 2w–1
• Up to 2w–1 bits
– Two’s complement max: x * y ≤ (–2w–1)
2
= 22w–2
• Up to 2w bits, but only for TMinw2
17
Multiplication
• Maintaining Exact Results
– Would need to keep expanding word size with
each product computed
– Done in software by “arbitrary precision”
arithmetic packages
18
Power-of-2 Multiply with Shift
Operands: w bits
True Product: w+k bits
Discard k bits: w bits
*
u · 2k
u
k
• • •
2k
0 ••• 0 1 0 ••• 0 0
• • •
UMultw(u , 2k)
•••
0 ••• 0 0
0 ••• 0 0
TMultw(u , 2k)
19
Power-of-2 Multiply with Shift
• Operation
– u << k gives u * 2k
– Both signed and unsigned
• Examples
– u << 3 ==
u*8
– u << 5 - u << 3 ==
u * 24
– Most machines shift and add much faster than
multiply
• Compiler will generate this code automatically
20
Unsigned Power-of-2 Divide with Shift
• Quotient of Unsigned by Power of 2
– u >> k gives  u / 2k 
– Uses logical shift
k
u
Operands:
/
2k
•••
•••
Binary Point
0 ••• 0 1 0 ••• 0 0
Division:
u / 2k
0 ••• 0 0
•••
Quotient:
u / 2k
0 ••• 0 0
•••
.
•••
21
2’s Comp Power-of-2 Divide with Shift
P77
• Quotient of Signed by Power of 2
– u >> k gives  u / 2k 
– Uses arithmetic shift
– Rounds wrong direction when u < 0
k
u
Operands:
Division:
Result:
/
2k
u / 2k
RoundDown(u / 2k)
•••
•••
Binary Point
0 ••• 0 1 0 ••• 0 0
0 •••
•••
0 •••
•••
.
•••
22
Correct Power-of-2 Divide
• Quotient of Negative Number by Power of 2
– Want  u / 2k  (Round Toward 0)
– Compute as  (u+2k-1)/ 2k 
• In C: (u + (1<<k)-1) >> k
• Biases dividend toward 0
23
Correct Power-of-2 Divide
Case 1: No rounding
k
u
Dividend:
+2k +–1
1
/
2k
 u / 2k 
0 ••• 0 0
0 ••• 0 0 1 ••• 1 1
1
Divisor:
•••
•••
1 ••• 1 1
Binary Point
0 ••• 0 1 0 ••• 0 0
0 ••• 1 1 1
1
•••
. 1 ••• 1 1
Biasing has no effect
24
Correct Power-of-2 Divide
Case 2: Rounding
k
u
Dividend:
+2k +–1
1
•••
•••
0 ••• 0 0 1 ••• 1 1
1
•••
•••
Incremented by 1
Divisor:
/
2k
 u / 2k 
Binary Point
0 ••• 0 1 0 ••• 0 0
0 ••• 1 1 1
1
•••
.
•••
Incremented by 1
Biasing adds 1 to final result
25
Floating Point
26
Topics
•
•
•
•
•
•
Fractional Binary Numbers
IEEE 754 Standard
Rounding Mode
FP Operations
Floating Point in C
Suggested Reading: Chap 2.4
27
Encoding Rational Numbers
•
•
•
•
P80
Form V = x  2
Very useful when V >> 0 or V <<1
An Approximation to real arithmetic
From programmer’s perspective
y
– Uninteresting
– Arcane and incomprehensive
* Arcane:神秘的
* Incomprehensive: 不可理解的
28
Encoding Rational Numbers
• Until 1980s
– Many idiosyncratic formats, fast speed, easy
implementation, less accuracy
• IEEE 754
– Designed by W. Kahan for Intel processors
– Based on a small and consistent set of principles, elegant,
understandable, hard to make go fast
Idiosyncratic: 特殊的
Elegant:雅致的
29
Fractional Binary Numbers
2m
2m–1
4
2
1
bm bm–1 • • •
b2 b1 b0 . b–1 b–2 b–3
1/2
1/4
1/8
•••
b–n
•••
•••
2–n
30
Fractional Binary Numbers
• Bits to right of “binary point” represent
fractional powers of 2
m
• Represents rational number:  bi 2i P81 (2.17)
i n
31
Fractional Numbers to Binary Bits
unsigned result_bits=0, current_bit=0x80000000
for (i=0;i<32;i++) {
x *= 2
if ( x>= 1 ) {
result_bits |= current_bit ;
if ( x == 1)
break ;
x -= 1 ;
}
current_bit >> 1 ;
}
32
Fraction Binary Number Examples
Value
0.2
• Observations:
Binary Fraction
0.00110011[0011]
– The form 0.11111…11 represent numbers just
below 1.0 which is noted as 1.0-
– Binary Fractions can only exactly represent x/2k
– Others have repeated bit patterns
33
IEEE Floating-Point Representation
P83
• Numeric form
– V=(-1)sM  2E
• Sign bit s determines whether number is
negative or positive
• Significand M normally a fractional value in
range [1.0,2.0).
• Exponent E weights value by power of two
34
IEEE Floating-Point Representation
• Encoding
– s
exp
frac
– s is sign bit
– exp field encodes E
– frac field encodes M
• Sizes
– Single precision (32 bits): 8 exp bits, 23 frac bits
– Double precision (64 bits): 11 exp bits, 52 frac bits
35
Normalize Values
P84
• Condition
– exp  000…0 and exp  111…1
• Exponent coded as biased value
– E = Exp – Bias
• Exp : unsigned value denoted by exp
• Bias : Bias value
– Single precision: 127 (Exp: 1…254, E : -126…127)
– Double precision: 1023 (Exp: 1…2046,
E : -1022 …1023)
– In general: Bias = 2m-1 - 1, where m is the number of
exponent bits
36
Normalize Values
• Significand coded with implied leading 1
– m = 1.xxx…x2
• xxx…x: bits of frac
• Minimum when 000…0 (M = 1.0)
• Maximum when 111…1 (M = 2.0 – )
• Get extra leading bit for “free”
37
Normalized Encoding Examples
• Value: 12345 (Hex: 0x3039)
• Binary bits: 11000000111001
• Fraction representation:
1.1000000111001*213
• M: 10000001110010000000000
• E: 10001100 (140)
• Binary Encoding
– 0100 0110 0100 0000 1110 0100 0000 0000
– 4640E400
38
Denormalized Values
P84
• Condition
– exp = 000…0
• Values
– Exponent Value: E = 1 – Bias
– Significant Value m = 0.xxx…x2
• xxx…x: bits of frac
39
Denormalized Values
• Cases
– exp = 000…0, frac = 000…0
• Represents value 0
• Note that have distinct values +0 and –0
– exp = 000…0, frac  000…0
• Numbers very close to 0.0
• Lose precision as get smaller
• “Gradual underflow”
40
Special Values
P85
• Condition
– exp = 111…1
41
Special Values
• exp = 111…1, frac = 000…0
– Represents value (infinity)
– Operation that overflows
– Both positive and negative
– E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 = 
42
Special Values
• exp = 111…1, frac  000…0
– Not-a-Number (NaN)
– Represents case when no numeric value can be
determined
– E.g., sqrt(–1), 
43
Summary of Real Number Encodings
Figure 2.22

NaN
-Normalized
+Denorm
-Denorm
0
P85
+Normalized
+
NaN
+0
44
8-bit Floating-Point Representations
7
s
6
3
exp
0
2
frac
45
8-bit Floating-Point Representations
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Exp
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
exp
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
E
-6
-6
-5
-4
-3
-2
-1
0
+1
+2
+3
+4
+5
+6
+7
n/a
2E
1/64
1/64
1/32
1/16
1/8
1/4
1/2
1
2
4
8
16
32
64
128
(denorms)
(inf, NaN)
46
Dynamic Range (Denormalized numbers)
Figure 2.23
• s exp
•
•
•
•
•
•
0
0
0
…
0
0
frac
E
Value
0000 000
0000 001
0000 010
-6
-6
-6
0
1/8*1/64 = 1/512
2/8*1/64 = 2/512
0000 110
0000 111
-6
-6
6/8*1/64 = 6/512
7/8*1/64 = 7/512
P86
47
Dynamic Range
• s exp
frac E
Value
•
•
•
•
•
•
•
000 -6
001 -6
8/8*1/64 = 8/512
9/8*1/64 = 9/512
0
0
…
0
0
0
0
0001
0001
0110
0110
0111
0111
110
111
000
001
-1
-1
0
0
14/8*1/2 = 14/16
15/8*1/2 = 15/16
8/8*1 = 1
9/8*1 = 9/8
48
Dynamic Range (Denormalized numbers)
• s exp
frac
E
•
•
•
•
•
010
0
0 0111
…
0 1110
0 1110
0 1111
110
7
111
7
000 n/a
Value
10/8*1 = 10/8
14/8*128 = 224
15/8*128 = 240
inf
49
Distribution of Representable Values
• 6-bit IEEE-like format
– K = 3 exponent bits
– n = 2 significand bits
– Bias is 3
• Notice how the distribution gets denser
toward zero.
50
Distribution of Representable Values
-15
-10
-5
Denormalized
-1
-0.5
Denormalized
0
Normalized
0
Normalized
5
10
15
Infinity
0.5
1
Infinity
51
Interesting Numbers
P88 Figure 2.24
52
Special Properties of Encoding
• FP Zero Same as Integer Zero
– All bits = 0
• Can (Almost) Use Unsigned Integer
Comparison
– Must first compare sign bits
– Must consider -0 = 0
– NaNs problematic
• Will be greater than any other values
– Otherwise OK
• Denorm vs. normalized
• Normalized vs. infinity
53
Round Mode
P89
• Round down:
– rounded result is close to but no greater than
true result.
• Round up:
– rounded result is close to but no less than true
result.
54
Round Mode
P90 Figure 2.25
Mode
1.40 1.60 1.50 2.50 -1.50
Round-to-Even
1
2
2
2
-2
Round-toward-zero
1
1
1
2
-1
Round-down
1
1
1
2
-2
Round-up
2
2
2
3
-1
55
Round-to-Even
• Default Rounding Mode
– Hard to get any other kind without dropping into
assembly
– All others are statistically biased
• Sum of set of positive numbers will consistently be
over- or under- estimated
56
Round-to-Even
• Applying to Other Decimal Places
– When exactly halfway between two possible values
• Round so that least significant digit is even
– E.g., round to nearest hundredth
1.2349999
1.23
(Less than half way)
1.2350001
1.24
(Greater than half way)
1.2350000
1.24
(Half way—round up)
1.2450000
1.24
(Half way—round down)
57
Rounding Binary Number
P91
• “Even” when least significant bit is 0
• Half way when bits to right of rounding
position = 100…2
Value
Binary
Rounded
Action
Round
Decimal
2 3/32
10.00011
10.00
Down
2
2 3/16
10.0011
10.01
Up
2 1/4
2 7/8
10.111
11.00
Up
3
2 5/8
10.101
10.10
Down
2 1/2
58
Floating-Point Operations
• Conceptual View
– First compute exact result
– Make it fit into desired precision
• Possibly overflow if exponent too large
• Possibly round to fit into frac
59
Mathematical Properties of FP Add
• Compare to those of Abelian Group
– Closed under addition?
YES
• But may generate infinity or NaN
– Commutative?
– Associative?
YES
NO
• Overflow and inexactness of rounding
– 0 is additive identity?
– Every element has additive inverse
YES
ALMOST
• Except for infinities & NaNs
60
Mathematical Properties of FP Add
• Monotonicity
– a ≥ b  a+c ≥ b+c?
ALMOST
• Except for infinities & NaNs
61
Algebraic Properties of FP Mult
• Compare to Commutative Ring
– Closed under multiplication?
YES
• But may generate infinity or NaN
– Multiplication Commutative?
– Multiplication is Associative? P92
YES
NO
• Possibility of overflow, inexactness of rounding
– 1 is multiplicative identity?
– Multiplication distributes over addition?
YES
NO
• Possibility of overflow, inexactness of rounding
62
Algebraic Properties of FP Mult
P90
• Monotonicity
– a ≥ b & c ≥ 0  a *c ≥ b *c?
ALMOST
• Except for infinities & NaNs
63
FP Multiplication
• Operands
(–1)s1 M1 2E1
(–1)s2 M2 2E2
• Exact Result
(–1)s M 2E
– Sign s :
– Significand M :
– Exponent E :
s1 ^ s2
M1 * M2
E1 + E2
64
FP Multiplication
• Fixing
– If M ≥ 2, shift M right, increment E
– If E out of range, overflow
– Round M to fit frac precision
65
FP Addition
• Operands
(–1)s1 M1 2E1
(–1)s2 M2 2E2
– Assume E1 > E2
• Exact Result
(–1)s M 2E
– Sign s, significand M:
• Result of signed align & add
– Exponent E : E1
66
FP Addition
• Fixing
– If M ≥ 2, shift M right, increment E
– if M < 1, shift M left k positions, decrement E by k
– Overflow if E out of range
– Round M to fit frac precision
67
FP Addition
E1–E2
(–1)s1 m1
(–1)s2 m2
+
(–1)s m
68
Answers to Floating Point Puzzles
• int x = …;
• float f = …;
• double d = …;
• Assume neither d nor f is NAN or infinity
69
Floating Point in C
•
•
•
•
•
•
•
•
•
•
x == (int)(float) x
x == (int)(double) x
f == (float)(double) f
d == (float) d
f == -(-f);
2/3 == 2/3.0
d < 0.0 ((d*2) < 0.0)
d > f  -f < -d
d *d >= 0.0
(d+f)-d == f
No: 24 bit significand
Yes: 53 bit significand
Yes: increases precision
No: loses precision
Yes: Just change sign bit
No: 2/3 == 0
Yes!
No
Yes!
No: Not associative
70
Answers to Floating Point Puzzles
• C Guarantees Two Levels
– float
– double
single precision
double precision
71
Answers to Floating Point Puzzles
• Conversions
– Casting between int, float, and double changes numeric values
– Double or float to int
• Truncates fractional part
• Like rounding toward zero
• Not defined when out of range
– Generally saturates to TMin or TMax
– int to double
• Exact conversion, as long as int has ≤ 53 bit word size
– int to float
• Will round according to rounding mode
72