Download ppt

Floating Point in computers Comply with standards: IEEE 754 ISO/IEC 559 Timeline • • • • • • Introduction Binary review Integer Arithmetic Floating Point Floating Point Arithmetic Other issues quite short not so long 1/3 1/3 1/3 extra short Introduction • • • • Who does computer arithmetic? Intel’s spare money How is it done in hardware? How Integer relates to Floating point • Now, we go back to “computer structure” Binary numbers • What is 1 0 0 1 0 1 1 . 0 0 1 0 1 ? 26 25 24 23 22 21 20 2122232425 64 8 21 1 8 1 32  75 5 32 Signed Binary Integers • • • • Sign-magnitude 2’s complement 1’s complement biased Sign-Magnitude • High order bit = Sign • 0101 = 5 • 1101 = -5 • 2 zero’s 2’s complement • Number + Negative = 2n • 0101 = 5 • 1011 = -5 • Easy addition (drop carry) • Formula: -an-12n-1 + an-22n-2 + … +a121 + a0 1’s Complement • Negative - complement to 1 • 0101 = 5 • 1010 = -5 • 2 zero’s • Number + Negative = 2n-1 Biased • Binary = Number + Bias • Bias = 5: 1101 = 5 0000 = -5 5+5=10 (-5)+5 = 0 • Relative order remains Integer Arithmetic Adding (usigned) Integers • Elementry school : 1 + 1 1 11001101 10000110 1 0 10 100 1 1 • Result has n+1 bits! Adding Integers - hardware Half Adder a Full Adder b a Cout b Cout Cin s s s  ab  ab cout  ab s  abcin  abcin  abcin cout  ab  ac  bc 2 logical levels Ripple carry Adder an-1 bn-1 an-2 bn-2 a1 b1 a0 b0 Cin Cout sn-1 sn-2 • Slow - 2n logical levels • Small constant (CMOS) • Other ways exist s1 s0 Adding Signed Integers • In 2’s complement: b + (2n-a) b + (-a) = (-b) + (-a) = (2n-b)+(2n-a) = 2n + (b-a) = (2n - (b+a)) + 2n • hence - add as integers, discard carry out • Example: 0011 + 1100 = ? Substracting Integers • Add the negation • Negating 2’s complement: 11010100101011000110000 = ? 001010110101001110 10000 Integer (unsigned) Multiplication • Elementry school : 1101 * 1001 1101 0000 0000 1101 01110101 • Result is 2n bits ! Hardware Multiplier Shift Carry P A n n B n • P=0 • loop: (i) if A0=1, add B to P (ii) right-shift P & A Integer (unsigned) Division • Elementry school : 0 100 11 1101 00 Result: 0100, Rem 1 011 11 Dec: 13/3=4, Rem 1 000 00 001 00 01 Hardware Divider Shift P n+1 0 • P=0 • loop: A n B n+1 (i) left-shift P & A (ii) Sub. B from P: positive: a0=1 negative: a0=0, restore P (add B) Example • 13 / 3 = 4 (1) • n=4 • A=1101 B=00011 P=00000 P A 00000 1101 B 00011 P A 00001 0100 Remainder Quotient B 00011 Division - remarks • • • • Non-restoring Algorithm Load P only if positive Check for 0 (Total) Result is 2n bits! Integer arithmetic - remarks • Signed Multiply and Division – Algorithms exist – We will not use them • What to do with extra bits? • Faster methods Floating Point Non Integers - Other Methods • Fixed Point – – – – example: # # # . # Binary point shifted Integer arithmetic (extra shifting) Small number magnitude • Rational – a/b (a,bZ) Floating Point • Exponent + Significand (= Mantisa) • x = s • 2e • Example: s=101 e=011 x = 101 • 211 = 5 • 23 = 40 = 101000 Uniqueness • Denormal Numbers: 123.456  107 0.123  104 • Normalized: #.###  10# 1.123  104 • What about 0 ? Floating Point Standard • Why Standartize? – – – – Hardware accelerators Software compatibility Build Software Libraries etc….. • IEEE 754-1985 ISO/IEC 559 • Includes: Structure, Arithmetic results Float Types • 4 Precision Types: – – – – Single Single extended Double Double extended Single Precision • 32 bits: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sign(1) Exponent(8) Significand(23) • Exponent (e): • Significand (f): Biased ( + 127) Fixed fraction: 0 . # # # … • Nuber: 1.f • 2e-127 Single Precision - Example • 1 10000001 01000000000000000000000 • 10000001 = 129  • 01000… = 0.01000…  • X = - 1.25 • 22 • X= -5 129-127=2 1.01= 1.25 Single Precision - Range • Emax = 127 • Emin = -126 (e = 254) (e = 1) • Why |Emin|<|Emax|? – 1/2Emin does not overflow • Why Biased notation? • What about 0 and 255 ? Floating Point Precision Single Single Extend Double Double Extend Format Width 32 43 64 80 Precision 24 32 53 64 Emax +127 1023 1023 16383 Emin -126 -1022 -1022 -16382 8 11 11 15 Exp. Width Exp. Bias 127 1023 Exmaples • We shall use base 10 sometimes: • f will have 3 digits • Emax will be 98 • Emin will be -97 • Ex: 5.341070 NaN • Not a Number • Result of ilegal computation: – 0    () 0   rem(x,0) rem(, y) 0  – Any computation involving a NaN • e = Emax + 1 & f  0 • # 11111111 ####################### • Many NaN’s (different f’s) NaN’s in use • Zero finder outside domain – f(x) = sqrt(x) - 1 • Works since all computations NaN • No exception caused ! Zero’s • 0 00000000 00000000000000000000000 ? • this is NOT 1.02Emin • 1 00000000 00000000000000000000000 ? • 0 is signed! 0 both exits! • What is the difference? Signed 0’os • +0 = -0 BUT: • Multiply/Divide keep sign rules: 3  (0)  (0) 3  (0)  (0) • Monivation: – Using inf correctly (describe later) – log(x) : log(0)=-inf log(negative)=Nan log(x) if x(-0) ? ± inf  1   0  1   0  1   0  1   0 • More logic:  1  0  • e = Emax + 1 & x  ()   f=0 • # 11111111 00000000000000000000000 Inf usage Example 1 cos (x)  2 tan 1 1 x 1 x (If tan-1 is defined properly) More on 0’os and inf’s • General Rule for 0/inf arithmetic: – Take appropriate limit: 3  3    0 lim x x 0 3  3  (0) () lim x x () • 1/(1/x) where x=0 or inf • Why not Max # instead? x2  y2 x  3  1070 y  4  1070 9.99  1098  3.16  1049 answare : 5  1070 Zero’s and inf’s - yet again • X/(x2+1) is bad! Why? • 1/(x+x-1) is better • Do we need to check for x=0? • Using 2 zero’s and inf’s saves some special cases checks. Denormalized numbers • Example: – – – – x=1.23•10-98 y=1.11•10-98 x-y = 1.20•10 -99 = 0 so: x-y=0 but: x  y think of: if(x  y) then z=1/(x-y) • Soluition: – use denormalized numbers! Denormal Numbers • Smallest normal: • Below, use denormal: 1.0 • 2Emin 0.f • 2Emin • e = Emin - 1 & f  0 • # 00000000 ####################### • Gradual underflow: 1.23 • 10-4 ( /10 ) 0.12 • 10-4 ( /10 ) 0.01 • 10-4 ( /10 ) 0 Denormal Numbers • Back to our Example: – x=1.23•10-98 y=1.11•10-98 – x-y = 0.12•10 -98 – and this is not 0 ! Flush to 0 Vs Gradual Underflow 0 2-4 2-3 2-2 2-1 0 2-4 2-3 2-2 2-1 Special Values - Summary Exponent Fraction Represents Emin-1 f=0 0 Emin-1 f 0 0.f2Emin ---f=0 1.f2e 0 f 0 0.f2Emin Emin  e  Emax Emax+1 Emax+1 Rounding • Why is rounding needed? • • • • Infinit numbers  Finit representation Integers only overflow Almost all operations need rounding IEEE - specifies algorithms for arithmetic Numbers need rounding • Out of range: – x>22Emax x<12Emin • Between 2 floats: – 0.110 = 0.00011001100….2 = 1.1001100…. 2-4 – 1.1001 2-4 Measuring Error • ULPS (units in last place) – 1.1210-1 Vs – 1.1210-1 Vs 0.124 0.118 : 0.4 ulps : 0.2 ulps • Relative Error – Difference/Original – 1.1210-1 Vs 0.124 : Err=0.004/0.124=0.032 Calculate Using Rounding • Benign cancellation – Calculate 10.1-9.93 (= 0.17) 1.01 101 0.99 101 0.02 101 = 2.00 10-1 – 30 upls! Rounding problems • Catastrophic cancellation – – – – b2-4ac both b2 and 4ac are rounded the (-) exposes the error b=3.34 a=1.22 c=2.28 b2=11.2 4ac=11.1 b2-4ac=0.10 correct=0.0292 (70.08 upls) IEEE Arithmetic • Requirement: +- shold be EXACTLY rounded remainder shold be EXACTLY rounded Integer conv. shold be EXACTLY rounded • Not all (transcendental, binary to decimal) • “Tie break” - Round to Even Round to Even • How will 1.005 be rounded ? – Round Up: – Round Even: 1.01 1.00 • Why? Example: – xi=xi-1+y-y x0=1.00 y=0.125 – Round up: 1.00, 1.01, 1.02, …. – Round even: 1.00, 1.00, 1.00, …. Float Multiplication e1  e2 (s1  2 )  (s2  2 )  (s1  s2)  2 e1 e2 Integer Biased multiply additio n •“Biased addition”: (e1  127)  (e2  127)  127  (e3  127) -detect Overflow: Use n+1 bit adder -detect Underflow: Harder (Denormals) Rounding Multiplication 1.23 6.78 8.3394 Round to 8.34 X 1.0001 1 Round bit 0 2.83 4.47 12.6501 Round to 1.27 1.0010 0 Round bit 1 All rest 0 X 1.0010 1 Round bit 1 All rest 0 1.28 7.81 09.9968 Round to 1.00 X 0.1101 0 Shift needed Round, Guard, Sticky 1.001000100 number round sticky 0.110100010 number guard round sticky Rounding Multiplication Shift Carry P A n n B n Product Results: Case 1: x0=0, shift x0x1.x2x3x4x Case 2: x0=1, inc. exp X0.x1x2x3x4x5 5 x1.x2x3x4x5 g g r s s s s Sticky bit Roun d digit Rounding rules • r=0 • r=1, s=1 • r=1, s=0    • Denormals  rounded OK add 1 to LSB add 1 if LSB=1 Extra shifting Float addition • Compute all digits and round? – 1.00220 + 1.00 2-20 = 10000000….0000001 – too long! • Use Round and Sticky bits: – shift to same exponent – r = first discarded digit – s = OR of rest discarded Float addition - example Calculate: Shift exponents: 0.000011000120 1.1001120 + 1.100012-5 1.1001120 + r=1 1.10011 + .00001 1.10100 r=1, s=1 Round needed! 1.10101 s=0|0|0|1= 1 Signed Addition/Substraction • Simplest way - convert to 2’s cmpl. • Cancellation of high order bit - shift 1.00000 0.00000101111 cmpl 1.1111101000 1 1.00000 + 1.11111 0.11111 • more bits cancel - How many guard digits? Float Division e1 e2 (s1  2 )  (s2  2 )  (s1  s2)  2 e1 e2 Integer Biased division substractio n • • • • • Very similar to Multiplication Dividing using integer divide Compute 2 more bits (round, guard) Use remainder as sticky bit (Why?) Sign bit: XOR More on floats Rounding modes • IEEE specifies 4 modes: – – – – Nearest towards 0 towards +inf towards -inf (default) • affects overflow (How?) Exceptions • Set a flag at: – – – – – Underflow Overflow divide by 0 inexact invalid • flags are sticky 1.02Emin x 1.02Emin 1.02Emax x 1.02Emax 1/0 Rounded was needed NaN return operations Speeding up • Different algorithms may be used • Result should be exact • divide SRT algorithm in pentium – 5/2048 entries in a table – 1/9,000,000 chance – check: Precision • Why extended precisions? – Return higher accuracy (D*Dext. D) – use for computations: x2  y 2

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ppt