Download Integers and Floating Point

CMPE12 – More about Numbers Integers and Floating Point (Rest of Textbook Chapter 2 plus more)" Review: Unsigned Integer •  A string of 0s and 1s that represent a positive integer." •  String is Xn-1, Xn-2, … X1, X0, where Xk is either a 0 or a 1 and has a weight of 2k." •  The represented number is the sum of all the weights for each 1 in the string." CMPE12 – Fall 2011 – J. Ferguson" 14 - 2" Signed Integers" Allow us to represent positive and negative integers." 4 important types:" "Sign and Magnitude -- Leftmost bit is the sign and the remaining bits are the unsigned magnitude." "1ʼs complement -- The additive inverse of a number is the bit-wise complement of the number." "2ʼs complement -- The additive inverse of a number is the bit-wise complement plus one to the number." "Bias or excess notation -- a bias is subtracted from the “unsigned value” to get the bias value." CMPE12 – Fall 2011 – J. Ferguson" 14 - 3" Quick Review of Signed Integers" Decimal 1’s complement 2’s complement Sign-andMagnitude 22 0010110 0010110 0010110 -22 1101001 1101010 1010110 CMPE12 – Fall 2011 – J. Ferguson" 14 - 4" Biased notation" •  How does it work? The signed integer is biased so that the bias value is represented by 000…000." •  Advantages" –  Preserves lexical order" –  Single zero" –  Most versatile " •  Disadvantages:" -  Add and sub require one additional operation to adjust the bias" CMPE12 – Fall 2011 – J. Ferguson" Rep Value 000 -3 001 -2 010 -1 011 0 100 1 101 2 110 3 111 4 Bias 3 14 - 5" Conversion: Decimal/BiasX" •  Decimal -> BiasX" –  Add X, then Convert to Binary" •  BiasX -> Decimal" –  Convert Binary to Decimal, then Subtract X" CMPE12 – Fall 2011 – J. Ferguson" Rep D Value 000 0 -3 001 1 -2 010 2 -1 011 3 0 100 4 1 101 5 2 110 6 3 111 7 4 14 - 6" Addition of Bias representations?" What you get" unsigned" (x+b)! (x+ b)! z+2b! x! +(y+b)! +(y+ b)! - b! +y! (z+b)! (z+2b)! z+ b! z! What you want" How you convert" So you must subtract out the additional Bias when you are finished!" CMPE12 – Fall 2011 – J. Ferguson" 14 - 7" Subtraction?" What you get" unsigned" (x+b) (x+b) z+0 x -(y+b) -(y+b) + b -y (z+b) (z-0) z+b z What you want How you convert" So you must add back the Bias when you are finished!" CMPE12 – Fall 2011 – J. Ferguson" 14 - 8" Biased notation mapping" Number" -3! represented" Number" 000! encoded" -2! -1! 0! 1! 2! 3! 4! 001! 010! 011! 100! 101! 110! 111! Range on n bits:" -(2n-1-1) to 2n-1" if Bias is 011…112" CMPE12 – Fall 2011 – J. Ferguson" 14 - 9" 32-bit word" •  32 bit word can represent ~ 4.3 billion values" –  Integers: 0 -> ~4.3 billion" –  Signed Integers: ~ -2.15B -> 2.15B" •  Fractional numbers?" •  Very large numbers?" •  Numbers with very small magnitude?" CMPE12 – Fall 2011 – J. Ferguson" 14 - 10" Scientific Notation" •  Example: 6.023*1023" •  Of form A.xxx *(BASE)exponent" •  In Binary: 1.xxx… * 2exponent" –  Or maybe Y16.xx…16 * 16exponent" •  Standard: IEEE standard for floating point arithmetic" CMPE12 – Fall 2011 – J. Ferguson" 14 - 11" IEEE standard for floating point" 1.xxx * 2exponent in a 32-bit word" The 1. and the 2 can be assumed." xxx…xx and exponent (and sign) is all that must be specified." CMPE12 – Fall 2011 – J. Ferguson" 14 - 12" Floating Point Numbers" 8 bits S Exponent 1 means" negative" 23 bits" Fraction (xx…xx)" (In Bias 127)" How do we convert to Decimal?" If 00000000 < Exponent < 111111111" N = (-1)S * 1.Fraction * 2Exponent-127" CMPE12 – Fall 2011 – J. Ferguson" 14 - 13" Converting from Decimal to Float" 1.  Convert to Binary (eg. -10010.01101)" 2.  Normalize (form = 1.xxxxxx *2EXP)" 3.  Convert EXP to bias127 (add 127 to it)" 4.  MSB [31] gets sign" 5.  [23:30] gets EXP (bias127)" 6.  [0:22] gets xxxxxxxxxxxxxxxxxxxxxxx" CMPE12 – Fall 2011 – J. Ferguson" 14 - 14" Convert to IEEE FP" •  56.5" •  -5.625" •  -.0004 (do to 5 binary places)" CMPE12 – Fall 2011 – J. Ferguson" 14 - 15" Your hard work has not gone unnoticed!" From: “The Chronicle of Higher Education”" The average full-time undergraduate student studies about … 15 hours a week—but the duration varies by major, according to this year's National Survey of Student Engagement." Engineering majors spend the most time studying, 19 hours a week, but even among those who exceed 20 hours, nearly a quarter still often show up for class without assignments completed." CMPE12 – Fall 2011 – J. Ferguson" 14 - 16" Misconceptions about floats" •  Floats are not reals. Ex. 2/3" •  Floats are not decimals. 0.110 = 0.0011001100110011…2" •  Not all integers < 231 can be represented. 224+1 = 10000000000000000000000012" CMPE12 – Fall 2011 – J. Ferguson" 13 - 17" More on IEEE 754 FP Standard" •  •  •  •  Distribution of floats on number line" “Denormalized” floats" Double precision floats" Arithmetic on floates" CMPE12 – Fall 2011 – J. Ferguson" 14 - 18" How FP numbers distributed" •  A 32 bit number can represent at most 232 values" •  IEEE 754 FP can represent numbers larger than 2127 so many integers between 0 and 2127are not represented." •  High density close to 0." •  Low density far from 0" CMPE12 – Fall 2011 – J. Ferguson" 13 - 19" Specifically: 223 values for each value of exponent (23 bits)" •  Between 1/2048 and 1/1024 there are 223 floats." •  Between 1 and 2 there are 223 floats." •  Between 230 and 231 there are 223 floats." •  Between 2x to 2x+1 for -127 < x < 128 there are 223 floats." CMPE12 – Fall 2011 – J. Ferguson" 13 - 20" Number Line" 223 0 1 223 2 3 223 4 5 6 223 7 8 9 10 11 12 13 14 15 16 223 240 CMPE12 – Fall 2011 – J. Ferguson" 241 14 - 21" Denormalized Floating Point Numbers" 8 bits S 00000000 1 means" negative" 23 bits" Fraction (xx…xx)" (Shows as Denormalized)" How do we convert to Decimal?" N = (-1)S * 0.Fraction * 2-126" 1.00000000000000000000000 * 2-126 is smallest Normalized number; 0.11111111111111111111111 * 2-126 is largest Denormalized number." CMPE12 – Fall 2011 – J. Ferguson" 16 - 22" What if Exponent is 11111111?" •  If FRAC is 0, the 32 bits represent + or – infinity." •  If FRAC is nonzero, the 32 bits represent NaN (Not a Number)" –  Ex: 0/0" CMPE12 – Fall 2011 – J. Ferguson" 15 - 23" Infinity: EXP = 11111111; FRAC=0" •  Infinity avoids exception on overflow. (overflow definition: result exceeds value that can be represented)" •  Examples of operations that return infinity: 1/0, -1/0, 3 – inf, sqrt(+inf) CMPE12 – Fall 2011 – J. Ferguson" 13 - 24" Double Precision IEEE 754 Floating Point Numbers" 11 bits S Exponent 1 means" negative" 52 bits" Fraction (xx…xx)" (In Bias 1023)" To convert to Decimal" If 00000000000 < Exponent < 111111111111" N = (-1)S * 1.Fraction * 2Exponent-1023" CMPE12 – Fall 2011 – J. Ferguson" 16 - 25" Double precision floating-point" 11 bits 52 bits" S Exponent 1 means" negative" Fraction (xx…xx)" (In Bias 1023)" -(2-1024 - 1 ) <= exp <= 21024 " 21024 is about CMPE12 – Fall 2011 – J. Ferguson" 2*10308" 15 - 26" Double Precision Float" 52 significant figures base 2 is approximately 16 significant figures in base 10." CMPE12 – Fall 2011 – J. Ferguson" 14 - 27" Single vs. Double FP" •  Range:" –  SP: ~2-126 to 2128 . approximately: 10-38 to 1038" –  DP approximately: 2*10-308 to 2*10308" •  Significant figures: "" –  SP: 23 significant bits, 223 = 8,388,608" almost 9 significant decimal digits" –  DP: 52 significant bits, 252 = 4*220*230 " > 15 significant decimal digits" CMPE12 – Fall 2011 – J. Ferguson" 13 - 28" What is this single-precision floating-point number?" 0 01111010 000000……………………………..000 A.  B.  C.  D.  E.  CMPE12 – Fall 2011 – J. Ferguson" 2-5 0 0.0000000 1 * 2exp(011110102) None of the above 15 - 29" What is this floating-point number?" 000000000 01000000……………………………..000 A.  B.  C.  D.  E.  CMPE12 – Fall 2011 – J. Ferguson" 1.01 1.01*2-127 2-129 2-128 None of the above 15 - 30" Adding two “scientific notation” numbers" 5.345*1023 + 1.236*1025" 1.  Make their exponent the same (0.05345*1025 + 1.236*1025)" 2.  Add the non-exponents (1.28945*1025)" 3.  Normalize (already done)" CMPE12 – Fall 2011 – J. Ferguson" 15 - 31" Adding two floats" 0 11111100 01100000……………………………..000 0 11111000 110100000…………………….…..000 1.  1.011*211111100 + 1.1101*211111000" 2.  Make their exponent the same (1.011*211111100 + .00011101*211111100)" 3.  Add nonexponents (1.01111101 *211111100)" 4.  Normalize (already done)" 0 11111100 0111110100………………………..000 CMPE12 – Fall 2011 – J. Ferguson" 15 - 32" Multiplying two “scientific notation” numbers" 1.  5.3*1023 * 8.1*1025" 2.  Multiply the non-exponents and add the exponents (42.93*1048)" 3.  Normalize (4.293*1049)" CMPE12 – Fall 2011 – J. Ferguson" 15 - 33" Multiplying two floats" 0 10000011 0100000……………………………..000 0 10000001 110000000…………………….…..000 1.  1.01*24 * 1.11*22" 2.  Multiply the non-exponents and add the exponents (10.0011*26)" 3.  Normalize (1.00011*27)" 0 10000110 0001100000………………………..000 CMPE12 – Fall 2011 – J. Ferguson" 15 - 34" Add these two floats" 0 11111100 01110000……………………………..000 0 11111110 100100000…..………………….…..000 1.  Write each in normalized form" 2.  Make their exponent the same " 3.  Add nonexponents" 4.  Normalize" CMPE12 – Fall 2011 – J. Ferguson" 15 - 35" Multiplying these two floats" 0 10000100 0100000……………………………..000 0 01111000 100000000…………………….…..000 1.  Write normal form of numbers" 2.  Multiply the non-exponents and add the exponents" 3.  Normalize" CMPE12 – Fall 2011 – J. Ferguson" 15 - 36" How is FP arithmetic done?" Software: very, very slow." Hardware floating-point: expensive, but usually worth it." Two measures of performance:" "1. MIPS: millions of instructions executed per second." "2. MFLOP: millions of floating point operations per second." CMPE12 – Fall 2011 – J. Ferguson" 15 - 37"

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Integers and Floating Point