* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Signed integer representation
Survey
Document related concepts
Transcript
Lecture 2 Data Representation in Computer Systems Lecture Duration: 2 Hours Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation Prepared by Dr. Hassan SALTI - 2012 2 Introduction Some Notifications – A reminder (1/2) Bit: The most basic unit of information in a digital computer (On/Off ; 0/1 state) Byte: A set of 8bits Word: two or more adjacent bytes that are manipulated collectively Word size: The size of a word in bits depends on the computer organization (16, 32, 64 bits, …) Nibbles (or nybbles): set of 4 bits – Usually a set of 8 bits is divided into two nibbles, a low order nibble and a high order nibble Prepared by Dr. Hassan SALTI - 2012 3 Introduction Some notifications – A reminder (2/2) Example: Most Significant bit 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 1 (MSB) bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit High Order nibble Least Significant bit (LSB) Low Order High Order Low Order nibble nibble nibble byte byte Word (16 bit) Prepared by Dr. Hassan SALTI - 2012 4 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation Prepared by Dr. Hassan SALTI - 2012 5 Positional Numbering System Positional Numbering System (1/3) Any numeric value is represented through increasing powers of a radix (or base) The set of valid numerals (digits) is equal in size to the radix of that system The least numeral is 0 and the highest one in 1 smaller than the radix Example: • In the decimal system (base 10) - The radix is 10 - The number of valid numerals is 10 (equal to the radix) - The set of valid numerals is: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} Prepared by Dr. Hassan SALTI - 2012 6 Positional Numbering System Positional Numbering System (2/3) The most important radices (bases) in computer science are: • Binary - Radix 2 or base 2 - Numerals: {0 , 1} • Octal - Radix 8 or Base 8 - Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7} • Hexadecimal - Radix 16 or base 16 - Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , A , B , C , D , E , F} Prepared by Dr. Hassan SALTI - 2012 7 Positional Numbering System Positional Numbering System (3/3) Any numeric value is represented through increasing powers of a radix (or base) Examples • 43.5110 = 2x102 + 4x101 + 3x100 + 5x10-1 + 1x10-2 • 2123 = 2x32 + 1x31 + 2x30 = 2310 • 10110.012 = 1x24 + 0x23 + 1x22 + 1x21 + 0x20 + 0x2-1 + 1x2-2= 22.2510 Prepared by Dr. Hassan SALTI - 2012 8 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion • Converting Unsigned Whole Numbers • Converting fractions • Converting between Power-of-Two Radices Signed integer representation Floating-point representation Prepared by Dr. Hassan SALTI - 2012 9 Decimal to binary conversion Some numbers to remember (1/1) Keep in mind the following tables or how to obtain them! Prepared by Dr. Hassan SALTI - 2012 10 Decimal to binary conversion Converting Unsigned Whole Numbers (1/6) A real number can take any value (ex. 10323.7643 ; -16813.5322703) Whole number: No fractions (ex: 10, 1231, 3543, …, -12, -12334,…) Unsigned number: Only positive numbers (ex: 102313.43234, 1231.56234, 12357, …) Unsigned whole numbers: No fraction and only positive numbers Prepared by Dr. Hassan SALTI - 2012 11 Decimal to binary conversion Converting Unsigned Whole Numbers (2/6) Convert the decimal number 11310 to binary: 11310 = 2 Method 1: Repeated subtraction 113 - 64 49 - 32 17 - 16 1 - 1 0 11310 = 11100012 Prepared by Dr. Hassan SALTI - 2012 1 0 0 0 1 1 1 12 Decimal to binary conversion Converting Unsigned Whole Numbers (3/6) Method 2: Division-remainder 2 |113 Remainder 1 LSB 2 |56 Remainder 0 2 |28 Remainder 0 2 |14 Remainder 0 11310 = 11100012 2 |7 Remainder 1 2 |3 Remainder 1 Remainder 1 MSB 2 |1 0 Prepared by Dr. Hassan SALTI - 2012 13 Decimal to binary conversion Converting Unsigned Whole Numbers (4/6) A binary number with N bits can represent 2N unsigned integers from 0 to 2N-1 Example: • Having N=4 bits, we can represent 24 = 16 unsigned integers from 0 to 24-1=16-1=15 • The number 16 CANNOT be represented with only 4 bits!! Prepared by Dr. Hassan SALTI - 2012 14 Decimal to binary conversion Converting Unsigned Whole Numbers (5/6) The subtraction method is cumbersome. The subtraction method requires a familiarity with the powers of the radix being used. The division-remainder method is faster and easier than the repeated subtraction method. The division-remainder method can be used to convert from decimal to any other base system (not only to base 2). Prepared by Dr. Hassan SALTI - 2012 15 Decimal to binary conversion Converting Unsigned Whole Numbers (6/6) Example: Convert 10410 to base 3 using the division-remainder method. 3 |104 3 |34 3 |11 3 |3 3 |1 0 Remainder 2 Remainder 1 Remainder 2 Remainder 0 Remainder 1 Prepared by Dr. Hassan SALTI - 2012 10410 = 102123 16 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion • Converting Unsigned Whole Numbers • Converting fractions • Converting between Power-of-Two Radices Signed integer representation Floating-point representation Prepared by Dr. Hassan SALTI - 2012 17 Decimal to binary conversion Converting fractions (1/5) Fractions in a decimal system can be converted/approximated to fractions in any other radix system Radix points separate the integer part of a number from its fractional part Example of fractions (the integer part is italic and the fractional part is bold) • Base 10 : 2390167.1208 • Base 3 : 2012.11022 • Base 2 : 1011110.111011 The “radix point” is called a “decimal point” in a decimal system, a “binary point” in a binary system, and so on… Prepared by Dr. Hassan SALTI - 2012 18 Decimal to binary conversion Converting fractions (2/5) To convert fractions from decimal to any other base system we repeatedly multiply by the destination radix Example: Convert 0.430410 to base 5. 0.4304 x 5 2.1520 0.1520 x 5 0.7600 x 5 3.8000 0.8000 x 5 4.0000 The integer part is 2 The integer part is 0 0.430410 = 0.20345 The integer part is 3 The integer part is 4, the fractional part is zero, we are done Prepared by Dr. Hassan SALTI - 2012 19 Decimal to binary conversion Converting fractions (3/5) Some fractions in one base could be indeterminate • Fractions that contain repeating strings of digits to the right of the radix point • Example: (2/3)10=(0.666…)10 An indeterminate fraction in one base could be determinate in another base (and vice-versa). • Example: (2/3)10=0.23=(0.666…)10 - 2/3 is indeterminate in base 10 but determinate in base 3. When a fraction is indeterminate, an approximation is needed • We fix the number of digits to the right of the radix point Also, approximation is needed due to the limited computing resources (example: limited size of the processor’s registers) Prepared by Dr. Hassan SALTI - 2012 20 Decimal to binary conversion Converting fractions (4/5) Example: Convert 0.3437510 to binary with 4 bits to the right of the binary point. 0.34375 x2 0.68750 x2 1.37500 0.3437510 = 0.01012 0.37500 x2 0.75000 x2 1.50000 This is our fourth bit. We will stop here. Prepared by Dr. Hassan SALTI - 2012 21 Decimal to binary conversion Converting fractions (5/5) Convert 26.78125 to binary: 26.7812510 = 2 By using the methods just described we will have: 2610=110102 and 0.7812510=0.110012 So 26.7812510=11010.110012 Prepared by Dr. Hassan SALTI - 2012 22 Decimal to binary conversion Going back to positional numbering system (1/1) Any unsigned whole or fractional number could be converted to decimal by using the “Positional Numbering System” described previously Examples: 0.01012=0x2-1+1x2-2+0x2-3+1x2-4 = 0 + 0.25 + 0 + 0.0625 = 0.312510 134.20345 = 1x52 + 3x51 + 4x50 + 2x5-1 + 0x5-2 + 3x5-3 + 4x5-4 = 44.430410 Prepared by Dr. Hassan SALTI - 2012 23 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion • Converting Unsigned Whole Numbers • Converting fractions • Converting between Power-of-Two Radices Signed integer representation Floating-point representation Prepared by Dr. Hassan SALTI - 2012 24 Decimal to binary conversion Converting between Power-of-Two Radices (1/4) To convert between any base to any other base (different than base 10), it is easier to pass through base 10. • Example: 31214= 3? • First step: 31214 = 3x43 + 1x42 + 2x41 + 1x40=21710 • Second step: by using the division-remainder method: 21710 = 220013 • So 31214=220013 Working between bases that are powers of two is much more easier. Prepared by Dr. Hassan SALTI - 2012 25 Decimal to binary conversion Converting between Power-of-Two Radices (2/4) The must famous power-of-two radices are: binary (base 2), octal (base 23 / base 8) and hexadecimal (base 24 / base 16). Each octal digit is equivalent to a group of 3 binary digits called octet1 Each hexadecimal digit is equivalent to a group of 4 binary digits called hextet We convert from binary to octal and from binary to hexadecimal by simply grouping bits 1 The term “Octet” could also be used in the literature to describe a set of 8 bits. Prepared by Dr. Hassan SALTI - 2012 26 Decimal to binary conversion Converting between Power-of-Two Radices (3/4) Example: Convert 101100100111012 to octal • Make Groups of 3 bits (from right to left): - 10 110 010 011 101 • Add zero(s) on the left to complete the last octet - 010 110 010 011 101 • Convert each octet to its corresponding octal digit - 010 110 010 011 101 2 6 2 3 5 • Finally: 101100100111012 = 262358 Prepared by Dr. Hassan SALTI - 2012 27 Decimal to binary conversion Converting between Power-of-Two Radices (4/4) Example: Convert 101100100111012 to hexadecimal • Make Groups of 4 bits (from right to left): - 10 1100 1001 1101 • Add zero(s) on the left to complete the last hextet - 0010 1100 1001 1101 • Convert each hextet to its corresponding hexadecimal digit - 0010 1100 1001 1101 2 C 9 D • Finally: 101100100111012 = 2C9D16 Prepared by Dr. Hassan SALTI - 2012 28 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation • Signed Magnitude • Complement system Floating-point representation Prepared by Dr. Hassan SALTI - 2012 29 Signed integer representation Signed integer representation An integer is a whole number Signed integers are the set of positive and negative whole numbers How should we encode and deal with the actual sign of the number? Two concepts are used • Signed Magnitude concept • Complement concept Prepared by Dr. Hassan SALTI - 2012 30 Signed integer representation Signed Magnitude (1/13) Signed magnitude is the most intuitive method The MSB (Most Significant Bit) of a binary number is kept as the “sign” of the number • MSB = 1: negative number • MSB = 0: positive number The remaining bits represent the magnitude (or absolute value) of the numeric value Prepared by Dr. Hassan SALTI - 2012 31 Signed integer representation Signed Magnitude (2/13) Example: In a 8 bit word signed magnitude system give the decimal representation of the following numbers • 00000001? - The MSB is 0: The number is positive - The remaining 7 bits are: 00000012 = 110 - The decimal number is +1 • 10000001? - The MSB is 1: The number is negative - The remaining 7 bits are: 00000012 = 110 - The decimal number is -1 Prepared by Dr. Hassan SALTI - 2012 32 Signed integer representation Signed Magnitude (3/13) Example: In a 8 bit word signed magnitude system give the decimal representation of the following numbers • 10001001? - The MSB is 1: The number is negative - The remaining 7 bits are: 00010012 = 910 - The decimal number is -9 • 01000001? - The MSB is 0: The number is positive - The remaining 7 bits are: 10000012 = 6510 - The decimal number is +65 Prepared by Dr. Hassan SALTI - 2012 33 Signed integer representation Signed Magnitude (4/13) In a N bit word signed magnitude system • • • • 1 bit is used for the sign of the number N-1 bits are used for the magnitude of the number The largest integer is 2N-1 - 1 The smallest integer is -(2N-1 - 1) Example: in a 8 bit word signed magnitude system • The largest integer is 011111112 = 27-1 = 12710 • The smallest integer is 111111112 = -(27-1) = -12710 Prepared by Dr. Hassan SALTI - 2012 34 Signed integer representation Signed Magnitude (5/13) Computers should be able to carry out mathematical operations Signed-magnitude arithmetic is carried out using essentially the same methods as humans • At first we look at the signs of the two operands • We arrange the operands in a certain way based on their signs • We perform the calculation without regard to the signs • Finally, we supply the sign as appropriate Prepared by Dr. Hassan SALTI - 2012 35 Signed integer representation Signed Magnitude (6/13) Adding operands that have the same sign Example: Add 010011112 to 001000112 using signed-magnitude arithmetic. 1 1 1 1 ⇐ carries Sign 0 1001111 (79) 0 +0100011 + (35) 0 1110010 (114) We find 010011112 + 001000112 = 011100102 in signed-magnitude representation. Prepared by Dr. Hassan SALTI - 2012 36 Signed integer representation Signed Magnitude (7/13) Overflow condition • In the last example, adding the seventh’ bits to the left gives no carry • If there is a carry, we say that we have an overflow condition and the carry is discarded, resulting in an incorrect sum. Example: Add 010000012 to 011000012 using signed-magnitude arithmetic Prepared by Dr. Hassan SALTI - 2012 37 Signed integer representation Signed Magnitude (8/13) 1 ⇐ carries 0 1000001 (65) 0 +1100001 + (97) (34) 0 0100010 The addition overflows The last carry is discarded The sum’s result is incorrect 1 X Prepared by Dr. Hassan SALTI - 2012 38 Signed integer representation Signed Magnitude (9/13) Signed-magnitude subtraction is carried out in a manner similar to pencil and paper decimal arithmetic Example 1: Subtract 010011112 (79) from 011000112 (99) using signed-magnitude arithmetic. 0112 ⇐ borrows 0 1100011 (99) 0 -1001111 (79) 0 0010100 (20) We find 011000112 - 010011112 = 000101002 in signed-magnitude representation. Prepared by Dr. Hassan SALTI - 2012 39 Signed integer representation Signed Magnitude (10/13) Example 2: Subtract 011000112 (99) from 010011112 (79) using signed-magnitude arithmetic. • Here the subtrahend, 01100011, is larger than the minuend, 01001111. • With the result obtained in Example 2.12, we know that the difference of these two numbers is 00101002. • Because the subtrahend is larger than the minuend, all that we need to do is change the sign of the difference. • So we find 010011112 - 011000112 = 100101002 in signedmagnitude representation Prepared by Dr. Hassan SALTI - 2012 40 Signed integer representation Signed Magnitude (11/13) Example 3: Add 100100112 (-19) to 000011012 (+13) using signed-magnitude arithmetic. • The result is negative • We subtract 13 from 19 • The result of the binary subtraction is: 100001102 (-6) Example 4: Subtract 100110002 (-24) from 101010112 (-43) using signed-magnitude arithmetic. • • • • This is equivalent to adding -43 to 24 The result is negative We subtract 24 from 43 The result of the binary subtraction is: 100100112 (-19) Prepared by Dr. Hassan SALTI - 2012 41 Signed integer representation Signed Magnitude (12/13) General rules when operands have different signs • Determine which operand has the larger magnitude • The sign of the result is the same as the sign of the operand with the larger magnitude • the magnitude must be obtained by subtracting (not adding) the smaller one from the larger one Prepared by Dr. Hassan SALTI - 2012 42 Signed integer representation Signed Magnitude (13/13) Problems related to signed magnitude • To much decisions to make (larger number? ; borrows? ; what signs?). • The number 0 could have two representations : 10000000 and 00000000. • Complicated method • Expensive circuits Prepared by Dr. Hassan SALTI - 2012 43 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation • Signed Magnitude • Complement system Floating-point representation Prepared by Dr. Hassan SALTI - 2012 44 Signed integer representation Complement system (1/19) Complement system is used to represent/convert negative numbers only When using complement system the subtraction is converted to an addition Advantages of complement system • Simplify computer arithmetic • No need to process sign bits separately • The sign of a number is easily checked by looking at its high-order bit (MSB). Prepared by Dr. Hassan SALTI - 2012 45 Signed integer representation Complement system (2/19) In base 10, “Casting out 9s” was used to subtract numbers Let’s say we wanted to find 167 - 52 • At first, 999 - 52 is calculated 999 – 52 = 947 • 947 is then added to 167 and the last carry is added to the sum: 167 – 52 = 167 + 947 = 114 + 1 = 115 Carries: 1 + 1 1 1 6 7 9 4 7 1 1 4 Prepared by Dr. Hassan SALTI - 2012 a 46 Signed integer representation Complement system (3/19) The last method uses a “diminished radix complement” Working in base r (radix), the diminished radix is given by : r-1 Example: Base 10 ; r=10 • The diminished radix is r-1 = 10 - 1 = 9 • We say that a negative number is converted to its 9’s complement • For example, -246810 is converted to its nine’s complement as follows: -246810 = 9999 - 2468 = 7531C9 Prepared by Dr. Hassan SALTI - 2012 47 Signed integer representation Complement system (4/19) In a binary system r=2 • The diminished radix complement is r-1 = 1 • We say that we work in one’s complement (C1) • To convert a negative number to its one’s complement this number is subtracted from all ones • A positive number is directly converted to its binary representation • Example: - The one’s complement of 01012 is 11112 - 01012 = 1010C1 - It is nothing more than switching all of the 1s with 0s and vice versa!! Prepared by Dr. Hassan SALTI - 2012 48 Signed integer representation Complement system (5/19) Example: Express 2310 and -910 in 8-bit binary one’s complement form. 2310 = + (000101112) = 00010111C1 -910 = - (000010012) = 11110110C1 Prepared by Dr. Hassan SALTI - 2012 49 Signed integer representation Complement system (6/19) In one’s compliment the subtraction is converted into addition • Example: 2310 – 910 = 2310 + (-910) Example: Add 2310 to -910 using 8-bit binary one’s complement arithmetic. Carries: 1 + 1 1 1 1 1 0 0 0 1 0 1 1 1 2310 1 1 1 1 0 1 1 0 + (-910) 0 0 0 0 1 1 0 1 1410 The result is 00001110C1 = +(000011102) = 1410 Prepared by Dr. Hassan SALTI - 2012 50 Signed integer representation Complement system (7/19) Example: Add 910 to -2310 using 8-bit binary one’s complement arithmetic. -2310 = - (00010111)2 = 11101000C1 910 = + (000010012) = 00001001C1 910 + (-2310) = 11101000C1 + 00001001C1 Carries: 0 + 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 910 + (-2310) -1410 Result: 11110001C1 = -(000011102) = -1410 Prepared by Dr. Hassan SALTI - 2012 51 Signed integer representation Complement system (8/19) In One’s complement, we still have two representations for zero: 00000000 and 11111111 Computer engineers long ago stopped using one’s complement A more efficient representation for binary numbers is the two’s complement Prepared by Dr. Hassan SALTI - 2012 52 Signed integer representation Complement system (9/19) Two’s complement is an example of a radix complement No need to subtract one from the radix r when working in a radix complement. Example: Base 10 ; r=10 • We say that a negative number is converted to its 10’s complement • For example, -246810 is converted to its ten’s complement as follows: -246810 = 10000 - 2468 = 7532C10 Prepared by Dr. Hassan SALTI - 2012 53 Signed integer representation Complement system (10/19) In a binary system r=2 • • • • The diminished radix r = 2 We say that we work in two’s complement Consider “d” is the number of digits To convert a negative number “N” to its two’s complement this number is subtracted from rd = 2d : N10 = (2d – N)C2 • A positive number is directly converted to its binary representation Prepared by Dr. Hassan SALTI - 2012 54 Signed integer representation Complement system (11/19) Example: • In a 4 bits system: d=4; • All negative numbers are converted by being subtracted from 2d = 24 = 1610 = 100002 • The two’s complement of 00112 is 100002 - 00112 = 1101C2 • It is nothing more than one’s complement incremented by 1!! Prepared by Dr. Hassan SALTI - 2012 55 Signed integer representation Complement system (12/19) Example: Express 2310, -2310, and -910 in 8-bit binary two’s complement form. • 2310 = + (000101112) = 000101112 • -2310 = -(000101112) = 111010002 + 1 = 111010012 • -910 = -(000010012) = 111101102 + 1 = 111101112 Prepared by Dr. Hassan SALTI - 2012 56 Signed integer representation Complement system (13/19) Unlike C1 arithmetic, in C2 the last carry is discarded Example 1: Add 910 to -2310 using two’s complement arithmetic. Carries: 0 + 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 0 0 1 0 910 + (-2310) -1410 The result is 11110010C2 = -(000011102) = -1410 Prepared by Dr. Hassan SALTI - 2012 57 Signed integer representation Complement system (14/19) Note how a negative binary number in C2 is converted to decimal • At first all 0 and 1 in the C2’s number are switched: 11110010 → 00001101 • A “1” is then added to the last number: 00001101+1 = 00001110 • So 11110010C2 = -(000011102) = -1410 Prepared by Dr. Hassan SALTI - 2012 58 Signed integer representation Complement system (15/19) Example 2: Find the sum of 2310 and -910 in binary using two’s complement arithmetic. 2310 = +(00010111)2 = 00010111C2 -910 = -(000010012) = 11110111C2 2310 + (-910) = 00010111C2 + 11110111C2 Carries: 1 + 1 1 1 0 1 1 1 0 0 0 1 0 1 1 1 2310 1 1 1 1 0 1 1 1 + (-910) 0 0 0 0 1 1 1 0 -1410 Result: 00001110C2 = +(000011102) = 1410 Prepared by Dr. Hassan SALTI - 2012 59 Signed integer representation Complement system (16/19) Advantages of two’s complement • It is the most popular choice for representing signed numbers • The algorithm for adding and subtracting is quite easy • It has the best representation for 0 (all 0 bits) • It is self-inverting • It is easily extended to larger numbers of bits. Prepared by Dr. Hassan SALTI - 2012 60 Signed integer representation Complement system (17/19) Drawback • the asymmetry seen in the range of values that can be represented by N bits. • Examples: - With signed-magnitude, 4 bits allow us to represent the values -7 (11112) through +7 (01112). - Using two’s complement, we can represent the values: -8 (1000C2) through +7 (0111C2) Prepared by Dr. Hassan SALTI - 2012 61 Signed integer representation Complement system (18/19) Overflow in complement systems (C1 and C2) • An overflow occurs if two positive numbers are added and the result is negative • or if two negative numbers are added and the result is positive. • It is not possible to have overflow when if a positive and a negative number are being added together. Prepared by Dr. Hassan SALTI - 2012 62 Signed integer representation Complement system (19/19) To Detect Overflow • Check the last two carries - If these are different: there is an overflow - If these are equal: there is no overflow Example 1: Find the sum of 12610 and 810 in binary using two’s complement arithmetic. Carries: 0 1 1 1 1 1 1 0 + 0 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 12610 + 810 -1410 The result is 10000110C2 = -(01111010)2 = -12210!!! Note that the last two carries are different Prepared by Dr. Hassan SALTI - 2012 63 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation • A simple model • Floating-point arithmetic • Floating point errors Prepared by Dr. Hassan SALTI - 2012 64 Floating-point representation Floating-point representation (1/1) A computer is supposed to solve all problems Huge and fractional numbers and complicated mathematical operations could be involved An optimized solution to give a good ratio: “Biggest Number/word size” is the Floating point representation Prepared by Dr. Hassan SALTI - 2012 65 Computers use a form of scientific notation for floating-point representation Numbers written in scientific notation have three components: Scientific notation in base 10: + 0.579 x 107 Scientific notation in base 2: + 0.101101 x Prepared by Dr. Hassan SALTI - 2012 23 66 Floating-point representation A simple model (1/8) In digital computers, floating-point numbers consist of three parts: • A sign bit, • an exponent part: representing the exponent on a power of 2, • a fractional part called a significand: which is a fancy word for a mantissa. Prepared by Dr. Hassan SALTI - 2012 67 Floating-point representation A simple model (2/8) More bits used for the exponent increases the range of numbers More bits used for the significant increases the precision For simplicity, in all this course, we will use a simplified 14 bits model • Sign bit: 1 bit • Exponent: 5 bits • Significand: 8 bits Prepared by Dr. Hassan SALTI - 2012 68 Floating-point representation A simple model (3/8) Exercise 1: Represent the number 17 in a 14 bits floating point representation • 17 = 17.0 x 100 = 1.7 x 101 = 0.17 x 102 • Analogically in binary: • 1710= 100012 x 20 = 1000.12 x 21= 100.012 x 22 = 10.0012 x23 = 1.00012 x 24 = 0.100012 x 25 = 0.0100012 x 26 = 0.00100012 x 27 = ... • As a convention, we stop when the MSB of the significant is “1”: 0.100012 x 25 • The exponent is 510 = 001012 • The significant is: 100012 → 100010002 • So: 0 0 0 1 0 1 1 0 0 0 1 0 0 0 Prepared by Dr. Hassan SALTI - 2012 69 Floating-point representation A simple model (4/8) The last floating point representation is not suitable for negative exponents • Example: - the number 0.25 = 0.012 = 0.12 x 2-1 - How to represent the negative exponent -1?! To solve such problems we use an excess-16 bias • All negative and positive exponents are added by 16 • We say that the real exponent is replaced by a biased exponent • All exponents are converted to positive biased exponents Prepared by Dr. Hassan SALTI - 2012 70 Floating-point representation A simple model (5/8) With an excess-16 bias • Exponent values less than 16 will indicate negative exponent values • Exponent values more than 16 will indicate positive exponent values • exponents of all zeros or all ones are typically reserved for special numbers (such as zero or infinity). Prepared by Dr. Hassan SALTI - 2012 71 Floating-point representation A simple model (6/8) Example 1: Represent the number 17 in a 14 bits floating point form with excess-16 bias • • • • • The number is positive: sign bit is “0” 1710= 0.100012 x 25 The exponent is 510 → (5+16)10 = 2110 = 101012 The significant is: 100012 → 100010002 So 17 in floating point form with excess-16 bias is: 0 1 0 1 0 1 1 0 Prepared by Dr. Hassan SALTI - 2012 0 0 1 0 0 0 72 Floating-point representation A simple model (7/8) Example 2: Represent the number 0.2510 in a 14 bits floating point form with excess-16 bias. • • • • • The number is positive: sign bit is “0” 0.25 = 0.012 x 20 = 0.12 x 2-1 The exponent is -110 → (-1+16)10 = 1510 = 011112 The significant is 1 → 10000000 So 0.25 in floating point form with excess-16 bias is: 0 0 1 1 1 1 1 0 Prepared by Dr. Hassan SALTI - 2012 0 0 0 0 0 0 73 Floating-point representation A simple model (8/8) Example 3: Express -0.0312510 in normalized floating-point form with excess-16 bias. • The number is negative: sign bit is “1” • 0.0312510 = 0.000012 = 0.00001x20 = 0.0001x2-1 = … = 0.1x2-4 • The exponent is -410 → (-4+16)10 = 1210 = 011002 • The significant is 1 → 10000000 • So -0.03125 in floating point form with excess-16 bias is: 1 0 1 1 0 0 1 0 Prepared by Dr. Hassan SALTI - 2012 0 0 0 0 0 0 74 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation • A simple model • Floating-point arithmetic • Floating point errors Prepared by Dr. Hassan SALTI - 2012 75 Floating-point representation Floating point arithmetic (1/2) To add/subtract two numbers in floating point form • Both numbers should have the same exponent • If exponents are different 1. we change one of the numbers so that both of them are expressed in the same power of the base 2. We add the binary numbers 3. We represent the result in a normalized floating point form Prepared by Dr. Hassan SALTI - 2012 76 Floating-point representation Floating point arithmetic (2/2) Example: Add the following binary numbers as represented in a normalized 14-bit format with an excess-16 bias. 1810 → 210 + 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 1 0 1610 → 010 The second number is 0.10011010x20 The first number is 0.11001000x22 = 11.001000x20 Now 0.100110102 + 11.0010002 : 0.1 0 0 1 1 0 1 0 + 1 1.0 0 1 0 0 0 0 0 1 1.1 0 1 1 1 0 1 0 The result is 11.10111010 x 20 = 0.1110111010 x 22 In floating point form with excess-16 0 1 0 0 1 0 1 1 Prepared by Dr. Hassan SALTI - 2012 1 0 1 1 77 Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation • A simple model • Floating-point arithmetic • Floating point errors Prepared by Dr. Hassan SALTI - 2012 78 Floating-point representation Floating Point Errors (1/2) Computers are finite systems When dealing with floating-point form, we are modeling the infinite system of real numbers in a finite system of integers What we have, in truth, is an approximation of the real number system The more bits we use, the better the approximation However, there is always some element of error Such errors can propagate through a lengthy calculation, causing substantial loss of precision Prepared by Dr. Hassan SALTI - 2012 79 Floating-point representation Floating Point Errors (2/2) Example: • In our previous simple model - we are limited between -0.111111112x215 through +0.111111112x215. - we cannot store 2x-19 or 2128; they simply don’t fit. - Also, 128.5 cannot be accurately stored even if it is well within our range → 128.510 = 10000000.12 = 0.1000000012x28 → The significant is expressed with more than 8 bits! → In practice we store only the first 8 bits: 10000000 → We actually store 128 and not 128.5 with an absolute error of 0.5 → The relative error is : 128.5 - 128 = 0.0038910 = 0.39%. 128.5 Prepared by Dr. Hassan SALTI - 2012 80 End of lecture 2 Try to solve all exercises related to lecture 2