* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Representation of Information
Survey
Document related concepts
Transcript
Topic 2: Representation of Information This chapter deals with different aspects of information representation which are commonly used in digital systems. Depending of the type of application, we can see information as sets of bytes organized in a particular fashion to express an entity, which can be a signal, an image, a file or a series of numbers. Objective: After this Lesson you will master different systems of numeration (binary, octal, decimal, hexadecimal), and fixed- and floating-point representations. You will also be able to analyze some basic codes such as ASCII and BCD. 1. Information Information is represented in a digital system by means of binary sequences (0 and 1), which are organized into words. A word is a unit of information of fixed length n. A sequence of 8 bits is called byte. Commonly word sizes are multiples of 8. Figure 2.1 shows some basic types of information represented in a computer. Information Data Numbers Fixed-point Number Instructions (Assembly code, Java, C/C++, JPEG, MP3) Non-numerical data (ASCII Characters) Floating-point Number Unsigned fixed-point number Signed fixed-point number Figure 2.1: Some basic information types. There is a fundamental division of information into instructions (control information covered in Processor Architecture) and data. Data may be further subdivided into numerical and non-numerical. In view of the importance of numerical computation, a great deal attention has been given to the development of number codes. Two major formats have evolved, fixed-point and floating-point. The binary fixed-point is of the form: 1 X an1an2 ...........a1a0 a1a2 ......am Where ai {0, 1}. A floating-point number, on the other hand, consists of a pair of fixed-point number (M, E), where M is the mantissa and E the exponent, which represents the number MxB E, where B is a predetermined base or radix. Floating-point notation corresponds to the so-called scientific notation. A variety of codes are used to represent fixed-point numbers. These codes may be classified as binary, e.g., two's complement, or decimal, BCD (binary code decimal). Non-numerical data usually take the form of variable-length character strings encoded in ASCII or similar codes (Unicode). The Encoding describes the process of assigning representations to information. Choosing an appropriate and efficient encoding is a real engineering challenge. The encoding process impacts design at many levels: - Mechanism (devices, # of components used). - Efficiency (bits used). - Reliability (noise). - Security (encryption). 2. Representation of Numbers (Encoding Numbers) In selecting a number representation to be used in a digital system, the following factors should be taken into account: - The type of numbers to be represented, e.g., integers, real numbers, complex numbers. - The range of values likely to be encountered. - The precision of the number, which refers to the maximum accuracy of the representation. - The cost of the hardware required to store and process the numbers. Two principal number formats are fixed-point and floating -point. In general, fixed-point formats allow a limited range of values. Floating-point numbers, on the other hand, allow a much larger range of values. The four major systems of numeration are: a) Decimal Numbers: Base = 10, ten unique digits (0,1,2,3,4,5,6,7,8,9) b) Binary Numbers: Base = 2, two unique digits (0 and1), binary digit = “bit” c) Octal Numbers: Base =8, eight unique digits (0,1,2,3,4,5,6,7) d) Hexadecimal Numbers: Base 16, sixteen unique digits (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) In general the representation of a positive fixed-point number X in base (radix) r uses (n+m) digits or less, and it is expressed as follows: X n a r i m i i a n 1 r n 1 a n 2 r n 2 .......... a1 r 1 a 0 r 0 a 1 r 1 a 2 r 2 ......... a m r m ai 0,.....r 1 or 2 X an1an2 ...........a1a0 a1a2 ......am Binary (r =2), decimal (r = 10), octal (r = 8) and hexadecimal (r =16) an-1an-2…..a1a0 represents the integer part of X , and a-1a-2…a-m the fraction part. An integer INT is represented in binary with n bits as a n-1an-2…..a1a0 where an-1 is the most significant bit (MSB) and a0 the least significant bit (LSB). 2.1. Fixed-point Numbers The fixed-point format is derived from ordinary (decimal) representation of a number as a sequence of digits separated by a decimal point. In general, we can assign weights of the form of r i, where r is the radix or base of the number system, to each digit. The most fundamental number representation used in computers employs a positional notation with 2 as the radix. A binary sequence of the form bn…b3b2b1b0.b-1b-2b-3……b-m represents the number: 10102 denotes the binary equivalent of the decimal number 10, 3.3 is an example of fixed-point number, its binary form is 11.01001…. 2. The following table summarizes methods for converting among the most common radices: Table 2.1: Conversion methods for common radices (bases). All the notions we have presented are related to unsigned binary numbers. Several distinct methods are used to represent signed (positive and negative) numbers. 3 Sign Considerations Let us consider n-bit word to be used to contain a signed binary number. One bit is reserved to represent the sign of the number, while the remaining bits indicate its magnitude. Generally the sign is placed in the leftmost position, and the values 0 and 1 are used to denote plus or minus respectively. Thus we obtain the format: xnxn-1………x2x1x0. The precision allowed by this format is n-1 bits, which is equivalent to (n-1)/log210 decimal digits. Using n-bit integer format, we can represents all integers N with magnitude /N/ in the range 0 /N/ 2n-1. This number code is called sign-magnitude representation. Several number codes have been devised which use the same representation for positive numbers as the sign-magnitude code but represent negative numbers in various different ways. For example, in onescomplement code , -X is denoted by X, the bitwise logical complement of X. In the twos-complement code, -X is formed by adding 1 to the least significant bit of X and ignoring any carry bit generated from the most significant (sign) position. If X is an n-bit binary fraction, this may be expressed as follows: -X = xnxn-1xn-2…….x2x1x0 + 0.00….01 (Modulo 2), where the use of modulo-2 addition corresponds to ignoring carries from the sign position. If X is an integer, that we have -X = xnxn-1xn-2…….x2x1x0. + 1 (Modulo 2n). In each complement codes, xn retains its role as the sign bit, but the remaining bits no longer form a simple positional code when the number is negative. Some 8-bit two’s complement representations are shown below: 4 Figure 2.2, illustrates how integers are represented using each of the three codes discussed above when n = 4. Decimal Sign magnitude Ones complement Twos complement +7 0111 0111 0111 +6 0110 0110 0110 +5 0101 0101 0101 +4 0100 0100 0100 +3 0011 0011 0011 +2 0010 0010 0010 +1 0001 0001 0001 +0 0000 0000 0000 -0 1000 1111 0000 -1 1001 1110 1111 -2 1010 1101 1110 -3 1011 1100 1101 -4 1100 1011 1100 -5 1101 1010 1011 -6 1110 1001 1010 -7 1111 1000 1001 Figure 2.2. Representation of Numbers using different binary codes. For 2’s complement signed numbers, the range of values for n-bit numbers is: -(2n-1) to + (2n-1 – 1) Let us Summarize about Complements They are commonly used to represent negative numbers and to perform subtraction. Two types of complements can be applied to any base: General Base 2 Base 10 (r-1)’s complement 1’s complement 9’s complement r’s complement 2’s complement 10’s complement r = base A general number X might be represented as: X an1an2 ...........a1a0 a1a2 ......am (n = number of digits before the decimal point and m = number of digit after the decimal point) (r-1)’s complement of X = rn – r-m –X or (rn - 1 – X if m =0) r’s complement of X = rn –X = (r-1)’s complement + r-m or ((r-1)’s complement + 1 if m =0) Practical techniques for complements 9’s complement: subtract each digit from 9 10’s complement: 9’s complement +1 if m = 0 1’s complement: replace each o by 1 and replace each 1 by 0 2’s complement: 1’s complement +1 if m = 0 Hexadecimal and Octal Bases In order to represent a number in a binary form we need two digit (0, 1), in some cases the number of 0 and 1 in the binary representation is very large. It's useful to have condensed forms of representation using a 5 multiple of two of the digits related to the binary form. With base octal (8 = 2 3) we use 3 digits, and 4 digits for the hexadecimal base (16 = 24). Examples: 1000 1100 1000 1111 11002 = 8C8FCH = 21443748 2.2. Conversion Algorithms a) Converting a binary integer to decimal format (procedure BINDECi) - Let N2 = bn-1bn-2……b0 be the binary integer to be converted to the decimal form N10. Set N10 to an initial value of zero - Scan N2 from left to right, and for each bit bi in turn, compute 2xN 10 + bi and make this the new value of N10. The final value of N10 obtained after n steps is the desired result. b) Converting a mixed binary number N2 that has m fraction bits and n-m integer bits to decimal format (procedure BINDECm) c) - Multiply the given number N2 by scale factor 2m to change it into a binary inter N’ 2. - Use the integer conversion procedure BINDECi to convert N’2 to the decimal dorm N’10. - Finally, multiply N’10 by scale factor 2-m to obtain the desired (mixed) decimal result N 10 Converting an integer from Base X to base Y (will be discussed in class) 2.3. Addition and Subtraction of Nondecimal Numbers The following table shows the addition and subtraction for of binary digits. Examples of additions and subtractions: 6 X= minuend; Y = subtrahend; X – Y = difference. 2.4. Two’s-complement additions and subtractions The rules behind arithmetic operations on binary numbers are summarized as follows: Addition (Textbook page 39) Overflow (out of the range –2n-1 to 2n-1 –1, textbook page 41) Subtraction (textbook page 43). 2.5. Floating-point Numbers The previous paragraph deals with fixed point numbers. For very large numbers such as the Number of Avogadro (6.022 x 1023) and very small numbers such as the charge of an electron (-1.6 x 10 –19), there is a need to define another standard. Floating-point notation has been proposed for this class of numbers. A real number, or as it is often called, a floating-point number, contains two parts: a mantissa (significand, or fraction) and an exponent, the base is 2. Figure 1.3 depicts both the 4- and 8-byte forms of real numbers as they are store in some microcomputer systems. Note that 4-byte real number is called single-precision and the 8-byte form is called double precision. The form presented here is the same form specified by the IEEE standard, IEEE-754, version 10.0. 7 S Exponent Significand Figure 2.3. The floating-point numbers: single-precision using a bias of 7FH, and double-precision using a bias of 3FFH. The exponent is stored as a biased exponent. With the single-precision form of the real number, the bias is 127 (7FH); with the double-precision form, it is 1023 (3FFH). The bias adds to the exponent before is stored it to the exponent portion of the floating-point number. An exponent of 23, represented as a biased of 127 + 3 or 130 (82H) in single-precision form or as 1026 (402H) in the double-precision form. There are two exceptions to the rules for floating-point numbers. The number 0.0 is stored as all zero. The number infinity is stored as all ones in the exponent and all zeros in the mantissa. Table 2.2 shows numbers defined in real number format. Decimal Binary Normalized Sign Biased Mantissa Exponent +12 1100 1.1x23 0 1000 0010 1000000 00000000 00000000 -12 1100 -1.1x23 1 1000 0010 1000000 00000000 00000000 +100 1100100 1.1001x26 0 1000 0101 1001000 00000000 00000000 -1.75 1.11 -1.11x20 1 0111 1111 1100000 00000000 00000000 +0.25 .01 1.0x2-2 0 0111 1101 0000000 00000000 00000000 +0.0 0 0 0 0000 0000 0000000 00000000 00000000 Table 2.2. Real number format. In a condensed form a floating point number is represented as followed: (-1)Sx(1 + significand )x2E or (-1)Sx(1 + significand)x2(Biased Exponent -bias) 0≤ significant (mantissa) <1 Single Precision Format (32-bit): Sign Biased Exponent (8-bit) Mantissa (23-bit) Double Precision Format (64-bit): 8 Sign Biased Exponent (11-bit) Mantissa (52-bit) Addition of two floating-point numbers - Compare the exponent of N1 = E1M1 and N2 = E2M2 and identify the smaller one, say E2. This comparison can be implemented by special hardware (a comparator) or by a trial subtraction of the form E1-E2. - Equalize the exponents of N1 and of N2 by right-shifting E1-E2 places, the mantissa M2 of the number with the smaller exponent. - Add the mantissas to obtain M3. - If the result is not normalized, then right-shift M3 one place and add one to the exponent E1 to obtain the exponent E3 of the result N3 = E3M3 3. Codes Data used by digital systems required a precise format. Data may appear as ASCII, BCD or in other formats presented previously. 3.3. ASCII Code ASCII (American Standard Code for Information Interchange) is used to represent alphanumeric characters. The standard ASCII code is 7-bit code with the eighth and most significant bit used to hold parity in some systems (table 2.2) Firs t 0X X0 X1 X2 X3 X4 X5 X6 NUL SOH STX ETX EOT ENQ 1X DLE DC1 DC2 DC3 DC4 NA K % 5 E U e e AC K SYN Second X7 X* X9 XA XB XC XD XE XF BEL BS HT LF VT FF CR SO SI ETB CA N ( 8 H X h x EM SUB ESC FS GS RS US , < L \ l | = M ] m } . > N ^ n ~ / ? O o ░ SP ! “ # $ & ’ ) * + 2X 0 1 2 3 4 6 7 9 : ; 3X @ A B C D F G I J K 4X P Q R S T V W Y Z [ 5X ‘ a b c d f g i j k 6X p q r s t v w y z { 7X Table 2.2. ASCII code The ASCII control characters, also listed in table 2.2, perform control functions in a including clear screen, backspace, line feed, etc. computer system, 9 3.4. Unicode Code (extract from the unicode web page, http://unicode.org) "What is Unicode? Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Unicode is changing all that! Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends. Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption." 10 3.5. BCD Code Binary-coded decimal (BCD) information is stored in either packed or unpacked forms. Packed BCD data are stored as two digits per byte and unpacked BCD data are stored as one digit per byte. The range of a BCD digit extends from (0000)2 to (1001)2, 0-9 decimal. Table 2.3 shows some decimal numbers converted to both packed and unpacked BCD forms. Decimal 12 Packed 0001 0010 Unpacked 0000 0001 0000 0010 623 0000 0110 0010 0011 0000 0110 0000 0010 0000 0011 910 0000 1001 0001 0000 0000 1001 0000 0001 0000 0000 Table 2.3. Packed and unpacked BCD data. 3.6. Other codes: Gray code, universal product code, and error-detecting code (Wakerly, pp. 51 - 65) 11