Download Representation of Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of logarithms wikipedia , lookup

Large numbers wikipedia , lookup

Real number wikipedia , lookup

Location arithmetic wikipedia , lookup

Approximations of π wikipedia , lookup

Arithmetic wikipedia , lookup

Addition wikipedia , lookup

Elementary mathematics wikipedia , lookup

Positional notation wikipedia , lookup

Transcript
Topic 2:
Representation of Information
This chapter deals with different aspects of information representation which are commonly used
in digital systems. Depending of the type of application, we can see information as sets of bytes organized
in a particular fashion to express an entity, which can be a signal, an image, a file or a series of numbers.
Objective: After this Lesson you will master different systems of numeration (binary, octal, decimal,
hexadecimal), and fixed- and floating-point representations. You will also be able to analyze some basic
codes such as ASCII and BCD.
1.
Information
Information is represented in a digital system by means of binary sequences (0 and 1), which are
organized into words. A word is a unit of information of fixed length n. A sequence of 8 bits is called byte.
Commonly word sizes are multiples of 8. Figure 2.1 shows some basic types of information represented in
a computer.
Information
Data
Numbers
Fixed-point Number
Instructions (Assembly code,
Java, C/C++, JPEG, MP3)
Non-numerical data (ASCII Characters)
Floating-point Number
Unsigned fixed-point number
Signed fixed-point number
Figure 2.1: Some basic information types.
There is a fundamental division of information into instructions (control information covered in
Processor Architecture) and data. Data may be further subdivided into numerical and non-numerical.
In view of the importance of numerical computation, a great deal attention has been given to the
development of number codes. Two major formats have evolved, fixed-point and floating-point. The binary
fixed-point is of the form:
1
X  an1an2 ...........a1a0  a1a2 ......am
Where ai  {0, 1}.
A floating-point number, on the other hand, consists of a pair of fixed-point number (M, E), where M
is the mantissa and E the exponent, which represents the number MxB E, where B is a predetermined base or
radix. Floating-point notation corresponds to the so-called scientific notation.
A variety of codes are used to represent fixed-point numbers. These codes may be classified as binary, e.g.,
two's complement, or decimal, BCD (binary code decimal).
Non-numerical data usually take the form of variable-length character strings encoded in ASCII or similar
codes (Unicode).
The Encoding describes the process of assigning representations to information. Choosing an appropriate
and efficient encoding is a real engineering challenge. The encoding process impacts design at many
levels:
- Mechanism (devices, # of components used).
- Efficiency (bits used).
- Reliability (noise).
- Security (encryption).
2.
Representation of Numbers (Encoding Numbers)
In selecting a number representation to be used in a digital system, the following factors should be taken
into account:
- The type of numbers to be represented, e.g., integers, real numbers, complex numbers.
- The range of values likely to be encountered.
- The precision of the number, which refers to the maximum accuracy of the representation.
- The cost of the hardware required to store and process the numbers.
Two principal number formats are fixed-point and floating -point. In general, fixed-point formats allow a
limited range of values. Floating-point numbers, on the other hand, allow a much larger range of values.
The four major systems of numeration are:
a) Decimal Numbers: Base = 10, ten unique digits (0,1,2,3,4,5,6,7,8,9)
b) Binary Numbers: Base = 2, two unique digits (0 and1), binary digit = “bit”
c) Octal Numbers: Base =8, eight unique digits (0,1,2,3,4,5,6,7)
d) Hexadecimal Numbers: Base 16, sixteen unique digits (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F)
In general the representation of a positive fixed-point number X in base (radix) r uses (n+m) digits or less,
and it is expressed as follows:
X 
n
a r
i  m
i
i
 a n 1 r n 1  a n  2 r n  2  ..........  a1 r 1  a 0 r 0  a 1 r 1  a  2 r  2 .........  a  m r  m
ai  0,.....r  1
or
2
X  an1an2 ...........a1a0  a1a2 ......am
Binary (r =2), decimal (r = 10), octal (r = 8) and hexadecimal (r =16)
an-1an-2…..a1a0 represents the integer part of X , and a-1a-2…a-m the fraction part.
An integer INT is represented in binary with n bits as a n-1an-2…..a1a0 where an-1 is the most significant bit
(MSB) and a0 the least significant bit (LSB).
2.1. Fixed-point Numbers
The fixed-point format is derived from ordinary (decimal) representation of a number as a
sequence of digits separated by a decimal point. In general, we can assign weights of the form of r i, where r
is the radix or base of the number system, to each digit. The most fundamental number representation used
in computers employs a positional notation with 2 as the radix. A binary sequence of the form
bn…b3b2b1b0.b-1b-2b-3……b-m represents the number: 10102 denotes the binary equivalent of the decimal
number 10, 3.3 is an example of fixed-point number, its binary form is 11.01001…. 2.
The following table summarizes methods for converting among the most common radices:
Table 2.1: Conversion methods for common radices (bases).
All the notions we have presented are related to unsigned binary numbers. Several distinct methods are
used to represent signed (positive and negative) numbers.
3
Sign Considerations
Let us consider n-bit word to be used to contain a signed binary number. One bit is reserved to
represent the sign of the number, while the remaining bits indicate its magnitude. Generally the sign is
placed in the leftmost position, and the values 0 and 1 are used to denote plus or minus respectively. Thus
we obtain the format: xnxn-1………x2x1x0. The precision allowed by this format is n-1 bits, which is
equivalent to (n-1)/log210 decimal digits. Using n-bit integer format, we can represents all integers N with
magnitude /N/ in the range 0  /N/  2n-1. This number code is called sign-magnitude representation.
Several number codes have been devised which use the same representation for positive numbers as the
sign-magnitude code but represent negative numbers in various different ways. For example, in onescomplement code , -X is denoted by X, the bitwise logical complement of X. In the twos-complement code,
-X is formed by adding 1 to the least significant bit of X and ignoring any carry bit generated from the
most significant (sign) position. If X is an n-bit binary fraction, this may be expressed as follows:
-X = xnxn-1xn-2…….x2x1x0 + 0.00….01 (Modulo 2), where the use of modulo-2 addition corresponds to
ignoring carries from the sign position. If X is an integer, that we have
-X = xnxn-1xn-2…….x2x1x0. + 1 (Modulo 2n).
In each complement codes, xn retains its role as the sign bit, but the remaining bits no longer form a simple
positional code when the number is negative.
Some 8-bit two’s complement representations are shown below:
4
Figure 2.2, illustrates how integers are represented using each of the three codes discussed above when n =
4.
Decimal
Sign magnitude
Ones complement
Twos complement
+7
0111
0111
0111
+6
0110
0110
0110
+5
0101
0101
0101
+4
0100
0100
0100
+3
0011
0011
0011
+2
0010
0010
0010
+1
0001
0001
0001
+0
0000
0000
0000
-0
1000
1111
0000
-1
1001
1110
1111
-2
1010
1101
1110
-3
1011
1100
1101
-4
1100
1011
1100
-5
1101
1010
1011
-6
1110
1001
1010
-7
1111
1000
1001
Figure 2.2. Representation of Numbers using different binary codes.
For 2’s complement signed numbers, the range of values for n-bit numbers is: -(2n-1) to + (2n-1 – 1)
Let us Summarize about Complements
They are commonly used to represent negative numbers and to perform subtraction. Two types of
complements can be applied to any base:
General
Base 2
Base 10
(r-1)’s complement
1’s complement
9’s complement
r’s complement
2’s complement
10’s complement
r = base
A general number X might be represented as:
X  an1an2 ...........a1a0  a1a2 ......am
(n = number of digits before the decimal point and m = number of digit after the decimal point)
(r-1)’s complement of X = rn – r-m –X or (rn - 1 – X if m =0)
r’s complement of X = rn –X = (r-1)’s complement + r-m or ((r-1)’s complement + 1 if m =0)
Practical techniques for complements
9’s complement: subtract each digit from 9
10’s complement: 9’s complement +1 if m = 0
1’s complement: replace each o by 1 and replace each 1 by 0
2’s complement: 1’s complement +1 if m = 0
Hexadecimal and Octal Bases
In order to represent a number in a binary form we need two digit (0, 1), in some cases the number of 0 and
1 in the binary representation is very large. It's useful to have condensed forms of representation using a
5
multiple of two of the digits related to the binary form. With base octal (8 = 2 3) we use 3 digits, and 4 digits
for the hexadecimal base (16 = 24).
Examples:
1000 1100 1000 1111 11002 = 8C8FCH = 21443748
2.2. Conversion Algorithms
a) Converting a binary integer to decimal format (procedure BINDECi)
-
Let N2 = bn-1bn-2……b0 be the binary integer to be converted to the decimal form N10. Set N10
to an initial value of zero
-
Scan N2 from left to right, and for each bit bi in turn, compute 2xN 10 + bi and make this the
new value of N10. The final value of N10 obtained after n steps is the desired result.
b) Converting a mixed binary number N2 that has m fraction bits and n-m integer bits to
decimal format (procedure BINDECm)
c)
-
Multiply the given number N2 by scale factor 2m to change it into a binary inter N’ 2.
-
Use the integer conversion procedure BINDECi to convert N’2 to the decimal dorm N’10.
-
Finally, multiply N’10 by scale factor 2-m to obtain the desired (mixed) decimal result N 10
Converting an integer from Base X to base Y (will be discussed in class)
2.3. Addition and Subtraction of Nondecimal Numbers
The following table shows the addition and subtraction for of binary digits.
Examples of additions and subtractions:
6
X= minuend; Y = subtrahend; X – Y = difference.
2.4. Two’s-complement additions and subtractions
The rules behind arithmetic operations on binary numbers are summarized as follows:
Addition (Textbook page 39)
Overflow (out of the range –2n-1 to 2n-1 –1, textbook page 41)
Subtraction (textbook page 43).
2.5. Floating-point Numbers
The previous paragraph deals with fixed point numbers. For very large numbers such as the Number of
Avogadro (6.022 x 1023) and very small numbers such as the charge of an electron (-1.6 x 10 –19), there is a
need to define another standard. Floating-point notation has been proposed for this class of numbers.
A real number, or as it is often called, a floating-point number, contains two parts: a mantissa (significand,
or fraction) and an exponent, the base is 2. Figure 1.3 depicts both the 4- and 8-byte forms of real numbers
as they are store in some microcomputer systems. Note that 4-byte real number is called single-precision
and the 8-byte form is called double precision. The form presented here is the same form specified by the
IEEE standard, IEEE-754, version 10.0.
7
S
Exponent
Significand
Figure 2.3. The floating-point numbers: single-precision using a bias of 7FH, and double-precision using a
bias of 3FFH.
The exponent is stored as a biased exponent. With the single-precision form of the real number,
the bias is 127 (7FH); with the double-precision form, it is 1023 (3FFH). The bias adds to the exponent
before is stored it to the exponent portion of the floating-point number. An exponent of 23, represented as a
biased of 127 + 3 or 130 (82H) in single-precision form or as 1026 (402H) in the double-precision form.
There are two exceptions to the rules for floating-point numbers. The number 0.0 is stored as all zero. The
number infinity is stored as all ones in the exponent and all zeros in the mantissa.
Table 2.2 shows numbers defined in real number format.
Decimal
Binary
Normalized
Sign
Biased
Mantissa
Exponent
+12
1100
1.1x23
0
1000 0010
1000000 00000000 00000000
-12
1100
-1.1x23
1
1000 0010
1000000 00000000 00000000
+100
1100100
1.1001x26
0
1000 0101
1001000 00000000 00000000
-1.75
1.11
-1.11x20
1
0111 1111
1100000 00000000 00000000
+0.25
.01
1.0x2-2
0
0111 1101
0000000 00000000 00000000
+0.0
0
0
0
0000 0000
0000000 00000000 00000000
Table 2.2. Real number format.
In a condensed form a floating point number is represented as followed:
(-1)Sx(1 + significand )x2E or (-1)Sx(1 + significand)x2(Biased Exponent -bias)
0≤ significant (mantissa) <1
Single Precision Format (32-bit):
Sign
Biased Exponent (8-bit)
Mantissa (23-bit)
Double Precision Format (64-bit):
8
Sign
Biased Exponent (11-bit)
Mantissa (52-bit)
Addition of two floating-point numbers
-
Compare the exponent of N1 = E1M1 and N2 = E2M2 and identify the smaller one, say E2. This
comparison can be implemented by special hardware (a comparator) or by a trial subtraction
of the form E1-E2.
-
Equalize the exponents of N1 and of N2 by right-shifting E1-E2 places, the mantissa M2 of the
number with the smaller exponent.
-
Add the mantissas to obtain M3.
-
If the result is not normalized, then right-shift M3 one place and add one to the exponent E1 to
obtain the exponent E3 of the result N3 = E3M3
3.
Codes
Data used by digital systems required a precise format. Data may appear as ASCII, BCD or in
other formats presented previously.
3.3. ASCII Code
ASCII (American Standard Code for Information Interchange) is used to represent alphanumeric
characters. The standard ASCII code is 7-bit code with the eighth and most significant bit used to hold
parity in some systems (table 2.2)
Firs
t
0X
X0
X1
X2
X3
X4
X5
X6
NUL
SOH
STX
ETX
EOT
ENQ
1X
DLE
DC1
DC2
DC3
DC4
NA
K
%
5
E
U
e
e
AC
K
SYN
Second
X7
X*
X9
XA
XB
XC
XD
XE
XF
BEL
BS
HT
LF
VT
FF
CR
SO
SI
ETB
CA
N
(
8
H
X
h
x
EM
SUB
ESC
FS
GS
RS
US
,
<
L
\
l
|
=
M
]
m
}
.
>
N
^
n
~
/
?
O
o
░
SP
!
“
#
$
&
’
)
*
+
2X
0
1
2
3
4
6
7
9
:
;
3X
@
A
B
C
D
F
G
I
J
K
4X
P
Q
R
S
T
V
W
Y
Z
[
5X
‘
a
b
c
d
f
g
i
j
k
6X
p
q
r
s
t
v
w
y
z
{
7X
Table 2.2. ASCII code
The ASCII control characters, also listed in table 2.2, perform control functions in a
including clear screen, backspace, line feed, etc.
computer system,
9
3.4. Unicode Code (extract from the unicode web page, http://unicode.org)
"What is Unicode?
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other
characters by assigning a number for each one. Before Unicode was invented,
there were hundreds of different encoding systems for assigning these numbers.
No single encoding could contain enough characters: for example, the European
Union alone requires several different encodings to cover all its languages. Even
for a single language like English no single encoding was adequate for all the
letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can
use the same number for two different characters, or use different numbers for the
same character. Any given computer (especially servers) needs to support many
different encodings; yet whenever data is passed between different encodings or
platforms, that data always runs the risk of corruption.
Unicode is changing all that!
Unicode provides a unique number for every character, no matter what the
platform, no matter what the program, no matter what the language. The Unicode
Standard has been adopted by such industry leaders as Apple, HP, IBM,
JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others.
Unicode is required by modern standards such as XML, Java, ECMAScript
(JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement
ISO/IEC 10646. It is supported in many operating systems, all modern browsers,
and many other products. The emergence of the Unicode Standard, and the
availability of tools supporting it, are among the most significant recent global
software technology trends.
Incorporating Unicode into client-server or multi-tiered applications and websites
offers significant cost savings over the use of legacy character sets. Unicode
enables a single software product or a single website to be targeted across
multiple platforms, languages and countries without re-engineering. It allows data
to be transported through many different systems without corruption."
10
3.5. BCD Code
Binary-coded decimal (BCD) information is stored in either packed or unpacked forms. Packed
BCD data are stored as two digits per byte and unpacked BCD data are stored as one digit per byte. The
range of a BCD digit extends from (0000)2 to (1001)2, 0-9 decimal. Table 2.3 shows some decimal numbers
converted to both packed and unpacked BCD forms.
Decimal
12
Packed
0001 0010
Unpacked
0000 0001 0000 0010
623
0000 0110 0010 0011
0000 0110 0000 0010 0000 0011
910
0000 1001 0001 0000
0000 1001 0000 0001 0000 0000
Table 2.3. Packed and unpacked BCD data.
3.6. Other codes:
Gray code, universal product code, and error-detecting code (Wakerly, pp. 51 - 65)
11