Download Bits and Bytes and Hex

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer science wikipedia , lookup

Computer program wikipedia , lookup

Manchester Mark 1 wikipedia , lookup

Transcript
2
Bits and Bytes and Hex
In which our hero learns that there are 10 types of people in the world; those
who understand binary numbers, and those who don’t.
2.1 The Astonishing Hypothesis
In the field of neuroscience, there is something called the Astonishing Hypothesis [C94], and it is this: that every feeling and thought
we have, in fact the very essence of consciousness, is the result of a
biological process. Depending on who you are or what your background is, this is either (a) obvious, (b) mind blowing, (c) sacrilegious, or (d) some combination of these choices.
In the world of Computer Science, perhaps we have nothing that
is quite so mind blowing, but there is another astonishing hypothesis
of sorts: that everything in a computer system, whether it be a word
document, jpeg image, downloaded (i.e., stolen) movie, audio file,
video game, facebook web page, really anything you just name it, is
represented in digital form simply by a collection of ones and zeroes.
Everything!
We call this most basic unit of information, the one-or-zero, a
binary digit, or, much more concisely, a bit. So if you remember
nothing else about computer systems, and indeed, stop reading right
now, please do remember this: it’s all bits. There is nothing else used
to represent information of any kind inside a computer system. Just
ones and zeroes – astounding!
This fact, that it’s all ones and zeroes, implies something for you,
the reader: that you have to be somewhat fluent in binary to be good
at low-level computer programming, some of which we’ll be doing
in this book. Hopefully, you already have that fluency; if not, we’ll
do a little bit now to give you a sense of what we mean.
1
2
B ITS AND B YTES AND H EX
2.2
Binary Basics
We are assuming here you are pretty familiar with our old pal,
decimal numbers. As an example, a 4-digit decimal number can be
represented as follows:
D 3 D 2 D1 D 0
Because it’s a decimal number, we know each digit can be any of
the following: 0, 1, 2, ..., 9. We also know that, crucially, the position of each digit is important in determining its value. Specifically,
we know in this case that the leftmost digit D3 (the most-significant
digit) tells us how many thousands are in the number (103 ), the
next digit D2 tells us how many hundreds (102 ), D1 how many tens
(101 ), and D0 , the least-significant digit, how many ones (100 ). This
compact and tidy representation enables you, the burgeoning mathematician, to deduce that 9142 refers to number nine thousand one
hundred and forty two, or, more formally:
D3 × 103 + D2 × 102 + D1 × 101 + D0 × 100
Binary numbers are really no different. In fact, the only difference
is that each digit can only be a zero or a one – that’s it! Let us examine
the same abstract number, but this time imagine that it’s in binary:
B3 B2 B1 B0
If we are just interpreting this as a postive integer, we could compute its value much like we computed the value of the decimal number above. Specifically, we know that a binary number has the following value:
B3 × 23 + B2 × 22 + B1 × 21 + B0 × 20
Thus, if someone shows you the number 0110, you immediately
can compute its decimal value by plugging it into the equation above:
0 × 23 + 1 × 22 + 1 × 21 + 0 × 20 = 6
Converting a decimal number to binary form, in your head or on
paper, is slightly more challenging. Let’s look at an example to make
further sense of this. Imagine we are trying to represent the decimal
number 13 in binary form, as per the above. How would we go about
doing that?
One simple way is as follows. Start with the most significant bit
(B3 ), and think about the number you have to represent (13). If we
put a 1 in B3 , this means we are adding 23 to our total, or 8. We thus
clearly need this 8 (as it is less than 13), and thus B3 = 1. We now
subtract 8 from 13 to tell us what value we need to represent in the
T HE C
IN THE
M ACHINE
( V 0.1)
B ITS AND B YTES AND H EX
3
remaining bits, and get 5, and repeat this process. Is 4 (22 ) less than
our remaining total of 5? As it is, we mark B2 = 1 and continue.
If we subtract 4 from our remaining total, we end up with 1, and
you then know, if you follow this same process, that you should set
B1 = 0 and B0 = 1. Thus, our final binary representation of 13 is
1101.
2.3 N-Digit Numbers
Although it may now be obvious, it is probably worth stating
here that an N -digit binary number can represent 2N different values. Hopefully this is intuitive to you; after all, you probably already
know that with decimal numbers, an N digit number can be used
to represent 10N different values (e.g., a 3-digit decimal number can
represent numbers from 000 through 999).
In the encoding above, we specifically show how to represent
numbers from 0 to 2N −1 . Of course, we may wish to use binary
digits to represent negative numbers too (e.g., −12), or even fractional values (e.g., 43.45). We’ll show how computers do this later,
in subsequent chapters, so if that sounds exciting, just wait! And if it
doesn’t sound exciting, well, you are probably a normal person.
2.4 From Binary To Hexadecimal And Back Again
Binary is great for the computer, but for humans, it presents at
least one problem: it is too verbose. In later parts of your learning
about CS, you may need to represent 32-bit or 64-bit values, and doing so in binary is cumbersome and error-prone. Here, for example,
is one such 32-bit number:
1101 1110 1010 1101 1011 1110 1110 1111
There must be a better way! And it turns out there is. The way
that we will often use is called hexadecimal format, and it is quite
simple to understand. In hexadecimal (or hex, for short), a single
digit can represent a number from 0 through 15 (sixteen total possibilities, and hence the name hexadecimal).
Of course, to represent a number like 10, or 12, or 15, with a single
digit means we have a problem: our usual numbers of 0 through 9
aren’t enough. Simply put: we need some more characters. So which
ones?
Interestingly, there was originally some debate about this topic,
with a number of different possibilities proposed [W17a]. However,
many pragmatic computer designers decided it was easiest to go
with some characters familiar to most people, even if they were familiar not as numbers but as letters. And thus we arrive with today’s
A RPACI -D USSEAU
T HE C
IN THE
M ACHINE
( V 0.1)
4
B ITS AND B YTES AND H EX
Decimal
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Binary
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Hex
0x0
0x1
0x2
0x3
0x4
0x5
0x6
0x7
0x8
0x9
0xA or 0xa
0xB or 0xb
0xC or 0xc
0xD or 0xd
0xE or 0xe
0xF or 0xf
Table 2.1: Translating Decimal to Binary to Hex
common representation: the letter A for 10, B for 11, C for 12, D for
13, E for 14, and F for 15. Lowercase a through f are also acceptable.
With hex format, we can represent large numbers compactly. Take
the example above; instead of using 32 characters to represent the
number, we can use just 8. The translation results in the hex number
DEADBEEF. Yes, we can even make numbers that look like words
[W17b]. The example here, DEADBEEF, is actually a commonly used
hex string in some computer systems for debugging purposes1 .
One last point about hex numbers. When we see them in C programs (as we soon will), we have to annotate them slightly so that the
language can differentiate between hex and regular decimal numbers. Why? Well, imagine if you had the number 15 written in your
program. Is that decimal 15, or hex 15 (which is equal to 21)? Thus,
as shown in Figure 2.1, when we show a hex number, we will always
prepend it with the characters “0x”. When talking about DEADBEEF
above, we should have been saying 0xDEADBEEF (even though that
is less fun).
2.5
Making A Byte Out Of Bits
Because bits are often moved around and accessed in larger groups
(e.g., as we’ll later see, a C integer is typically 32 bits, and a C character is 8 bits), we have developed some names for larger groups of
bits. One particularly important one is called a byte; it (almost always) represents eight bits2 .
1
For example, Sun operating systems used to fill freed memory with this value;
if you then ever saw freed memory without this value inside of it, you would know
something bad was happening elsewhere in the program.
2
Some older machines had different-sized bytes; but in modern times, a byte is eight
bits, so we’ll just assume that from now on.
T HE C
IN THE
M ACHINE
( V 0.1)
B ITS AND B YTES AND H EX
5
There are some other terms people use to talk about even larger
collections of bits. For example, sometimes people refer to a word,
which is just some number of bytes, usually four or eight (or 32 or 64
bits, respectively). As related to this, sometimes you’ll see a double
word, which is just (obviously) twice the word size. And sometimes
you’ll even see references to something that is half the size of a byte
(i.e., four bits): some clever person decided to call this a nibble.
2.6 Why Binary?
Before closing, let’s answer one last question: why use binary?
After all, humans think in decimal, so shouldn’t computer systems
“think” in decimal too? As it turns out, there are excellent practical
reasons for binary’s dominance within computer systems [B15].
The primary reason for binary is that it enables us to build reliable computer circuits. At a very low level, computers must use
analog (continuous) values from the real world to represent digital
(discrete) values found in the computer. For example, we might want
to use the current voltage level of a circuit to store a value. If we use
binary, we can have a very simple differentiator between 0 and 1: if
there is a low voltage, consider the value to be 0, and if there is a high
voltage, consider it to be 1. If we instead try to store a decimal number in such a circuit, the circuit would have to be able to differentiate
between ten different voltage levels, which is quite a bit harder3 . The
simplest and most robust approach is to encode 1 or 0 (on or off) and
then use collections of said circuits to represent larger numbers.
2.7 Summary
We’ve introduced one of the most basic aspects of computer systems: the notion of a binary digit, or bit. As we’ve said throughout
the chapter, bits are how all information is represented within computer systems; any program you write, any document you store, any
movie you steal, any music you create, if you are doing it on a computer, the computer is storing it in digital form as a series of 0s and
1s. To become a low-level systems expert, you thus too become an
expert with this simplest of number systems.
3
As a poor analogy: it’s much easier to determine whether or not someone punched
you, rather than determine exactly how hard they punched you. Of course, why do you
keep getting punched? Bob and weave, friend, and you will be much less of a target.
A RPACI -D USSEAU
T HE C
IN THE
M ACHINE
( V 0.1)
6
B ITS AND B YTES AND H EX
References
[B15] “Why Computers Use Binary Numbers”
Bob Brown
http://ksuweb.kennesaw.edu/faculty/rbrow211/papers/why binary.html
A nice succinct explanation of why we use binary numbers so commonly in building computer
systems.
[C94] “The Astonishing Hypothesis”
Francis Crick
Scribner Books (Reprint), 1995
Crick of course is famous for his work in helping to discover the physical nature of DNA, the
double helix (with James Watson, though Rosalind Franklin did much of the work and received
little of the credit, sadly). Crick then spent many years thinking about the nature of human
consciousness, which is still one of the most puzzling and fascinating aspects of biology and
modern neuroscience.
[W17a] “Hexadecimal”
https://en.wikipedia.org/wiki/Hexadecimal
A good background on hex format over the years, including different suggestions for characters.
It’s probably a good thing we didn’t end up stuck with the characters K, S, N, J, F, and L like they
used on the old Illiac I computer.
[W17b] “Hexspeak”
https://en.wikipedia.org/wiki/Hexspeak
A fun wiki page about different words created in hex and then sometimes used in computer systems. Well, mostly fun, and occasionally offensive, so reader beware.
T HE C
IN THE
M ACHINE
( V 0.1)