Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2 Bits and Bytes and Hex In which our hero learns that there are 10 types of people in the world; those who understand binary numbers, and those who don’t. 2.1 The Astonishing Hypothesis In the field of neuroscience, there is something called the Astonishing Hypothesis [C94], and it is this: that every feeling and thought we have, in fact the very essence of consciousness, is the result of a biological process. Depending on who you are or what your background is, this is either (a) obvious, (b) mind blowing, (c) sacrilegious, or (d) some combination of these choices. In the world of Computer Science, perhaps we have nothing that is quite so mind blowing, but there is another astonishing hypothesis of sorts: that everything in a computer system, whether it be a word document, jpeg image, downloaded (i.e., stolen) movie, audio file, video game, facebook web page, really anything you just name it, is represented in digital form simply by a collection of ones and zeroes. Everything! We call this most basic unit of information, the one-or-zero, a binary digit, or, much more concisely, a bit. So if you remember nothing else about computer systems, and indeed, stop reading right now, please do remember this: it’s all bits. There is nothing else used to represent information of any kind inside a computer system. Just ones and zeroes – astounding! This fact, that it’s all ones and zeroes, implies something for you, the reader: that you have to be somewhat fluent in binary to be good at low-level computer programming, some of which we’ll be doing in this book. Hopefully, you already have that fluency; if not, we’ll do a little bit now to give you a sense of what we mean. 1 2 B ITS AND B YTES AND H EX 2.2 Binary Basics We are assuming here you are pretty familiar with our old pal, decimal numbers. As an example, a 4-digit decimal number can be represented as follows: D 3 D 2 D1 D 0 Because it’s a decimal number, we know each digit can be any of the following: 0, 1, 2, ..., 9. We also know that, crucially, the position of each digit is important in determining its value. Specifically, we know in this case that the leftmost digit D3 (the most-significant digit) tells us how many thousands are in the number (103 ), the next digit D2 tells us how many hundreds (102 ), D1 how many tens (101 ), and D0 , the least-significant digit, how many ones (100 ). This compact and tidy representation enables you, the burgeoning mathematician, to deduce that 9142 refers to number nine thousand one hundred and forty two, or, more formally: D3 × 103 + D2 × 102 + D1 × 101 + D0 × 100 Binary numbers are really no different. In fact, the only difference is that each digit can only be a zero or a one – that’s it! Let us examine the same abstract number, but this time imagine that it’s in binary: B3 B2 B1 B0 If we are just interpreting this as a postive integer, we could compute its value much like we computed the value of the decimal number above. Specifically, we know that a binary number has the following value: B3 × 23 + B2 × 22 + B1 × 21 + B0 × 20 Thus, if someone shows you the number 0110, you immediately can compute its decimal value by plugging it into the equation above: 0 × 23 + 1 × 22 + 1 × 21 + 0 × 20 = 6 Converting a decimal number to binary form, in your head or on paper, is slightly more challenging. Let’s look at an example to make further sense of this. Imagine we are trying to represent the decimal number 13 in binary form, as per the above. How would we go about doing that? One simple way is as follows. Start with the most significant bit (B3 ), and think about the number you have to represent (13). If we put a 1 in B3 , this means we are adding 23 to our total, or 8. We thus clearly need this 8 (as it is less than 13), and thus B3 = 1. We now subtract 8 from 13 to tell us what value we need to represent in the T HE C IN THE M ACHINE ( V 0.1) B ITS AND B YTES AND H EX 3 remaining bits, and get 5, and repeat this process. Is 4 (22 ) less than our remaining total of 5? As it is, we mark B2 = 1 and continue. If we subtract 4 from our remaining total, we end up with 1, and you then know, if you follow this same process, that you should set B1 = 0 and B0 = 1. Thus, our final binary representation of 13 is 1101. 2.3 N-Digit Numbers Although it may now be obvious, it is probably worth stating here that an N -digit binary number can represent 2N different values. Hopefully this is intuitive to you; after all, you probably already know that with decimal numbers, an N digit number can be used to represent 10N different values (e.g., a 3-digit decimal number can represent numbers from 000 through 999). In the encoding above, we specifically show how to represent numbers from 0 to 2N −1 . Of course, we may wish to use binary digits to represent negative numbers too (e.g., −12), or even fractional values (e.g., 43.45). We’ll show how computers do this later, in subsequent chapters, so if that sounds exciting, just wait! And if it doesn’t sound exciting, well, you are probably a normal person. 2.4 From Binary To Hexadecimal And Back Again Binary is great for the computer, but for humans, it presents at least one problem: it is too verbose. In later parts of your learning about CS, you may need to represent 32-bit or 64-bit values, and doing so in binary is cumbersome and error-prone. Here, for example, is one such 32-bit number: 1101 1110 1010 1101 1011 1110 1110 1111 There must be a better way! And it turns out there is. The way that we will often use is called hexadecimal format, and it is quite simple to understand. In hexadecimal (or hex, for short), a single digit can represent a number from 0 through 15 (sixteen total possibilities, and hence the name hexadecimal). Of course, to represent a number like 10, or 12, or 15, with a single digit means we have a problem: our usual numbers of 0 through 9 aren’t enough. Simply put: we need some more characters. So which ones? Interestingly, there was originally some debate about this topic, with a number of different possibilities proposed [W17a]. However, many pragmatic computer designers decided it was easiest to go with some characters familiar to most people, even if they were familiar not as numbers but as letters. And thus we arrive with today’s A RPACI -D USSEAU T HE C IN THE M ACHINE ( V 0.1) 4 B ITS AND B YTES AND H EX Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Hex 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA or 0xa 0xB or 0xb 0xC or 0xc 0xD or 0xd 0xE or 0xe 0xF or 0xf Table 2.1: Translating Decimal to Binary to Hex common representation: the letter A for 10, B for 11, C for 12, D for 13, E for 14, and F for 15. Lowercase a through f are also acceptable. With hex format, we can represent large numbers compactly. Take the example above; instead of using 32 characters to represent the number, we can use just 8. The translation results in the hex number DEADBEEF. Yes, we can even make numbers that look like words [W17b]. The example here, DEADBEEF, is actually a commonly used hex string in some computer systems for debugging purposes1 . One last point about hex numbers. When we see them in C programs (as we soon will), we have to annotate them slightly so that the language can differentiate between hex and regular decimal numbers. Why? Well, imagine if you had the number 15 written in your program. Is that decimal 15, or hex 15 (which is equal to 21)? Thus, as shown in Figure 2.1, when we show a hex number, we will always prepend it with the characters “0x”. When talking about DEADBEEF above, we should have been saying 0xDEADBEEF (even though that is less fun). 2.5 Making A Byte Out Of Bits Because bits are often moved around and accessed in larger groups (e.g., as we’ll later see, a C integer is typically 32 bits, and a C character is 8 bits), we have developed some names for larger groups of bits. One particularly important one is called a byte; it (almost always) represents eight bits2 . 1 For example, Sun operating systems used to fill freed memory with this value; if you then ever saw freed memory without this value inside of it, you would know something bad was happening elsewhere in the program. 2 Some older machines had different-sized bytes; but in modern times, a byte is eight bits, so we’ll just assume that from now on. T HE C IN THE M ACHINE ( V 0.1) B ITS AND B YTES AND H EX 5 There are some other terms people use to talk about even larger collections of bits. For example, sometimes people refer to a word, which is just some number of bytes, usually four or eight (or 32 or 64 bits, respectively). As related to this, sometimes you’ll see a double word, which is just (obviously) twice the word size. And sometimes you’ll even see references to something that is half the size of a byte (i.e., four bits): some clever person decided to call this a nibble. 2.6 Why Binary? Before closing, let’s answer one last question: why use binary? After all, humans think in decimal, so shouldn’t computer systems “think” in decimal too? As it turns out, there are excellent practical reasons for binary’s dominance within computer systems [B15]. The primary reason for binary is that it enables us to build reliable computer circuits. At a very low level, computers must use analog (continuous) values from the real world to represent digital (discrete) values found in the computer. For example, we might want to use the current voltage level of a circuit to store a value. If we use binary, we can have a very simple differentiator between 0 and 1: if there is a low voltage, consider the value to be 0, and if there is a high voltage, consider it to be 1. If we instead try to store a decimal number in such a circuit, the circuit would have to be able to differentiate between ten different voltage levels, which is quite a bit harder3 . The simplest and most robust approach is to encode 1 or 0 (on or off) and then use collections of said circuits to represent larger numbers. 2.7 Summary We’ve introduced one of the most basic aspects of computer systems: the notion of a binary digit, or bit. As we’ve said throughout the chapter, bits are how all information is represented within computer systems; any program you write, any document you store, any movie you steal, any music you create, if you are doing it on a computer, the computer is storing it in digital form as a series of 0s and 1s. To become a low-level systems expert, you thus too become an expert with this simplest of number systems. 3 As a poor analogy: it’s much easier to determine whether or not someone punched you, rather than determine exactly how hard they punched you. Of course, why do you keep getting punched? Bob and weave, friend, and you will be much less of a target. A RPACI -D USSEAU T HE C IN THE M ACHINE ( V 0.1) 6 B ITS AND B YTES AND H EX References [B15] “Why Computers Use Binary Numbers” Bob Brown http://ksuweb.kennesaw.edu/faculty/rbrow211/papers/why binary.html A nice succinct explanation of why we use binary numbers so commonly in building computer systems. [C94] “The Astonishing Hypothesis” Francis Crick Scribner Books (Reprint), 1995 Crick of course is famous for his work in helping to discover the physical nature of DNA, the double helix (with James Watson, though Rosalind Franklin did much of the work and received little of the credit, sadly). Crick then spent many years thinking about the nature of human consciousness, which is still one of the most puzzling and fascinating aspects of biology and modern neuroscience. [W17a] “Hexadecimal” https://en.wikipedia.org/wiki/Hexadecimal A good background on hex format over the years, including different suggestions for characters. It’s probably a good thing we didn’t end up stuck with the characters K, S, N, J, F, and L like they used on the old Illiac I computer. [W17b] “Hexspeak” https://en.wikipedia.org/wiki/Hexspeak A fun wiki page about different words created in hex and then sometimes used in computer systems. Well, mostly fun, and occasionally offensive, so reader beware. T HE C IN THE M ACHINE ( V 0.1)