Download chapter1

Chapter 1 Introduction The theory of computation is the mathematical study of computing machines and their capabilities. We must therefore develop a model for the data that computers manipulate. We adopt the mathematically expedient choice of representing data by strings of symbols. 1.1 Alphabets and Languages Alphabet is a finite set of symbols and is denoted by . In fact, any object can be in an alphabet; from a formal point of view, an alphabet is simply a finite set of any sort. A string over an alphabet is a finite sequence of symbols from the alphabet. A string may have no symbols at all; in this case it is called the empty string and is denoted by . The length of a string is the number of symbols in string. We denote the length of a string w by |w|; thus |101| = 3 and || = 0. Two strings over the same alphabet can be combined to form a third by the operation of concatenation. The concatenation of strings x and y, written x  y or simply xy, is the string x followed by the string y. A string v is a substring of a string w if and only if there are strings x and y such that w = xvy. Both x and y could be , so every string is a substring of itself; and taking x = w and v = y = , we see that  is a substring of every string. If w = xv for some x, then v is a suffix of w; if w = vy for some y, then v is a prefix of w. For each string w and each natural number I, the string wI is defined as w0 = , the empty string; wI+1 = wI  w for each I  0. The reversal of a string w, denoted by wR, is the string “spelled backwards”: for example, reverseR = esrever. The set of all strings - including the empty string - over an alphabet  is denoted by *. Any set of strings over an alphabet  - that is, any subset of * - will be called a language. Thus *, , and  are languages. Since a language is simply a special kind of set, we can specify a finite language by listing all its strings. For example, { aba, cde, fg} is a language over {a,b, …, z}. However, most languages of interest are infinite, so that listing all the strings is not possible. Thus we can specify infinite languages by the scheme L = { w * : w has property P } eg. { w  {0,1}* : w has an equal number of 0’s and 1’s}, and { w * : w = wR }. If L1 and L2 are languages over , their concatenation is L = L1  L2, or simply L1L2, where L = { w : w = x  y for some x  L1 and y  L2}. Example,  = { 0, 1 }, L1 = { w  *: w has an even number of 0’s} and L2 = { w : w starts with a 0 and the rest of the symbols are 1’s}, then L1  L2 = { w  *: w has an odd number of 0’s}. Another language operation is the closure or Kleene star of a single language L, denoted * by L . L* is the set of all strings obtained by concatenating zero or more strings from L. (The concatenation of zero strings is , and the concatenation of one string is the string of itself.) Thus, L* = {w  *: w = w1  w2  …  wk, for some k  0 and some w1,…,wk  L}. Example, L = {01, 1, 100}, then 110001110011  *, since it is equal to 1100011001 1. Note! L* and * = {}. We write L+ for the set LL* . Equivalently, L+= {w : w = w1  w2  …  wk, for some k  1 and some w1,…,wk  L}. 1.2 FINITE AND INFINITE SETS We call two sets A and B equinumerous if there is a bijection (one-to-one and onto function) f : A  B. For example, { 8, red, {,b}} and {1, 2, 3} are equinumerous; let f(8) = 1 , f(red) = 2, f({,b}) = 3. In general, a set is finite if it is equinumerous with {1, 2, …, n} for some natural number n. (for n = 0 means  is finite) If A and {1, 2, …, n} are equinumerous, then we say that the cardinality of A (|A|) is n. A set is infinite if it is not finite. For example, the set  of natural numbers is infinite; so are sets such as the set  of integers, the set  of reals, and the set of perfect squares. A set is said to be countably infinite if it is equinumerous with , and countable if it is finite or countably infinite. A set that is not countable is uncountable. Several techniques are useful for showing a set A to be countably infinite. The most direct way is to exhibit a bijection between some countably infinite set B (not necessarily ) and A. Since B and , and B and A, are then known to be equinumerous, so  and A. If we can count-off the elements of the set S, and be sure that this counting doesn’t miss any elements, then S is countably infinite. Example of countably infinite set, a)  = { …, -3, -2, -1, 0, 1, 2, 3, … } is countable infinite.  = {0, 1, -1, 2, -2, 3, -3, … } = {1, 2, 3, 4, 5, 6, 7, … } b) Rational numbers x/y where x>0, y>0 and x,y   is countably infinite. x+y=2 | x + y = 3 | x + y = 4 | x+y=5 |… S = { 1/1, | ½, 2/1, | 1/3, 2/2, 3/1, | ¼, 2/3, 3/2, 4/1, | … } = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, … } The followings are true: 1. If A is an infinite subset of some countably infinite set B, then A must be countably infinite. Consider a bijection f : B as a way of listing the elements of B: B = {b0, b1, b2, … } where bi = f(I). A can be listed in the same way, simply by striking out those elements of B that are missing from A. For example, we might have A = {b2, b7, b9, b13, … }. The required bijection g: A can then be obtained directly : g(0) = b2, g(1) = b7, g(2) = b9, and in general g(n) = bm, where m is the least number such that |{b0, …, bm} A| = n+1. Such an m exists for each n, since A is an infinite subset of B. 2. The union of any finite number of countably infinite sets is countably infinite. We show this for three pairwise disjoint, countably infinite sets; a similar argument works in general. Call the sets A, B, and C. The sets can be listed as : A = {a0, a1, … }, B = {b0, b1, … }, and C = {c0, c1, … }. Then their union can be listed as A BC = {a0, b0, c0, a1, b1, c1, … }. This listing amounts to a way of “visiting” all the elements in A BC by alternating between differents sets as shown below: A a0 a1 a2 a3 … B b0 b1 b2 b3 … C c0 c1 c2 c3 … The technique of interweaving the enumeration of several sets is called “dovetailing”. 3. The union of a countably infinite collection of countably infinite sets is countably infinite. For example,  is countably infinite; note that  is the union of {0} , {1}, {2},…. We use dovetailing technique as shown below: … {4}  (4,0) (4,1) (4,2) (4,3) (4,4) (4,5) … {3}  (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) … {2}  (2,0) (2,1) (2,2) (2,3) (2,4) (2,5) … {1}  (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) … {0}  (0,0) (0,1) (0,2) (0,3) (0,4) (0,5) … A technique for showing that a set is uncountable: The Diagonalization Principle: Let R be a binary relation on a set A, and let D, the diagonal set for R, be { a : a A and (a,a) R}. For each a A, let Ra = { b : b A and (a,b) R}. Then D is distinct from each Ra. Example: Let A = {a, b, c, d, e, f}, and R = {(a,b), (a,d), (b,b), (b,c), (c,c), (d,b), (d,c), (d,e), (d,f), (e,e), (e,f), (f,a), (f,c), (f,d), (f,e)}. R may be picture like this: a c a b  b   Rb = {b, c}  Rc = {c} c  d d  e Ra = {b, d}  e f       Rd = {b, c, e, f}   Re = {e, f}  The sequence of boxes along the diagonal is  f  Rf = {a, c, d, e} Its complement is    The diagonal set D corresponds to the complement of the sequence of boxes along the main diagonal. The diagonalization principle can then be rephrased: the complement of the diagonal is different from each row. Example of uncountable set, a) The set 2 is uncountable. The set containing all possible sets of natural numbers is called the power set of the natural numbers. It is frequently written 2. When we ask how large is the power set of , we are asking how many sets of natural numbers there can be. Some sets of natural numbers are: S1 even numbers S2 all numbers S3 , the empty set S4 prime numbers S5 {3, 5, 126} S6 {1, 4, 9, 16, 25, 36,…} Some of these sets are finite (such as S3 and S5 above) and some are infinite. Some have easy descriptions and some do not. Let us assume that the power set of  is countable infinite. If so, then there is a complete list of such sets. We can construct a chart showing which numbers are in which sets. We list the numbers across the top and list the sets down the side. We mark a “+” to show that a given number is contained within a given set, and a “-“ to show that it isn’t. The chart looks like this: 1 2 3 4 5 6 … even numbers S1 + + + … all numbers S2 + + + + + + …  S3 … Prime numbers S4 + + + … {3, 5, 126 } S5 + + … squares S6 + + … … Take a moment to understand how this chart works. It does more than simply list the elements of a set in the usual manner, it also shows which numbers are not in the set. Reading set S4 from the chart, for instance, we see that 1 is not in S4, 2 and 3 are in S4, 4 is not in S4, and so on. Now we can construct the “diagonal set” SD. We define SD as follows: If the number n is in the set Sn, then n is not in SD If the number n is not in the set Sn, then n is in SD Using the above chart as an example, we have SD = {1, 3, 4, 6, …}. We determine this as follows: 1 is in SD, because 1 is not in S1 (even numbers) 2 is not in SD, because 2 is in S2 (all numbers) 3 is in SD, because 3 is not in S3 () 4 is in SD, because 4 is not in S4 (prime numbers) 5 is not in SD, because 5 is in S5 ({3, 5, 126}) 6 is in SD, because 6 is not in S6 (squares) If SD were on the chart, it would look likes this: 1 2 3 4 5 6 … SD + + + + … At this point we take note of the perversity of SD. It is guaranteed to be different than every set in the list. How do we know that? Imagine that we are searching the list of sets, and we find some set Sk which seems to resemble SD. If we inspect the chart closely enough, we will see that there is at least one number which is in in Sk but not in SD. (or vice versa). The picture looks like this (alternatively, k might be in SD but not in Sk): 1 2 3 4 5 6 … k SD + + + + … … Sk + + + + … + … Since this perversity will hold for any list of sets of natural numbers we conclude that no complete list can exist, and therefore there are an uncountable number of such sets. Proof: Suppose that 2 is countably infinite, that is there is a bijection f : 2. Then can be enumerated as 2 = {S1, S2, S3, …}, where f(i) = si for each i . Now consider the set D = { n : nSn }. D is a set of natural numbers, and therefore should be Sk for some natural number k. Now we ask if k Sk, (a) Suppose the answer is yes, k Sk. Since D = { n : n Sn}, it follows that k D; but D = Sk, a contradiction. (b) Suppose the answer is no, k Sk. then k D. But D is Sk, so k Sk, another contradiction. Since neither (a) nor (b) is possible, the assumption that D = Sk for some k must have been in error. Hence 2 is uncountable. b) The set of number theoretic functions is uncountable. Let us consider the set of all functions f: . There are usually known as the number theoretic functions. They take a single natural number as an argument, and produce a single natural number as a result. Some examples of number theoretical functions are: y=x y=x+3 y = x2+ 2x – 1 y = x/2 y = 13 You might think that since these functions are so restricted- only one variable, working with only natural numbers- that the set of all possible functions must be countable. Let us assume this to be true. There must then be a countable list containing all number theoretic functions: f1, f2, f3, …. We don’t know what that list looks like, but let us start with the above list of example functions. We build a chart showing the behavior of every function on this list. Never mind that the chart is infinite in two dimensions, we will show only a piece of it. The chart looks like this: 1 2 3 4 5 6 … x f1 1 2 3 4 5 6 … x+3 f2 4 5 6 7 8 9 … x2 + 2x – 1 f3 2 7 14 23 34 47 … x/2 f4 1 1 2 2 3 3 … 13 f5 13 13 13 13 13 13 … f6 1 4 9 2 3 9 … . . fk 2 6 15 3 14 10 … . A number theoretic function can be any mapping from natural numbers to natural numbers, even when there is no simple equation such as “ y = x + 3” to describe it. Functions f6 and fk in the above list, for example, do not appear to have simple descriptions; we know only that for every value x the function produces a corresponding value y. Now we define a new function fd(i) = fi(i) + 1. We obtain the values for fd by inspecting the functions on our list, so fd(1) = f1(1) + 1 = 2, fd(2) = f2(2) + 1 = 6, and so on. As an entry in our chart, function fd would look like this: 1 2 3 4 5 6 … fd 2 6 15 3 14 10 … By inspection, we can see that function fd is clearly different from most of the functions we have listed. We know that fd  f1 for instance, because fd(1) = 2 and f1(1) = 1. We notice that f3(1) = 2, however, but closer inspection reveals that fd and f3 don’t seem to have very many points in common beyond that. Is fd a number theoretic function, a mapping from  to ? It most certainly is. For every possible natural number x, fd(x) (obtained by adding 1 to fx(x)) is also a natural number. At this point we remember our assumption that the set of number theoretic functions is countable, and furthermore that we are working with a (supposedly) complete list of such functions f1, f2, …. Since fd is also a number theoretic function, we conclude that fd must be somewhere on the list. We start searching the list of functions in an attempt to find fd. Somewhere in our search we find function fk, a number theoretic function, which looks like this (I’ve included fd for comparison purposes): 1 2 3 4 5 6 … fk 2 6 15 3 14 10 … fd 2 6 15 3 14 10 … How nice! We have found a function which resembles fd. Since the list f1, f2, … is supposed to contain all number therortic functions, we are relieved that the list seems to contain our new function fd. Now we look a bit closer at the chart: 1 2 3 4 5 6 … k … fk 2 6 15 3 14 10 … fk(k) … fd 2 6 15 3 14 10 … fk(k)+1 … Oops! No matter that fd and fk looks similar at first, we see that there is at least one place where they are different. We conclude that fd is not exactly the same as fk. We look further down the “complete” list of number theoretic functions, hoping that fd must be somewhere on the list. We search in vain. Every time we find a function fi which seems to resemble fd, we notice there is at least one place where the two functions differ (namely, fi(i)  fd (i)). Our new function fd is perserse. It refuses to fit anywhere on the list. If we started with a different “complete” list of functions our new function fd might look different, but it would sill be perverse. There is no such thing as a complete list of number theoretic functions. We conclude that the set of all possible number theoretic functions is not countable. c) The set  of reals is uncountable. Let us assume that  is countable infinite. If so, then there is a complete list of such sets. Consider real number between 0 and 1. r1 0.250000… r2 0.14159… r3 0.090909… . … Now we define rd = 0.d1d2d3… such that dk of rd  dk of rk. It is guaranteed that rd will be different from every rk in the list. Therefore there are an uncountable number of such sets. 1.3 FINITE REPRESENTATION OF LANGUAGES A central issue in the theory of computation is the representation of languages by finite specifications. Let us be somewhat more precise about the notion of “finite representation of a language.” The first point to be made is that any such representation must itself be a string, a finite sequence of symbols over some alphabet . Second, we certainly want different languages to have different representations. But these two requirements already imply that the possibilities for finite representation are severely limited. If  is a finite alphabet, then * is countably infinite. Construct a bijection f: * , first fix some ordering of the alphabet, say  = {a1, …, an}, where a1, …, an are distinct. The members of * can then be enumerated in the following way. 1. For each k  0, all strings of length k are enumerated before all strings of length k + 1. 2. The nk strings of length exactly k are enumerated lexicographically, that is, ai1…aik preceedes aj1…ajk provided that, for some m, 0  m  k – 1, il = jl for l = 1,…,m, and im+1 < jm+1. For example, if  = {a1, a2}, the order would be as follows.  a1 a2 a 1a 1 a 1a 2 a 2a 1 a 2a 2 a 1a 1a 1 a 1a 1a 2 a 1a 2a 1 a 1a 2 a 2 a 2a 1a 1 … On the other hand, the set of all possible languages over a given alphabet  - that is, 2*- is uncoutably, since 2, and hence the power set of any countably infinite set, is uncountably. With only a countable number of representations and an uncountable number of things to represent, we are unable to represent all languages finitely. Thus the most we can hope for is to find finite representations for at least some of the more interesting languages. This is our first result in the theory of computation : No matter how powerful are the methods we use for representing languages, only countably many languages can be represented, so long as the representations themselves are finite. There being uncountably many languages in all, uncountably many of them will inevitably be missed under any finite representational scheme. Example: L = { w  { 0, 1 } * : w has two or three occurrences of 1, the first and second of which are not consecutive}. This language can be described using only singleton sets and the symbols , , and * as L = {0}* {1}{0}*{0}{1}{0}*(({1}{0}*)  *). The only symbols used in this representation are the braces { and }, the parentheses ( and ), , 0, 1, , , and . In fact, we may dispense with the braces and  and write simply L = 010*010*(10* *) . An expression 010*010*(10* *) is called a regular expression. The regular expressions over an alphabet  is defined as follows: i.  is a regular expression. ii.  is a regular expression. iii.  a  is a regular expression. iv. If  and  are regular expressions then so is . v. If  and  are regular expressions then so is  or  + . vi. If  is a regular expressions then so is *. Note! Parentheses, ( ), can be use to take a precedence. Example a) 00 is a regular expression representing {00}. b) (0 + 1)* denotes all strings of 0’s and 1’s. c) (0 + 1)* 00 (0 + 1)* denotes all strings of 0’s and 1’s with at least two consecutive 0’s. d) (1 + 10)* denotes all strings of o’s and 1’s beginning with 1 and not having two consecutive 0’s. e) (0 + ) (1 + 10)* denotes all strings of o’s and 1’s whatsoever that do not have two consecutive 0’s. f) (0+1)*011 denotes all strings of o’s and 1’s ending in 011. g) 0*1*2* denotes any number of 0’s followed by any number of 1’s followed by any number of 2’s. Every regular expression represents a language. Formally, the relation between regular expressions and the languages they represent is established by a function L, such that if  is any regular expression, then L() is the language represented by . That is, L is a function from strings ( over the alphabet   { ), (, , , * } ) to languages. The function L is defined as follows. i. L() =  ii. L() = {  } iii. L(a) = { a } for each a . iv. If  and  are regular expressions, then L() = L()L). v. If  and  are regular expressions, then L(  ) = L()  L). vi. If  is a regular expression, then L(*) = L()*. Example What is L(((a  b)*a))? L(((a  b)*a)) = L((a  b)*) L(a) = L((a  b)*){a} = (L(a)  L(b))*{a} = ({a}  {b})*{a} = {a, b}* {a} = { w  {a, b}* : w ends with a} Therefore regular expression ((a  b)*a) represents language { w  {a, b}* : w ends with a}. So regular expressions are one method for describing concisely certain infinite languages. We already know that they can not describe all languages. The language that can be described by a regular expression is called a regular language. Unfortunately, we can not describe by regular expressions some languages that have very simple descriptions such as {0n1n: n 1}.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download chapter1