* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Minds may be computers but.. - Cognitive Science Department
Technological singularity wikipedia , lookup
Chinese room wikipedia , lookup
Computer chess wikipedia , lookup
Gene expression programming wikipedia , lookup
Wizard of Oz experiment wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
Human–computer interaction wikipedia , lookup
Pattern recognition wikipedia , lookup
Human-Computer Interaction Institute wikipedia , lookup
Human–computer chess matches wikipedia , lookup
Genetic algorithm wikipedia , lookup
Intelligence explosion wikipedia , lookup
Computer Go wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Minds May be Computers, But… (… let’s not forget that they had to evolve) Peter Kugel, Boston College Summary Minds may be computers, but unlike the computers we have on our desks, they cannot be programmed by others. Therefore, they have to develop their own programs. That simple fact could help us understand natural intelligence and develop intelligent machines. It might also help us explain why computers can play chess without intelligence, human beings need intelligence to play chess, and animals cannot play chess at all. “Nothing in biology makes sense except in the light of evolution” Dobzhansky (1973) 1. What I am trying to do One reason why it is so hard for us to understand our minds is that, as Bierce (1911) put it, we have nothing but our minds to do it with. Fortunately, our minds have developed some useful strategies to help us overcome their limitations. One of those strategies is break the job of trying to understand our minds into parts and allocate each part to different groups of people. We let one group (neurologists) focus on how the mind is implemented by the brain. We let another group (psychologists) look at the behavior that things with minds are capable of. And we let others look at the mind as an information processing system. And that is what I want to do in this paper. I want to look at minds as an information processing systems, focusing on what they do, and largely ignoring how that doing is implemented by the brain. That is the kind of thing that computer scientists do when they study software without thinking much about the hardware they use to do it with. But I want to go a step further and look only at what Chomsky (1965) called the underlying “competence” involved in intelligence, rather than the actual “performance” involved in actually being intelligent. Real information processing systems have to worry about limits of space and time. Performance models take those limitations into 1 account. Competence models ignore them in an effort to try to understand the limits of the machinery that underlies the performance. In other words, I want to look at intelligence in much the same spirit in which Turing (1936) looked at computation, using what we now call the “Turing machine”. The Turing machine is a competence model because it is allowed to use unlimited time and space and is assumed never to make a processing error. One of the advantages of thinking of computers as Turing machines, with their unlimited resources, is that such machines cannot actually be constructed, what we learn applies to any computers we might possibly build, no matter how big or fast they might become. Thus, when Turing (1936) proved that the halting problem was unsolvable by a Turing machine, he proved that it was unsolvable by any possible computing machine1 we might ever build. The reason I want to look at the mind so abstractly is that I want to try to figure out whether our models of the mind have enough “stuff” in them to fully characterize the information processing minds to or whether we need to add something to them before they can do that. This is a bit like what people did when they asked “Can computers think?” It is a bit like what Chomsky (1959) did when he asked whether Skinner’s model of the mind was powerful enough to handle human languages. And it is a bit like what Greek mathematicians did when they asked if the rational numbers were enough to deal with Euclidean space. Chomsky decided that Skinner’s model did not have the right kind of “stuff” and proposed a stronger model. Mathematicians looked at the diagonal of the unit square (the 1x1 square whose diagonal is √2 units long). and found that the rational numbers were not enough to characterize its length. And so they added the irrational numbers to their conceptual toolkit. In this paper, I am going to look at our computer models of minds and suggest that they do not have all the “stuff” they need to deal with intelligence. And I am going to suggest a way we might improve them. 2. Computer models of minds Cognitive psychologists use computer models to characterize the ways minds process information, much as physicists use numbers to characterize the way balls roll down inclined planes. Thus, for example, they have described the way minds process Notice that I am calling them “computing machines” and not “computers”. That is because I am reserving the word “computers” for another use. 1 2 visual information as computer programs. That has given them a language in which to describe such processes precisely enough to allow them to make testable predictions. Meanwhile, engineers working in artificial intelligence (AI) have used the computer model to develop machines that can do apparently intelligent things. They assumed that computing machines had all the “stuff” that intelligence required and that, rather than build new kinds of machines to develop an intelligent one, all they had to do was to write programs for the programmable computers they already had. Although those programs don’t always use methods that resemble those that intelligent minds to achieve their results (Most of the people who develop such systems don’t care whether they do or not.), they accomplish many of the tasks that minds can achieve by using their intelligence. As a result, we now have machines that can “read” printed pages and identify faces in photographs. So our use of computer models of minds has led to some successful results. But so did Skinner’s model of language and so did mathematics limited to the rational numbers. There are reasons to believe that our computer models of minds could be improved and in this paper, I want to discuss one direction these improvements might be made. One of my reasons for suspecting that this might be the case is (somewhat paradoxically) one of AI’s most notable successes – the programs it has developed for playing chess at a world championship level. It was once widely believed that, if we could program a computer to play excellent chess, we would learn a lot about intelligence by looking at its program. The thinking behind this belief was that, since human beings need intelligence to play chess, computers will too. So once we have succeeded in programming computers to play good chess, we will only have to look at the programs we have written and we will be able to learn what intelligence is by looking at how they use intelligence to pick their moves. . But when we (or rather the intelligent programmers among us) wrote programs that played excellent chess and we looked for the intelligence in them, we couldn’t find it. Somehow it looked as though computers were able to do what intelligent people did without using intelligence to do it. That was disappointing and some of us wondered why we learned so little from programming computers to do something intelligent. I want to suggest that we can learn something about intelligence from our failure to learn much about it from computer chess by asking ourselves how computers could do something that requires intelligence without intelligence. Most people have suggested that they do it by brute force. I want to offer another explanation. 3 I want to suggest that intelligence is not just the ability to apply good algorithms well, which is what our chess-playing computers do. It is also the ability to acquire those algorithms in the first place. By programming computers to play chess, the people in AI were providing their chess-playing computers with the necessary algorithms and thus leaving an important part of intelligence out of the picture. They were asking the computer to do half of what intelligent chess playing requires and leaving out the other half. I believe that, if we want to use computer chess to study intelligence as a whole, we should not program them to play chess. We should program them to program themselves to play chess and to then apply the programs they develop to beat the heck out of their opponents. That would, of course, make it harder to develop programs that play good chess, but it might make it easier for us to learn about intelligence. In the remainder of this paper, I want to try to try to explain this account and to justify it. There are four things I want to do: I want to give my reasons for thinking that acquiring the algorithms one uses is a critical part of intelligence and that merely applying the algorithms others develop for you is not enough. I want to give my reasons for believing that although minds can acquire programs, they cannot acquire them the way that computers can because minds cannot be programmed the way computer can be. Then I want to ask how computers might go about developing their own programs and to suggest that, although computers will be able to do that, they will only be able to do it if we use them differently than we typically use them today.. And, finally, I want to suggest some of the benefits we might gain by revising our computer models of the mind so that they develop their own programs. 4. Acquiring algorithms matters Many people believe that intelligence the ability to do difficult and useful things like play good chess, solve hard problems, make difficult diagnoses, write piano sonatas and the like. In computer terms, it is a matter of being able to apply good algorithms. I want to suggest that, although the ability to apply good algorithms plays an important role in intelligence, it is only a part of intelligence and that by focusing on it exclusively, we have ignored a crucial part of intelligence – the ability to acquire the algorithms one applies. The American Heritage Dictionary suggests this two-part analysis of intelligence when it defines it as “the ability to acquire and apply knowledge.” The dictionary is not alone. Many people who have looked into intelligence have believed that acquiring algorithms played an important role. Binet (1905) developed the intelligence test, not to 4 find out how good (or bad) children were at applying the algorithms we use to read, ‘rite, and do ‘rithmetic, but to try to predict how good (or bad) they would be at acquring those algorithms in school. People working in artificial intelligence might have recognized the importance of acquisition when critics dismissed the intelligence of some of the apparently intelligent programs they developed once they learned how the algorithms used in those programs worked. The people who developed those programs took that as criticism of their work, but it might simply have been an expression of the fact that, once somebody finds the algorithm used to do something that seemed intelligent and hands it to you, it no longer seems intelligent because finding the algorithm is such an important part of intelligence. It is not hard to see why psychologists and computer scientists largely ignored the role of algorithm acquisition and focused on algorithm application. It is relatively easy to observe the result of applying algorithms. It is much harder to observe the results of learning them. What’s more, it is the results of applying algorithms that we value. That is, after all, how we win games, diagnose illnesses, invent lightbulbs and produce good music. To make the acquisition of algorithms even less inviting, it is quite hard to observe. In contrast to algorithm application, which produces results in the outside world (It moves the bishop on the chessboard.), algorithm acquisition produces results “inside the head.” Which makes the ability to apply algorithms seem a lot more inviting than the ability to acquire them. And the study and development of powerful algorithms has given us some impressive results. I’m not against looking at application. I just think we ought to look at acquisition too. Psychologists tend to define “intelligence” in terms of algorithm application. For example, Gardner (1983) has defined intelligence as " the ability to solve problems, or to create products, that are valued within one or more cultural settings” and Pinker (1997) has defined it as “the ability to attain goals in the face of obstacles by means of decisions based on rational (truth-obeying) rules.” And although he may have meant it as a joke, Boring’s (1923) definition of intelligence as “what is measured by intelligence tests,” is in the same spirit. Most computer scientists who study intelligence in machines seem to feel the same way. Thus, the founders of artificial in intelligence proposed that a machine should be considered intelligent if it “behaves in ways that would be called intelligent if a human were so behaving.” Minsky (1985) has defined intelligence as “the ability to solve hard problems” and McCarthy (2004) has defined it as “the computational part of the ability to achieve goals in the world.” But there are people who feel that algorithm acquisition is important too.. Many tend to credit the person who comes up with an algorithm with greater intelligence than the people who merely use it. Thus, for example, most people would say that Newton’s 5 ability to develop many of the basic algorithms of modern physics indicates a higher level of intelligence than the ability of today’s physics students who know more algorithms and apply them better than Newton ever did. Idiot savants who have strong capabilities in limited areas, and are not able to develop others, are not usually considered intelligent even though they command powerful algorithms. And we tend not to credit spiders with great intelligence in spite of the fact that the algorithms they use to spin webs are quite impressive because they did not acquire the web-spinning algorithms they apply. They were born with them. Consider a simple thought experiment – the story of Joan and Ellen2 -- that seems, to me, to support this claim that algorithm acquisition is a (and perhaps the) crucial element of intelligence. Imagine that Joan has developed an algorithm for playing worldchampionship chess. And suppose that she cannot memorize it or apply it fast enough to play in a tournament. So she asks her friend, Ellen, who has an excellent memory and who is very good at applying algorithms she has memorized, to use Jane’s algorithm in a tournament. If Ellen memorizes the algorithm and uses it effectively to win the world chess championship, whose contribution to the resulting intelligent behavior would you consider the more important? Clearly neither Joan nor Ellen could have won the championship by herself, so it makes some sense to say that each deserves part of the credit. (After all, hardly anybody has said that intelligence had to be just one thing.) But it seems, to me, that Joan contributed more to their combined intelligence than Ellen did. 5. The computer as a model Some have said that computers cannot be intelligent because they lack important capabilities such as the ability to have feelings or conscious awareness. I want to suggest that the computer may be a bad, or at least misleading, model of the human mind, not because of what it cannot do, but because of what it can do. It can run programs written for it by others and minds cannot. This obscures the role of algorithm acquisition because it allows humans to do the acquiring for them. Minds cannot do that To see the role of algorithm acquisition in intelligence, consider chess. In order to play good chess, human beings have to acquire three kinds of algorithms. They have to acquire the algorithms they use to make legal moves, the algorithms other have developed for picking good moves and, if they want to be the best in the world, to develop new algorithms for making good moves.. 2 Joan is named after John von Neumann (1928) who developed the min-max algorithm used by most computer chess programs and Ellen is named after Alan Turing (1936) who developed the computer on which those programs are run. 6 Although it looks as though humans can be given the algorithms they need to do the first two of these things in much the same way that a computer can and that they only need to go to the trouble of developing their own algorithm in the third case3, I want to suggest that people cannot be given any of these kinds of algorithms – not even the algorithms they use to do such simple things make legal moves with the bishop in chess – in the way that a computer can be programmed to do them. I believe that there is a sharp difference between how computers can be told these things and how humans can. If I am right about that, then computers are bad models for intelligence because they can “cheat” on the intelligence tests we give them to see how intelligent they are.. They can win games without intelligence because, unlike humans, they can use the algorithms developed by others – in this case, their human programmers -- to win. . I’m going to have to use some mathematical terminology to make my case, so let me give you some definitions. An algorithm is a computable procedure for doing something – say for doubling a positive integer. An algorithm allows the system that can apply it to evaluate a function by taking an input or argument and generating an output or value. We can think of a function that an algorithm allows a computer to evaluate as what mathematicians call a graph – an infinite set of ordered pairs of the form <argument, value>. Some members of this set for doubling function are <1,2>,<2,4>, and <1234,2468> A program is a finite object that defines this infinite set and allows a computer to evaluate the function. Such a program might look this in an imaginary programming language: INPUT number; result := number * 2;; OUTPUT result; Notice that the graph of the function is infinite and the program is finite. The advantage of the program over the graph for a human being is that the program fits inside the head and can be carried around with you. A graph does not and cannot. Now, in an ordinary computer, this program only works for a finite set of cases because a real computer runs into trouble when the numbers to be doubled get very large.. But for theoretical purposes, we can think of computers as Turing machines with their unlimited space (and time) and such machines can double any integer in principle. 3 There are some who believe that even this does not require algorithm creation because they believe that playing well is only a matter of applying good algorithms more quickly or accurately than your opponents. 7 A computer can be given a finite program that defines a function and use it to compute any (or all) the values of that function. Therefore we will say that computers can be finitely programmed because they can be enabled to compute a particular computable function by being given a finite object. Human beings, I will suggest cannot be finitely programmed (to compute infinite functions). Although I believe that they can be programmed, I do not believe that they can be programmed by being given a finite object. The way they get their algorithms, a wil argue, is by being infinitely programmed. I will be more precise about what this means in a moment but, for now just think of a machine that is infinitely programmed as a machine that is given an algorithm by an infinite object of some sort (from which it derives a finite program that it can apply by evaluating it). To apply a finite program that you have been given, all you need is to be able to compute. But before you can apply what we might think of as an infinite program, you first have to turn it into a finite program and this requires more than a computation. Therefore, it cannot be done by a computer if we limit that computer to computing only. That is what we do by custom with our computers today but computers can do more than compute Let us call a computer that is only used to compute computing machine. But, a computer, as a piece of machinery, can do more than compute so, when I want to talk about this machinery without limiting it to computing, I will refer to it as a computer. In my terminology, a computer is not necessarily a computing machine although it can be used as one. With this terminology in hand, I can say what the main point of the rest of this paper is going to be. Computers are misleading models of intelligent minds, not because they are too weak, but because they are too strong. Computers can do things that minds cannot and because of that, they can get away with partial stupidity in cases where minds cannot. And this obscures some crucial elements of intelligence. There are some details in this story that I have glided over. I have said that minds cannot be finitely programmed but I have not given any reasons for thinking so. I have also said that deriving programs requires more than computing without giving any arguments to support my claim. It is now time to get to some of these details and, as the saying goes, the devil is in the details. 6. Minds can be programmed but not finitely programmed If you look around, the claim that people cannot be finitely programmed may seem ridiculous. When we ask for directions to the drug store we are given finite instructions that certainly seem like programs and we can apply them to go to the drugstore and buy 8 aspirin. We can be given finite instructions that tell us what to do to improve our golf game and we can apply them to do better on the golf course. But I have several reasons for believing that minds cannot be finitely programmed. One is that we don’t act as though we believe that they could be. If we believed that children could be (finitely) programmed, we would spend a lot less time and money on education because we would simply program students to read, ‘rite and do ‘rithmetic. That shouldn’t take more than a week, once somebody writes the necessary programs and it would be a lot cheaper than sending them to school for years. What’s more, some of the things we do when we instruct people wouldn’t make any sense if instructing people were like programming computers. We often use examples to convey instructions to people, but we seldom use examples to convey programs to computers – and then we do it only rarely and for a rather narrow group of programs. We use diagrams to help people follow instructions and we ask them to do homework. We don’t do either of these things when we program a computer. And, if I believed you could be programmed, I wouldn’t have written this paper to try to convince you to change your model of the mind. I would simply have programmed you to think differently. When people make mistakes that show that they have not got an algorithm quite right, it often helps to give them an example to help them correct their misunderstanding. But we do nothing of the sort when a computer does something wrong. Some of the things that children learn do not seem to be learned by anything that resembles being programmed. For example, when children learn to walk, they seem to learn, not by being told what to do but by figuring it out. And of course, children do not learn the elements of their native language by being told. They learn it from examples. People learn to ride a bicycle by riding a bicycle which is lucky because, if we had to program them, we wouldn’t have the faintest idea of what to say. (How do you keep a bicycle from falling over?). Try to imagine what it would be like to try to teach a child who had never ridden bicycle before how to ride a bicycle over the telephone. You couldn’t do it but you could convey a program to a computer over the phone. What is more, there are many algorithms that people cannot learn from others. Good examples of such algorithms are found in creativity. Gauss did not use algorithms he was given by others to prove new theorems and Mozart did not use algorithms others gave him to write his piano sonatas. The algorithms Gauss and Mozart used were 9 developed from algorithms that others helped them acquire. but Mozart and Gauss were not told how to build on those algorithms. I have looked around for examples of people being given algorithms by finite programs and have been unable to find any. Now I admit that a lot of people act as though a finite set of instructions could totally define an algorithm for people but I believe that closer inspection would show that they are wrong. Thus, for example, the people who write instructions for assembling toys, or pieces of furniture that are supposed to be “easy to assemble”, believe that their instructions are unambiguous and fully define the steps that have to be taken. But, if you have ever tried to follow those instructions, you probably did at least one thing wrong, I always do. And, if you are like me, you probably blame yourself when this happens. But I believe that you are at least partially wrong. The language in which such instructions are written are learned from examples and the examples do not fully define the language. Unlike computer programming languages, they are necessarily ambiguous. There are many things that look like programs in this world. We can tell a child the rules for playing tic tac toe – and what we say looks very much like a computer program. The child seems to be able to follow the rules we give it and play the game correctly. Cooks can follow the recipes in cookbooks. Pianists can follow the instructions given by a musical score. All those things look like programs and people can follow them. So, at first glance, it rather looks as though people can be finitely programmed But, at first glance, it also looks as though the earth is flat. The obvious is not always the best place to start a science, so bear with me, If you stop and think about the examples I have given, you might notice that there is room for doubt. People sometimes get lost on their way to your house and have to call you for help. They miss turns or make other mistakes. Parents often have trouble assembling a child’s toy even when the instructions have been carefully tested by the manufacturer. And the soufflé sometimes comes out quite badly even though the recipe is quite correct. These observations are suggestive, but they are not compelling. Our friend’s inability to get to our house may be due to her failure to pay attention. The inability of the child to do what the teacher tells it to do may simply be due to the child’s sleepiness or lack of interest. 10 What’s more, the fact that our attempts to program people can fail is not surprising. After all, as most programmers know, attempts to program computers almost always fail the first time around. It often takes a few tries to get a program to do what you want it to do. But there are two significant differences between what happens when we program a computer and when we try to program a person. One is that although what a program (as a whole) does is not always what we thought it was going to do, that is not the case for the individual instructions of which a program is constructed. Those instructions must do what we expect them to do. When, for example, a programmer writes an instruction such as “x := 7;” he or she has to be able to assume that that instruction will get the program to do what the programmer expects it to do – that it will put (some representation of) the number 7 into the location referred to as x in the program. Without that assumption, programming (as we know it) would be impossible. (Imagine how difficult programming would be if “x := 7;” occasionally got the computer to fry an egg.) The second is that, once we write (and debug) a program we can use it on other computers to get them to do the same thing. But when we tell a person that the bishop can only move along the diagonal in a chess game, that does not necessarily mean that everyone will interpret what we say in exactly the same way. As every teacher knows, the instructions that convey an idea to one student do not usually convey it to all.. Although these observations are suggestive, they do not fully convince me that people cannot be finitely programmed. What does convince me is that I know that the mind had to evolve, and that a finitely programmable mind could not have done that.. 7. Why finitely programmable minds could not have evolved A finitely programmable mind could only have evolved it finite programmability conferred some survival advantage to the organism that had one. And, although I believe that programmability does confer such an advantage to its possessor, finite programmability does not. Before I go into what is wrong with finite programmability, let me say a few words a about programmability in general. Consider a machine that is not programmed – say an adding machine. Since it can only do one thing – add – its instructions can be built in. When an adding machine computes the sum of 5 and 7, it has what we can think of as one input (the pair <5,7>) and one output (the number 12). When a multi-purpose calculator adds those same two numbers, it has two inputs rather than one. It needs to be told to add (press the +) and 11 what to add, perhaps the input pair <5,7>. If you press the – button instead of the + button, you get 2 as a result rather than 12. The ability to execute more than one algorithm has some clear cut advantages and it is not hard to see how such an ability might have evolved. Most higher organisms can execute more than one algorithm and they can choose which one to use. Thus, for example, they can usually chose among fighting, fleeing, feeding or fornicating and that ability to choose does a lot for their chances of surviving and passing on their genes. What a computer does is different. Whereas the adding machine can aplly only one algorithm and the simple calculator can apply somewhere between four and twenty, the computer can apply a potential infinity of algorithms.. As a matter of fact, as Turing (1936) and Church (1936) suggested, a computing machine can apply all possible algorithms. I believe that minds can also apply infinitely many algorithms which allows them to apply algorithms that were not developed at the time they were built. That is not true of the adding machine or the multi-purpose calculator. This ability is an important feature of programmability in both computing machines and human minds.. The advantage that programmability confers on the human mind is that it allows it to adapt its cognitive capabilities to a potential infinity of environments. Whereas a polar bear has command of a finite number of algorithms that allow it to survive in the Arctic and an elephant has command of a finite number of somewhat different algorithms that allow it to survive in Africa, the human has the ability to develop algorithms that allow it to survive in either, should it find itself there. And, because the human is programmable, it could figure out what to do in an environment that was totally new. An African elephant would probably freeze to death in the Arctic but an African human would invent the igloo and the parka, live in the former and put on the latter, and survive.. So programmability is probably a good thing to have. But finite programmability is basically useless for the simple reason that there are no finite programs in the world for the human being to use. (This is not the case for the computer. People are producing finite programs for it all the time.) If a finitely-programmed mind were to look for a finite program that it could use to predict the behavior of polar bears where would it look? Such programs are not written on the foreheads of polar bears or given to people by a beneficent deity. True, today you might be able to google “polar bear behavior” and get some sort of answer but google is relatively recent and there’s another problem with depending on the Internet for the algorithms you will use to deal with polar bears. You’ll have a lot of trouble trying to figure on which of the algorithms you find you can rely on. 12 One way a system might develop a suitable algorithm to deal with a particular environment is to evolve one. Let individual animals with different algorithms compete for the limited resources of Africa and let the ones with the best algorithms survive. After many generations an elephant-surviving species might emerge. There are two problems with this strategy. One is that it takes a long time to develop a good algorithm by evolving one. The other is that you end up with several species – one for the Arctic that has algorithms suitable for dealing with polar bears and another in Africa that is capable of dealing with elephants. Let’s call this way of developing algorithms species programming. Species programming evolves good programs for specific environments and I suspect that that is how the algorithms used by most animals were acquired. But this way of developing algorithms is rather slow. It took a long time to develop the algorithms spiders use to spin their webs. But I believe that there’s another way that humans use and it’s a lot faster. The human species seems to have developed a mind that could produce its own algorithms from the environment. That procedure is exemplified by the way a child learns the meaning of the word “dog”. It is not programmed to recognize dogs. (Try to imagine what you would tell a child if you had to tell it how to go about identifying dogs.) It is given a few examples and it produces a program from those examples. Let us call this individual programming because it develops programs for individual people. This kind of programming is what is involved when people develop finite programs from the potentially infinitely many behaviors of either polar bears or elephants or even dragons or unicorns if they should suddenly appear. This way of building up your arsenal of algorithms allows humans to develop good algorithms for different situations. It allows humans to adapt – within a single lifetime – to a broad variety of environments. And it even allows them to learn to play chess – something that wasn’t around when their minds were evolving. That ability, I suggest, is what distinguishes humans from other animals and makes it possible for a single species to live all over the world. (As we are finding out, that is not an unalloyed blessing.) Humans also developed culture -- which allows members of the human species to help each other and to accumulate algorithms they can share with each other. But because they did not develop finite programmability, they cannot share algorithms the way that owners of iPhones can share apps. Why not? 13 Why can’t people program each other? One reason is that the finite programmability of computers depends on knowing what their internal “order code” is. Although it is true that the person who writes a program for a computer does not need to know that computer’s order code, somebody does. For a computer to run a program written in a high-level programming language, somebody has to program the computer to translate a program given to it in the high-level programming language, into the machine code of the computer on which it will run. And the people who write that program (the compiler) need to know the target machine’s internal code. Because computers are manufactured to specifications, we can know what their order code is. But humans are not manufactured and it seems quite likely that different brains have different order codes. And, besides, even if every person had the same internal order code, we would have no way to know it. Of course it would be useful for us to be able to finitely program each other. Think of how much time we could save if we could program children to do arithmetic rather than teaching them how to do it. But, although it sounds good, there are reasons to think that it might not be as useful as it seems. How would we decide which programmer to trust? How would we deal with the inevitable errors in such programs? And wouldn’t it be hard for teachers to come up with the right program? (Please come up with a program that would allow a child to recognize a dog. ( I’ll take a short informal explanation of an algorithm if you don’t know how to program..) These thoughts suggest, to me, that minds cannot be finitely programmed. But the fact that minds cannot be finitely programmed does not mean that minds cannot be programmed somehow and, indeed, there are good reasons to believe that the minds of humans and of animals are programmable somehow. Let’s look at how this can be done more closely.. As I said above, spiders use an algorithm to spin their webs and those algorithms probably evolved by some sorts of Darwinian mechanisms. I have called such programming species programming because it programs a whole species. Species programming takes a long time and requires speciation. It allows spiders to develop algorithms for spinning webs that are good in a variety of environments. But the algorithms they can execute are largely determined at birth (although their parameters can often be tweaked after birth) and they are therefore limited to the possibilities dealt with at their creation. Human beings have, I believe, a different method. I called it individual programming. When human beings find themselves in the Arctic, where they have to deal with polar 14 bears, they can observe polar bears and develop programs that they can run to predict the behavior of polar bears from their observations. If, on the other hand, they find themselves where there are elephants, they can develop algorithms for predicting the behavior of elephants. This kind of programming or algorithm acquisition – we might call it “learning” although it is only one kind of learning – is much quicker than species programming and allows one species to occupy many different niches in the environment. And this is the kind of programming that I believe humans are capable of and that is the basis for what we call their “intelligence”. The fact that they can do it, animals cannot and computers do not do it today, seems, to me, to explain why computers can play chess without using intelligence (They can be given the algorithm by a finite program.), why humans need intelligence to play chess (They have to figure out the algorithms themselves.) and why animals cannot play chess at all. (It would take them too long to develop the required algorithms.) 8. How could a mind program itself? There are, I believe, several ways that a mind might program itself and the main reason why I believe that the computing machine is not a good model of the mind is that it is not capable of using any of them (a fact that I will try to demonstrate in a few moments). But that does not imply that the computer (which, as you may recall, is different from a computing machine) is not a good model. There are several different ways that the mind might develop a program that it could run but perhaps the easiest one to understand is what I will call inductive programming This is a process that develops a finite program from a potentially infinite series of observations. (An example is the way the child learns the meaning of “dog” from examples.) Gold (1965) has suggested a model for this process that he calls black box identification. Imagine a device you cannot open or look inside of (a black box) that takes in integers and outputs integers in response. You are said to have identified the black box if you come up with a finite program that gets a computer to produce exactly the same results. With such a program in hand, you can predict the behavior (inputs and outputs) of the box. Notice that you cannot be expected to find exactly the same program the box uses because you cannot see inside the box and, given any function that the box might be evaluating, there are always infinitely many programs that compute the same function. Without being able to look inside the box, and understand what you see, you cannot determine which of the infinitely many programs that produce the same results the black box is using. 15 Rather than talking about how we might generate finite programs to predict the behavior of elephants or polar bears, let us look at a simple example that deals only with integers (which are so much less complicated than elephants and polar bears). Consider a black box that computes the values of the function DOUBLE that double the value of any positive integer you give it. Input 1 and it outputs 2, input 2 and it outputs 4, and so forth. And imagine that the program you are trying to produce will be used to predict future outputs of this black box and to “retrodict” its past behavior. If you could develop such a program, you could carry it around in your head and use it to predict the future behavior of the black box. (I’m assuming that the more complicated procedures we might develop for predicting the behavior of elephants and polar bears bee acquired similarly.) Let us use Turing’s (1936) idealized computer, the Turing machine, as our model of the mind and what Turing (1939) called an oracle to represent the input to this process. An oracle is a one-way infinite tape whose symbols represent the output of this black box. The first square of the tape represents the box’s output for the input 1, the second its output for the input 2, and, in general, the nth square represents the output of the black box for the input n. This tape is the oracle for the function DOUBLE4 and it would look like this: Let us imagine that a machine with the machinery of a Turing machine reads the symbols on this tape from left to right and that it tries to compute a program from this oracle (or “evidence”) that it can use to predict the symbols of the oracle that it has not yet seen – the future behavior of the black box.. Thus, for example, after reading the first symbol (2) it might guess that this was the oracle representing the behavior of a black box that always outputs 2. The second input (4) would cause it to change its mind and perhaps guess that this was the oracle from a box that doubled its previous output. So it might go on to predict 8. It would be wrong again so it would change its mind again when it saw 6 instead of the predicted 8. And imagine that, after seeing the 8 it guessed that it was dealing with a box that doubled its inputs. 4 Turing(1939) does not deal with computable oracles. 16 Now it would be pleased to see that the next symbols were 10, 12, and 14. That would please it enough to allow it to exclaim a machine’s equivalent of “eureka” and stop, having computed the right program. Such a machine would qualify as a computing machine – although we don’t usually think of computing machines as having infinite inputs – because it would output a finite program (or theory) in finite time and then stop. The problem with this is that it could easily come up with the wrong theory. And this possibility would remain, no matter how many inputs it had read before making its final decision. (It has to make its final decision in finite time to qualify as a computing machine.) So, suppose the sequence on the oracle started with 2, 4, 6, 8, 10, 12, 14 and then continued with 14,14,14,14…. or 14,12,10,… or what have you. A computing machine has to stay committed. It has to announce its theory at some time and stick with it. A system that used such a strategy would strike us as quite stupid or pigheaded as it stuck to its theory in the face of considerable evidence that that theory was wrong. We would say that its theory of the black box was less and less “about” the box. And we would say that it lacked what Brentano called “Intentionality” which Brentano and many others after him considered to be the essence of intelligence. There is another way the machine might operate that would require no new machinery but would get around this difficulty.. It could use what we might call (following Putnam, 1965) a trial-and-error procedure. A trial-and-error procedure is like a computation except that whereas, in a computation, we count the first value a system produces as its result, in a trial-and-error procedure we count the last value it produces as its result.5 Notice that both computations and trial-and-error procedures produce their results (when they produce a result) in finite time, using only finite space. To see how a trial-and-error procedure works, consider a machine that tries to solve the pi recognition problem. This is the problem of checking out an infinite tape that is purported to contain the full (infinite) decimal expansion of pi. Such a machine would compute successive values of the decimal expansion of pi and check the symbols of its input tape against those values. It would say that the tape was OK until it read an incorrect value. Now notice that the last output this machine produces gives the correct evaluation of the tape and gives it in finite time. If the tape does contain the infinitely many values of the decimal expansion of pi, it would indicate that it was correct after reading the first input.. And if the tape contained an error it would also come up with the right evaluation of its correctness (It is not correct.) in finite time. But there is a difference between 5 If it produces no last result, we say that its result is undefined. 17 these two cases. If the machine says the tape is bad, that is its final answer and we can turn it off, convinced that it is right. If, on the other hand, it says the tape is OK we can never be sure that that is its final answer. The OK result is a lot like a scientific theory. Unlike the result of a computation, we cannot be sure it is right. (Remember that we are in an idealized world in which machines never make mistakes in calculating.) I want to suggest that such machines – infinitely programmable computers or computers that derive their programs from infinite inputs -- are better models of the mind than computing machines. Although they do not require any machinery that is not in a computer, they use that machinery differently6. Such machines – which use trial-anderror procedures to generate programs from examples --- could have evolved because the examples from which they derive programs can be found in the world. What is more, they can develop programs with anybody else needing to know their internal “order code” and that is lucky because it is hard to see how such knowledge could be gained of a biological system. 9. Infinitely programmable machines In case you didn’t notice it, the infinitely programmable machines that I have discussed so far, evaluate what mathematicians call “functionals” – functions whose arguments and/or values are themselves functions. We can think of them as having two parts. The first part takes an oracle (or equivalently, what mathematicians call the graph of a function) and produces a finite program that computes the same function. The second part takes a finite program and an input and produces the value that the program computes for that input. The first part acquires a program for computing an algorithm and the second part applies the acquired algorithm. Together, they can be fully intelligent, which neither part can be alone. So far, most of the work in cognitive science – both in cognitive psychology and in artificial intelligence – has focused on the second part – the part that applies algorithms.. I would like to suggest that the time may have come for these disciplines, and others interested in intelligence, to look at both parts. There is plenty for us to look at from plenty of different disciplines7. Although I have tried to give good arguments in favor of this approach, the compelling argument for it may not be based on the problems that it may help us solve 6 In 1947, Turing suggested that computers might have to be used in some way that was different from the way they were originally designed to be used if they were to become intelligent . 7 Fuller explanations of such inductive procedures can be found in my other publications listed at the end of this paper. 18 but on the problems it raises that we can study. Let me suggest some problems it suggests for people working in (a) mathematics, (b) computer science, (c) psychology and neurology, biology, (d) education and (e) philosophy. a. Mathematics We already know quite a bit about the theory of trial-and-error processes. Gold (1965) has called them limiting computations to highlight their similarity to the limiting process in calculus and they resemble the processes of passing to the limit in calculus in many respects. . Interestingly enough, we know that inductive processes of the kind I discussed above are not universal in that they cannot acquire all possible algorithms. (Turing machines are universal because they can compute all computable functions, given the right program. Induction machines are not because their acquisition components cannot acquire all computable functions). Thus, if the mind uses a trial-and-error m machine to acquire the algorithms it can apply, there will be things that are learnable in principle that it cannot learn, even in principle. (Kugel (2005)) IT would be interesting to characterize such things. We also know that the set of all algorithms that a single machine of this sort can acquire is recursively enumerable. In other words, there is a computer program that will grind out a list of all the programs a specific trial-and-error machine can acquire by induction. We also know that any set of totally computable functions that can be recursively enumerated can be identified by an inductive trial-and-error machine. What we do not know is how to make such a machine efficient in most interesting cases. Finding such ways seems, to me, to be another good problem for mathematicians to investigate. Induction, which tries to find an algorithm that matches observations is not the only way that a trial-and-error machine can acquire algorithms. Strategizing by setting goals and then looking for better and better algorithms to meet those goals is another way and it shares many features with induction. Using such strategizing procedures might be the way chess players find better strategies for playing chess than any they have observed or the way Mozart developed a style of composition different from any of his predecessors. Strategizing trial and error procedures differ from inductive procedures in that there in not necessarily a best procedure and, if there is, there may be no way to identify it. Another trial-and-error way of generating algorithms is by adapting a program you already have. You start with a program that works badly and modify it to make it work better. This can, of course, be done by simply varying parameters. A thermostat that 19 keeps a house too cold for comfort can be set to a higher temperature. But that is a rather limited way of producing “new” algorithms. The kind of thing I have in mind is what might be happening when we use models.. Consider, for example, what we do when we use the computer as a model of the mind. We understand a computer does what it does better than we understand how the mind does what it does. But the computer mostly does different things than the mind does. So we try to see how we might change the kinds of programs we know in computers into ones that do what minds do. Doing this has already allowed us to develop good theories of the mind. Induction can be done by comparing the behavior of the system being studied to the available algorithms which, in the case of an inductive trial-and-error machine can be listed computationally (or recursively enumerated). When the evidence shows that the current algorithm is wrong, we move down the list of possible algorithms looking for one that matches all the data so far. This can be a very inefficient process and one thing mathematicians might try to develop is more efficient ways of doing it. These are some of the kinds of procedures the mind might use to develop programs without being programmed by others. It might be worth our while to try to develop a robust mathematical theory of algorithm acquisition. Such a theory could do lots of things but it would be interesting to see if it could develop a clear distinction between what can be done by adjusting parameters (as in setting a thermostat) and by programming a computer. This will not be easy because programming is, as Turing (1936) showed equivalent to setting a parameter on a universal Turing machine, b. Computer science When computer scientists talk about automatic programming they do not mean what those words imply to a layman. Automatic programming uses computers to turn finite programs that a programmer has developed into programs a computing machine can run. It might be interesting for computer science to look for procedures that would generate programs “from scratch” and implementing inductive programs that generate finite programs from infinite inputs is one way to do that. If mathematicians can come up with good schemes for doing induction in Gold’s way, it might be worth the while of computer science to try to turn those ideas into systems that generate programs from examples. Some such programs already exist but they are hardly robust techniques we can use to get computer to generate their own programs. 20 Or computer scientists might try to imitate a technique human beings seem to use to generate new programs by taking a program they already have and changing it slightly to make it more like what they want. Thus, for example, when people have trouble understanding how sound waves work, we sometimes tell them that sound waves are a lot like the waves you get when you toss a pebble into a pond. Because people know something about how to think about water waves (They have seen them.) this gives them something to start with. As they learn more, they can adapt the procedures they use to think about water waves to develop procedures that they can use to understand sound waves. (The fact that they probably don’t really understand water waves doesn’t seem to make a difference. The process seems to work without that.) This would be hard to do if programs were represented the way they are represented in most of today’s programming languages for at least two reasons. One is that small changes to today’s programs (changing “input” to “unput” for example) are more likely to be fatal than useful. Another is that it is (with one exception) hard to relate changes in the program to changes in the algorithm the program defines. The one exception is changing numerical parameters. Thus, a change of “123” into 124” is unlikely to be fatal and it is often easy to see what its effect on the resulting behavior will be. If we want to develop procedures for improving programs for algorithms that we already have so that they are better for achieving a particular purpose (e.g. winning chess games) it would be helpful to have languages that allowed programs to be represented in ways that supported such changes. The human mind seems to use some such representation – representing programs in terms of the objectives they achieve. Thus, for example, a person who has learned to sign their name with their right hand can usually, without additional practice, sign their name with their left hand even though quite different muscles are involved. They can even use their toes to sign their name in the sand on a beach, even though the actions performed involve quite different instructions to quite different muscles. We seem to be able to do things to achieve objectives (sign our names) no matter what specific steps we need to perform to achieve those objectives. Could we develop ways to represent programs in computers that would allow such things? In other words, could we develop what we might call “objective oriented programming”? c. Psychology and neurology One way we might go about trying to implement such languages is to try to figure out how such languages are implemented in the brain. Perhaps we can develop objective 21 oriented programming languages by seeing how the brain implements our ability to do such things as signing our names with our toes when all we have learned to do is to sign it with our hands. More generally, we might ask how programmability is implemented in the brain. Does the brain, in fact, separate the program (the instructions) from the machinery it uses to run the program? Nature has already developed at least one procedure that uses programmability. The genes are the program the ribosome uses to generate proteins. A single ribosome can generate many different proteins, given the right genes. It would be interesting to look for programs in the brain. I don’t believe that neurologists have yet found anything that represents a program in the brain. Psychologists night look for algorithm generation by algorithm adjustment in human behavior too. When faced with new evidence that disproves the suitability of an algorithm a person is using, what does he or she do to change that algorithm? Does everyone do the same thing or do some people do it differently than others? Can the ways people do it be classified? Can we intervene to make the process of making such changes more efficient? Can people change the way they make such changes more efficiently (which is to say, can they learn to learn)? What exactly is it that children learn when they learn to walk and how do they learn it? And then there is the matter of studying intelligence. I believe that the acquisition + application model helps us define what it is and psychologists might look into how that model could help us understand it better. It can be shown that intelligence, when so defined, cannot be measured effectively but it might be possible to characterize the effectiveness with which different people use it in some way that does not count as measuring it.. One of the roles that consciousness might play in the mind is to help us acquire algorithms. It might be the tool we use to look at the algorithms we use (unthinkingly once we have learned them) and change them. What can we say about how humans use consciousness to do this? d. Biology I said (above) that I could not see how finite programmability could have evolved. But it is not easy to see how what we might call rapid infinite programmability evolved. Programmability might have evolved first. If you want to allow random changes in cognition to drive evolution (which are then selected by Darwinian mechanisms) it makes sense to represent algorithms in the brain in a way that separates what is done (the program) from how it is done (the machinery that executes the program’s instructions). 22 But how did rapid infinite programming develop from the slow infinite programming that evolves algorithms by evolution? Infinite programmability is one thing. Rapid infinite programmability is another. How did it develop? e. Education Much of education consists in conveying algorithms. Can understanding how this is done help us do it better? Schools try to convey algorithms by explaining them. IT doesn’t work very well but it is a wonder that it works at all. It would be interesting to try to understand how this is possible. It doesn’t work the same way programming works because natural languages are quite different from programming languages (although some beginning teachers seem to believe that they are and that teaching is telling. f. Philosophy Finally, the extended model of the mind that includes algorithm acquisition as we;; as algorithm application seems to me to shed light on such philosophical issues as free will, intentionality and the nature of induction. And it seems to me to shed light on that old philosophical chestnut “Can computers think?”. As I have argued elsewhere, it seems show us the way out of Searle’s (1980) Chinese room. 10. Summary and Conclusion Although minds resemble computers in many respects, they differ from them in one important way. Unlike computers which are” intelligently designed”, they cannot be programmed by others. That, I have suggested, is the first of three mistakes we have made in trying to use computers to model intelligence.. Our second mistake is that we have, by and large, assumed that intelligence is only the ability to do intelligent things. I have suggested that it may be more than that -- that may also be the ability to develop the ways we go about doing those intelligent things. By assuming that minds can get their algorithms the way computers do --- by being programmed by others or by what I have called finite programming -- we have downplayed the importance of algorithm development. Because minds cannot develop algorithms that way, they have to develop them for themselves or, as I have put it, they can only be infinitely programmed. Our third mistake was to assume that intelligence could be achieved by computing alone. Although computing plays an important role in intelligence, I believe that full intelligence requires more powerful procedures such as those that I have called trialand-error procedures. But, unlike most of the people who believe that intelligence is 23 beyond computation, I do not believe that it requires anything that computers cannot do.. If I am right (and since I am a trial-and-error machine, I may not be), these ideas suggest an answer to Lady Lovelace’s objection to the claim that computers could be intelligent. Recall that Lady Lovelace (1843) said that she thought that computers (or rather Babbage’s Analytical Engine) could not ever be intelligent because they can only do what we tell them to do. But, if we want computers to be intelligent, we need not tell them what to do. We could tell them how to figure out what to do and let go and figure out what to do for themselves. I believe that that is what humans do. They inherit procedures for acquiring algorithms and then go out and acquire the algorithms they use to accomplish their aims. And I believe that, with this ability humans are capable of being intelligent. It is a pity that they do not always seem to exercise this ability. 24 References Bierce, Ambrose, The Devil’s Dictionary, 1911 Binet, Alfred (1905). “New methods for the diagnosis of the intellectual level of subnormals”,L'Année Psychologique, 12, 191-244. Dobzhansky, Theodosius (1973), “Nothing in Biology Makes Sense Except in the Light of Evolution”. The American Biology Teacher, volume 35, 125-129. Boring, Edward G. (1923). “Intelligence as the tests test it”. New Republic, 35:35-37, 1923. Chomsky, Noam (1959) "A Review of B. F. Skinner's Verbal Behavior", Language, 35, No. 1, 26-58. Chomsky, Noam (1965). Aspects of the Theory of Syntax. MIT Press. Church, Alonzo (1936), “An unsolvable problem in elementary number theory”. The American Journal of Mathematics 58, 345-363 Gardner, Howard. (1983). Frames of mind. The theory of multiple intelligences. New York: BasicBooks. Gold, Mark E. (1965) “Limiting Recursion”. J. Symbolic Logic Volume 30, Issue 1 Kugel, Peter (2005) "It’s Time to Think Outside the Computational Box", Communications of the ACM, 48:13 Lovelace, A.A. Countess of (1843), In .”Sketch Of The Analytical Engine Invented By Charles Babbage, Esq.”, by L.F. Menabrea, of Turin, Officer of the Military Engineers, translation with extensive notes. in Taylor’s Scientific Memoirs III, ed. R. Taylor, London: R. & J. E. Taylor McCarthy, John (2004). “What is artificial intelligence?” wwwformal.stanford.edu/jmc/whatisai/whatisai.html, 2004. Minsky, Marvin (1985). The Society of Mind. Simon and Schuster, New York. Pinker, Steven (1997), How the Mind Works W. W. Norton & Company Putnam, Hilary (1965) “Trial and Error Predicates and the Solution to a Problem of Mostowski” The Journal of Symbolic Logic. Volume 30, Number 1, March 1965 25 Searle, John (1980). “Minds, Brains, and Programs.” Behavioral and Brain Sciences 3, 417-424. Turing, Alan M. (1936). "On Computable Numbers, with an Application to the Entscheidungsproblem". Proceedings of the London Mathematical Society 42: pp. 230– 65. Turing, Alan M, “Systems of logic based on ordinals”, Proc. London math. soc., 45, 1939 Von Neumann, John: Zur Theorie der Gesellschaftsspiele Math. Annalen. 100 (1928) 295-320 Related publications by me: "You Don't Need a Hypercomputer to Evaluate an Uncomputable Function", International Journal of Unconventional Computing Vol. 5, No.3-4, 2009, pp 209-222 "It’s Time to Think Outside the Computational Box", Communications of the ACM, 2005:48:13. "The Chinese Room is a Trick", Behavioral and Brain Sciences, 2004, 27:1.. "Toward a Theory of Intelligence", Theoretical Computer Science, 2004, 317:1-3. "If Intelligence Is Uncomputable, Then..." , Paper presented at the Special Session on 'Beyond the Classical Boundaries of Computability', 987th Meeting of the American Mathematical Society , May 3-4, 2003. "Computers Can't Be Intelligent (...and Turing Said So)". Minds and Machines , 2002, 12:4. "Thinking May Be More Than Computing", Cognition, Vol. 22, 1986. 26