Download Minds may be computers but.. - Cognitive Science Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Technological singularity wikipedia , lookup

Chinese room wikipedia , lookup

Computer chess wikipedia , lookup

Gene expression programming wikipedia , lookup

AI winter wikipedia , lookup

Wizard of Oz experiment wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Human–computer interaction wikipedia , lookup

Pattern recognition wikipedia , lookup

Human-Computer Interaction Institute wikipedia , lookup

Human–computer chess matches wikipedia , lookup

Genetic algorithm wikipedia , lookup

Intelligence explosion wikipedia , lookup

Computer Go wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Transcript
Minds May be Computers, But…
(… let’s not forget that they had to evolve)
Peter Kugel, Boston College
Summary
Minds may be computers, but unlike the computers we have on
our desks, they cannot be programmed by others. Therefore, they
have to develop their own programs. That simple fact could help us
understand natural intelligence and develop intelligent machines. It
might also help us explain why computers can play chess without
intelligence, human beings need intelligence to play chess, and
animals cannot play chess at all.
“Nothing in biology makes sense except in the light of evolution”
Dobzhansky (1973)
1. What I am trying to do
One reason why it is so hard for us to understand our minds is that, as Bierce (1911)
put it, we have nothing but our minds to do it with. Fortunately, our minds have
developed some useful strategies to help us overcome their limitations.
One of those strategies is break the job of trying to understand our minds into parts
and allocate each part to different groups of people. We let one group (neurologists)
focus on how the mind is implemented by the brain. We let another group
(psychologists) look at the behavior that things with minds are capable of. And we let
others look at the mind as an information processing system.
And that is what I want to do in this paper. I want to look at minds as an information
processing systems, focusing on what they do, and largely ignoring how that doing is
implemented by the brain. That is the kind of thing that computer scientists do when
they study software without thinking much about the hardware they use to do it with.
But I want to go a step further and look only at what Chomsky (1965) called the
underlying “competence” involved in intelligence, rather than the actual “performance”
involved in actually being intelligent. Real information processing systems have to
worry about limits of space and time. Performance models take those limitations into
1
account. Competence models ignore them in an effort to try to understand the limits of
the machinery that underlies the performance.
In other words, I want to look at intelligence in much the same spirit in which Turing
(1936) looked at computation, using what we now call the “Turing machine”. The Turing
machine is a competence model because it is allowed to use unlimited time and space
and is assumed never to make a processing error. One of the advantages of thinking
of computers as Turing machines, with their unlimited resources, is that such machines
cannot actually be constructed, what we learn applies to any computers we might
possibly build, no matter how big or fast they might become. Thus, when Turing (1936)
proved that the halting problem was unsolvable by a Turing machine, he proved that it
was unsolvable by any possible computing machine1 we might ever build.
The reason I want to look at the mind so abstractly is that I want to try to figure out
whether our models of the mind have enough “stuff” in them to fully characterize the
information processing minds to or whether we need to add something to them before
they can do that.
This is a bit like what people did when they asked “Can computers think?” It is a bit
like what Chomsky (1959) did when he asked whether Skinner’s model of the mind was
powerful enough to handle human languages. And it is a bit like what Greek
mathematicians did when they asked if the rational numbers were enough to deal with
Euclidean space.
Chomsky decided that Skinner’s model did not have the right kind of “stuff” and
proposed a stronger model. Mathematicians looked at the diagonal of the unit square
(the 1x1 square whose diagonal is √2 units long). and found that the rational numbers
were not enough to characterize its length. And so they added the irrational numbers to
their conceptual toolkit.
In this paper, I am going to look at our computer models of minds and suggest that
they do not have all the “stuff” they need to deal with intelligence. And I am going to
suggest a way we might improve them.
2. Computer models of minds
Cognitive psychologists use computer models to characterize the ways minds
process information, much as physicists use numbers to characterize the way balls roll
down inclined planes. Thus, for example, they have described the way minds process
Notice that I am calling them “computing machines” and not “computers”. That is because I am
reserving the word “computers” for another use.
1
2
visual information as computer programs. That has given them a language in which to
describe such processes precisely enough to allow them to make testable predictions.
Meanwhile, engineers working in artificial intelligence (AI) have used the computer
model to develop machines that can do apparently intelligent things. They assumed
that computing machines had all the “stuff” that intelligence required and that, rather
than build new kinds of machines to develop an intelligent one, all they had to do was to
write programs for the programmable computers they already had.
Although those programs don’t always use methods that resemble those that
intelligent minds to achieve their results (Most of the people who develop such systems
don’t care whether they do or not.), they accomplish many of the tasks that minds can
achieve by using their intelligence. As a result, we now have machines that can “read”
printed pages and identify faces in photographs.
So our use of computer models of minds has led to some successful results. But so
did Skinner’s model of language and so did mathematics limited to the rational
numbers. There are reasons to believe that our computer models of minds could be
improved and in this paper, I want to discuss one direction these improvements might
be made.
One of my reasons for suspecting that this might be the case is (somewhat
paradoxically) one of AI’s most notable successes – the programs it has developed for
playing chess at a world championship level. It was once widely believed that, if we
could program a computer to play excellent chess, we would learn a lot about
intelligence by looking at its program. The thinking behind this belief was that, since
human beings need intelligence to play chess, computers will too. So once we have
succeeded in programming computers to play good chess, we will only have to look at
the programs we have written and we will be able to learn what intelligence is by looking
at how they use intelligence to pick their moves. .
But when we (or rather the intelligent programmers among us) wrote programs that
played excellent chess and we looked for the intelligence in them, we couldn’t find it.
Somehow it looked as though computers were able to do what intelligent people did
without using intelligence to do it. That was disappointing and some of us wondered
why we learned so little from programming computers to do something intelligent.
I want to suggest that we can learn something about intelligence from our failure to
learn much about it from computer chess by asking ourselves how computers could do
something that requires intelligence without intelligence. Most people have suggested
that they do it by brute force. I want to offer another explanation.
3
I want to suggest that intelligence is not just the ability to apply good algorithms
well, which is what our chess-playing computers do. It is also the ability to acquire
those algorithms in the first place. By programming computers to play chess, the
people in AI were providing their chess-playing computers with the necessary
algorithms and thus leaving an important part of intelligence out of the picture. They
were asking the computer to do half of what intelligent chess playing requires and
leaving out the other half.
I believe that, if we want to use computer chess to study intelligence as a whole, we
should not program them to play chess. We should program them to program
themselves to play chess and to then apply the programs they develop to beat the heck
out of their opponents. That would, of course, make it harder to develop programs that
play good chess, but it might make it easier for us to learn about intelligence.
In the remainder of this paper, I want to try to try to explain this account and to
justify it. There are four things I want to do:




I want to give my reasons for thinking that acquiring the algorithms one uses
is a critical part of intelligence and that merely applying the algorithms others
develop for you is not enough.
I want to give my reasons for believing that although minds can acquire
programs, they cannot acquire them the way that computers can because
minds cannot be programmed the way computer can be.
Then I want to ask how computers might go about developing their own
programs and to suggest that, although computers will be able to do that,
they will only be able to do it if we use them differently than we typically use
them today..
And, finally, I want to suggest some of the benefits we might gain by revising
our computer models of the mind so that they develop their own programs.
4. Acquiring algorithms matters
Many people believe that intelligence the ability to do difficult and useful things like
play good chess, solve hard problems, make difficult diagnoses, write piano sonatas
and the like. In computer terms, it is a matter of being able to apply good algorithms.
I want to suggest that, although the ability to apply good algorithms plays an
important role in intelligence, it is only a part of intelligence and that by focusing on it
exclusively, we have ignored a crucial part of intelligence – the ability to acquire the
algorithms one applies.
The American Heritage Dictionary suggests this two-part analysis of intelligence
when it defines it as “the ability to acquire and apply knowledge.” The dictionary is not
alone. Many people who have looked into intelligence have believed that acquiring
algorithms played an important role. Binet (1905) developed the intelligence test, not to
4
find out how good (or bad) children were at applying the algorithms we use to read, ‘rite,
and do ‘rithmetic, but to try to predict how good (or bad) they would be at acquring those
algorithms in school.
People working in artificial intelligence might have recognized the importance of
acquisition when critics dismissed the intelligence of some of the apparently intelligent
programs they developed once they learned how the algorithms used in those programs
worked. The people who developed those programs took that as criticism of their work,
but it might simply have been an expression of the fact that, once somebody finds the
algorithm used to do something that seemed intelligent and hands it to you, it no longer
seems intelligent because finding the algorithm is such an important part of intelligence.
It is not hard to see why psychologists and computer scientists largely ignored the
role of algorithm acquisition and focused on algorithm application. It is relatively easy to
observe the result of applying algorithms. It is much harder to observe the results of
learning them. What’s more, it is the results of applying algorithms that we value. That
is, after all, how we win games, diagnose illnesses, invent lightbulbs and produce good
music.
To make the acquisition of algorithms even less inviting, it is quite hard to observe.
In contrast to algorithm application, which produces results in the outside world (It
moves the bishop on the chessboard.), algorithm acquisition produces results “inside
the head.” Which makes the ability to apply algorithms seem a lot more inviting than the
ability to acquire them. And the study and development of powerful algorithms has
given us some impressive results. I’m not against looking at application. I just think we
ought to look at acquisition too.
Psychologists tend to define “intelligence” in terms of algorithm application. For
example, Gardner (1983) has defined intelligence as " the ability to solve problems, or
to create products, that are valued within one or more cultural settings” and Pinker
(1997) has defined it as “the ability to attain goals in the face of obstacles by means of
decisions based on rational (truth-obeying) rules.” And although he may have meant it
as a joke, Boring’s (1923) definition of intelligence as “what is measured by intelligence
tests,” is in the same spirit.
Most computer scientists who study intelligence in machines seem to feel the same
way. Thus, the founders of artificial in intelligence proposed that a machine should be
considered intelligent if it “behaves in ways that would be called intelligent if a human
were so behaving.” Minsky (1985) has defined intelligence as “the ability to solve hard
problems” and McCarthy (2004) has defined it as “the computational part of the ability to
achieve goals in the world.”
But there are people who feel that algorithm acquisition is important too.. Many
tend to credit the person who comes up with an algorithm with greater intelligence than
the people who merely use it. Thus, for example, most people would say that Newton’s
5
ability to develop many of the basic algorithms of modern physics indicates a higher
level of intelligence than the ability of today’s physics students who know more
algorithms and apply them better than Newton ever did. Idiot savants who have strong
capabilities in limited areas, and are not able to develop others, are not usually
considered intelligent even though they command powerful algorithms. And we tend
not to credit spiders with great intelligence in spite of the fact that the algorithms they
use to spin webs are quite impressive because they did not acquire the web-spinning
algorithms they apply. They were born with them.
Consider a simple thought experiment – the story of Joan and Ellen2 -- that seems,
to me, to support this claim that algorithm acquisition is a (and perhaps the) crucial
element of intelligence. Imagine that Joan has developed an algorithm for playing worldchampionship chess. And suppose that she cannot memorize it or apply it fast enough
to play in a tournament. So she asks her friend, Ellen, who has an excellent memory
and who is very good at applying algorithms she has memorized, to use Jane’s
algorithm in a tournament. If Ellen memorizes the algorithm and uses it effectively to
win the world chess championship, whose contribution to the resulting intelligent
behavior would you consider the more important?
Clearly neither Joan nor Ellen could have won the championship by herself, so it
makes some sense to say that each deserves part of the credit. (After all, hardly
anybody has said that intelligence had to be just one thing.) But it seems, to me, that
Joan contributed more to their combined intelligence than Ellen did.
5. The computer as a model
Some have said that computers cannot be intelligent because they lack important
capabilities such as the ability to have feelings or conscious awareness. I want to
suggest that the computer may be a bad, or at least misleading, model of the human
mind, not because of what it cannot do, but because of what it can do. It can run
programs written for it by others and minds cannot. This obscures the role of algorithm
acquisition because it allows humans to do the acquiring for them. Minds cannot do
that
To see the role of algorithm acquisition in intelligence, consider chess. In order to
play good chess, human beings have to acquire three kinds of algorithms. They have to
acquire the algorithms they use to make legal moves, the algorithms other have
developed for picking good moves and, if they want to be the best in the world, to
develop new algorithms for making good moves..
2
Joan is named after John von Neumann (1928) who developed the min-max algorithm used by most
computer chess programs and Ellen is named after Alan Turing (1936) who developed the computer on
which those programs are run.
6
Although it looks as though humans can be given the algorithms they need to do the
first two of these things in much the same way that a computer can and that they only
need to go to the trouble of developing their own algorithm in the third case3, I want to
suggest that people cannot be given any of these kinds of algorithms – not even the
algorithms they use to do such simple things make legal moves with the bishop in chess
– in the way that a computer can be programmed to do them. I believe that there is a
sharp difference between how computers can be told these things and how humans
can. If I am right about that, then computers are bad models for intelligence because
they can “cheat” on the intelligence tests we give them to see how intelligent they are..
They can win games without intelligence because, unlike humans, they can use the
algorithms developed by others – in this case, their human programmers -- to win. .
I’m going to have to use some mathematical terminology to make my case, so let me
give you some definitions. An algorithm is a computable procedure for doing
something – say for doubling a positive integer. An algorithm allows the system that
can apply it to evaluate a function by taking an input or argument and generating an
output or value. We can think of a function that an algorithm allows a computer to
evaluate as what mathematicians call a graph – an infinite set of ordered pairs of the
form <argument, value>. Some members of this set for doubling function are
<1,2>,<2,4>, and <1234,2468> A program is a finite object that defines this infinite set
and allows a computer to evaluate the function. Such a program might look this in an
imaginary programming language:
INPUT number;
result := number * 2;;
OUTPUT result;
Notice that the graph of the function is infinite and the program is finite. The
advantage of the program over the graph for a human being is that the program fits
inside the head and can be carried around with you. A graph does not and cannot.
Now, in an ordinary computer, this program only works for a finite set of cases
because a real computer runs into trouble when the numbers to be doubled get very
large.. But for theoretical purposes, we can think of computers as Turing machines with
their unlimited space (and time) and such machines can double any integer in principle.
3
There are some who believe that even this does not require algorithm creation because they believe
that playing well is only a matter of applying good algorithms more quickly or accurately than your
opponents.
7
A computer can be given a finite program that defines a function and use it to
compute any (or all) the values of that function. Therefore we will say that computers
can be finitely programmed because they can be enabled to compute a particular
computable function by being given a finite object. Human beings, I will suggest cannot
be finitely programmed (to compute infinite functions). Although I believe that they can
be programmed, I do not believe that they can be programmed by being given a finite
object. The way they get their algorithms, a wil argue, is by being infinitely
programmed. I will be more precise about what this means in a moment but, for now
just think of a machine that is infinitely programmed as a machine that is given an
algorithm by an infinite object of some sort (from which it derives a finite program that it
can apply by evaluating it).
To apply a finite program that you have been given, all you need is to be able to
compute. But before you can apply what we might think of as an infinite program, you
first have to turn it into a finite program and this requires more than a computation.
Therefore, it cannot be done by a computer if we limit that computer to computing only.
That is what we do by custom with our computers today but computers can do more
than compute
Let us call a computer that is only used to compute computing machine. But, a
computer, as a piece of machinery, can do more than compute so, when I want to talk
about this machinery without limiting it to computing, I will refer to it as a computer. In
my terminology, a computer is not necessarily a computing machine although it can be
used as one.
With this terminology in hand, I can say what the main point of the rest of this paper
is going to be. Computers are misleading models of intelligent minds, not because they
are too weak, but because they are too strong. Computers can do things that minds
cannot and because of that, they can get away with partial stupidity in cases where
minds cannot. And this obscures some crucial elements of intelligence.
There are some details in this story that I have glided over. I have said that minds
cannot be finitely programmed but I have not given any reasons for thinking so. I have
also said that deriving programs requires more than computing without giving any
arguments to support my claim. It is now time to get to some of these details and, as
the saying goes, the devil is in the details.
6. Minds can be programmed but not finitely programmed
If you look around, the claim that people cannot be finitely programmed may seem
ridiculous. When we ask for directions to the drug store we are given finite instructions
that certainly seem like programs and we can apply them to go to the drugstore and buy
8
aspirin. We can be given finite instructions that tell us what to do to improve our golf
game and we can apply them to do better on the golf course.
But I have several reasons for believing that minds cannot be finitely programmed.
One is that we don’t act as though we believe that they could be. If we believed that
children could be (finitely) programmed, we would spend a lot less time and money on
education because we would simply program students to read, ‘rite and do ‘rithmetic.
That shouldn’t take more than a week, once somebody writes the necessary programs
and it would be a lot cheaper than sending them to school for years.
What’s more, some of the things we do when we instruct people wouldn’t make any
sense if instructing people were like programming computers. We often use examples
to convey instructions to people, but we seldom use examples to convey programs to
computers – and then we do it only rarely and for a rather narrow group of programs.
We use diagrams to help people follow instructions and we ask them to do homework.
We don’t do either of these things when we program a computer.
And, if I believed you could be programmed, I wouldn’t have written this paper to try
to convince you to change your model of the mind. I would simply have programmed
you to think differently.
When people make mistakes that show that they have not got an algorithm quite
right, it often helps to give them an example to help them correct their
misunderstanding. But we do nothing of the sort when a computer does something
wrong.
Some of the things that children learn do not seem to be learned by anything that
resembles being programmed. For example, when children learn to walk, they seem to
learn, not by being told what to do but by figuring it out. And of course, children do not
learn the elements of their native language by being told. They learn it from examples.
People learn to ride a bicycle by riding a bicycle which is lucky because, if we had to
program them, we wouldn’t have the faintest idea of what to say. (How do you keep a
bicycle from falling over?). Try to imagine what it would be like to try to teach a child
who had never ridden bicycle before how to ride a bicycle over the telephone. You
couldn’t do it but you could convey a program to a computer over the phone.
What is more, there are many algorithms that people cannot learn from others. Good
examples of such algorithms are found in creativity. Gauss did not use algorithms he
was given by others to prove new theorems and Mozart did not use algorithms others
gave him to write his piano sonatas. The algorithms Gauss and Mozart used were
9
developed from algorithms that others helped them acquire. but Mozart and Gauss were
not told how to build on those algorithms.
I have looked around for examples of people being given algorithms by finite
programs and have been unable to find any. Now I admit that a lot of people act as
though a finite set of instructions could totally define an algorithm for people but I
believe that closer inspection would show that they are wrong.
Thus, for example, the people who write instructions for assembling toys, or pieces
of furniture that are supposed to be “easy to assemble”, believe that their instructions
are unambiguous and fully define the steps that have to be taken. But, if you have ever
tried to follow those instructions, you probably did at least one thing wrong, I always do.
And, if you are like me, you probably blame yourself when this happens. But I believe
that you are at least partially wrong.
The language in which such instructions are written are learned from examples and
the examples do not fully define the language. Unlike computer programming
languages, they are necessarily ambiguous.
There are many things that look like programs in this world. We can tell a child the
rules for playing tic tac toe – and what we say looks very much like a computer
program. The child seems to be able to follow the rules we give it and play the game
correctly. Cooks can follow the recipes in cookbooks. Pianists can follow the
instructions given by a musical score. All those things look like programs and people
can follow them. So, at first glance, it rather looks as though people can be finitely
programmed
But, at first glance, it also looks as though the earth is flat. The obvious is not
always the best place to start a science, so bear with me,
If you stop and think about the examples I have given, you might notice that there is
room for doubt. People sometimes get lost on their way to your house and have to call
you for help. They miss turns or make other mistakes. Parents often have trouble
assembling a child’s toy even when the instructions have been carefully tested by the
manufacturer. And the soufflé sometimes comes out quite badly even though the recipe
is quite correct.
These observations are suggestive, but they are not compelling. Our friend’s
inability to get to our house may be due to her failure to pay attention. The inability of
the child to do what the teacher tells it to do may simply be due to the child’s sleepiness
or lack of interest.
10
What’s more, the fact that our attempts to program people can fail is not surprising.
After all, as most programmers know, attempts to program computers almost always fail
the first time around. It often takes a few tries to get a program to do what you want it to
do. But there are two significant differences between what happens when we program
a computer and when we try to program a person.
One is that although what a program (as a whole) does is not always what we
thought it was going to do, that is not the case for the individual instructions of which a
program is constructed. Those instructions must do what we expect them to do.
When, for example, a programmer writes an instruction such as “x := 7;” he or she
has to be able to assume that that instruction will get the program to do what the
programmer expects it to do – that it will put (some representation of) the number 7 into
the location referred to as x in the program. Without that assumption, programming (as
we know it) would be impossible. (Imagine how difficult programming would be if “x :=
7;” occasionally got the computer to fry an egg.)
The second is that, once we write (and debug) a program we can use it on other
computers to get them to do the same thing. But when we tell a person that the bishop
can only move along the diagonal in a chess game, that does not necessarily mean that
everyone will interpret what we say in exactly the same way. As every teacher knows,
the instructions that convey an idea to one student do not usually convey it to all..
Although these observations are suggestive, they do not fully convince me that
people cannot be finitely programmed. What does convince me is that I know that the
mind had to evolve, and that a finitely programmable mind could not have done that..
7. Why finitely programmable minds could not have evolved
A finitely programmable mind could only have evolved it finite programmability
conferred some survival advantage to the organism that had one. And, although I
believe that programmability does confer such an advantage to its possessor, finite
programmability does not.
Before I go into what is wrong with finite programmability, let me say a few words a
about programmability in general.
Consider a machine that is not programmed – say an adding machine. Since it can
only do one thing – add – its instructions can be built in. When an adding machine
computes the sum of 5 and 7, it has what we can think of as one input (the pair <5,7>)
and one output (the number 12). When a multi-purpose calculator adds those same two
numbers, it has two inputs rather than one. It needs to be told to add (press the +) and
11
what to add, perhaps the input pair <5,7>. If you press the – button instead of the +
button, you get 2 as a result rather than 12.
The ability to execute more than one algorithm has some clear cut advantages and it
is not hard to see how such an ability might have evolved. Most higher organisms can
execute more than one algorithm and they can choose which one to use. Thus, for
example, they can usually chose among fighting, fleeing, feeding or fornicating and that
ability to choose does a lot for their chances of surviving and passing on their genes.
What a computer does is different. Whereas the adding machine can aplly only one
algorithm and the simple calculator can apply somewhere between four and twenty, the
computer can apply a potential infinity of algorithms.. As a matter of fact, as Turing
(1936) and Church (1936) suggested, a computing machine can apply all possible
algorithms. I believe that minds can also apply infinitely many algorithms which allows
them to apply algorithms that were not developed at the time they were built. That is
not true of the adding machine or the multi-purpose calculator. This ability is an
important feature of programmability in both computing machines and human minds..
The advantage that programmability confers on the human mind is that it allows it to
adapt its cognitive capabilities to a potential infinity of environments. Whereas a polar
bear has command of a finite number of algorithms that allow it to survive in the Arctic
and an elephant has command of a finite number of somewhat different algorithms that
allow it to survive in Africa, the human has the ability to develop algorithms that allow it
to survive in either, should it find itself there. And, because the human is
programmable, it could figure out what to do in an environment that was totally new. An
African elephant would probably freeze to death in the Arctic but an African human
would invent the igloo and the parka, live in the former and put on the latter, and
survive..
So programmability is probably a good thing to have. But finite programmability is
basically useless for the simple reason that there are no finite programs in the world for
the human being to use. (This is not the case for the computer. People are producing
finite programs for it all the time.) If a finitely-programmed mind were to look for a finite
program that it could use to predict the behavior of polar bears where would it look?
Such programs are not written on the foreheads of polar bears or given to people by a
beneficent deity. True, today you might be able to google “polar bear behavior” and get
some sort of answer but google is relatively recent and there’s another problem with
depending on the Internet for the algorithms you will use to deal with polar bears.
You’ll have a lot of trouble trying to figure on which of the algorithms you find you can
rely on.
12
One way a system might develop a suitable algorithm to deal with a particular
environment is to evolve one. Let individual animals with different algorithms compete
for the limited resources of Africa and let the ones with the best algorithms survive.
After many generations an elephant-surviving species might emerge.
There are two problems with this strategy. One is that it takes a long time to develop
a good algorithm by evolving one. The other is that you end up with several species –
one for the Arctic that has algorithms suitable for dealing with polar bears and another in
Africa that is capable of dealing with elephants.
Let’s call this way of developing algorithms species programming. Species
programming evolves good programs for specific environments and I suspect that that
is how the algorithms used by most animals were acquired. But this way of developing
algorithms is rather slow. It took a long time to develop the algorithms spiders use to
spin their webs.
But I believe that there’s another way that humans use and it’s a lot faster. The
human species seems to have developed a mind that could produce its own algorithms
from the environment. That procedure is exemplified by the way a child learns the
meaning of the word “dog”. It is not programmed to recognize dogs. (Try to imagine
what you would tell a child if you had to tell it how to go about identifying dogs.) It is
given a few examples and it produces a program from those examples.
Let us call this individual programming because it develops programs for individual
people. This kind of programming is what is involved when people develop finite
programs from the potentially infinitely many behaviors of either polar bears or
elephants or even dragons or unicorns if they should suddenly appear. This way of
building up your arsenal of algorithms allows humans to develop good algorithms for
different situations. It allows humans to adapt – within a single lifetime – to a broad
variety of environments. And it even allows them to learn to play chess – something
that wasn’t around when their minds were evolving.
That ability, I suggest, is what distinguishes humans from other animals and makes
it possible for a single species to live all over the world. (As we are finding out, that is
not an unalloyed blessing.)
Humans also developed culture -- which allows members of the human species to
help each other and to accumulate algorithms they can share with each other. But
because they did not develop finite programmability, they cannot share algorithms the
way that owners of iPhones can share apps. Why not?
13
Why can’t people program each other? One reason is that the finite programmability
of computers depends on knowing what their internal “order code” is. Although it is true
that the person who writes a program for a computer does not need to know that
computer’s order code, somebody does. For a computer to run a program written in a
high-level programming language, somebody has to program the computer to translate
a program given to it in the high-level programming language, into the machine code of
the computer on which it will run. And the people who write that program (the compiler)
need to know the target machine’s internal code. Because computers are
manufactured to specifications, we can know what their order code is. But humans are
not manufactured and it seems quite likely that different brains have different order
codes.
And, besides, even if every person had the same internal order code, we would have
no way to know it.
Of course it would be useful for us to be able to finitely program each other. Think of
how much time we could save if we could program children to do arithmetic rather than
teaching them how to do it. But, although it sounds good, there are reasons to think that
it might not be as useful as it seems. How would we decide which programmer to trust?
How would we deal with the inevitable errors in such programs? And wouldn’t it be hard
for teachers to come up with the right program? (Please come up with a program that
would allow a child to recognize a dog. ( I’ll take a short informal explanation of an
algorithm if you don’t know how to program..)
These thoughts suggest, to me, that minds cannot be finitely programmed. But the
fact that minds cannot be finitely programmed does not mean that minds cannot be
programmed somehow and, indeed, there are good reasons to believe that the minds of
humans and of animals are programmable somehow.
Let’s look at how this can be done more closely..
As I said above, spiders use an algorithm to spin their webs and those algorithms
probably evolved by some sorts of Darwinian mechanisms. I have called such
programming species programming because it programs a whole species. Species
programming takes a long time and requires speciation. It allows spiders to develop
algorithms for spinning webs that are good in a variety of environments. But the
algorithms they can execute are largely determined at birth (although their parameters
can often be tweaked after birth) and they are therefore limited to the possibilities dealt
with at their creation.
Human beings have, I believe, a different method. I called it individual programming.
When human beings find themselves in the Arctic, where they have to deal with polar
14
bears, they can observe polar bears and develop programs that they can run to predict
the behavior of polar bears from their observations. If, on the other hand, they find
themselves where there are elephants, they can develop algorithms for predicting the
behavior of elephants. This kind of programming or algorithm acquisition – we might
call it “learning” although it is only one kind of learning – is much quicker than species
programming and allows one species to occupy many different niches in the
environment.
And this is the kind of programming that I believe humans are capable of and that is
the basis for what we call their “intelligence”. The fact that they can do it, animals
cannot and computers do not do it today, seems, to me, to explain why computers can
play chess without using intelligence (They can be given the algorithm by a finite
program.), why humans need intelligence to play chess (They have to figure out the
algorithms themselves.) and why animals cannot play chess at all. (It would take them
too long to develop the required algorithms.)
8. How could a mind program itself?
There are, I believe, several ways that a mind might program itself and the main
reason why I believe that the computing machine is not a good model of the mind is that
it is not capable of using any of them (a fact that I will try to demonstrate in a few
moments). But that does not imply that the computer (which, as you may recall, is
different from a computing machine) is not a good model.
There are several different ways that the mind might develop a program that it could
run but perhaps the easiest one to understand is what I will call inductive programming
This is a process that develops a finite program from a potentially infinite series of
observations. (An example is the way the child learns the meaning of “dog” from
examples.) Gold (1965) has suggested a model for this process that he calls black box
identification. Imagine a device you cannot open or look inside of (a black box) that
takes in integers and outputs integers in response. You are said to have identified the
black box if you come up with a finite program that gets a computer to produce exactly
the same results. With such a program in hand, you can predict the behavior (inputs
and outputs) of the box.
Notice that you cannot be expected to find exactly the same program the box uses
because you cannot see inside the box and, given any function that the box might be
evaluating, there are always infinitely many programs that compute the same function.
Without being able to look inside the box, and understand what you see, you cannot
determine which of the infinitely many programs that produce the same results the black
box is using.
15
Rather than talking about how we might generate finite programs to predict the
behavior of elephants or polar bears, let us look at a simple example that deals only
with integers (which are so much less complicated than elephants and polar bears).
Consider a black box that computes the values of the function DOUBLE that double the
value of any positive integer you give it. Input 1 and it outputs 2, input 2 and it outputs
4, and so forth.
And imagine that the program you are trying to produce will be used to predict future
outputs of this black box and to “retrodict” its past behavior. If you could develop such a
program, you could carry it around in your head and use it to predict the future behavior
of the black box. (I’m assuming that the more complicated procedures we might
develop for predicting the behavior of elephants and polar bears bee acquired similarly.)
Let us use Turing’s (1936) idealized computer, the Turing machine, as our model of
the mind and what Turing (1939) called an oracle to represent the input to this process.
An oracle is a one-way infinite tape whose symbols represent the output of this black
box. The first square of the tape represents the box’s output for the input 1, the second
its output for the input 2, and, in general, the nth square represents the output of the
black box for the input n. This tape is the oracle for the function DOUBLE4 and it would
look like this:
Let us imagine that a machine with the machinery of a Turing machine reads the
symbols on this tape from left to right and that it tries to compute a program from this
oracle (or “evidence”) that it can use to predict the symbols of the oracle that it has not
yet seen – the future behavior of the black box..
Thus, for example, after reading the first symbol (2) it might guess that this was the
oracle representing the behavior of a black box that always outputs 2. The second input
(4) would cause it to change its mind and perhaps guess that this was the oracle from a
box that doubled its previous output. So it might go on to predict 8. It would be wrong
again so it would change its mind again when it saw 6 instead of the predicted 8. And
imagine that, after seeing the 8 it guessed that it was dealing with a box that doubled its
inputs.
4
Turing(1939) does not deal with computable oracles.
16
Now it would be pleased to see that the next symbols were 10, 12, and 14. That
would please it enough to allow it to exclaim a machine’s equivalent of “eureka” and
stop, having computed the right program.
Such a machine would qualify as a computing machine – although we don’t usually
think of computing machines as having infinite inputs – because it would output a finite
program (or theory) in finite time and then stop. The problem with this is that it could
easily come up with the wrong theory. And this possibility would remain, no matter how
many inputs it had read before making its final decision. (It has to make its final
decision in finite time to qualify as a computing machine.) So, suppose the sequence on
the oracle started with 2, 4, 6, 8, 10, 12, 14 and then continued with 14,14,14,14…. or
14,12,10,… or what have you. A computing machine has to stay committed. It has to
announce its theory at some time and stick with it.
A system that used such a strategy would strike us as quite stupid or pigheaded as it
stuck to its theory in the face of considerable evidence that that theory was wrong. We
would say that its theory of the black box was less and less “about” the box. And we
would say that it lacked what Brentano called “Intentionality” which Brentano and many
others after him considered to be the essence of intelligence.
There is another way the machine might operate that would require no new
machinery but would get around this difficulty.. It could use what we might call
(following Putnam, 1965) a trial-and-error procedure. A trial-and-error procedure is like
a computation except that whereas, in a computation, we count the first value a system
produces as its result, in a trial-and-error procedure we count the last value it produces
as its result.5 Notice that both computations and trial-and-error procedures produce
their results (when they produce a result) in finite time, using only finite space.
To see how a trial-and-error procedure works, consider a machine that tries to solve
the pi recognition problem. This is the problem of checking out an infinite tape that is
purported to contain the full (infinite) decimal expansion of pi. Such a machine would
compute successive values of the decimal expansion of pi and check the symbols of its
input tape against those values. It would say that the tape was OK until it read an
incorrect value.
Now notice that the last output this machine produces gives the correct evaluation of
the tape and gives it in finite time. If the tape does contain the infinitely many values of
the decimal expansion of pi, it would indicate that it was correct after reading the first
input.. And if the tape contained an error it would also come up with the right evaluation
of its correctness (It is not correct.) in finite time. But there is a difference between
5
If it produces no last result, we say that its result is undefined.
17
these two cases. If the machine says the tape is bad, that is its final answer and we can
turn it off, convinced that it is right. If, on the other hand, it says the tape is OK we can
never be sure that that is its final answer.
The OK result is a lot like a scientific theory. Unlike the result of a computation, we
cannot be sure it is right. (Remember that we are in an idealized world in which
machines never make mistakes in calculating.)
I want to suggest that such machines – infinitely programmable computers or
computers that derive their programs from infinite inputs -- are better models of the mind
than computing machines. Although they do not require any machinery that is not in a
computer, they use that machinery differently6. Such machines – which use trial-anderror procedures to generate programs from examples --- could have evolved because
the examples from which they derive programs can be found in the world. What is
more, they can develop programs with anybody else needing to know their internal
“order code” and that is lucky because it is hard to see how such knowledge could be
gained of a biological system.
9. Infinitely programmable machines
In case you didn’t notice it, the infinitely programmable machines that I have
discussed so far, evaluate what mathematicians call “functionals” – functions whose
arguments and/or values are themselves functions. We can think of them as having two
parts. The first part takes an oracle (or equivalently, what mathematicians call the
graph of a function) and produces a finite program that computes the same function.
The second part takes a finite program and an input and produces the value that the
program computes for that input. The first part acquires a program for computing an
algorithm and the second part applies the acquired algorithm. Together, they can be
fully intelligent, which neither part can be alone.
So far, most of the work in cognitive science – both in cognitive psychology and in
artificial intelligence – has focused on the second part – the part that applies
algorithms.. I would like to suggest that the time may have come for these disciplines,
and others interested in intelligence, to look at both parts. There is plenty for us to look
at from plenty of different disciplines7.
Although I have tried to give good arguments in favor of this approach, the
compelling argument for it may not be based on the problems that it may help us solve
6
In 1947, Turing suggested that computers might have to be used in some way that was different from
the way they were originally designed to be used if they were to become intelligent .
7 Fuller explanations of such inductive procedures can be found in my other publications listed at the end
of this paper.
18
but on the problems it raises that we can study. Let me suggest some problems it
suggests for people working in (a) mathematics, (b) computer science, (c) psychology
and neurology, biology, (d) education and (e) philosophy.
a. Mathematics
We already know quite a bit about the theory of trial-and-error processes. Gold
(1965) has called them limiting computations to highlight their similarity to the limiting
process in calculus and they resemble the processes of passing to the limit in calculus
in many respects. .
Interestingly enough, we know that inductive processes of the kind I discussed
above are not universal in that they cannot acquire all possible algorithms. (Turing
machines are universal because they can compute all computable functions, given the
right program. Induction machines are not because their acquisition components
cannot acquire all computable functions). Thus, if the mind uses a trial-and-error m
machine to acquire the algorithms it can apply, there will be things that are learnable in
principle that it cannot learn, even in principle. (Kugel (2005)) IT would be interesting to
characterize such things.
We also know that the set of all algorithms that a single machine of this sort can
acquire is recursively enumerable. In other words, there is a computer program that
will grind out a list of all the programs a specific trial-and-error machine can acquire by
induction. We also know that any set of totally computable functions that can be
recursively enumerated can be identified by an inductive trial-and-error machine. What
we do not know is how to make such a machine efficient in most interesting cases.
Finding such ways seems, to me, to be another good problem for mathematicians to
investigate.
Induction, which tries to find an algorithm that matches observations is not the only
way that a trial-and-error machine can acquire algorithms. Strategizing by setting
goals and then looking for better and better algorithms to meet those goals is another
way and it shares many features with induction. Using such strategizing procedures
might be the way chess players find better strategies for playing chess than any they
have observed or the way Mozart developed a style of composition different from any
of his predecessors. Strategizing trial and error procedures differ from inductive
procedures in that there in not necessarily a best procedure and, if there is, there may
be no way to identify it.
Another trial-and-error way of generating algorithms is by adapting a program you
already have. You start with a program that works badly and modify it to make it work
better. This can, of course, be done by simply varying parameters. A thermostat that
19
keeps a house too cold for comfort can be set to a higher temperature. But that is a
rather limited way of producing “new” algorithms. The kind of thing I have in mind is
what might be happening when we use models..
Consider, for example, what we do when we use the computer as a model of the
mind. We understand a computer does what it does better than we understand how
the mind does what it does. But the computer mostly does different things than the
mind does. So we try to see how we might change the kinds of programs we know in
computers into ones that do what minds do. Doing this has already allowed us to
develop good theories of the mind.
Induction can be done by comparing the behavior of the system being studied to the
available algorithms which, in the case of an inductive trial-and-error machine can be
listed computationally (or recursively enumerated). When the evidence shows that the
current algorithm is wrong, we move down the list of possible algorithms looking for
one that matches all the data so far. This can be a very inefficient process and one
thing mathematicians might try to develop is more efficient ways of doing it.
These are some of the kinds of procedures the mind might use to develop programs
without being programmed by others. It might be worth our while to try to develop a
robust mathematical theory of algorithm acquisition. Such a theory could do lots of
things but it would be interesting to see if it could develop a clear distinction between
what can be done by adjusting parameters (as in setting a thermostat) and by
programming a computer. This will not be easy because programming is, as Turing
(1936) showed equivalent to setting a parameter on a universal Turing machine,
b. Computer science
When computer scientists talk about automatic programming they do not mean what
those words imply to a layman. Automatic programming uses computers to turn finite
programs that a programmer has developed into programs a computing machine can
run. It might be interesting for computer science to look for procedures that would
generate programs “from scratch” and implementing inductive programs that generate
finite programs from infinite inputs is one way to do that.
If mathematicians can come up with good schemes for doing induction in Gold’s
way, it might be worth the while of computer science to try to turn those ideas into
systems that generate programs from examples. Some such programs already exist
but they are hardly robust techniques we can use to get computer to generate their
own programs.
20
Or computer scientists might try to imitate a technique human beings seem to use to
generate new programs by taking a program they already have and changing it slightly
to make it more like what they want. Thus, for example, when people have trouble
understanding how sound waves work, we sometimes tell them that sound waves are a
lot like the waves you get when you toss a pebble into a pond. Because people know
something about how to think about water waves (They have seen them.) this gives
them something to start with. As they learn more, they can adapt the procedures they
use to think about water waves to develop procedures that they can use to understand
sound waves. (The fact that they probably don’t really understand water waves doesn’t
seem to make a difference. The process seems to work without that.)
This would be hard to do if programs were represented the way they are
represented in most of today’s programming languages for at least two reasons. One
is that small changes to today’s programs (changing “input” to “unput” for example) are
more likely to be fatal than useful. Another is that it is (with one exception) hard to
relate changes in the program to changes in the algorithm the program defines.
The one exception is changing numerical parameters. Thus, a change of “123” into
124” is unlikely to be fatal and it is often easy to see what its effect on the resulting
behavior will be.
If we want to develop procedures for improving programs for algorithms that we
already have so that they are better for achieving a particular purpose (e.g. winning
chess games) it would be helpful to have languages that allowed programs to be
represented in ways that supported such changes. The human mind seems to use
some such representation – representing programs in terms of the objectives they
achieve. Thus, for example, a person who has learned to sign their name with their
right hand can usually, without additional practice, sign their name with their left hand
even though quite different muscles are involved. They can even use their toes to sign
their name in the sand on a beach, even though the actions performed involve quite
different instructions to quite different muscles. We seem to be able to do things to
achieve objectives (sign our names) no matter what specific steps we need to perform
to achieve those objectives.
Could we develop ways to represent programs in computers that would allow such
things? In other words, could we develop what we might call “objective oriented
programming”?
c. Psychology and neurology
One way we might go about trying to implement such languages is to try to figure out
how such languages are implemented in the brain. Perhaps we can develop objective
21
oriented programming languages by seeing how the brain implements our ability to do
such things as signing our names with our toes when all we have learned to do is to
sign it with our hands.
More generally, we might ask how programmability is implemented in the brain. Does
the brain, in fact, separate the program (the instructions) from the machinery it uses to
run the program? Nature has already developed at least one procedure that uses
programmability. The genes are the program the ribosome uses to generate proteins.
A single ribosome can generate many different proteins, given the right genes.
It would be interesting to look for programs in the brain. I don’t believe that
neurologists have yet found anything that represents a program in the brain.
Psychologists night look for algorithm generation by algorithm adjustment in human
behavior too. When faced with new evidence that disproves the suitability of an
algorithm a person is using, what does he or she do to change that algorithm? Does
everyone do the same thing or do some people do it differently than others? Can the
ways people do it be classified?
Can we intervene to make the process of making such changes more efficient? Can
people change the way they make such changes more efficiently (which is to say, can
they learn to learn)? What exactly is it that children learn when they learn to walk and
how do they learn it?
And then there is the matter of studying intelligence. I believe that the acquisition +
application model helps us define what it is and psychologists might look into how that
model could help us understand it better. It can be shown that intelligence, when so
defined, cannot be measured effectively but it might be possible to characterize the
effectiveness with which different people use it in some way that does not count as
measuring it..
One of the roles that consciousness might play in the mind is to help us acquire
algorithms. It might be the tool we use to look at the algorithms we use (unthinkingly
once we have learned them) and change them. What can we say about how humans
use consciousness to do this?
d. Biology
I said (above) that I could not see how finite programmability could have evolved.
But it is not easy to see how what we might call rapid infinite programmability evolved.
Programmability might have evolved first. If you want to allow random changes in
cognition to drive evolution (which are then selected by Darwinian mechanisms) it
makes sense to represent algorithms in the brain in a way that separates what is done
(the program) from how it is done (the machinery that executes the program’s
instructions).
22
But how did rapid infinite programming develop from the slow infinite programming
that evolves algorithms by evolution? Infinite programmability is one thing. Rapid
infinite programmability is another. How did it develop?
e. Education
Much of education consists in conveying algorithms. Can understanding how this is
done help us do it better? Schools try to convey algorithms by explaining them. IT
doesn’t work very well but it is a wonder that it works at all. It would be interesting to try
to understand how this is possible.
It doesn’t work the same way programming works because natural languages are
quite different from programming languages (although some beginning teachers seem
to believe that they are and that teaching is telling.
f. Philosophy
Finally, the extended model of the mind that includes algorithm acquisition as we;; as
algorithm application seems to me to shed light on such philosophical issues as free
will, intentionality and the nature of induction. And it seems to me to shed light on that
old philosophical chestnut “Can computers think?”. As I have argued elsewhere, it
seems show us the way out of Searle’s (1980) Chinese room.
10. Summary and Conclusion
Although minds resemble computers in many respects, they differ from them in one
important way. Unlike computers which are” intelligently designed”, they cannot be
programmed by others. That, I have suggested, is the first of three mistakes we have
made in trying to use computers to model intelligence..
Our second mistake is that we have, by and large, assumed that intelligence is only
the ability to do intelligent things. I have suggested that it may be more than that -- that
may also be the ability to develop the ways we go about doing those intelligent things.
By assuming that minds can get their algorithms the way computers do --- by being
programmed by others or by what I have called finite programming -- we have
downplayed the importance of algorithm development. Because minds cannot develop
algorithms that way, they have to develop them for themselves or, as I have put it, they
can only be infinitely programmed.
Our third mistake was to assume that intelligence could be achieved by computing
alone. Although computing plays an important role in intelligence, I believe that full
intelligence requires more powerful procedures such as those that I have called trialand-error procedures. But, unlike most of the people who believe that intelligence is
23
beyond computation, I do not believe that it requires anything that computers cannot
do..
If I am right (and since I am a trial-and-error machine, I may not be), these ideas
suggest an answer to Lady Lovelace’s objection to the claim that computers could be
intelligent. Recall that Lady Lovelace (1843) said that she thought that computers (or
rather Babbage’s Analytical Engine) could not ever be intelligent because they can only
do what we tell them to do.
But, if we want computers to be intelligent, we need not tell them what to do. We
could tell them how to figure out what to do and let go and figure out what to do for
themselves. I believe that that is what humans do. They inherit procedures for
acquiring algorithms and then go out and acquire the algorithms they use to accomplish
their aims.
And I believe that, with this ability humans are capable of being intelligent. It is a pity
that they do not always seem to exercise this ability.
24
References
Bierce, Ambrose, The Devil’s Dictionary, 1911
Binet, Alfred (1905). “New methods for the diagnosis of the intellectual level of
subnormals”,L'Année Psychologique, 12, 191-244.
Dobzhansky, Theodosius (1973), “Nothing in Biology Makes Sense Except in the Light
of Evolution”. The American Biology Teacher, volume 35, 125-129.
Boring, Edward G. (1923). “Intelligence as the tests test it”. New Republic, 35:35-37,
1923.
Chomsky, Noam (1959) "A Review of B. F. Skinner's Verbal Behavior", Language, 35,
No. 1, 26-58.
Chomsky, Noam (1965). Aspects of the Theory of Syntax. MIT Press.
Church, Alonzo (1936), “An unsolvable problem in elementary number theory”. The
American Journal of Mathematics 58, 345-363
Gardner, Howard. (1983). Frames of mind. The theory of multiple intelligences. New
York: BasicBooks.
Gold, Mark E. (1965) “Limiting Recursion”. J. Symbolic Logic Volume 30, Issue 1
Kugel, Peter (2005) "It’s Time to Think Outside the Computational Box",
Communications of the ACM, 48:13
Lovelace, A.A. Countess of (1843), In .”Sketch Of The Analytical Engine Invented By
Charles Babbage, Esq.”, by L.F. Menabrea, of Turin, Officer of the Military Engineers,
translation with extensive notes. in Taylor’s Scientific Memoirs III, ed. R. Taylor, London:
R. & J. E. Taylor
McCarthy, John (2004). “What is artificial intelligence?” wwwformal.stanford.edu/jmc/whatisai/whatisai.html, 2004.
Minsky, Marvin (1985). The Society of Mind. Simon and Schuster, New York.
Pinker, Steven (1997), How the Mind Works W. W. Norton & Company
Putnam, Hilary (1965) “Trial and Error Predicates and the Solution to a Problem of
Mostowski” The Journal of Symbolic Logic. Volume 30, Number 1, March 1965
25
Searle, John (1980). “Minds, Brains, and Programs.” Behavioral and Brain Sciences 3,
417-424.
Turing, Alan M. (1936). "On Computable Numbers, with an Application to the
Entscheidungsproblem". Proceedings of the London Mathematical Society 42: pp. 230–
65.
Turing, Alan M, “Systems of logic based on ordinals”, Proc. London math. soc., 45,
1939
Von Neumann, John: Zur Theorie der Gesellschaftsspiele Math. Annalen. 100 (1928)
295-320
Related publications by me:
"You Don't Need a Hypercomputer to Evaluate an Uncomputable Function",
International Journal of Unconventional Computing Vol. 5, No.3-4, 2009, pp 209-222
"It’s Time to Think Outside the Computational Box", Communications of the ACM,
2005:48:13.
"The Chinese Room is a Trick", Behavioral and Brain Sciences, 2004, 27:1..
"Toward a Theory of Intelligence", Theoretical Computer Science, 2004, 317:1-3.
"If Intelligence Is Uncomputable, Then..." , Paper presented at the Special Session on
'Beyond the Classical Boundaries of Computability', 987th Meeting of the American
Mathematical Society , May 3-4, 2003.
"Computers Can't Be Intelligent (...and Turing Said So)". Minds and Machines , 2002,
12:4.
"Thinking May Be More Than Computing", Cognition, Vol. 22, 1986.
26