Download Interview with Eliezer Yudkowsky http://johncarlosbaez.wordpress

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Interview with Eliezer Yudkowsky
http://johncarlosbaez.wordpress.com/2011/03/07/this-weeks-finds-week-311/
Singularity Institute of Artificial Intelligence.
While many believe that global warming or peak oil are the biggest dangers facing humanity,
Yudkowsky is more concerned about risks inherent in the accelerating development of technology.
There are different scenarios one can imagine, but a bunch tend to get lumped under the general
heading of a technological singularity. Instead of trying to explain this idea in all its variations, let
me rapidly sketch its history and point you to some reading material. Then, on with the interview!
In 1958, the mathematician Stanislaw Ulam wrote about some talks he had with John von
Neumann:
One conversation centered on the ever accelerating progress of technology and changes in the
mode of human life, which gives the appearance of approaching some essential singularity in the
history of the race beyond which human affairs, as we know them, could not continue.
In 1965, the British mathematician Irving John Good raised the possibility of an "intelligence
explosion": if machines could improve themselves to get smarter, perhaps they would quickly
become a lot smarter than us.
In 1983 the mathematician and science fiction writer Vernor Vinge brought the singularity idea into
public prominence with an article in Omni magazine, in which he wrote:
We will soon create intelligences greater than our own. When this happens, human history will
have reached a kind of singularity, an intellectual transition as impenetrable as the knotted spacetime at the center of a black hole, and the world will pass far beyond our understanding. This
singularity, I believe, already haunts a number of science-fiction writers. It makes realistic
extrapolation to an interstellar future impossible. To write a story set more than a century hence,
one needs a nuclear war in between … so that the world remains intelligible.
In 1993 wrote an essay in which he even ventured a prediction as to when the singularity would
happen:
Within thirty years, we will have the technological means to create superhuman intelligence.
Shortly after, the human era will be ended.
You can read that essay here:
• Vernor Vinge, The coming technological singularity: how to survive in the post-human era, article
for the VISION-21 Symposium, 30-31 March, 1993.
With the rise of the internet, the number of people interested in such ideas grew enormously:
transhumanists, extropians, singularitarians and the like. In 2005, Ray Kurzweil wrote:
What, then, is the Singularity? It’s a future period during which the pace of technological change
will be so rapid, its impact so deep, that human life will be irreversibly transformed. Although
neither utopian or dystopian, this epoch will transform the concepts we rely on to give meaning to
our lives, from our business models to the cycle of human life, including death itself.
Understanding the Singularity will alter our perspective on the significance of our past and the
ramifications for our future. To truly understand it inherently changes one’s view of life in general
and one’s particular life. I regard someone who understands the Singularity and who has reflected
on its implications for his or her own life as a "singularitarian".
He predicted that the singularity will occur around 2045. For more, see:
• Ray Kurzweil, The Singularity is Near: When Humans Transcend Biology, Viking, 2005.
Yudkowsky distinguishes three major schools of thought regarding the singularity:
• Accelerating Change that is nonetheless somewhat predictable (e.g. Ray Kurzweil).
• Event Horizon: after the rise of intelligence beyond our own, the future becomes absolutely
unpredictable to us (e.g. Vernor Vinge).
• Intelligence Explosion: a rapid chain reaction of self-amplifying intelligence until ultimate
physical limits are reached (e.g. I. J. Good and Eliezer Yudkowsky).
Yukdowsky believes that an intelligence explosion could threaten everything we hold dear unless
the first self-amplifying intelligence is "friendly". The challenge, then, is to design “friendly AI”.
And this requires understanding a lot more than we currently do about intelligence, goal-driven
behavior, rationality and ethics—and of course what it means to be “friendly”. For more, start here:
• The Singularity Institute of Artificial Intelligence, Publications.
Needless to say, there’s a fourth school of thought on the technological singularity, even more
popular than those listed above:
• Baloney: it’s all a load of hooey!
Most people in this school have never given the matter serious thought, but a few have taken time
to formulate objections. Others think a technological singularity is possible but highly undesirable
and avoidable, so they want to prevent it. For various criticisms, start here:
• Technological singularity: Criticism, Wikipedia.
Personally, what I like most about singularitarians is that they care about the future and recognize
that it may be very different from the present, just as the present is very different from the prehuman past. I wish there were more dialog between them and other sorts of people—especially
people who also care deeply about the future, but have drastically different visions of it. I find it
quite distressing how people with different visions of the future do most of their serious thinking
within like-minded groups. This leads to groups with drastically different assumptions, with each
group feeling a lot more confident about their assumptions than an outsider would deem
reasonable. I’m talking here about environmentalists, singularitarians, people who believe global
warming is a serious problem, people who don’t, etc. Members of any tribe can easily see the
cognitive defects of every other tribe, but not their own. That’s a pity.
And so, this interview:
JB: I’ve been a fan of your work for quite a while. At first I thought your main focus was artificial
intelligence (AI) and preparing for a technological singularity by trying to create "friendly AI". But
lately I’ve been reading your blog, Less Wrong, and I get the feeling you’re trying to start a
community of people interested in boosting their own intelligence—or at least, their own rationality.
So, I’m curious: how would you describe your goals these days?
EY: My long-term goals are the same as ever: I’d like human-originating intelligent life in the
Solar System to survive, thrive, and not lose its values in the process. And I still think the best
means is self-improving AI. But that’s a bit of a large project for one person, and after a few years
of beating my head against the wall trying to get other people involved, I realized that I really did
have to go back to the beginning, start over, and explain all the basics that people needed to know
before they could follow the advanced arguments. Saving the world via AI research simply can’t
compete against the Society for Treating Rare Diseases in Cute Kittens unless your audience
knows about things like scope insensitivity and the affect heuristic and the concept of marginal
expected utility, so they can see why the intuitively more appealing option is the wrong one. So I
know it sounds strange, but in point of fact, since I sat down and started explaining all the basics,
the Singularity Institute for Artificial Intelligence has been growing at a better clip and attracting
more interesting people.
Right now my short-term goal is to write a book on rationality (tentative working title: The Art of
Rationality) to explain the drop-dead basic fundamentals that, at present, no one teaches; those
who are impatient will find a lot of the core material covered in these Less Wrong sequences:
• Map and territory.
• How to actually change your mind.
• Mysterious answers to mysterious questions.
though I intend to rewrite it all completely for the book so as to make it accessible to a wider
audience. Then I probably need to take at least a year to study up on math, and then—though it
may be an idealistic dream—I intend to plunge into the decision theory of self-modifying decision
systems and never look back. (And finish the decision theory and implement it and run the AI, at
which point, if all goes well, we Win.)
JB: I can think of lots of big questions at this point, and I’ll try to get to some of those, but first I
can’t resist asking: why do you want to study math?
EY: A sense of inadequacy.
My current sense of the problems of self-modifying decision theory is that it won’t end up being
Deep Math, nothing like the proof of Fermat’s Last Theorem—that 95% of the progress-stopping
difficulty will be in figuring out which theorem is true and worth proving, not the proof. (Robin
Hanson spends a lot of time usefully discussing which activities are most prestigious in academia,
and it would be a Hansonian observation, even though he didn’t say it AFAIK, that complicated
proofs are prestigious but it’s much more important to figure out which theorem to prove.) Even
so, I was a spoiled math prodigy as a child—one who was merely amazingly good at math for
someone his age, instead of competing with other math prodigies and training to beat them. My
sometime coworker Marcello (he works with me over the summer and attends Stanford at other
times) is a non-spoiled math prodigy who trained to compete in math competitions and I have
literally seen him prove a result in 30 seconds that I failed to prove in an hour.
I’ve come to accept that to some extent we have different and complementary abilities—now and
then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s
not right" and maybe half the time it will actually be wrong. And when I’m feeling inadequate I
remind myself that having mysteriously good taste in final results is an empirically verifiable talent,
at least when it comes to math. This kind of perceptual sense of truth and falsity does seem to be
very much important in figuring out which theorems to prove. But I still get the impression that
the next steps in developing a reflective decision theory may require me to go off and do some of
the learning and training that I never did as a spoiled math prodigy, first because I could sneak by
on my ability to "see things", and second because it was so much harder to try my hand at any
sort of math I couldn’t see as obvious. I get the impression that knowing which theorems to prove
may require me to be better than I currently am at doing the proofs.
On some gut level I’m also just embarrassed by the number of compliments I get for my math
ability (because I’m a good explainer and can make math things that I do understand seem
obvious to other people) as compared to the actual amount of advanced math knowledge that I
have (practically none by any real mathematician’s standard). But that’s more of an emotion that
I’d draw on for motivation to get the job done, than anything that really ought to factor into my
long-term planning. For example, I finally looked up the drop-dead basics of category theory
because someone else on a transhumanist IRC channel knew about it and I didn’t. I’m happy to
accept my ignoble motivations as a legitimate part of myself, so long as they’re motivations to
learn math.
JB: Ah, how I wish more of my calculus students took that attitude. Math professors worldwide will
frame that last sentence of yours and put it on their office doors.
I’ve recently been trying to switch from pure math to more practical things. So I’ve been reading
more about control theory, complex systems made of interacting parts, and the like. Jan Willems
has written some very nice articles about this, and your remark about complicated proofs in
mathematics reminds me of something he said:
… I have almost always felt fortunate to have been able to do research in a mathematics
environment. The average competence level is high, there is a rich history, the subject is stable.
All these factors are conducive for science. At the same time, I was never able to feel
unequivocally part of the mathematics culture, where, it seems to me, too much value is put on
difficulty as a virtue in itself. My appreciation for mathematics has more to do with its clarity of
thought, its potential of sharply articulating ideas, its virtues as an unambiguous language. I am
more inclined to treasure the beauty and importance of Shannon’s ideas on errorless
communication, algorithms such as the Kalman filter or the FFT, constructs such as wavelets and
public key cryptography, than the heroics and virtuosity surrounding the four-color problem,
Fermat’s last theorem, or the Poincaré and Riemann conjectures.
I tend to agree. Never having been much of a prodigy myself, I’ve always preferred thinking of
math as a language for understanding the universe, rather than a list of famous problems to
challenge heroes, an intellectual version of the Twelve Labors of Hercules. But for me the universe
includes very abstract concepts, so I feel "pure" math such as category theory can be a great
addition to the vocabulary of any scientist.
Anyway: back to business. You said:
I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its
values in the process. And I still think the best means is self-improving AI.
I bet a lot of our readers would happily agree with your first sentence. It sounds warm and fuzzy.
But a lot of them might recoil from the next sentence. "So we should build robots that take over
the world???" Clearly there’s a long train of thought lurking here. Could you sketch how it goes?
EY: Well, there’s a number of different avenues from which to approach that question. I think I’d
like to start off with a quick remark—do feel free to ask me to expand on it—that if you want to
bring order to chaos, you have to go where the chaos is.
In the early twenty-first century the chief repository of scientific chaos is Artificial Intelligence.
Human beings have this incredibly powerful ability that took us from running over the savanna
hitting things with clubs to making spaceships and nuclear weapons, and if you try to make a
computer do the same thing, you can’t because modern science does not understand how this
ability works.
At the same time, the parts we do understand, such as that human intelligence is almost certainly
running on top of neurons firing, suggest very strongly that human intelligence is not the limit of
the possible. Neurons fire at, say, 200 hertz top speed; transmit signals at 150 meters/second top
speed; and even in the realm of heat dissipation (where neurons still have transistors beat cold) a
synaptic firing still dissipates around a million times as much heat as the thermodynamic limit for a
one-bit irreversible operation at 300 Kelvin. So without shrinking the brain, cooling the brain, or
invoking things like reversible computing, it ought to be physically possible to build a mind that
works at least a million times faster than a human one, at which rate a subjective year would pass
for every 31 sidereal seconds, and all the time from Ancient Greece up until now would pass in less
than a day. This is talking about hardware because the hardware of the brain is a lot easier to
understand, but software is probably a lot more important; and in the area of software, we have
no reason to believe that evolution came up with the optimal design for a general intelligence,
starting from incremental modification of chimpanzees, on its first try.
People say things like "intelligence is no match for a gun" and they’re thinking like guns grew on
trees, or they say "intelligence isn’t as important as social skills" like social skills are implemented
in the liver instead of the brain. Talking about smarter-than-human intelligence is talking about
doing a better version of that stuff humanity has been doing over the last hundred thousand years.
If you want to accomplish large amounts of good you have to look at things which can make large
differences.
Next lemma: Suppose you offered Gandhi a pill that made him want to kill people. Gandhi starts
out not wanting people to die, so if he knows what the pill does, he’ll refuse to take the pill,
because that will make him kill people, and right now he doesn’t want to kill people. This is an
informal argument that Bayesian expected utility maximizers with sufficient self-modification
ability will self-modify in such a way as to preserve their own utility function. You would like me to
make that a formal argument. I can’t, because if you take the current formalisms for things like
expected utility maximization, they go into infinite loops and explode when you talk about selfmodifying the part of yourself that does the self-modifying. And there’s a little thing called Löb’s
Theorem which says that no proof system at least as powerful as Peano Arithmetic can
consistently assert its own soundness, or rather, if you can prove a theorem of the form
□P ⇒ P
(if I prove P then it is true) then you can use this theorem to prove P. Right now I don’t know how
you could even have a self-modifying AI that didn’t look itself over and say, "I can’t trust anything
this system proves to actually be true, I had better delete it". This is the class of problems I’m
currently working on—reflectively consistent decision theory suitable for self-modifying AI. A
solution to this problem would let us build a self-improving AI and know that it was going to keep
whatever utility function it started with.
There’s a huge space of possibilities for possible minds; people makethe mistake of asking "What
will AIs do?" like AIs were the Tribe that Lives Across the Water, foreigners all of one kind from the
same country. A better way of looking at it would be to visualize a gigantic space of possible minds
and all human minds fitting into one tiny little dot inside the space. We want to understand
intelligence well enough to reach into that gigantic space outside and pull out one of the rare
possibilities that would be, from our perspective, a good idea to build.
If you want to maximize your marginal expected utility you have to maximize on your choice of
problem over the combination of high impact, high variance, possible points of leverage, and few
other people working on it. The problem of stable goal systems in self-improving Artificial
Intelligence has no realistic competitors under any three of these criteria, let alone all four.
That gives you rather a lot of possible points for followup questions so I’ll stop there.
JB: Sure, there are so many followup questions that this interview should be formatted as a tree
with lots of branches instead of in a linear format. But until we can easily spin off copies of
ourselves I’m afraid that would be too much work.
So, I’ll start with a quick point of clarification. You say "if you want to bring order to chaos, you
have to go where the chaos is." I guess that at one level you’re just saying that if we want to
make a lot of progress in understanding the universe, we have to tackle questions that we’re really
far from understanding—like how intelligence works.
And we can say this in a fancier way, too. If we wants models of reality that reduce the entropy of
our probabilistic predictions (there’s a concept of entropy for probability distributions, which is big
when the probability distribution is very smeared out), then we have to find subjects where our
predictions have a lot of entropy.
Am I on the right track?
EY: Well, if we wanted to torture the metaphor a bit further, we could talk about how what you
really want is not high-entropy distributions but highly unstable ones. For example, if I flip a coin,
I have no idea whether it’ll come up heads or tails (maximum entropy) but whether I see it come
up heads or tails doesn’t change my prediction for the next coinflip. If you zoom out and look at
probability distributions over sequences of coinflips, then high-entropy distributions tend not to
ever learn anything (seeing heads on one flip doesn’t change your prediction next time), while
inductive probability distributions (where your beliefs about probable sequences are such that, say,
11111 is more probable than 11110) tend to be lower-entropy because learning requires structure.
But this would be torturing the metaphor, so I should probably go back to the original tangent:
Richard Hamming used to go around annoying his colleagues at Bell Labs by asking them what
were the important problems in their field, and then, after they answered, he would ask why they
weren’t working on them. Now, everyone wants to work on "important problems", so why areso
few people working on important problems? And the obvious answer is that working on the
important problems doesn’t get you an 80% probability of getting one more publication in the next
three months. And most decision algorithms will eliminate options like that before they’re even
considered. The question will just be phrased as, "Of the things that will reliably keep me on my
career track and not embarrass me, which is most important?"
And to be fair, the system is not at all set up to support people who want to work on high-risk
problems. It’s not even set up to socially support people who want to work on high-risk problems.
In Silicon Valley a failed entrepreneur still gets plenty of respect, which Paul Graham thinks is one
of the primary reasons why Silicon Valley produces a lot of entrepreneurs and other places don’t.
Robin Hanson is a truly excellent cynical economist and one of his more cynical suggestions is that
the function of academia is best regarded as the production of prestige, with the production of
knowledge being something of a byproduct. I can’t do justice to his development of that thesis in a
few words (keywords: hanson academia prestige) but the key point I want to take away is that if
you work on a famous problem that lots of other people are working on, your marginal
contribution to human knowledge may be small, but you’ll get to affiliate with all the other
prestigious people working on it.
And these are all factors which contribute to academia, metaphorically speaking, looking for its
keys under the lamppost where the light is better, rather than near the car where it lost them.
Because on a sheer gut level, the really important problems are often scary. There’s a sense of
confusion and despair, and if you affiliate yourself with the field, that scent will rub off on you.
But if you try to bring order to an absence of chaos—to some field where things are already in nice,
neat order and there is no sense of confusion and despair—well, the results are often well
described in a little document you may have heard of called the Crackpot Index. Not that this is
the only thing crackpot high-scorers are doing wrong, but the point stands, you can’t revolutionize
the atomic theory of chemistry because there isn’t anything wrong with it.
We can’t all be doing basic science, but people who see scary, unknown, confusing problems that
no one else seems to want to go near and think "I wouldn’t want to work on that!" have got their
priorities exactly backward.
JB: The never-ending quest for prestige indeed has unhappy side-effects in academia. Some of my
colleagues seem to reason as follows:
If Prof. A can understand Prof. B’s work, but Prof. B can’t understand Prof. A, then Prof. A must be
smarter—so Prof. A wins.
But I’ve figured out a way to game the system. If I write in a way that few people can understand,
everyone will think I’m smarter than I actually am! Of course I need someone to understand my
work, or I’ll be considered a crackpot. But I’ll shroud my work in jargon and avoid giving away my
key insights in plain language, so only very smart, prestigious colleagues can understand it.
On the other hand, tenure offers immense opportunities for risky and exciting pursuits if one is
brave enough to seize them. And there are plenty of folks who do. After all, lots of academics are
self-motivated, strong-willed rebels.
This has been on my mind lately since I’m trying to switch from pure math to something quite
different. I’m not sure what, exactly. And indeed that’s why I’m interviewing you!
(Next week: Yudkowsky on The Art of Rationality, and what it means to be rational.)
Whenever there is a simple error that most laymen fall for, there is always a slightly more
sophisticated version of the same problem that experts fall for. – Amos Tversky