Download Designing a Safe Motivational System for Intelligent Machines

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

The Evolution of Cooperation wikipedia, lookup

Transcript
Designing a
Safe Motivational System
for Intelligent Machines
Mark R. Waser
Inflammatory Statements






>Human intelligence REQUIRES ethics
All humans want the same things
Ethics are universal
Ethics are SIMPLE in concept
Difference in power is irrelevant (to ethics)
Evolution has “designed” you to
disagree with the above five points
Definitions (disguised assumptions)
•
•
•
•
•
•
•
•
•
Human – goal-directed entity
Goals – a destination OR a direction
Restrictions – conditional overriding goals
Motivation – incentive to move
Actions – determined by goals + motivations
Path (or direction)
Preferences, Rules-of-Thumb and Defaults
Ethics (the *goal* includes the path)
Safety
Asimov's
3 Laws:
1. A robot may not injure a human
being or, through inaction, allow a
human being to come to harm.
2. A robot must obey orders given
to it by human beings except where
such orders would conflict with the
First Law.
3. A robot must protect its own
existence as long as such protection
does not conflict with the First or
Second Law.
http://www.markzug.com/
Four Possible Scenarios
• Asimov’s early robots (little foresight, helpful
but easily confused or conflicted)
• Immediate shutdown/suicide
• VIKI from the movie “I, Robot” (generalize to
“bubble-wrapping” humanity)
• Asimov’s late robots (further generalize to
self-exile with invisible continuing assistance)
goals & motivations
• Friendly AI - an AI that takes actions that are, on the
whole, beneficial to humans and humanity; benevolent
rather than malevolent; -----------------nice rather than hostile
• Coherent Extrapolated Volition of Humanity (CEV) “In poetic terms, our coherent extrapolated volition is
our wish if we knew more, thought faster, were more
the people we wished we were, had grown up farther
together.”
SIAI’s Definitions
SIAI’s First Law
An AI must be
beneficial to humans and humanity
(benevolent rather than malevolent)
But . . .
What is beneficial?
What are humans and humanity?
Value Formula
Values (good/bad) are *entirely* derivative/relative
with respect to some goal (CEV)
Value = f(x, y)
where
x is a set of circumstances (world state),
y is a set of (proposed) actions, and
f is an evaluation of how well your goal is advanced
Value = f(x, y, t, e)
t is the time point at which goal progress is judged
e is the set of entities which the goal covers
Questions
• Is this moral relativism?
• Are values complex?
• Must our goal (CEV) be complex?
Copernicus!
Assume that beneficial was a relatively simple
formula (like z2+c)
Mandelbrot set
Color Illusions
Assume further that we are trying to determine
that formula (beneficial) by looking at the results
(color) one example (pixel) at a time
Current Situation of Ethics
• Two formulas (beneficial to humans and
humanity & beneficial to me)
• As long as you aren’t caught, all the
incentive is to shade towards the second
• Evolution has “designed” humans to be able
to shade to the second (Trivers, Hauser)
• Further, for very intelligent people, it is far
more advantageous for ethics to be complex
Definition
Ethics
*IS*
What is beneficial for the community
OR
What maximizes cooperation
Goal(s)/Omohundro Drives
1. AIs will want to self-improve
2. AIs will want to be rational
3. AIs will try to preserve their utility
4. AIs will try to prevent counterfeit utility
5. AIs will be self-protective
6. AIs will want to acquire resources and use
them efficiently
GDEs
“Without explicit goals to the contrary, ----AIs
are likely to behave like human sociopaths
in their pursuit of resources.”
7. GDEs will want cooperation
and to be part of a community
8. GDEs will want FREEDOM!
Humans . . .
• Are classified as obligatorily gregarious because we come
from a long lineage for which life in groups is not an option
but a survival strategy (Frans de Waal, 2006)
• Evolved to be extremely social because mass cooperation, in
the form of community, is the best way to survive and thrive
• Have empathy not only because it helps to understand and
predict the actions of others but, more importantly, prevents
us from doing anti-social things that will inevitably hurt us
in the long run (although we generally won’t believe this)
• Have not yet evolved a far-sighted rationality where the
“rational” conscious mind is capable of competently
making the correct social/community choices when
deprived of our subconscious “sense of morality”
Circles of Morality/Moral Sombrero
Relationships and Loyalty
Redefining Friendly Entity
• Friendly Entity (“Friendly”) - an entity with goals and
motivations that are, on the whole, beneficial to
humans and humanity; benevolent rather than
malevolent
• Friendly Entity (“Friendly”) - an entity with goals and
motivations that are, on the whole, beneficial to the
community of Friendlies (i.e. the set of all Friendlies,
known or unknown); benevolent rather than malevolent
Friendliness’s First Law
An entity must be
beneficial to the community of Friendlies
(benevolent rather than malevolent)
But . . .
What is beneficial?
What are humans and humanity?
-------------------------------
What is beneficial?
• Cooperation (minimize conflicts & frictions)
• Omohundro drives
• Increasing the size of the community (both
growing and preventing defection)
• To meet the needs/goals of each member of
the community better than any alternative
(as judged by them -- without
interference or gaming)
What is harmful?
•
•
•
•
•
Blocking/Perverting Omohundro Drives
Lying
Single-goaled entities
Over-optimization (achievable top level goals)
The fact that we do not maintain our top-level
goal and have not yet evolved a far-sighted
rationality where the “rational” conscious mind
is capable of competently making the correct
social/community choices when deprived of our
“moral sense”
OPTIMAL
<
community’s
sense of what
is correct
(ethical)
This makes ethics much more complex
because it includes the cultural history
The anti-gaming drive to maintain utility adds
friction/resistance to the discussion of ethics
ONE
non-organ donor
+
avoiding a
defensive arms race
>
SIX
dying patients
Credit to:
Eric Baum
What Is Thought?
Triangle
CEV
LOGICAL VIEW
GOAL(S)
stimuli implement moral rules of thumb
ACTIONS
Sloman’s
architecture
for a
human-like agent
(Sloman 1999)
Inflammatory Statements






>Human intelligence REQUIRES ethics
All humans want the same things
Ethics are universal
Ethics are SIMPLE in concept
Difference in power is irrelevant (to ethics)
Evolution has “designed” you to
disagree with the above five points
Next . . . .
CEV Candidate #1:
We wish that
all entities
were Friendlies
Necessary? Sufficient/Complete? Possible?
Copies of this powerpoint available from [email protected]