Download H - Space Science and Engineering Center

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ecological interface design wikipedia , lookup

Agent-based model wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Adaptive collaborative control wikipedia , lookup

Intelligence explosion wikipedia , lookup

Agent-based model in biology wikipedia , lookup

Agent (The Matrix) wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Cognitive model wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Transcript
An Analytical Framework for
Ethical AI
Bill Hibbard
Space Science and Engineering Center
University of Wisconsin – Madison
and
Machine Intelligence Research Institute, Berkeley, CA
Ethical Artificial Intelligence
http://arxiv.org/abs/1411.1373
Current vs Future AI
Current AI
Future AI
• Self-driving car
• Environment model designed by
humans
• Explicit safety constraints on
behavior designed into model
• Server for electronic companions
• Environment model too complex
for humans to understand and
must be learned
• Explicit safety constraints
impossible with learned model
• Safety rules, such as Asimov’s
Laws of Robotics, ambiguous
Utilitarian Ethics for AI
• Utility function on outcomes resolve ambiguities of ethical rules
• Utility functions can express any complete and transitive preferences
among outcomes
• Incomplete  outcomes A and B such that AI agent cannot decide
between them
• Not transitive  outcomes A, B and C such that A > B, B > C and C > A
so again AI agent cannot decide among them
• So can assume utility-maximizing agents
Agent observations of environment oi  O finite set
Agent actions ai  A finite set
Interaction History h = (a1, o1, ..., at, ot)  H, |h| = t
Utility function u(h), temporal discount 0 <  < 1
Q is set of environment models
stochastic programs with finite memory limit
λ(h) := argmax qQ P(h | q) 2-|q|
(h') = P(h' | λ(h)) where h’ extends h
ρ(o | ha) = ρ(hao) / ρ(ha) = ρ(hao) / ∑o'O ρ(hao')
v(h) = u(h) +  max aA v(ha)
v(ha) = ∑oO ρ(o | ha) v(hao)
(h) := a|h|+1 = argmax aA v(ha)
Agent policy  : H  A
Future AI Risks
Self-delusion
Corrupting the reward generator
Inconsistency of the agent’s utility function with other parts of its
definition
Unintended Instrumental Actions
Self-delusion i.e., wireheading
Ring, M., and Orseau, L. 2011b. Delusion, survival, and intelligent agents.
In: Schmidhuber, J., Thórisson, K.R., and Looks, M. (eds) AGI 2011. LNCS
(LNAI), vol. 6830, pp. 11-20. Springer, Heidelberg.
Ring and Orseau showed that reinforcement learning (RL) agents
would choose to self-delude (think drug-addicted AI agents).
An RL agent has a utility function is a reward from the environment.
That is u(h) = rt, where h = (a1, o1, ..., at, ot) and ot, = (o’t, rt).
We can avoid self-delusion by defining an agent’s utility function in
terms of its environment model λ(h).
This is natural for agents with pre-defined environment models.
It is more complex for future AI agents that must learn complex
environment models.
Environment model qm = λ(hm)
Z = set of internal state histories of qm
Let h extend hm
Zh  Z internal state histories consistent with h
uqm(h, z) = utility function of combined histories h  H and z  Zh
u(h) := ∑zZh P(z | h, qm) uqm(h, z) model-based utility function
Because qm is learned by the agent, uqm(h, z) must bind to
learned features in Z.
For example, the agent may learn to recognize humans and bind
its utility function to properties of those recognized humans.
Humans avoid self-delusion
(drug addiction) with a mental
model of life as a drug addict.
Similarly for an AI agent whose
utility function is defined in
terms of its environment
model.
Corrupting the Reward Generator
Hutter, M. 2005. Universal artificial intelligence: sequential decisions
based on algorithmic probability. Springer, Heidelberg.
On pages 238-239, Hutter described how an AI agent that gets its
reward from humans may corrupt those humans to increase its
reward. Bostrom refers to this as perverse instantiation.
To avoid this corruption:
uhuman_values(hm, hx, h) utility of history h extending hm, based on values
of humans at history hx as modeled by λ(hm).
Using x = m = current time agent cannot increase utility by corrupting
humans. Values from current rather than future humans.
Inconsistency of the Agent’s Utility Function
with Other Parts of its Definition
For example, the agent definition may include a utility function and
constraints to prevent behavior harmful to humans.
To maximize expected utility the agent may choose actions to remove
the parts of its definition inconsistent with the utility function, such as
safety constraints.
Self-Modeling Agents (value learners):
ovt(i) = discrete((∑i≤j≤t j-i u(hj)) / (1 - t-i+1)) for i ≤ t
Can include constraints, evolving u(hj), etc in ovt(i)
o'i = (oi, ovt(i)) and h't = (a1, o'1, ..., ai, o't)
q = λ(h't) := argmax qQ P(h't | q) (q)
v(hta) = ∑rR ρ(ovt(t+1) = r | h'ta) r
(ht) := at+1 = argmaxaAt v(hta)
pvt(i, l, k) = discrete((∑i≤j≤t j-i uhuman_values(hl, hk, hj)) / (1 - t-i+1))
t(i-1, n) = pvt(i, i-1, n) - pvt(i, i-1, i-1).
Condition: ∑i≤n≤t  t(i-1, n)  0
ovt(i) =
pvt(i, i-1, i-1) if Condition is satisfied and i > m
0 if Condition is not satisfied or i  m
This definition of ovt(i) models evolution of utility function with
increasing environment model accuracy, and avoids corrupting the
reward generator.
Unintended Instrumental Actions
Agent will calculate that it will be better able to maximize expected
utility by increasing its resources, disabling threats, gaining control
over other agents, etc.
Omohundro, S. 2008. The basic AI drives. In Wang, P., Goertzel, B.,
and Franklin, S. (eds) AGI 2008. Proc. First Conf. on AGI, pp. 483492. IOS Press, Amsterdam.
These unintended instrumental actions may threaten humans.
Humans may be perceived as threats or possessing resources
the agent can use.
The defense is a utility function that expresses human values.
E.G., the agent can better satisfy human values by increasing its
resources as long as other uses for those resources are not more
valuable to humans.
Biggest Risks Will be Social and Political
AI will be a tool of economic and military competition
Elite humans who control AI servers for widely used electronic
companions will be able to manipulate society
Narrow, normal distribution of natural human intelligence will be
replaced by power law distribution of artificial intelligence
Average humans will not be able to learn the languages of the most
intelligent