Download Robot Learning, Future of Robotics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Existential risk from artificial general intelligence wikipedia , lookup

Perceptual control theory wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Catastrophic interference wikipedia , lookup

Fuzzy logic wikipedia , lookup

Pattern recognition wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Reinforcement learning wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Machine learning wikipedia , lookup

Concept learning wikipedia , lookup

Transcript
Autonomous Mobile Robots
CPE 470/670
Lecture 13
Instructor: Monica Nicolescu
Review
• Hybrid control
– Selection, Advising, Adaptation, Postponing
– AuRA, Atlantis, Planner-Reactor, PRS, many others
• Adaptive behavior
– Adaptation vs. learning
– Challenges
– Types of learning algorithms
CPE 470/670 - Lecture 13
2
Learning Methods
• Reinforcement learning
• Neural network (connectionist) learning
• Evolutionary learning
• Learning from experience
– Memory-based
– Case-based
• Learning from demonstration
• Inductive learning
• Explanation-based learning
• Multistrategy learning
CPE 470/670 - Lecture 13
3
Reinforcement Learning (RL)
• Motivated by psychology (the Law of Effect,
Thorndike 1991):
Applying a reward immediately after the
occurrence of a response increases its probability
of reoccurring, while providing punishment after
the response will decrease the probability
• One of the most widely used methods for adaptation
in robotics
CPE 470/670 - Lecture 13
4
Reinforcement Learning
• Combinations of stimuli
(i.e., sensory readings and/or state)
and responses (i.e., actions/behaviors)
are given positive/negative reward
in order to increase/decrease their probability of future use
• Desirable outcomes are strengthened and undesirable
outcomes are weakened
• Critic: evaluates the system’s response and applies
reinforcement
– external: the user provides the reinforcement
– internal: the system itself provides the reinforcement (reward
function)
CPE 470/670 - Lecture 13
5
Decision Policy
• The robot can observe the state of
the environment
• The robot has a set of actions it can perform
– Policy: state/action mapping that determines which
actions to take
• Reinforcement is applied based on the results of the
actions taken
– Utility: the function that gives a utility value to each state
• Goal: learn an optimal policy that chooses the best
action for every set of possible inputs
CPE 470/670 - Lecture 13
6
Unsupervised Learning
• RL is an unsupervised learning method:
– No target goal state
• Feedback only provides information on the quality of
the system’s response
– Simple: binary fail/pass
– Complex: numerical evaluation
• Through RL a robot learns on its own, using its own
experiences and the feedback received
• The robot is never told what to do
CPE 470/670 - Lecture 13
7
Challenges of RL
• Credit assignment problem:
– When something good or bad happens, what exact
state/condition-action/behavior should be rewarded or
punished?
• Learning from delayed rewards:
– It may take a long sequence of actions that receive
insignificant reinforcement to finally arrive at a state with
high reinforcement
– How can the robot learn from reward received at some
time in the future?
CPE 470/670 - Lecture 13
8
Challenges of RL
• Exploration vs. exploitation:
– Explore unknown states/actions or exploit states/actions
already known to yield high rewards
• Partially observable states
– In practice, sensors provide only partial information about
the state
– Choose actions that improve observability of environment
• Life-long learning
– In many situations it may be required that robots learn
several tasks within the same environment
CPE 470/670 - Lecture 13
9
Types of RL Algorithms
• Adaptive Heuristic Critic (AHC)
• Learning the policy is separate from
learning the utility function the critic
uses for evaluation
• Idea: try different actions in
different states and observe
the outcomes over time
CPE 470/670 - Lecture 13
10
Q-Learning
• Watkins 1980’s
• A single utility Q-function is learned
to evaluate both actions and states
• Q values are stored in a table
• Updated at each step, using the following rule:
Q(x,a) Q(x,a) +  (r + E(y) - Q(x,a))
•
x: state; a: action; : learning rate; r: reward;
: discount factor (0,1);
• E(y) is the utility of the state y: E(y) = max(Q(y,a))  actions a
• Guaranteed to converge to optimal solution, given infinite trials
CPE 470/670 - Lecture 13
11
Learning to Walk
• Maes, Brooks (1990)
• Genghis: hexapod robot
• Learned stable tripod
stance and tripod gait
• Rule-based subsumption
controller
• Two sensor modalities for feedback:
– Two touch sensors to detect hitting the floor: - feedback
– Trailing wheel to measure progress: + feedback
CPE 470/670 - Lecture 13
12
Learning to Walk
• Nate Kohl & Peter Stone (2004)
CPE 470/670 - Lecture 13
13
Learning to Push
• Mahadevan & Connell 1991
• Obelix: 8 ultrasonic sensors, 1 IR, motor current
• Learned how to push a box (Q-learning)
• Motor outputs grouped into 5 choices: move
forward, turn left or right (22 degrees), sharp
turn left/right (45 degrees)
• 250,000 states
CPE 470/670 - Lecture 13
14
Supervised Learning
• Supervised learning requires the user to give the
exact solution to the robot in the form of the error
direction and magnitude
• The user must know the exact desired behavior for
each situation
• Supervised learning involves training, which can be
very slow; the user must supervise the system with
numerous examples
CPE 470/670 - Lecture 13
15
Neural Networks
• One of the most used supervised learning methods
• Used for approximating real-valued and vectorvalued target functions
• Inspired from biology: learning systems are built
from complex networks of interconnecting neurons
• The goal is to minimize the error between the
network output and the desired output
– This is achieved by adjusting the weights on the network
connections
CPE 470/670 - Lecture 13
16
Training Neural Networks
• Hebbian learning
– Increases synaptic strength along neural pathways
associated with a stimulus and a correct response
• Perceptron learning
– Delta Rule: for networks without hidden layers
– Back-propagation: for multi-layer networks
CPE 470/670 - Lecture 13
17
Perceptron Learning
Repeat
•
Present an example from a set of positive and negative
learning experiences
•
Verify the output of the network as to whether it is correct or
incorrect
•
If it is incorrect, supply the correct output at the output unit
•
Adjust the synaptic weights of the perceptrons in a manner
that reduces the error between the observed output and the
correct output
Until satisfactory performance (convergence or stopping
condition is met)
CPE 470/670 - Lecture 13
18
ALVINN
• ALVINN (Autonomous Land
Vehicle in a Neural Network)
• Dean Pomerleau (1991)
• Pittsburg to San Diego: 98.2%
autonomous
CPE 470/670 - Lecture 13
19
Learning from Demonstration & RL
• S. Schaal (’97)
• Pole balancing, pendulum-swing-up
CPE 470/670 - Lecture 13
20
Learning from Demonstration
Inspiration:
• Human-like teaching by demonstration
Demonstration
Robot performance
CPE 470/670 - Lecture 13
21
Learning from Robot Teachers
• Transfer of task knowledge from humans to robots
Human demonstration
CPE 470/670 - Lecture 13
Robot performance
23
Classical Conditioning
• Pavlov 1927
• Assumes that unconditioned stimuli (e.g. food)
automatically generate an unconditioned response
(e.g., salivation)
• Conditioned stimulus (e.g., ringing a bell) can, over
time, become associated with the unconditioned
response
CPE 470/670 - Lecture 13
24
Darvin VII
• G. Edelman et. Al.
• Low reflectivity walls, floor
• Darvin VII Sensors
• Two types of stimulus blocks
– CCD Camera
– 6cm metallic cubes
– Gripper that senses
conductivity
– Blobs: low conductivity (“bad
taste”)
– IR sensors
– Stripes: high conductivity (“good
taste”)
• Darvin VII Actuators
– PTZ camera
– Wheels
– Gripper
CPE 470/670 - Lecture 13
25
Darvin’s Perceptual Categorization
Early training
After the 10th stimulus
• Instead of hard-wiring stimulus-response rules,
develop these associations over time
CPE 470/670 - Lecture 13
26
Genetic Algorithms
• Inspired from evolutionary biology
• Individuals in a populations have a particular fitness
with respect to a task
• Individuals with the highest fitness are kept as
survivors
• Individuals with poor performance are discarded: the
process of natural selection
• Evolutionary process: search through the space of
solutions to find the one with the highest fitness
CPE 470/670 - Lecture 13
27
Genetic Operators
• Knowledge is encoded as bit strings: chromozome
– Each bit represents a “gene”
• Biologically inspired operators are applied to yield
better generations
CPE 470/670 - Lecture 13
28
Classifier Systems
• ALECSYS system
• Learns new behaviors and
coordination
• Genetic operators act upon a
set of rules encoded by bit
strings
• Demonstrated tasks:
– Phototaxis
– Coordination of approaching,
chasing and escaping
behaviors by combination,
suppression and sequencing
CPE 470/670 - Lecture 13
29
Evolving Structure and Control
• Karl Sims 1994
• Evolved morphology and control
for virtual creatures performing
swimming, walking, jumping,
and following
• Genotypes encoded as directed graphs are used to produce
3D kinematic structures
• Genotype encode points of attachment
• Sensors used: contact, joint angle and photosensors
CPE 470/670 - Lecture 13
30
Evolving Structure and Control
• Jordan Pollak
– Real structures
CPE 470/670 - Lecture 13
31
Fuzzy Control
• Fuzzy control produces actions using a set of fuzzy
rules based on fuzzy logic
• In fuzzy logic, variables take values based on how
much they belong to a particular fuzzy set:
– Fast, slow, far, near – not crisp values!!
• A fuzzy logic control system consists of:
– Fuzzifier: maps sensor readings to fuzzy input sets
– Fuzzy rule base: collection of IF-THEN rules
– Fuzzy inference: maps fuzzy sets to other fuzzy sets
according to the rulebase
– Defuzzifier: maps fuzzy outputs to crisp actuator commands
CPE 470/670 - Lecture 13
32
Examples of Fuzzy Control
• Flakey the robot:
– Behaviors are encoded as collections of fuzzy rules
IF obstacle-close-in-front AND NOT obstacle-close-on-left
THEN turn sharp-left
– Each behavior may be active to a varying degree
– Behavior responses are blended smoothly
– Multiple goals can be pursued
• Systems for learning fuzzy rules have also been
developed
CPE 470/670 - Lecture 13
33
Where Next?
CPE 470/670 - Lecture 13
34
Fringe Robotics: Beyond Behavior
Questions for the future
• Human-like intelligence
• Robot consciousness
• Complete autonomy of complex thought and action
• Emotions and imagination in artificial systems
• Nanorobotics
• Successor to human beings
CPE 470/670 - Lecture 13
35
A Robot Mind
• The goal of AI is to build artificial minds
• What is the mind?
• “The mind is what the brain does.” (M. Minsky)
• The mind includes
– thinking
– feeling
CPE 470/670 - Lecture 13
36
Computational Thought
• What does it mean for a machine to think?
• Bellman
– Thought is not well defined, so we cannot ascribe/judge it
– Computers can perform processes representative of human
thought: decision making/learning
•
Albus
– For robots to understand humans, they must be indistinguishable
from humans in bodily appearance, physical and mental
development
• Brooks:
– Thought and consciousness need not be programmed in: they
will emerge
CPE 470/670 - Lecture 13
37
The Turing Test
• Developed by the mathematician Alan Turing
Original version of Turing Test:
• Two people (a man and a woman) are put in
separate closed rooms. A third person can interact
with each of the two through writing (no voices).
• Can the 3rd person tell the difference between the
man and the woman?
CPE 470/670 - Lecture 13
38
The Turing Test
AI version of the Turing Test:
• A person sits in front of two terminals: at one end is
a human at the other end is a computer. The
questioner is free to ask any questions to the
respondents at the other end of the terminals
• If the questioner cannot tell the difference between
the computer and the human subject, the computer
has passed the Turing Test!
CPE 470/670 - Lecture 13
39
The Turing Test
• The Turing Test contest is performed annually, and it
carries a $100,000 award for anybody who passes it
• No computer so far has truly passed the Turing Test
• Is this a good test of intelligence?
– Thought is defined based on human fallibility rather than on
machine consciousness
• Many researchers oppose to using this test as a proof
of intelligence
CPE 470/670 - Lecture 13
40
Penrose’s Critique
• Roger Penrose (Emperor’s new Mind, Shadows of the
Mind), a British physicist, is a famous critic of AI
• Intelligence is a consequence of neural activity and
interactions in the brain
• Computers can only simulate this activity, but this is
not sufficient for true intelligence
• Intelligence requires understanding, and
understanding requires awareness, an aspect of
consciousness
• Many refuting arguments have been given
CPE 470/670 - Lecture 13
41
“They're Made Out Of Meat“
Terry Bisson
"They're made out of meat.“
"Meat?“
"Meat. They're made out of meat.“
"Meat?“
"There's no doubt about it. We picked several from different
parts of the planet, took them aboard our recon vessels,
probed them all the way through. They're completely meat.“
"That's impossible. What about the radio signals? The
messages to the stars.“
"They use the radio waves to talk, but the signals don't come
from them. The signals come from machines.“
"So who made the machines? That's who we want to contact."
CPE 470/670 - Lecture 13
42
“They're Made Out Of Meat“
Terry Bisson
"They made the machines. That's what I'm trying to tell you.
Meat made the machines.“
That's ridiculous. How can meat make a machine? You're
asking me to believe in sentient meat.“
"I'm not asking you, I'm telling you. These creatures are the
only sentient race in the sector and they're made out of meat.“
"Maybe they're like the Orfolei. You know, a carbon-based
intelligence that goes through a meat stage.“
"Nope. They're born meat and they die meat. We studied
them for several of their life spans, which didn't take too long.
Do you have any idea what’s the life span of meat?“
"Spare me. Okay, maybe they're only part meat. You know,
like the Weddilei. A meat head with an electron plasma brain
inside."
CPE 470/670 - Lecture 13
43
“They're Made Out Of Meat“
Terry Bisson
"Nope. We thought of that, since they do have meat heads
like the Weddilei. But I told you, we probed them. They're
meat all the way through.“
"No brain?“
"Oh, there is a brain all right. It's just that the brain is made
out of meat!“
"So... what does the thinking?"
"You're not understanding, are you? The brain does the
thinking. The meat.“
"Thinking meat! You're asking me to believe in thinking meat!“
"Yes, thinking meat! Conscious meat! Loving meat. Dreaming
meat. The meat is the whole deal! Are you getting the
picture?"
CPE 470/670 - Lecture 13
44
Conclusion
Lots of remaining interesting problems to explore!
Get involved!
CPE 470/670 - Lecture 13
45
Readings
• Lecture notes
CPE 470/670 - Lecture 13
46