Download Analysis of Algorithms CS 465/665

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operant conditioning wikipedia , lookup

Learning theory (education) wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Perceptual control theory wikipedia , lookup

Transcript
Advanced Topics in Robotics
CS493/790 (X)
Lecture 2
Instructor: Monica Nicolescu
Robot Components
• Sensors
• Effectors and actuators
– Used for locomotion and manipulation
• Controllers for the above systems
– Coordinating information from sensors
with commands for the robot’s actuators
• Robot = an autonomous system which exists in the
physical world, can sense its environment and can
act on it to achieve some goals
CS 493/790(X) - Lecture 2
2
Sensors
• Sensor = physical device that provides
information about the world
– Process is called sensing or perception
• Robot sensing depends on the task
• Sensor (perceptual) space
– All possible values of sensor readings
– One needs to “see” the world through the robot’s “eyes”
– Grows quickly as you add more sensors
CS 493/790(X) - Lecture 2
3
State Space
• State: A description of the robot (of a system in general)
• State space: All possible states a robot could be in
– E.g.: light switch has two states, ON, OFF; light switch with
dimmer has continuous state (possibly infinitely many states)
• Different than the sensor/perceptual space!!
– Internal state may be used to store information about the world
(maps, location of “food”, etc.)
• How intelligent a robot appears is strongly dependent on how
much and how fast it can sense its environment and about
itself
CS 493/790(X) - Lecture 2
4
Representation
• Internal state that stores information about the world
is called a representation or internal model
– Self: stored proprioception, goals, intentions, plans
– Environment: maps
– Objects, people, other robots
– Task: what needs to be done, when, in what order
• Representations and models influence determine
the complexity of a robot’s “brain”
CS 493/790(X) - Lecture 2
5
Action
• Effectors: devices of the robot that have impact on
the environment (legs, wings  robotic legs,
propeller)
• Actuators: mechanisms that allow the effectors to
do their work (muscles  motors)
• Robotic actuators are used for
– locomotion (moving around, going places)
– manipulation (handling objects)
CS 493/790(X) - Lecture 2
6
Autonomy
• Autonomy is the ability to make one’s own decisions
and act on them.
– For robots: take the appropriate action on a given situation
• Autonomy can be complete (R2D2) or partial
(teleoperated robots)
CS 493/790(X) - Lecture 2
7
Control Architectures
• Robot control is the means by which the sensing and
action of a robot are coordinated
• Controllers enable robots to be autonomous
– Play the role of the “brain” and nervous system in animals
• Controllers need not (should not) be a single program
– Typically more than one controller, each process information
from sensors and decide what actions to take
– Should control modules be centralized?
• Challenge: how do all these controllers coordinate
with each other?
CS 493/790(X) - Lecture 2
8
Spectrum of robot control
From “Behavior-Based Robotics” by R. Arkin, MIT Press, 1998
CS 493/790(X) - Lecture 2
9
Robot control approaches
• Reactive Control
– Don’t think, (re)act.
• Deliberative (Planner-based) Control
– Think hard, act later.
• Hybrid Control
– Think and act separately & concurrently.
• Behavior-Based Control (BBC)
– Think the way you act.
CS 493/790(X) - Lecture 2
10
Thinking vs. Acting
• Thinking/Deliberating
– slow, speed decreases with complexity
– involves planning (looking into the future) to avoid bad
solutions
– thinking too long may be dangerous
– requires (a lot of) accurate information
– flexible for increasing complexity
• Acting/Reaction
– fast, regardless of complexity
– innate/built-in or learned (from looking into the past)
– limited flexibility for increasing complexity
CS 493/790(X) - Lecture 2
11
Reactive Control:
Don’t think, react!
• Technique for tightly coupling perception and action to provide
fast responses to changing, unstructured environments
• Collection of stimulus-response rules
• Limitations
• Advantages
– No/minimal state
– Very fast and reactive
– No memory
– Powerful method: animals
are largely reactive
– No internal representations
of the world
– Unable to plan ahead
– Unable to learn
CS 493/790(X) - Lecture 2
12
Deliberative Control:
Think hard, then act!
• In DC the robot uses all the available sensory information and
stored internal knowledge to create a plan of action: sense 
plan  act (SPA) paradigm
• Limitations
– Planning requires search through potentially all possible plans 
these take a long time
– Requires a world model, which may become outdated
– Too slow for real-time response
• Advantages
– Capable of learning and prediction
– Finds strategic solutions
CS 493/790(X) - Lecture 2
13
Hybrid Control:
Think and act independently & concurrently!
• Combination of reactive and deliberative control
– Reactive layer (bottom): deals with immediate reaction
– Deliberative layer (top): creates plans
– Middle layer: connects the two layers
• Usually called “three-layer systems”
• Major challenge: design of the middle layer
– Reactive and deliberative layers operate on very different
time-scales and representations (signals vs. symbols)
– These layers must operate concurrently
• Currently one of the two dominant control paradigms
in robotics
CS 493/790(X) - Lecture 2
14
Behavior-Based Control:
Think the way you act!
• Behaviors: concurrent processes that take inputs from
sensors and other behaviors and send outputs to a
robot’s actuators or other behaviors to achieve some
goals
• An alternative to hybrid control, inspired from biology
• Has the same capabilities as hybrid control:
– Act reactively and deliberatively
• Also built from layers
– However, there is no intermediate layer
– Components have a uniform representation and time-scale
CS 493/790(X) - Lecture 2
15
Fundamental Differences of Control
• Time-scale: How fast do things happen?
– how quickly the robot has to respond to the environment,
compared to how quickly it can sense and think
• Modularity: What are the components of the control system?
– Refers to the way the control system is broken up into
modules and how they interact with each other
• Representation: What does the robot keep in its brain?
– The form in which information is stored or encoded in the
robot
CS 493/790(X) - Lecture 2
16
Behavior Coordination
• Behavior-based systems require consistent
coordination between the component behaviors for
conflict resolution
• Coordination of behaviors can be:
– Competitive: one behavior’s output is selected from
multiple candidates
– Cooperative: blend the output of multiple behaviors
– Combination of the above two
CS 493/790(X) - Lecture 2
17
Competitive Coordination
• Arbitration: winner-take-all strategy  only one
response chosen
• Behavioral prioritization
– Subsumption Architecture
• Action selection/activation spreading (Pattie Maes)
– Behaviors actively compete with each other
– Each behavior has an activation level driven by the robot’s
goals and sensory information
• Voting strategies
– Behaviors cast votes on potential responses
CS 493/790(X) - Lecture 2
18
Cooperative Coordination
• Fusion: concurrently use the output of multiple
behaviors
• Major difficulty in finding a uniform command
representation amenable to fusion
• Fuzzy methods
• Formal methods
– Potential fields
– Motor schemas
– Dynamical systems
CS 493/790(X) - Lecture 2
19
Example of Behavior Coordination
Fusion: flocking (formations)
Arbitration:  foraging (search, coverage)
CS 493/790(X) - Lecture 2
20
Learning & Adaptive Behavior
• Learning produces changes within an agent that
over time enable it to perform more effectively within
its environment
• Adaptation refers to an agent’s learning by making
adjustments in order to be more attuned to its
environment
– Phenotypic (within an individual agent) or genotypic
(evolutionary)
– Acclimatization (slow) or homeostasis (rapid)
CS 493/790(X) - Lecture 2
22
Learning
Learning can improve performance in additional ways:
• Introduce new knowledge (facts, behaviors, rules)
• Generalize concepts
• Specialize concepts for specific situations
• Reorganize information
• Create or discover new concepts
• Create explanations
• Reuse past experiences
CS 493/790(X) - Lecture 2
23
Learning Methods
• Reinforcement learning
• Neural network (connectionist) learning
• Evolutionary learning
• Learning from experience
– Memory-based
– Case-based
• Learning from demonstration
• Inductive learning
• Explanation-based learning
• Multistrategy learning
CS 493/790(X) - Lecture 2
24
Reinforcement Learning (RL)
• Motivated by psychology (the Law of Effect,
Thorndike 1991):
Applying a reward immediately after the
occurrence of a response increases its probability
of reoccurring, while providing punishment after
the response will decrease the probability
• One of the most widely used methods for adaptation
in robotics
CS 493/790(X) - Lecture 2
25
Reinforcement Learning
• Goal: learn an optimal policy that chooses the
best action for every set of possible inputs
• Policy: state/action mapping that determines
which actions to take
• Desirable outcomes are strengthened and undesirable
outcomes are weakened
• Critic: evaluates the system’s response and applies
reinforcement
– external: the user provides the reinforcement
– internal: the system itself provides the reinforcement (reward
function)
CS 493/790(X) - Lecture 2
26
Learning to Walk
• Maes, Brooks (1990)
• Genghis: hexapod robot
• Learned stable tripod
stance and tripod gait
• Rule-based subsumption
controller
• Two sensor modalities for feedback:
– Two touch sensors to detect hitting the floor: - feedback
– Trailing wheel to measure progress: + feedback
CS 493/790(X) - Lecture 2
27
Learning to Walk
• Nate Kohl & Peter Stone (2004)
CS 493/790(X) - Lecture 2
28
Supervised Learning
• Supervised learning requires the user to give the
exact solution to the robot in the form of the error
direction and magnitude
• The user must know the exact desired behavior for
each situation
• Supervised learning involves training, which can be
very slow; the user must supervise the system with
numerous examples
CS 493/790(X) - Lecture 2
29
Neural Networks
• One of the most used supervised learning methods
• Used for approximating real-valued and vectorvalued target functions
• Inspired from biology: learning systems are built
from complex networks of interconnecting neurons
• The goal is to minimize the error between the
network output and the desired output
– This is achieved by adjusting the weights on the network
connections
CS 493/790(X) - Lecture 2
30
ALVINN
• ALVINN (Autonomous Land
Vehicle in a Neural Network)
• Dean Pomerleau (1991)
• Pittsburg to San Diego: 98.2%
autonomous
CS 493/790(X) - Lecture 2
31
Learning from Demonstration & RL
• S. Schaal (’97)
• Pole balancing, pendulum-swing-up
CS 493/790(X) - Lecture 2
32
Learning from Demonstration
Inspiration:
• Human-like teaching by demonstration
Demonstration
Robot performance
CS 493/790(X) - Lecture 2
33
Learning from Robot Teachers
• Transfer of task knowledge from humans to robots
Human demonstration
CS 493/790(X) - Lecture 2
Robot performance
34
Classical Conditioning
• Pavlov 1927
• Assumes that unconditioned stimuli (e.g. food)
automatically generate an unconditioned response
(e.g., salivation)
• Conditioned stimulus (e.g., ringing a bell) can, over
time, become associated with the unconditioned
response
CS 493/790(X) - Lecture 2
35
Darvin’s Perceptual Categorization
Early training
•
After the 10th stimulus
Two types of stimulus blocks
– 6cm metallic cubes
– Blobs: low conductivity (“bad taste”)
– Stripes: high conductivity (“good taste”)
•
Instead of hard-wiring stimulus-response rules, develop these associations over
time
CS 493/790(X) - Lecture 2
36
Genetic Algorithms
• Inspired from evolutionary biology
• Individuals in a populations have a particular fitness
with respect to a task
• Individuals with the highest fitness are kept as
survivors
• Individuals with poor performance are discarded: the
process of natural selection
• Evolutionary process: search through the space of
solutions to find the one with the highest fitness
CS 493/790(X) - Lecture 2
37
Genetic Operators
• Knowledge is encoded as bit strings: chromozome
– Each bit represents a “gene”
• Biologically inspired operators are applied to yield
better generations
CS 493/790(X) - Lecture 2
38
Evolving Structure and Control
• Karl Sims 1994
• Evolved morphology and control
for virtual creatures performing
swimming, walking, jumping,
and following
• Genotypes encoded as directed graphs are used to produce
3D kinematic structures
• Genotype encode points of attachment
• Sensors used: contact, joint angle and photosensors
CS 493/790(X) - Lecture 2
39
Evolving Structure and Control
• Jordan Pollak
– Real structures
CS 493/790(X) - Lecture 2
40
Readings
• F. Martin: Sections 1.1, 1.2.3, 5
• M. Matarić: Chapters 1, 3, 10
CS 493/790(X) - Lecture 2
41