Download LEARNING FROM OBSERVATION: Introduction Observing a task

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantum machine learning wikipedia , lookup

Neural modeling fields wikipedia , lookup

Catastrophic interference wikipedia , lookup

Reinforcement learning wikipedia , lookup

Pattern recognition wikipedia , lookup

Concept learning wikipedia , lookup

Machine learning wikipedia , lookup

Transcript
LEARNING FROM OBSERVATION:
Introduction
Observing a task being performed or attempted by someone else often accelerates human
learning. If robots can be programmed to use such observations to accelerate learning their
usability and functionality will be increased and programming and learning time will be
decreased. This research explores the use of task primitives in robot learning from
observation. A framework has been developed that uses observed data to initially learn a
task and then the agent goes on to increase its performance through repeated task
performance (learning from practice). Data that is collected while a human performs a task is
parsed into small parts of the task called primitives. Modules are created for each primitive
type that encodes the movements required during the performance of the primitive, and
when and where the primitives are performed. The feasibility of this method is currently
being tested with agents that learn to play a virtual and an actual air hockey game. The term
robot and agent are used interchangeably to refer to an algorithm that senses its
environment and has the ability to control objects in either a hardware or software domain.
Observing the Task:
The task to be performed must first be observed. For a human learner this mostly involves
vision. In order for the robot to learn from observing a task being performed it must have
some way to sense what is occurring in the environment. This research does not seek to find
ways to use the robot's current sensors to observe performance. The agents will be given
whatever equipment is necessary to observe the performance or be given information that
represents the performance. The equipment may include a camera or some type of motion
capture device. Research is also being performed in virtual environments and the state of
objects is directly available from the simulation algorithm
Learning from Observation:
Components of the performance element A direct mapping from conditions on the current
state to actions.A means to infer relevant properties of the world from the percept
sequence.Information about the way the world evolves. Information about the results of
possible actions the agent can take. Utility information indicating the desirability of world
states.Action-value information indicating the desirability of particular actions in particular
states. Goals that describe classes of states whose achievement maximizes the agent's utility.
Representation of the components:
Any situation in which both the inputs and outputs of a component can be perceived is called
supervised learning.
In learning the condition-action component, the agent receives some evaluation of its action
(such as a hefty bill for rear-ending the car in front) but is not told the correct action (to brake
more gently and much earlier). This is called reinforcement learning;
Learning when there is no hint at all about the correct outputs is called unsupervised
learning.
Inductive Learning:
In supervised learning, the learning element is given the correct (or approximately correct)
value of the function for particular inputs, and changes its representation of the function to
try to match the information provided by the feedback. More formally, we say an example is
a pair (x,f(x)), where x is the input and/(jt) is the output of the function applied to x. The task
of pure inductive inference (or induction) is this: given a collection of examples of/, return a
function h that approximates/. The function h is called a hypothesis.
Learning decison trees:
A decision tree takes as input an object or situation described by a set of properties, and
outputs a yes/no "decision.".
The decision tree learning algorithm.
function DECISION-TREE-LEARNING(examples, attributes,
default) returns a decision tree
inputs: examples, set of examples
attributes, set of attributes
default, default value for the goal predicate
if examples is empty then return default
else if all examples have the same classification
then return the classification
else if attributes is empty then return
MAJORITY- VALVE(examples)
else
best <— CHOOSE- ATTRIBUTE(attributes, examples)
tree = a new decision tree with root test best
for each value v, of best do
example.i — {elements of examples with best = v,}
subtree — DECISION-TREE-LEARNING(examples,
attributes — best, MAJORITY- VALUE(examples)
add a branch to tree with label v, and subtree subtree
end
return tree
The performance of the learning algorithm
Collect a large set of examples.
Divide it into two disjoint sets: the training set and the test set.
Use the learning algorithm with the training set as examples to generate a hypothesis H.
Measure the percentage of examples in the test set that are correctly classified by H.
Reinforcement learning:
The task of reinforcement learning is to use rewards to learn a successful agent function. The
learning task can vary as :
The environment can be accessible or inaccessible. In an accessible environment, states can
be identified with percepts, whereas in an inaccessible environment, the agent must maintain
some internal state to try to keep track of the environment.
The agent can begin with knowledge of the environment and the effects of its actions; or it
will have to learn this model as well as utility information.
Rewards can be received only in terminal states, or in any state.
Rewards can be components of the actual utility (points for a ping-pong agent or dollars for a
betting agent) that the agent is trying to maximize, or they can be hints as to the actualutility
("nice move" or "bad dog").
The agent can be a passive learner or an active learner. A passive learner simply watches the
world going by, and tries to learn the utility of being in various states; an active learner must
also act using the learned information, and can use its problem generator to suggest
explorations of unknown portions of the environment.
Repeat steps 1 to 4 for different sizes of training sets and different randomly selected
training sets of each size.
Learning from Decision tree is used in: Designing oil platform equipment, learning to fly, etc.
Using information theory:
The information gained from the attribute test is saved as the difference between the original
information requirement and the new requirement
Noise and over fitting in data
Whenever there is a large set of possible hypotheses, one has to be careful not to use the
resulting freedom to find meaningless "regularity" in the data. This problem is called over
fitting.
Technique that eliminates the dangers of over fitting : pruning, Cross-validation.
Broadening the applicability of decision trees:
Issues must that be addressed:
Missing data
Multivalve attributes
Continuous-valued attributes
Learning general logical descriptions
Hypothesis proposes expressions, which we call as a candidate definition of the goal
predicate.
An example can be a false negative for the hypothesis, if the hypothesis says it should be
negative but in fact it is positive.
An example can be a false positive for the hypothesis, if the hypothesis says it should be
positive
but in fact it is negative.
Current-best-hypothesis search
The current-best-hypothesis learning algorithm. It searches for a consistent hypothesis and
backtracks when no consistent specialization/generalization can be found.
function CURRENT-EEST-LEARNING(examples) returns a hypothesis
H = any hypothesis consistent with the first example in examples
for each remaining example in examples do
if e is false positive for H then
H = choose a specialization of H consistent with examples
else if e is false negative for H then
H = choose a generalization of H consistent with examples
if no consistent specialization/generalization can be found then fail
end
return H
Introduction:
Supervised Learning
Inductive Learning
Analogical Learning
Introduction
Learning is an inherent characteristic of the human beings. By virtue of this, people, while
executing similar tasks, acquire the ability to improve their performance. This chapter
provides an overview of the principle of learning that can be adhered to machines to improve
their performance. Such learning is usually referred to as 'machine learning'. Machine
learning can be broadly classified into three categories: i) Supervised learning, ii)
Unsupervised learning and iii) Reinforcement learning. Supervised learning requires a trainer,
who supplies the input-output training instances. The learning system adapts its parameters
by some algorithms to generate the desired output patterns from a given input pattern. In
absence of trainers, the desired output for a given input instance is not known, and
consequently the learner has to adapt its parameters autonomously. Such type of learning is
termed 'unsupervised learning'. The third type called the reinforcement learning bridges a
gap between supervised and unsupervised categories. In reinforcement learning, the learner
does not explicitly know the input-output instances, but it receives some form of feedback
from its environment. The feedback signals help the learner to decide whether its action on
the environment is rewarding or punishable. The learner thus adapts its parameters based on
the states (rewarding / punishable) of its actions. Among the supervised learning techniques,
the most common are inductive and analogical learning. The inductive learning technique,
presented in the chapter, includes decision tree and version space based learning. Analogical
learning is briefly introduced through illustrative examples. The principle of unsupervised
learning is illustrated here with a clustering problem. The section on reinforcement learning
includes Q-learning and temporal difference learning. A fourth category of learning, which
has emerged recently from the disciplines of knowledge engineering, is called 'inductive logic
programming'. The principles of inductive logic programming have also been briefly
introduced in this chapter. The chapter ends with a brief discussion on the 'computational
theory of learning'. With the background of this theory, one can measure the performance of
the learning behavior of a machine from the training instances and their count.
Supervised Learning:
As already mentioned, in supervised learning a trainer submits the inputoutput exemplary
patterns and the learner has to adjust the parameters of the system autonomously, so that it
can yield the correct output pattern when excited with one of the given input patterns. We
shall cover two important types of supervised learning in this section. These are i) inductive
learning and ii) analogical learning.
Inductive Learning:
In supervised learning we have a set of {xi, f (xi)} for 1≤i≤n, and our aim is to determine 'f' by
some adaptive algorithm. The inductive learning is a special class of the supervised learning
techniques, where given a set of {xi, f(xi)} pairs, we determine a hypothesis h(xi ) such that
h(xi )≈f(xi ), ∀ i. A natural question that may be raised is how to compare the hypothesis h
that approximates f. For instance, there could be more than one h(xi ) where all of which are
approximately close to f(xi ). Let there be two hypothesis h1 and h2, where h1(xi) ≈ f(xi) and
h2(xi) = f(xi). We may select one of the two hypotheses by a preference criterion, called bias.
When {xi, f(xi)}, 1≤ ∀i ≤ n are numerical quantities we may employ the neural learning
techniques presented in the next chapter. Readers may wonder: could we find 'f' by curve
fitting as well. Should we then call curve fitting a learning technique? The answer to this, of
course, is in the negative. The learning algorithm for such numerical sets {xi, f(xi )} must be
able to adapt the parameters of the learner. The more will be the training instance, the larger
will be the number of adaptations. But what happens when xi and f(xi) are non-numerical?
For instance, suppose given the truth table of the following training instances.
Truth Table: Training Instances
Here we may denote bi = f (ai, ai→bi) for all i=1 to n. From these training instances we infer a
generalized hypothesis h as follows.
h≡∀i (ai, ai→bi)⇒bi.
Analogical Learning:
In inductive learning we observed that there exist many positive and negative instances of a
problem and the learner has to form a concept that supports most of the positive and no
negative instances. This demonstrates that a number of training instances are required to
form a concept in inductive learning. Unlike this, analogical learning can be accomplished
from a single example. For instance, given the following training instance, one has to
determine the plural form of bacillus.
Obviously, one can answer that the plural form of bacillus is bacilli. But how do we do so?
From common sense reasoning, it follows that the result is because of the similarity of
bacillus with fungus. The analogical learning system thus learns that to get the plural form of
words ending with 'us' is to replace it with 'i'.
The main steps in analogical learning are now formalized below.
Identifying Analogy: Identify the similarity between an experienced problem instance and a
new problem.
Determining the Mapping Function: Relevant parts of the experienced problem are selected
and the mapping is determined.
Apply Mapping Function: Apply the mapping function to transform the new problem from the
given domain to the target domain.
Validation: The newly constructed solution is validated for its applicability through its trial
processes like theorem or simulation.
Learning: If the validation is found to work well, the new knowledge is encoded and saved for
future usage.
Decision Trees;
"A decision tree takes as input an object or situation described by a set of properties, and
outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions
with a larger range of outputs can also be represented...."
Decision tree advantages:
Amongst other data mining methods, decision trees have various advantages:
Simple to understand and interpret. People are able to understand decision tree models after
a brief explanation.
Requires little data preparation. Other techniques often require data normalisation, dummy
variables need to be created and blank values to be removed.
Able to handle both numerical and categorical data. Other techniques are usually specialised
in analysing datasets that have only one type of variable. Ex: relation rules can be used only
with nominal variables while neural networks can be used only with numerical variables.
Uses a white box model. If a given situation is observable in a model the explanation for the
condition is easily explained by boolean logic. An example of a black box model is an artificial
neural network since the explanation for the results is difficult to understand.
Possible to validate a model using statistical tests. That makes it possible to account for the
reliability of the model.
Robust. Performs well even if its assumptions are somewhat violated by the true model from
which the data were generated.
Perform well with large data in a short time. Large amounts of data can be analysed using
personal computers in a time short enough to enable stakeholders to take decisions based on
its analysis.
Decision tree learning is a method commonly used in data mining. The goal is to create a
model that predicts the value of a target variable based on several input variables. An
example is shown on the right. Each interior node corresponds to one of the input variables;
there are edges to children for each of the possible values of that input variable. Each leaf
represents a value of the target variable given the values of the input variables represented
by the path from the root to the leaf.
A tree can be "learned" by splitting the source set into subsets based on an attribute value
test. This process is repeated on each derived subset in a recursive manner called recursive
partitioning. The recursion is completed when the subset at a node all has the same value of
the target variable, or when splitting no longer adds value to the predictions.
In data mining, trees can be described also as the combination of mathematical and
computational techniques to aid the description, categorisation and generalisation of a given
set of data.
Data comes in records of the form:
The dependent variable, Y, is the target variable that we are trying to understand, classify or
generalize. The vector x is composed of the input variables, x1, x2, x3 etc., that are used for
that task.
Decision trees used in data mining are of two main types:
Classification tree analysis is when the predicted outcome is the class to which the data
belongs.
Regression tree analysis is when the predicted outcome can be considered a real number (e.g.
the price of a house, or a patient’s length of stay in a hospital).
The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer
to both of the above procedures, first introduced by Breiman et al.[1] Trees used for
regression and trees used for classification have some similarities - but also some differences,
such as the procedure used to determine where to split.[1]
Some techniques use more than one decision tree for their analysis:
A Random Forest classifier uses a number of decision trees, in order to improve the
classification rate.
Boosted Trees can be used for regression-type and classification-type problems. [2][3]
There are many specific decision-tree algorithms. Notable ones include:
ID3 algorithm
C4.5 algorithm
CHi-squared Automatic Interaction Detector (CHAID). Performs multi-level splits when
computing classification trees.[4]
MARS: extends decision trees to better handle numerical data
Explanation-Based Learning
An Explanation-based Learning (EBL ) system accepts an example (i.e. a training example) and
explains what it learns from the example. The EBL system takes only the relevant aspects of
the training. This explanation is translated into particular form that a problem solving
program can understand. The explanation is generalized so that it can be used to solve other
problems.
PRODIGY is a system that integrates problem solving, planning, and learning methods in a
single architecture. It was originally conceived by Jaime Carbonell and Steven Minton, as an
AI system to test and develop ideas on the role that machine learning plays in planning and
problem solving. PRODIGY uses the EBL to acquire control rules.
The EBL module uses the results from the problem-solving trace (ie. Steps in solving
problems) that were generated by the central problem solver (a search engine that searches
over a problem space). It constructs explanations using an axiomatized theory that describes
both the domain and the architecture of the problem solver. The results are then translated
as control rules and added to the knowledge base. The control knowledge that contains
control rules is used to guide the search process effectively.
What is Reinforcement Learning?
Definition...
Reinforcement Learning is a type of Machine Learning, and thereby also a branch of Artificial
Intelligence. It allows machines and software agents to automatically determine the ideal
behaviour within a specific context, in order to maximize its performance. Simple reward
feedback is required for the agent to learn its behaviour; this is known as the reinforcement
signal.
There are many different algorithms that tackle this issue. As a matter of fact, Reinforcement
Learning is defined by a specific type of problem, and all its solutions are classed as
Reinforcement Learning algorithms. In the problem, an agent is supposed decide the best
action to select based on his current state. When this step is repeated, the problem is known
as a Markov Decision Process.
Reinforcement Learning allows the machine or software agent to learn its behaviour based
on feedback from the environment. This behaviour can be learnt once and for all, or keep on
adapting as time goes by. If the problem is modelled with care, some Reinforcement Learning
algorithms can converge to the global optimum; this is the ideal behaviour that maximises
the reward.
This automated learning scheme implies that there is little need for a human expert who
knows about the domain of application. Much less time will be spent designing a solution,
since there is no need for hand-crafting complex sets of rules as with Expert Systems, and all
that is required is someone familiar with Reinforcement Learning.
The possible applications of Reinforcement Learning are abundant, due to the genericness of
the problem specification. As a matter of fact, a very large number of problems in Artificial
Intelligence can be fundamentally mapped to a decision process. This is a distinct advantage,
since the same theory can be applied to many different domain specific problem with little
effort.
In practice, this ranges from controlling robotic arms to find the most efficient motor
combination, to robot navigation where collision avoidance behaviour can be learnt by
negative feedback from bumping into obstacles. Logic games are also well-suited to
Reinforcement Learning, as they are traditionally defined as a sequence of decisions: games
such as poker, back-gammom, othello, chess have been tackled more or less succesfully.