Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LEARNING FROM OBSERVATION: Introduction Observing a task being performed or attempted by someone else often accelerates human learning. If robots can be programmed to use such observations to accelerate learning their usability and functionality will be increased and programming and learning time will be decreased. This research explores the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive type that encodes the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game. The term robot and agent are used interchangeably to refer to an algorithm that senses its environment and has the ability to control objects in either a hardware or software domain. Observing the Task: The task to be performed must first be observed. For a human learner this mostly involves vision. In order for the robot to learn from observing a task being performed it must have some way to sense what is occurring in the environment. This research does not seek to find ways to use the robot's current sensors to observe performance. The agents will be given whatever equipment is necessary to observe the performance or be given information that represents the performance. The equipment may include a camera or some type of motion capture device. Research is also being performed in virtual environments and the state of objects is directly available from the simulation algorithm Learning from Observation: Components of the performance element A direct mapping from conditions on the current state to actions.A means to infer relevant properties of the world from the percept sequence.Information about the way the world evolves. Information about the results of possible actions the agent can take. Utility information indicating the desirability of world states.Action-value information indicating the desirability of particular actions in particular states. Goals that describe classes of states whose achievement maximizes the agent's utility. Representation of the components: Any situation in which both the inputs and outputs of a component can be perceived is called supervised learning. In learning the condition-action component, the agent receives some evaluation of its action (such as a hefty bill for rear-ending the car in front) but is not told the correct action (to brake more gently and much earlier). This is called reinforcement learning; Learning when there is no hint at all about the correct outputs is called unsupervised learning. Inductive Learning: In supervised learning, the learning element is given the correct (or approximately correct) value of the function for particular inputs, and changes its representation of the function to try to match the information provided by the feedback. More formally, we say an example is a pair (x,f(x)), where x is the input and/(jt) is the output of the function applied to x. The task of pure inductive inference (or induction) is this: given a collection of examples of/, return a function h that approximates/. The function h is called a hypothesis. Learning decison trees: A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no "decision.". The decision tree learning algorithm. function DECISION-TREE-LEARNING(examples, attributes, default) returns a decision tree inputs: examples, set of examples attributes, set of attributes default, default value for the goal predicate if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return MAJORITY- VALVE(examples) else best <— CHOOSE- ATTRIBUTE(attributes, examples) tree = a new decision tree with root test best for each value v, of best do example.i — {elements of examples with best = v,} subtree — DECISION-TREE-LEARNING(examples, attributes — best, MAJORITY- VALUE(examples) add a branch to tree with label v, and subtree subtree end return tree The performance of the learning algorithm Collect a large set of examples. Divide it into two disjoint sets: the training set and the test set. Use the learning algorithm with the training set as examples to generate a hypothesis H. Measure the percentage of examples in the test set that are correctly classified by H. Reinforcement learning: The task of reinforcement learning is to use rewards to learn a successful agent function. The learning task can vary as : The environment can be accessible or inaccessible. In an accessible environment, states can be identified with percepts, whereas in an inaccessible environment, the agent must maintain some internal state to try to keep track of the environment. The agent can begin with knowledge of the environment and the effects of its actions; or it will have to learn this model as well as utility information. Rewards can be received only in terminal states, or in any state. Rewards can be components of the actual utility (points for a ping-pong agent or dollars for a betting agent) that the agent is trying to maximize, or they can be hints as to the actualutility ("nice move" or "bad dog"). The agent can be a passive learner or an active learner. A passive learner simply watches the world going by, and tries to learn the utility of being in various states; an active learner must also act using the learned information, and can use its problem generator to suggest explorations of unknown portions of the environment. Repeat steps 1 to 4 for different sizes of training sets and different randomly selected training sets of each size. Learning from Decision tree is used in: Designing oil platform equipment, learning to fly, etc. Using information theory: The information gained from the attribute test is saved as the difference between the original information requirement and the new requirement Noise and over fitting in data Whenever there is a large set of possible hypotheses, one has to be careful not to use the resulting freedom to find meaningless "regularity" in the data. This problem is called over fitting. Technique that eliminates the dangers of over fitting : pruning, Cross-validation. Broadening the applicability of decision trees: Issues must that be addressed: Missing data Multivalve attributes Continuous-valued attributes Learning general logical descriptions Hypothesis proposes expressions, which we call as a candidate definition of the goal predicate. An example can be a false negative for the hypothesis, if the hypothesis says it should be negative but in fact it is positive. An example can be a false positive for the hypothesis, if the hypothesis says it should be positive but in fact it is negative. Current-best-hypothesis search The current-best-hypothesis learning algorithm. It searches for a consistent hypothesis and backtracks when no consistent specialization/generalization can be found. function CURRENT-EEST-LEARNING(examples) returns a hypothesis H = any hypothesis consistent with the first example in examples for each remaining example in examples do if e is false positive for H then H = choose a specialization of H consistent with examples else if e is false negative for H then H = choose a generalization of H consistent with examples if no consistent specialization/generalization can be found then fail end return H Introduction: Supervised Learning Inductive Learning Analogical Learning Introduction Learning is an inherent characteristic of the human beings. By virtue of this, people, while executing similar tasks, acquire the ability to improve their performance. This chapter provides an overview of the principle of learning that can be adhered to machines to improve their performance. Such learning is usually referred to as 'machine learning'. Machine learning can be broadly classified into three categories: i) Supervised learning, ii) Unsupervised learning and iii) Reinforcement learning. Supervised learning requires a trainer, who supplies the input-output training instances. The learning system adapts its parameters by some algorithms to generate the desired output patterns from a given input pattern. In absence of trainers, the desired output for a given input instance is not known, and consequently the learner has to adapt its parameters autonomously. Such type of learning is termed 'unsupervised learning'. The third type called the reinforcement learning bridges a gap between supervised and unsupervised categories. In reinforcement learning, the learner does not explicitly know the input-output instances, but it receives some form of feedback from its environment. The feedback signals help the learner to decide whether its action on the environment is rewarding or punishable. The learner thus adapts its parameters based on the states (rewarding / punishable) of its actions. Among the supervised learning techniques, the most common are inductive and analogical learning. The inductive learning technique, presented in the chapter, includes decision tree and version space based learning. Analogical learning is briefly introduced through illustrative examples. The principle of unsupervised learning is illustrated here with a clustering problem. The section on reinforcement learning includes Q-learning and temporal difference learning. A fourth category of learning, which has emerged recently from the disciplines of knowledge engineering, is called 'inductive logic programming'. The principles of inductive logic programming have also been briefly introduced in this chapter. The chapter ends with a brief discussion on the 'computational theory of learning'. With the background of this theory, one can measure the performance of the learning behavior of a machine from the training instances and their count. Supervised Learning: As already mentioned, in supervised learning a trainer submits the inputoutput exemplary patterns and the learner has to adjust the parameters of the system autonomously, so that it can yield the correct output pattern when excited with one of the given input patterns. We shall cover two important types of supervised learning in this section. These are i) inductive learning and ii) analogical learning. Inductive Learning: In supervised learning we have a set of {xi, f (xi)} for 1≤i≤n, and our aim is to determine 'f' by some adaptive algorithm. The inductive learning is a special class of the supervised learning techniques, where given a set of {xi, f(xi)} pairs, we determine a hypothesis h(xi ) such that h(xi )≈f(xi ), ∀ i. A natural question that may be raised is how to compare the hypothesis h that approximates f. For instance, there could be more than one h(xi ) where all of which are approximately close to f(xi ). Let there be two hypothesis h1 and h2, where h1(xi) ≈ f(xi) and h2(xi) = f(xi). We may select one of the two hypotheses by a preference criterion, called bias. When {xi, f(xi)}, 1≤ ∀i ≤ n are numerical quantities we may employ the neural learning techniques presented in the next chapter. Readers may wonder: could we find 'f' by curve fitting as well. Should we then call curve fitting a learning technique? The answer to this, of course, is in the negative. The learning algorithm for such numerical sets {xi, f(xi )} must be able to adapt the parameters of the learner. The more will be the training instance, the larger will be the number of adaptations. But what happens when xi and f(xi) are non-numerical? For instance, suppose given the truth table of the following training instances. Truth Table: Training Instances Here we may denote bi = f (ai, ai→bi) for all i=1 to n. From these training instances we infer a generalized hypothesis h as follows. h≡∀i (ai, ai→bi)⇒bi. Analogical Learning: In inductive learning we observed that there exist many positive and negative instances of a problem and the learner has to form a concept that supports most of the positive and no negative instances. This demonstrates that a number of training instances are required to form a concept in inductive learning. Unlike this, analogical learning can be accomplished from a single example. For instance, given the following training instance, one has to determine the plural form of bacillus. Obviously, one can answer that the plural form of bacillus is bacilli. But how do we do so? From common sense reasoning, it follows that the result is because of the similarity of bacillus with fungus. The analogical learning system thus learns that to get the plural form of words ending with 'us' is to replace it with 'i'. The main steps in analogical learning are now formalized below. Identifying Analogy: Identify the similarity between an experienced problem instance and a new problem. Determining the Mapping Function: Relevant parts of the experienced problem are selected and the mapping is determined. Apply Mapping Function: Apply the mapping function to transform the new problem from the given domain to the target domain. Validation: The newly constructed solution is validated for its applicability through its trial processes like theorem or simulation. Learning: If the validation is found to work well, the new knowledge is encoded and saved for future usage. Decision Trees; "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...." Decision tree advantages: Amongst other data mining methods, decision trees have various advantages: Simple to understand and interpret. People are able to understand decision tree models after a brief explanation. Requires little data preparation. Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed. Able to handle both numerical and categorical data. Other techniques are usually specialised in analysing datasets that have only one type of variable. Ex: relation rules can be used only with nominal variables while neural networks can be used only with numerical variables. Uses a white box model. If a given situation is observable in a model the explanation for the condition is easily explained by boolean logic. An example of a black box model is an artificial neural network since the explanation for the results is difficult to understand. Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model. Robust. Performs well even if its assumptions are somewhat violated by the true model from which the data were generated. Perform well with large data in a short time. Large amounts of data can be analysed using personal computers in a time short enough to enable stakeholders to take decisions based on its analysis. Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. An example is shown on the right. Each interior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf. A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions. In data mining, trees can be described also as the combination of mathematical and computational techniques to aid the description, categorisation and generalisation of a given set of data. Data comes in records of the form: The dependent variable, Y, is the target variable that we are trying to understand, classify or generalize. The vector x is composed of the input variables, x1, x2, x3 etc., that are used for that task. Decision trees used in data mining are of two main types: Classification tree analysis is when the predicted outcome is the class to which the data belongs. Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient’s length of stay in a hospital). The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both of the above procedures, first introduced by Breiman et al.[1] Trees used for regression and trees used for classification have some similarities - but also some differences, such as the procedure used to determine where to split.[1] Some techniques use more than one decision tree for their analysis: A Random Forest classifier uses a number of decision trees, in order to improve the classification rate. Boosted Trees can be used for regression-type and classification-type problems. [2][3] There are many specific decision-tree algorithms. Notable ones include: ID3 algorithm C4.5 algorithm CHi-squared Automatic Interaction Detector (CHAID). Performs multi-level splits when computing classification trees.[4] MARS: extends decision trees to better handle numerical data Explanation-Based Learning An Explanation-based Learning (EBL ) system accepts an example (i.e. a training example) and explains what it learns from the example. The EBL system takes only the relevant aspects of the training. This explanation is translated into particular form that a problem solving program can understand. The explanation is generalized so that it can be used to solve other problems. PRODIGY is a system that integrates problem solving, planning, and learning methods in a single architecture. It was originally conceived by Jaime Carbonell and Steven Minton, as an AI system to test and develop ideas on the role that machine learning plays in planning and problem solving. PRODIGY uses the EBL to acquire control rules. The EBL module uses the results from the problem-solving trace (ie. Steps in solving problems) that were generated by the central problem solver (a search engine that searches over a problem space). It constructs explanations using an axiomatized theory that describes both the domain and the architecture of the problem solver. The results are then translated as control rules and added to the knowledge base. The control knowledge that contains control rules is used to guide the search process effectively. What is Reinforcement Learning? Definition... Reinforcement Learning is a type of Machine Learning, and thereby also a branch of Artificial Intelligence. It allows machines and software agents to automatically determine the ideal behaviour within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behaviour; this is known as the reinforcement signal. There are many different algorithms that tackle this issue. As a matter of fact, Reinforcement Learning is defined by a specific type of problem, and all its solutions are classed as Reinforcement Learning algorithms. In the problem, an agent is supposed decide the best action to select based on his current state. When this step is repeated, the problem is known as a Markov Decision Process. Reinforcement Learning allows the machine or software agent to learn its behaviour based on feedback from the environment. This behaviour can be learnt once and for all, or keep on adapting as time goes by. If the problem is modelled with care, some Reinforcement Learning algorithms can converge to the global optimum; this is the ideal behaviour that maximises the reward. This automated learning scheme implies that there is little need for a human expert who knows about the domain of application. Much less time will be spent designing a solution, since there is no need for hand-crafting complex sets of rules as with Expert Systems, and all that is required is someone familiar with Reinforcement Learning. The possible applications of Reinforcement Learning are abundant, due to the genericness of the problem specification. As a matter of fact, a very large number of problems in Artificial Intelligence can be fundamentally mapped to a decision process. This is a distinct advantage, since the same theory can be applied to many different domain specific problem with little effort. In practice, this ranges from controlling robotic arms to find the most efficient motor combination, to robot navigation where collision avoidance behaviour can be learnt by negative feedback from bumping into obstacles. Logic games are also well-suited to Reinforcement Learning, as they are traditionally defined as a sequence of decisions: games such as poker, back-gammom, othello, chess have been tackled more or less succesfully.