Information-theoretic Policy Search Methods for Learning Versatile

... Abdolmaleki, …, Neumann, Model-Based Relative Entropy Stochastic Search, NIPS 2015 Kupcsik, …, Neumann, Model-based Contextual Policy Search for Data-Efficient Generalization of Robot Skills, Artificial Intelligence, 2015 Kupcsik, …, Neumann, Data-Efficient Generalization of Robot Skills with Contex ...

Learning Visual Representations for Perception

... A blend of deliberative and reactive concepts is probably best suited for autonomous robots. We nevertheless believe that the reactive paradigm holds powerful promise that is not yet fully exploited. Tightly linked perception-action loops enable powerful possibilities for incremental learning at man ...

Optimization Techniques

... The values inside the node show the value of state variable at each stage ...

The Foundations of Cost-Sensitive Learning

... 4 Effects of changing base rates Changing the training set prevalence of positive and negative examples is a common method of making a learning algorithm cost-sensitive. A natural question is what effect such a change has on the behavior of standard learning algorithms. Separately, many researchers ...

Concurrent Effect Search in Evolutionary Systems

The Model-based Approach to Autonomous Behavior: A

... Multi-agent planning. I’ve discussed planning models involving single agents, yet often autonomous agent, must interact with other autonomous agents. We do this naturally all the time: walking on the street, driving, etc. The first question is how plans and plan costs should be defined in such setti ...

Learning bayesian network structure using lp relaxations Please share

... We call the polytope over the parent set selections arising from all such lifted cycle inequalities, Ptogether with the simple constraints ηi (si ) ≥ 0 and si ηi (si ) = 1, the cycle relaxation Pcycle . It can be shown that these cycle inequalities are equivalent to the transitivity constraints used ...

Using Reinforcement Learning to Spider the Web Efficiently

... We represent the value function using a collection of naive Bayes text classifiers, performing the mapping by casting this regression problem as classification [Torgo and Gama, 1997]. We discretize the discounted sum of future reward values of our training data into bins, place the text in the neigh ...

Autonomously Learning an Action Hierarchy Using a Learned

... (although any action on the corresponding magnitude variable of Y is excluded from AC to prevent infinite regress). (2) The qualitative action qa(X, x) brings about the antecedent event of r. (3) QLAP subtracts those actions whose goal is already achieved in state s. To construct Tr : Sr × Asr → Sr ...

Simple Algorithmic Theory of Subjective Beauty, Novelty

... selector or controller use a general Reinforcement Learning (RL) algorithm (which should be able to observe the current state of the adaptive compressor) to maximize expected reward, including intrinsic curiosity reward. To optimize the latter, a good RL algorithm will select actions that focus the ...

Hybrid Evolutionary Learning Approaches for The Virus Game

... Games, particularly two-person zero sum games such as chess [12] and draughts [13] have been fertile ground for AI research for many years. In this paper we will consider the Virus game [14] [15] [16]. Virus is a two-person, zero-sum game of perfect information played on a square board. This report ...

A comprehensive survey of multi

... rk+1 = ρ(xk , uk , xk+1 ). This reward evaluates the immediate effect of action uk , i.e., the transition from xk to xk+1 . It says, however, nothing directly about the long-term effects of this action. For deterministic models, the transition probability function f is replaced by a simpler transiti ...

Technical Article Recent Developments in Discontinuous Galerkin Methods for the Time–

... frequency is denoted by ω > 0. The real-valued functions µ, ε and σ are the magnetic permeability, electric permittivity, and electric conductivity, respectively. The origins of DG methods can be traced back to the seventies, where they were proposed for the numerical solution of the neutron transpo ...

The State of SAT - Cornell Computer Science

... Currently the most practical extension of general resolution is symmetry detection. The pigeon hole problem is intuitively easy because we immediately see that different pigeons and holes are indistinguishable, so we do not need to actually consider all possible matchings — without loss of generali ...

Chapter 6 - Learning

... followed by rewarding stimulus – Negative reinforcement = response followed by removal of an aversive stimulus • Escape learning • Avoidance learning • Decreasing a response: – Punishment – Problems with punishment ...

An Evolutionary Artificial Neural Network Time Series Forecasting

... Artificial Neural Networks (ANNs) have the ability of learning and to adapt to new situations by recognizing patterns in previous data. Time Series (TS) (observations ordered in time) often present a high degree of noise which difficults forecasting. Using ANNs for Time Series Forecasting (TSF) may ...

CIS 730 (Introduction to Artificial Intelligence) Lecture

Online Adaptable Learning Rates for the Game Connect-4

... instead of batch updates. Sutton [11] proposed some extensions to IDBD with the algorithms K1 and K2 and compares them with the Least Mean Square (LMS) algorithm and Kalman filtering. Almeida [12] discussed another method of step-size adaptation and applied it to the minimization of nonlinear functi ...

45 Online Planning for Large Markov Decision Processes

... online planning to find the near-optimal action for the current state while exploiting the hierarchical structure of the underlying problem. Notice that MAXQ is originally developed for reinforcement learning problems. To the best of our knowledge, MAXQOP is the first algorithm that utilizes MAXQ hi ...

Beyond Classical Search

... Repeat Whyn times: does it work ??? 1) 1)Pick an initial state S at random with one that queen in each column There are many goal states are 2) Repeat k times: well-distributed over the state space a) If GOAL?(S) then return S 2)b)IfPick no an solution has been found after a few attacked queen Q at ...

Partially observable Markov decision processes for

... his likely responses. The partially observable Markov decision process1 (POMDP) is a mathematical model of the interaction between an agent and its environment. It provides a mechanism by which the agent can be programmed to act optimally with respect to a set of goals. This paper reports on an inve ...

Improving Control-Knowledge Acquisition for Planning by Active

... ally, these parameters can be domain-independent (number of goals), or domaindependent (number of objects, trucks, cities, robots, . . . ). The advantage of this scheme for generating training problems is its simplicity. As disadvantages, the user needs to adjust the parameters in such a way that th ...

Online Adaptable Learning Rates for the Game Connect-4

... chines (SVM): The low dimensional board is from all LUTs. It can be a rather big vector, conprojected into a high dimensional sample space taining, e. g., 9 million weights in our standard Connect-4 implementation with 70 8-tuples. It by the n-tuple indexing process [9]. N-tuples in Connect-4: An n- ...

Non-Monotonic Search Strategies for Grammatical Inference

... algorithm - this time based on evidence of individual states. Different valid merges label states as either accepting or rejecting. Suppose a number of valid merges label a particular state s as accepting. The idea is that of labeling state s (without merging any states initially) as accepting, depe ...

< 1 2 3 4 5 6 7 8 9 10 ... 17 >

Reinforcement learning

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Reinforcement learning