Planning with Markov Decision Processes

... the theoretical foundations of MDPs and the fundamental solution techniques for them. We then discuss modern optimal algorithms based on heuristic search and the use of structured representations. A major focus of the book is on the numerous approximation schemes for MDPs that have been developed in ...

Improving Adjustable Autonomy Strategies for Time

... in its detailed local information when predicting the results of implementing the human decision. (vi) Human decision consistent (Hdc) - The human has made a decision and the agent believes that the decision will either increase the reward for the team or does not have enough information to raise an ...

Metareasoning for Concurrent Planning and Execution

... Figure 2: Backed up heuristic error in Dynamic fˆ. the duration of the actions that have been queued for execution (line 6). The longer it will take the agent to execute the actions that have been committed to, the more computation can be done by the search before it must terminate the subsequent it ...

6 Learning in Multiagent Systems

... in so far as it performs all learning activities (in the case of centralized learning), or it may act as a “specialist” in so far as it is specialized in a particular activity (in the case of decentralized learning). (4) Goal-specific features. Two examples of features that characterize learning in ...

artificial intelligence - MET Engineering College

... An omniscient agent knows the actual outcome of its actions and can act accordingly; but omniscience is impossible in reality. Doing actions in order to modify future percepts-sometimes called information gathering-is an important part of rationality. Our definition requires a rational agent not onl ...

the oPerant CondItIonIng oF letter StrIng ProBlem

Locality Preserving Hashing Kang Zhao, Hongtao Lu and Jincheng Mei

... The above problem is difficult to solve since it is a typical combinatorial optimization problem which is usually NP hard. In the next section, we relax the constraints and adopt a coordinate-descent iterative algorithm to get an approximate solution. ...

LAO*: A heuristic search algorithm that finds solutions with loops

... depth one or more, and the current state is updated based on the outcome of the action. In addition, the evaluation function is updated for a subset of states that includes the current state. At the beginning of each trial, the current state is set to the start state. A trial ends when the goal stat ...

Metareasoning in Real-time Heuristic Search

... Because each search iteration is taking place under a time constraint, a real-time search must be careful about its use of computation time. After performing some amount of lookahead, a real-time search will have identified a most promising state on the search frontier, along with the path leading t ...

Considerations on Belief Revision in an Action Theory

... transition system framework, by simply combining an account of reasoning about knowledge with an account of belief revision. This in turn could be accomplished by simply specifying a faithful assignment, as in Definition 2, for every set of states. And indeed it is straightforward to incorporate bel ...

approximate reasoning using anytime algorithms

LNCS 3242 - A Hybrid GRASP – Evolutionary Algorithm Approach to

... Computation (EC) techniques to the search for OGRs. In 1995, Soliday, Homaifar and Lebby [12] used a genetic algorithm on diﬀerent instances of the Golomb Ruler problem. In their representation, each chromosome is composed by a permutation of n − 1 integers that represents the sequence of the n − 1 ...

approximate reasoning using anytime algorithms

This PDF is a selection from an out-of-print volume from... of Economic Research Volume Title: International Economic Policy Coordination

... simpler to specify and implement, but at the cost of some (though not excessive) loss in performance. These results are arrived at by means of methods of policy evaluation that, although non-standard in their use of stochastic control under rational expectations, are typical in their focus on the si ...

HTN Planning Approach Using Fully Instantiated

... possible actions in a planning problem allows to perform a priori study on the reachable propositions in the problem. This study is a necessary prerequisite for the preparation and the development of efficient heuristics to guide search process [18], [21]–[25]. Thirdly, This preprocessing step is a ...

Environments and Problem Solving Methods

... 4. Introduction: Environments and Problem Solving Methods ...

Foundations of Artificial Intelligence

... 4. Introduction: Environments and Problem Solving Methods ...

A Concentration Inequalities B Benchmark C

... linear AMO as follows. The time horizon T = n, and the number of arms K = 2. For each t ∈ [T ], the context of the first arm, xt (1) = zt , and its reward rt (1) = yt . The context of the second arm, xt (2) = 0, the all zeroes vector, and the reward rt (2) is also 0. The total reward of a linear pol ...

Unifying Instance-Based and Rule

... is to try each one in turn, and use cross-validation to choose the one that appears to perform best (Schaffer, 1994a). This is a long and tedious process, especially considering the large number of algorithms and systems now available, and the fact that each typically has options and parameters that ...

A Review of Machine Learning for Automated Plan- ning

... these two knowledge acquisition problems. A comprehensive survey of AP systems that benefit from ML can be found at (Zimmerman and Kambhampati, 2003). At the present time, there is renewed interest in using ML to improve planning. In 2005 the first International Competition on Knowledge Engineering ...

artificial intelligence - ABIT Group of Institutions

... considered subgoals and possible actions was similar to that in which humans approached the same problems. Thus, GPS was probably the first program to embody the "thinking humanly" approach. At IBM, Nathaniel Rochester and his colleagues produced some of the first A1 programs. Herbert Gelernter (195 ...

Kernel Estimation and Model Combination in A Bandit Problem with

... problem. With a model combining strategy and dimension reduction technique to be introduced later, the kernel estimation based randomized allocation strategy can be quite flexible with wide potential practical use. Moreover, in Appendix A, we incorporate the kernel estimation into a UCB-type algorit ...

Introduction of Markov Decision Process

... The rate of decay is related to the characteristic values, which is one over the zeros of the determinant. The characteristic values are 1, 1/4, and 1/4. The decay rate at each step is 1/16. ...

Information Gathering and Reward Exploitation of Subgoals for

... suggests that states associated with high rewards or informative observations may also be important. Our algorithm leverages this structure by attempting to identify these potentially important states and use them as subgoals. Specifically, we define two heuristic functions hr (s) and hi (s) of a st ...

< 1 2 3 4 5 6 ... 17 >

Reinforcement learning

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Reinforcement learning