Part I: Heuristics

... Defined in either way, a stochastic shortest-path problem is a special case of a fully-observable infinite-horizon Markov decision process (MDP). There are several ...

Using a Goal-Agenda and Committed Actions in Real

... introduced by [11]. It does not only order the (top-level) goals, but also the sub-goals that will necessarily arise during planning, i.e., it also takes into account what they called the “landmarks”. The key feature of a landmark is that it must be true at some point on any solution path to the giv ...

Literature Review on Feature Selection Methods for High

... The best example for the feature subset-based method is correlation-based feature subset selection (CRFS) developed by Hall [3]. In this approach, two correlation measures are considered; one is feature-class correlation and another one is feature-feature correlation. Initially, N numbers of feature ...

E - Read

... Implementation is relatively simple, – compared to their deterministic counterparts. ...

Policies for a Multi-Class Queue with Convex Holding Cost and

A Classification of Hyper-heuristic Approaches

... without learning. Hyper-heuristics without learning include approaches that use several heuristics (neighbourhood structures), but select the heuristics to call according to a predetermined sequence. Therefore, this category contains approaches such as variable neighbourhood search [42]. The hyper-h ...

Multi-Period Stock Allocation Via Robust Optimization

Integrating Planning, Execution and Learning to Improve Plan

... the-shelf spirit of the architecture allows pela to acquire other useful execution information, such as the actions durations (Lanchas et al., 2007). 3.1. Learning rules about the actions performance For each action a ∈ A, pela learns a model of the performance of a in terms of these three classes: ...

Explanation-Based Generalization: A Unifying View

... knowledge, these methods are able to produce a valid generalizationof the example along with a deductive justification of the generalization in terms of the system 's klwwleT}g"eTMo'rVpTecisdy, these exptanaTiori-based methods2 analyze the training exampleby first constructing an explanationof how t ...

Efficient Dynamic Allocation with Strategic Arrivals

Incremental Heuristic Search in AI

... We now discuss one particular way of solving fully dynamic shortest-path problems. As an example, we use route planning in known eight-connected gridworlds with cells whose traversability changes over time. They are either traversable (with cost one) or untraversable. The route-planning problem is t ...

Non-Deterministic Planning with Temporally Extended Goals

... can often be leveraged. The more challenging problem of planning with non-deterministic actions and LTL goals has not been studied to the same extent; Kabanza, Barbeau, and St.-Denis (1997), and Pistore and Traverso (2001) have proposed their own LTL planners, while Patrizi, Lipovetzky, and Geffner ...

Stochastic Search and Surveillance Strategies for

Moral Hazard and the Spanning Condition

... of conditions under which the FOA is valid.1 Nevertheless, it is well-understood that the FOA is not always valid. That is, sometimes the agent must be prevented from deviating to more remote actions; “non-local”incentive compatibility constraints may be binding. Another common approach simply assum ...

Introduction to Artificial Intelligence

... search (2:38) A∗ : Proof 1 of Optimality (2:40) Complexity of A∗ (2:41) A∗ : Proof 2 of Optimality (2:42) Admissible heuristics (2:44) Memory-bounded A∗ (2:47) ...

Monte-Carlo Tree Search for the Multiple Sequence Alignment Problem

... expansion (Hatem and Ruml 2013). Still, the memory requirements raise exponentially with the problem complexity (measured in the sum of the input sequences). In this paper we apply fixed-memory-bound randomized search that incorporates no expert knowledge in form of refined heuristics. The algorithm ...

airline seat allocation with multiple nested fare classes - U

... demand is exhausted. Sales to this fare class are then closed, and sales to the class with the next lowest fare are begun, and so on for all fare classes. It is assumed that any time limits on bookings for fare classes are prespecified. That is, the setting of such time limits is not part of the pro ...

Schematic Invariants by Reduction to Ground Invariants

... In this work we devise algorithms that share the benefits of both approaches: the algorithms are simple and efficient to implement (similar to invariant algorithms based on grounding), and scale well even when the number of ground instances is very high (similar to schematic invariant algorithms.) C ...

Separate-and-Conquer Rule Learning

... Figure 4 shows a generic separate-and-conquer rule learning algorithm that calls various subroutines which can be used to instantiate the generic algorithm into specific algorithms known from the literature. SEPARATEANDCONQUER starts with an empty theory. If there are any positive examples in the tr ...

Introduction to Artificial Intelligence (Undergraduate Topics in

... undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to ﬁnal-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one- or two-semester course. The texts are all ...

Examining Random Number Generators used in Stochastic Iteration

... theoretically they should provide false results. This is obvious for the Hammersley sequence, in which the first coordinate is increasing. It is less obvious, but is also true for the Halton sequence. Due to its construction using radical inversion, the subsequent points in the sequence are rather f ...

Knowledge Acquisition Via Incremental Conceptual Clustering

Time Series Prediction and Online Learning

... our knowledge, the only exception is the recent work of Kuznetsov and Mohri (2015), who analyzed the general non-stationary and non-mixing scenario and gave high-probability generalization bounds for this framework. The on-line learning scenario requires no distributional assumption. In on-line lear ...

Tree-Based State Generalization with Temporally Abstract Actions

1 2 3 4 5 ... 17 >

Reinforcement learning

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Reinforcement learning