IJAI-13 - aut.upt.ro

... different perspectives, the segmentation issue is in fact a multi-objective problem, and also analyzed, according to different techniques available in the literature, how this nature had been faced. It was also shown that, given that some key dominant points are shared by different solutions with di ...

Lecture II -- Problem Solving and Search

... Can be modified to explore uniform-cost tree ...

Linear-Time Filtering Algorithms for the Disjunctive Constraint

... sorted by est, lct, ect, lst, and processing times. We assume that all time points are encoded with w-bit integers and that these sets can be sorted in linear time O(n). This assumption is supported by the fact that a word of w = 32 bits is sufficient to encode all time points, with a precision of a ...

Accepting Optimally in Automated Negotiation with Incomplete

... cases, agents have only incomplete information about each other [5, 9, 15]. In this paper, we explore in particular the setting where the opponent has only limited or no knowledge of A’s preferences, and the proposals that A receives will therefore be necessarily uncertain. This makes A’s task of pr ...

Altered neural reward and loss processing and

... neural responses during anticipation and receipt of gains and losses and related PE-signals. Additionally, we assessed the relationship between neural responsivity during gain/loss processing and hedonic capacity. When compared with healthy controls, depressed individuals showed reduced fronto-stria ...

POMDP solution methods - Department of Computer Science

... the underlying environment state. The POMDP formalism is very general and powerful, extending the application of MDPs to many realistic problems. Unfortunately, the generality of POMDPs entails high computational cost. The problem of finding optimal policies for finite-horizon POMDPs has been proven ...

Motor planning under unpredictable reward: modulations of

... they would be rewarded only in the trials immediately following withheld rewards. In these trials, the animals responded sooner and moved faster. Single-unit recordings from the dorsal striatum revealed modulations in neural firing that reflected changes in movement vigor. First, in the trials with ...

INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9

... - It can be easily modified to correct errors and reflect changes in real conditions. - It can be widely used even if it is incomplete or inaccurate. - It can be used to help overcome its own sheer bulk by helping to narrow the range of possibilities that must be usually considered. Problem Spaces a ...

Paper

... real world domain contain very valuable planning knowledge. In order to make this compiled knowledge re-usable for novel situations, a specific integrated knowledge acquisition method has been developed: First, a domain theory is established from documentation materials or texts, which is then used ...

Semi-supervised collaborative clustering with partial background

... initialize) the clusters of the k-means algorithm. Two algorithms, seeded kmeans and constrained kmeans, are presented. In the first algorithm, the samples are only used to initialize the clusters and can eventually be affected to another class during the clustering process. In the second algorithm, ...

A case-based expert system for scheduling problems

... system approach cannot be applied. In scheduling problems, theoretical knowledge and some heuristics exists. Unfortunately, practical knowledge and experience may be of limited value. A number of review papers have been published in this area (Steffen [13], Kusiak and Chen [14], Charalambous and Hin ...

math 214 (notes) - Department of Mathematics and Statistics

... deviation is σ = $900. Suppose that a random sample of 50 state universities will be selected. a. Show the sampling distribution of x̄ where x̄ is the sample mean tuition cost for the 50 state universities. b. What is the probability that the simple random sample will provide a sample mean within $2 ...

Lecture Slides (PowerPoint)

... • History Table: track how often a particular move at any depth caused αβpruning or had best minimax value ...

Goal Recognition Design - Association for the Advancement of

... fi is a copy of F for agent i, split is a fluent representing the no-cost action DoSplit has occurred, and done0 is a fluent indicating the no-cost Done0 has occurred. The initial state is common to both agents and does not include the split and done0 fluents. Until a DoSplit action is performed, th ...

Distributed Constraint Satisfaction Algorithm for Complex Local

... agent must exhaustively search its local problem in order to change the bad decision made by the higher priority agent. When a local problem becomes large and complex, conducting such an exhaustive search becomes impossible. This approach is similar to that in method 1 described above, except that e ...

Multi-Robot Box-Pushing in Presence of Measurement Noise

... the paper lies in formulating the objective functions i) to confirm that the energy- and time-optimal local planning aligns the box towards its pre-defined goal position and ii) to ensure smoothness of the planned trajectory. Traditional approaches of calculus-based MOO, usually, cannot be used to h ...

Final Course Review

... 1. The essence of the frame is that it is a module of knowledge about something which we can call a concept. This can be a situation, an object, a phenomenon, a relation. 2. Frames contain smaller pieces of knowledge: components, attributes, actions which can be (or must be) taken when conditions fo ...

Symbol Acquisition for Probabilistic High

... associated with the probability P (C(s) = 1) of some condition C holding at every state s ∈ S. Symbols of type 2 are probabilistic classifiers, giving the probability that a condition holds for every state in S. Since the agent operates in an SMDP, a condition either holds at each state or it does n ...

A Stochastic Algorithm for Feature Selection in Pattern Recognition

... discriminant properties . In a recent work of Fleuret (2004), the author suggests to use mutual information to recursively select features and obtain performance as good as that obtained with a boosting algorithm (Friedman et al., 2000) with fewer variables. Weston et al. (2000) and Chapelle et al. ...

Conflict-Based Search For Optimal Multi

... Other decoupled approaches establish flow restrictions similar to traffic laws, directing agents at a given location to move only in a designated direction (Wang and Botea 2008; Jansen and Sturtevant 2008). Decoupled approaches run relatively fast, but optimality and even completeness are not always ...

Modeling Opponent Decision in Repeated One

... our framework has full information about the indivisible item being negotiated, but the other agent, without this information, moves first. In our setting, the buyer always gains if there is a contract. If the seller rejects the proposal, the buyer gets nothing. So, the buyer’s goal is to make the c ...

artificial intelligence - ABIT Group of Institutions

... tests, such as the one below ...

tablefinal

... world becomes more first at the same time more and more complex and competitive so that decision making must be taken in an optimal way for better results in a faster way. Therefore optimization is very important and concerning act of obtaining the best result under given situations. This is the rea ...

Hindsight Optimization for Probabilistic Planning with Factored Actions

... scalability and/or have been restricted to finding open-loop plans. Furthermore, these approaches have so far only been developed for concrete action spaces and hence are not applicable to factored action spaces. The question then is whether one can develop a reduction-based approach, which is both ...

< 1 ... 4 5 6 7 8 9 10 11 12 ... 27 >

Multi-armed bandit

In probability theory, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a gambler at a row of slot machines (sometimes known as ""one-armed bandits"") has to decide which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.Robbins in 1952, realizing the importance of the problem, constructed convergent population selection strategies in ""some aspects of the sequential design of experiments"".A theorem, the Gittins index published first by John C. Gittins gives an optimal policy in the Markov setting for maximizing the expected discounted reward.In practice, multi-armed bandits have been used to model the problem of managing research projects in a large organization, like a science foundation or a pharmaceutical company. Given a fixed budget, the problem is to allocate resources among the competing projects, whose properties are only partially known at the time of allocation, but which may become better understood as time passes.In early versions of the multi-armed bandit problem, the gambler has no initial knowledge about the machines. The crucial tradeoff the gambler faces at each trial is between ""exploitation"" of the machine that has the highest expected payoff and ""exploration"" to get more information about the expected payoffs of the other machines. The trade-off between exploration and exploitation is also faced in reinforcement learning.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Multi-armed bandit