Learning Domain-Specific Control Knowledge from Random Walks Alan Fern

... take n uniformly random actions2 to produce the state sequence (s0 , s1 , . . . , sn ). At each uniformly random action selection, we assume that an extra “no-op” action (that does not change the state) is selected with some fixed probability, for reasons explained below. Finally, return the plannin ...

AAAI Proceedings Template

... these agents was propositional representations of the 123 RPM-inspired visual analogy problems. The propositional representations were written by the instructors of the course to prevent students from building inferential advantages into the representations. During the design of their agents, studen ...

View - Association for Computational Creativity

... Agent 1 uses an intermediate propositional knowledge representation for working memory. In the agent’s representation, each frame in an RPM consists of objects, and each object consists of the following attributes: shape, size, fill, rotation, and relative-position to other shapes. A library ...

Team-Maxmin Equilibrium: Efficiency Bounds and Algorithms

... strategy si is mixed, as defined above for a generic normal– form game. In other words, teammates can jointly decide their strategies, but they cannot synchronize their actions, which must be drawn independently. The appropriate solution concept in such cases is the Team–maxmin equilibrium. A Team–m ...

artificial intelligence - cs2302 computer networks

... 1.1.4 The state of art What can A1 do today? Autonomous planning and scheduling: A hundred million miles from Earth, NASA's Remote Agent program became the first on-board autonomous planning program to control the scheduling of operations for a spacecraft (Jonsson et al., 2000). Remote Agent generat ...

1.1.1 What is artificial intelligence?

... 1.1.4 The state of art What can A1 do today? Autonomous planning and scheduling: A hundred million miles from Earth, NASA's Remote Agent program became the first on-board autonomous planning program to control the scheduling of operations for a spacecraft (Jonsson et al., 2000). Remote Agent generat ...

Dopamine: generalization and bonuses

... that the learned values of states estimate the sum of all the delayed rewards starting from those states. Thus, states with high value are good destinations, even if the act of getting to them does not itself lead to substantial reward. More concretely, a policy is a systematic, though possibly stoc ...

Hybrid Reasoning Model for Strengthening the problem solving

... The advantages of case-based reasoning include: 1) The ability to encode historical knowledge directly. In many domains, cases can be obtained from existing case histories, repair logs, or other sources, eliminating the need for intensive knowledge acquisition with a human expert. 2) Allows shortcut ...

A New Hybrid PSOGSA Algorithm for Function Optimization

... exploration and exploitation. Exploration is the ability of an algorithm to search whole parts of problem space whereas exploitation is the convergence ability to the best solution near a good solution. The ultimate goal of all heuristic optimization algorithms is to balance the ability of exploitat ...

Multi-objective Optimization Using Particle Swarm Optimization

... – Both update the population and search for the optimum with random techniques. – Both systems do not guarantee success. • Dissimilarity – However, unlike GA, PSO has no evolution operators such as crossover and mutation. – In PSO, the potential solutions, called particles, fly through the problem s ...

Replace Missing Values with EM algorithm based on GMM

Problem Solving and Search

... 3.2 Uniform-cost search Complete? Yes (if step cost ≥ 0) Time? # of nodes with path cost g ≤ cost of optimal solution Space? # of nodes with g ≤ cost of optimal solution Optimal? Yes (if step cost ≥ 0) ...

Goal-Based Action Priors - Humans to Robots Laboratory

... Robotic planning tasks are often formalized as a stochastic sequential decision making problem, modeled as a Markov Decision Process (MDP) (Thrun, Burgard, and Fox ...

1 Hybrid Evolutionary Algorithms: Methodologies, Architectures, and

... Summary. Evolutionary computation has become an important problem solving methodology among many researchers. The population-based collective learning process, selfadaptation, and robustness are some of the key features of evolutionary algorithms when compared to other global optimization techniques ...

Maximum likelihood bounded tree-width Markov networks

... is known to be hard [5], the hardness of many restricted learning scenarios is not wellunderstood. The potential for guaranteed approximation algorithms is also unresolved. Here, we focus on the simpler maximum likelihood criterion, in which regularization is attained solely by limiting the models t ...

Evolutionary Algorithms

... The original work on evolution strategies (Schwefel, 1965) used a (1 + 1) strategy This took a single parent and produced a ...

Equilibrium Strategies for Multi-unit Sealed

... multiple units of the good sold in a multi-unit auction. Examples of such auctions are the US treasury bills auctions and the FCC spectrum auctions. In fact, the latter have reinvigorated the research on auctions with multi-unit demand bidders in the mid 90’s. Before reviewing some of the literature ...

Original Article A shifted hyperbolic augmented Lagrangian

... Metaheuristics are approximate methods or heuristics that are designed to search for good solutions, known as near-optimal solutions, with less computational effort and time than the more classical algorithms. While heuristics are tailored to solve a specific problem, metaheuristics are general-purp ...

Why Dreyfus’ Frame Problem Argument Cannot Justify Anti- Representational AI

... Now while this solves the logical frame problem, the challenge of logically representing the commonsense law of inertia, it doesn't solve the related problem I alluded to above: How does a system engineer produce the correct set of axioms, the list of properties that do change given an action, in a ...

Strong Cyclic Planning with Incomplete Information and Sensing

... The language KL makes use of epistemic formulas of the logic ALCKN F (see (De Giacomo et al. 1997; Iocchi et al. 2000) for details). More specifically, we introduce a set of primitive properties (or fluents) P , that will be used to characterize the possible states of the world. The primitive fluent ...

CS6659-ARTIFICIAL INTELLIGENCE

... the agent needs some sort of goal information that describes situations that are desirable-for example, being at the passenger's destination. 10. What are utility based agents? Goals alone are not really enough to generate high-quality behavior in most environments. For example, there are many actio ...

Plan Recognition As Planning

... action sequence in Π∗P (G), if one exists, for any planning domain P and goal G. The cost of such a plan is the optimal cost c∗P (G). This computation is done by running an admissible search algorithm like A* or IDA* along with an admissible heuristic h that is extracted automatically from the probl ...

Heuristics, Planning and Cognition

... those differences [Newell and Simon 1963]. Since then, the idea of means-ends analysis has been refined and extended in many ways, seeking planning algorithms that are sound (only produce plans), complete (produce a plan if one exists), and effective (scale up to large problems). By the early 90’s, ...

Lakireddy Bali Reddy College of Engineering, Mylavaram

... Assuming that no machine needs adjustments twice on the same day, determine the probabilities that on a particular day. (i) just 2 old and no new machines need adjustment. (ii) if just 2 machines need adjustment, they are of the same type. [Ans. 0.016;0.028] Problem 7: An irregular six faced die is ...

Planning with Partially Specified Behaviors

... are deterministic from the perspective of planning: when performing a task, even though individual actions are stochastic, the task policy keeps executing until the task objective is met. The environments we are working with have no deadends. Thus, each task can be trained independently, but still i ...

< 1 ... 5 6 7 8 9 10 11 12 13 ... 27 >

Multi-armed bandit

In probability theory, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a gambler at a row of slot machines (sometimes known as ""one-armed bandits"") has to decide which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.Robbins in 1952, realizing the importance of the problem, constructed convergent population selection strategies in ""some aspects of the sequential design of experiments"".A theorem, the Gittins index published first by John C. Gittins gives an optimal policy in the Markov setting for maximizing the expected discounted reward.In practice, multi-armed bandits have been used to model the problem of managing research projects in a large organization, like a science foundation or a pharmaceutical company. Given a fixed budget, the problem is to allocate resources among the competing projects, whose properties are only partially known at the time of allocation, but which may become better understood as time passes.In early versions of the multi-armed bandit problem, the gambler has no initial knowledge about the machines. The crucial tradeoff the gambler faces at each trial is between ""exploitation"" of the machine that has the highest expected payoff and ""exploration"" to get more information about the expected payoffs of the other machines. The trade-off between exploration and exploitation is also faced in reinforcement learning.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Multi-armed bandit