Intelligent Agents

... • An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators – Human agent: eyes, ears, and other organs for sensors; hands, legs, mouth, and other body parts for ...

Full size

... In evaluating learning, we will be interested in precision, recall, and performance as the training set size changes. We can also combine poor-performing classifiers to get ...

More data speeds up training time in learning halfspaces over sparse vectors,

Dynamic Potential-Based Reward Shaping

... An MDP is a tuple hS, A, T, Ri, where S is the state space, A is the action space, T (s, a, s0 ) = P r(s0 |s, a) is the probability that action a in state s will lead to state s0 , and R(s, a, s0 ) is the immediate reward r received when action a taken in state s results in a transition to state s0 ...

Artificial Intelligence: Modern Approach

... about the world—how it works, what it is currently like, what one's actions might do—and how to reason logically with that knowledge. Part IV, "Acting Logically," then discusses how to use these reasoning methods to decide what to do, particularly by constructing plans. Part V, "Uncertain Knowledge ...

Reinforcement Learning in the Presence of Rare Events

... done in several ways, including stochastic approximation and the cross-entropy method (Rubinstein and Kroese, 2004; de Boer et al., 2002). The goal of these methods is to find a change of measure such that the variance in the rare event probability estimator is minimized. Finding an optimal change o ...

Solving Large Markov Decision Processes (depth paper)

Combining Rule Induction and Reinforcement Learning

... to compete with existing routing or planning algorithms, but rather to study the effect of combining reinforcement and rule learning. The specific problem considered here is how quickly different learning strategies (Q-learning, rule learning + Qlearning) converge to an optimal or near optimal routi ...

Behavioural Abnormality

... Anything which has the effect of increasing the likelihood of the behaviour being repeated by using consequences that are pleasant when they stop Anything unpleasant which has the effect of decreasing the likelihood of any behaviour which is not the desired behaviour. ...

Filtering Actions of Few Probabilistic Effects

... Pre: safe-open ν com1 Eff: safe-open • (try-com1-fail): 0.2 Pre: safe-open v com1 Eff : ~safe-open ...

Intelligence: Real and Artificial

... Discrete control theory approaches Empirical evaluations Evaluations and Implemented systems Fuzzy control techniques Graphplan-based algorithms MDP planning Partial-order planning Planning using dynamic belief networks Scheduling algorithms Specialized planning algorithms Robotics Tasks or Problems ...

Compute-Intensive Methods in Artificial Intelligence

... finding procedures have significantly extended the range and size of constraint and satisfiability problems that can be solved effectively. It has now become feasible to solve problem instances with tens of thousands of variables and up to several million constraints. Being able to tackle problem en ...

Current and Future Trends in Feature Selection and Extraction for

... aforementioned Data and Web Mining, where text mining is a central issue. The classical text classification is based on the vector-space model studied in the area of information retrieval. The basic challenge for this model is the large number of terms (features) compared to a relatively small numbe ...

Overview of Artificial Intelligence

... – build programs to simulate inference, learning... ...

Resources - CSE, IIT Bombay

... Nodes from open list are taken in some order, expanded and children are put into open list and parent is put into closed list. Assumption: Monotone restriction is satisfied. That is the estimated cost of reaching the goal node for a particular node is no more than the cost of reaching a child and th ...

The Robustness-Performance Tradeoff in Markov Decision Processes

Decision-Theoretic Planning for Multi

... 1. Guess a joint policy and write it down in exponential time. This is possible, because a joint policy consists of n mappings from observation histories to actions. Since h ≤ |S|, the number of possible histories is exponentially bounded by the problem description. 2. The DEC-POMDP together with th ...

Planning with Partially Specified Behaviors

... policy, i.e. a mapping from states to actions, that maximizes some measure of expected future reward. Most RL algorithms are value-based, maintaining and updating a value function that implicitly defines a policy. Model-free RL algorithms can learn an optimal or near-optimal policy even in the absen ...

Introduction - The MIT Press

... natural language processing, speech recognition , vision , robotics , planning , game playing , pattern recognition , expert systems, and so on . In principle , progress in ML can be leveraged in all these areas; it is truly at the core of artificial intelligence . Recently , machine learning resear ...

Reinforcement and Shaping in Learning Action Sequences with

... Further, we have demonstrated how goal-directed sequences of EBs can be learned from reward, received by the agent at the end of a successful sequence [7]. The latter architecture has integrated the DFT-based system for behavioural organization with the Reinforcement Learning algorithm (RL; [8], [9] ...

LTFeb7

... Amsel – nonreward elicits frustration, an aversive state. ...

Theory and applications of convex and non-convex

... or reflection operator RC := 2PC − I on a closed convex set C in Hilbert space. These methods work best when the projection on each set Ci is easy to describe or approximate. These methods are especially useful when the number of sets involved is large as the methods are fairly easy to parallelize. ...

Machine Learning

... • What function is to be learned and how will it be used by the performance system? • For checkers, assume we are given a function for generating the legal moves for a given board position and want to decide the best move. – Could learn a function: ChooseMove(board, legal-moves) → best-move – Or cou ...

Document

The Size of MDP Factored Policies

... MDP M, one of its states s, and one of its actions a, decide whether a is the action to execute in s according to some optimal policy. We prove this problem to be k;NP-hard. This is done by first showing it NP-hard, and then proving that the employed reduction is monotonic. Let Π be a set of clauses ...

< 1 ... 6 7 8 9 10 11 12 13 14 ... 17 >

Reinforcement learning

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Reinforcement learning