Download The Implementation of Artificial Intelligence and Temporal Difference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of artificial intelligence wikipedia , lookup

Catastrophic interference wikipedia , lookup

Human-Computer Interaction Institute wikipedia , lookup

Minimax wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Pattern recognition wikipedia , lookup

Computer chess wikipedia , lookup

Reinforcement learning wikipedia , lookup

Human–computer chess matches wikipedia , lookup

Machine learning wikipedia , lookup

Concept learning wikipedia , lookup

Computer Go wikipedia , lookup

Transcript
The Implementation of Artificial
Intelligence and Temporal
Difference Learning Algorithms in
a Computerized Chess
Programme
By James Mannion
Computer Systems Lab 08-09
Period 3
Abstract

Searching through large sets of data

Complex, vast domains

Heuristic searches

Chess

Evaluation Function

Machine Learning
Introduction

Simple domains, simple heuristics

The domain of chess

Deep Blue – brute force

Looking at 30100 moves before making the first

Supercomputer

Too many calculations

Not efficient
Introduction (cont’d)

Minimax search

Alpha-beta pruning

Only look 2-3 moves into the future

Estimate strength of position

Evaluation function

Can improve heuristic by learning
Introduction (cont’d)



Seems simple, but can become quite complex.
Chess masters spend careers learning how to
“evaluate” moves
Purpose: can a computer learn a good
evaluation function?
Background

Claude Shannon, 1950

Brute force would take too long

Discusses evaluation function


2-ply algorithm, but looks further into the future
for moves that could lead to checkmate
Possibility of learning in distant future
Background (cont’d)

D.F. Beal, M.C. Smith, 1999

Temporal Difference learning



Program spends 20,000 games learning the
evaluation function
Beats program that did not learn a function
Shows that programs can make their evaluation
functions better
Background (cont’d)

David E. Moriarty, Riso Miikkulainen

Evolutionary Neural Networks

Othello

Complex Strategies developed
Background (cont’d)

Shiu-li Huang, Fu-ren Lin, 2007

Temporal Difference Learning

Multi-Agent Bargaining

Bargaining, while not necessarily
adversarial, is similar to chess and other
games.
Development

Python

Stage 1: Text based chess game

Two humans input their moves

Illegal moves not allowed
Development (cont’d)
Development (cont’d)
Development (cont’d)
Development (cont’d)
•
Stage 2: Introduce a computer player
•
2-3 ply
•
Evaluation function will start out such that
choices are random
Development (cont’d)

Stage 3: Learning

Temporal Difference Learning


Adjusts the weights of the evaluation
function slightly based on gameplay
Evaluation function updated each time a
game is played
Testing


Learning vs No Learning
Two equal, random computer players pitted
against each other.

One will have the ability to learn

Thousands of games


Win-loss differential tracked over the length
of the test
By the end, the learner should be winning
significantly more games.
References




Shannon, Claude. “Programming a Computer
for Playing Chess.” 1950
Beal, D.F., Smith, M.C. “Temporal Difference
Learning for Heuristic Search and Game
Playing.” 1999
Moriarty, David E., Miikkulainen, Risto.
“Discovering Complex Othello Strategies
Through Evolutionary Neural Networks.”
Huang, Shiu-li, Lin, Fu-ren. “Using TemporalDifference Learning for Multi-Agent
Bargaining.” 2007