Download The Implementation of Artificial Intelligence and Temporal Difference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of artificial intelligence wikipedia , lookup

Human-Computer Interaction Institute wikipedia , lookup

Catastrophic interference wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Pattern recognition wikipedia , lookup

Minimax wikipedia , lookup

Reinforcement learning wikipedia , lookup

Computer chess wikipedia , lookup

Concept learning wikipedia , lookup

Machine learning wikipedia , lookup

Computer Go wikipedia , lookup

Transcript
The Implementation of Artificial
Intelligence and Temporal
Difference Learning Algorithms in
a Computerized Chess Program
By James Mannion
Computer Systems Lab 08-09
Period 3
Abstract






Searching through large sets of data
Complex, vast domains
Heuristic searches
Chess
Evaluation Function
Machine Learning
Introduction







Games
Minimax search
Alpha-beta pruning
Only look 2-3 moves into the future
Estimate strength of position
Evaluation function
Can improve heuristic by learning
Introduction



Seems simple, but can become quite complex.
Chess masters spend careers learning how to
“evaluate” moves
Purpose: can a computer learn a good
evaluation function?
Background





Claude Shannon, 1950
Brute force would take too long
Discusses evaluation function
2-ply algorithm, but looks further into the future
for moves that could lead to checkmate
Possibility of learning in distant future
Development




Python
Stage 1: Text based chess game
Two humans input their moves
Illegal moves not allowed
Development
Development
Development
Development
•
Stage 2: Introduce a computer player
•
2-3 ply
•
Evaluation function will start out such that
choices are based on a simple piecedifferential where each piece is waited equally
Development







Stage 3: Learning
Temporal Difference Learning
Weight adjustment:
w ← w + a*(Pt - Pt-1)*∂wPt-1
a = 200/(199 + n)
P = 1/(1 + e-h)
h = w1(j1 – k1) + … + w5(j5 – k5)
Testing





Learning vs No Learning
Two equal, piece-differential players pitted
against each other.
One will have the ability to learn
Multiple Games
Weight values and win-loss differential
tracked over the length of the test
Results
Change In Weights Over Time
5
4
Weight
3
Pawn
Knight
Bishop
Rook
Queen
2
1
0
1
9
17
25 33
41
49 57 65 73
-1
Turns Taken (10 turns)
81
89 97 105
Results
Win Percentage Over Time
1
0.9
Win Percentage
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
Games Played
15
Results
•
•
•
•
•
Weights changed
This affected performance
Equilibrium values reached
Program actually got worse at chess
Probably due to code error
References




Shannon, Claude. “Programming a Computer
for Playing Chess.” 1950
Beal, D.F., Smith, M.C. “Temporal Difference
Learning for Heuristic Search and Game
Playing.” 1999
Moriarty, David E., Miikkulainen, Risto.
“Discovering Complex Othello Strategies
Through Evolutionary Neural Networks.”
Huang, Shiu-li, Lin, Fu-ren. “Using TemporalDifference Learning for Multi-Agent
Bargaining.” 2007