Download 1 Problem Statement and Motivation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Transcript
Artificial WINtelligence
Group Members:
Jamaal Alleyne, C. Barrett Ames, Daniel Sullivan
TA:
Yue Gao
1
Problem Statement and Motivation
1.1
Problem Statement
Given sample code for a game of modified Chinese Checkers, improve it to move
our pieces from their starting positions to occupy the opponent’s pieces starting
positions, first.
1.2
Project Scope
Project Objective
Artificial WINtelligence seeks to improve the given sample code to learn and
adapt in an evolving Chinese Checkers environment, while maneuvering its
pieces to victoriously occupy the starting positions of the opponent’s pieces.
Deliverables
 Preliminary proposal due September 13th 2011
 Final proposal due September 16th 2011
 Progress Updates to Teaching Assistant due weekly
 Code review #1 due Tuesday October 4th 2011
 Code review #2 due Tuesday November 1st 2011
 Final Presentation due Tuesday November 29th 2011
 Final Project Reports due Friday December 2nd 2011
Behaviors
The program will be interacting with a game server and based on the input
gathered from the game server the program will react in a certain way. In order to
do this the program is expected to process the input provided by the game server.
Processing this input would involve:



Determining the current state of the board
Ascertaining the next best action in reaching the opponent’s starting
position, using adversarial search
Learning from experience of past games played
On processing the given input the program will return the decided move.
Technical Requirements
 Java programming Language
 Github code repository
Limitations and Constraints




2
Expected runtime of application is 10 minutes
Application should not communicate outside of the server
Application should not access files it does not own
Application should not interfere with other processes
Input-Output Specification
The input is managed via stdin and stdout. For input, according to the 4701 wiki: 0 means
game start, 1 means request move, 2 means the opponent moved, and 3 means error. An
opponent’s move is tracked by start cell and end cell. The output of the AI will be a move
described in a fashion such that the server can understand it.
3
Background Research
Most of the information that was researched came from Foundations of Artificial
Intelligence: A Modern Approach. In addition, several games of Chinese Checkers were
played, as well some online strategy guides that spoke of important plays made by the
professional Chinese Checkers player.
4
General Approach
The approach that will be taken is to have a set opening move. This comes to us from the
study of Chinese Checkers Strategy. There are a few opening moves that are always used by
the experts, and the game play doesn’t really get interesting until the pieces begin to interact.
Thus until the pieces are interacting a script will be followed. Once the pieces begin to
interact the moves will be made using an adversarial search algorithm paired with
reinforcement learning to increase its long term WINtelligence.
4.1 “Where’s the AI?”
Adversarial Search
The adversarial search will utilize the minimax algorithm to determine the most
optimal move to be taken at a given state. The minimax algorithm is used to
determine the minimax value of a given state. If we are player 1 the state we
move to would be based on whether that state has the maximum value among its
sibling states. On the other hand if we are player 2 the state we move to would be
based on whether that state has the minimum value among it sibling states.
The proposed algorithm for the minimax search, as taken from the text Artificial
Intelligence: A Modern Approach, is as follows:
function MINIMAX-Decision(state) returns an action
return arg maxa ε Actions(s) MIN_VALUE(RESULT(state,a))
end function
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
-∞
v
for each a in Actions(state) do
v
MAX(v,MIN-VALUE(RESULT(s,a)))
end for
return v
end function
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v
∞
for each a in ACTIONS(state) do
v
MIN(v, MAX-VALUE(RESULT(s,a)))
end for
return v
end function
MAX
MIN
E (13)
F (15)
A (13)
B (13)
G (14)
C (9)
H (34) I (9)
(23)
Figure 1
D (10)
J (45)
K (10)
L (22)
M
Implementing the minimax algorithm on the above tree to find the MAX value at
the root A, would result in a maximum value of 13. This is the result because 13
is the maximum of the values of A’s successors B, C and D, which are 13, 9 and
10 respectively. The values 13, 9 and 10 are the minimum values of the
successors of B, C and D respectively. This example shows that the entire tree
would need to be searched in order to determine the maximum value of the root.
This can be improved using alpha-beta pruning.
Using alpha-beta pruning can reduce the need to examine every node in the tree to
determine the next best move based on the minimax algorithm. For example in
figure 1 alpha beta pruning can be applied to the minimax algorithm to reduce the
amount of nodes processed. Consider the following set of equations:
MINIMAX(root) = max(min(13,15,14), min(34,9,45), min(10,22,23,))
=max(13,9,(10,x,y))
=max(13,9,z) where z = min(10,x,y) <=10
Max =13
Utilizing the alpha beta function has therefore resulted in only 1 of Ds successors
from being generated. This is due to the minimum value of A being greater than
the utility value of the first successor of D. This therefore nullifies the need to
continue generating Ds successors because the minimum value will be <=10
which will always be < 13.
This may however prove to be too time consuming because of the depth of the
tree. If that proves to be the case, then the method explained in the book for
imperfect real-time decisions will be utilized. In short, this allows the algorithm to
make quicker decisions by giving leaf nodes at a certain depth a terminal state
with a value decide on by a heuristic.
Reinforcement Learning
Passive reinforcement learning will be used to learn the utility functions that
should be applied to moves. This will help the adversarial search become more
accurate at choosing the proper plan. It will also help significantly if (when)
imperfect real time decisions must be made. The reinforcement learning will help
create a good heuristic function for the terminal states.
5
System Architecture
5.1
Input System
Purpose
This is the eye of the AI. It acts as the sensory node because the board
state can be read in from the server, as well as the actions of the opponent. The
random bot provides a base for this. However, an internal representation of the
world will have to be made. This representation will be used for making goals and
tracking the overall state of the game.
Development and Testing
The interface for the data structure will be agreed upon at the beginning of
development, this will allow other sections of the code to be developed without
the data structure. One individual will work it on until it has been completed, and
tested thoroughly. It shall be tested using unit tests of data structure, these will
ensure that the data passed into it is being properly organized, and it will prevent
future changes from breaking the current build.
5.2
Learning System
Purpose
The learning system looks at what the goals were from the last plan, the
actions that were taken to carry those goals out, and the current state, to see if the
proposed actions carried out the plan. It should reinforce good behaviors that lead
the program closer to the goal and should negatively influence goals and behaviors
that lead the program farther away from its goals.
Development and Testing
The selection of metrics by which the program can measure its progress is a
very important part of developing a good learning system. Thus an abundance of
features will be selected at first. The system will be run nightly against other
players and it’s performance will be judged on how well it’s learning from those
interactions, as well as how well it’s actually playing. This will be the last piece of
the code to be developed.
5.3
Goal System
Purpose
The goal system interfaces with both the input and output systems. It uses
the data structure the input system provides as the start point from which it plans.
Once it has created a plan that will maximize it’s utility function, it sends the plan
to the output system, which generates the proper actions to carry this plan out.
Development and Testing
The utility function that the goal system will attempt to maximize shall be
agreed on before the development of the goal system. Testing of the goal system
will occur by comparing the actual maximum value of the utility function
compared to what the goal system is selecting. For trivial cases this can be
checked by a unit test. Also by playing against an opponent that performs the
same moves every time / uses the same strategy every time, the utility function
maximum can be compared to previous times.
5.4
Output System
Purpose
This is the action center of the AI. It translates the process that the goal
planning section comes up with into movements of the piece.
Development and Testing
This will be tested by using unit tests to make sure the actions sent to it are
being properly carried out, and that if the action is not properly carried out it is
notifying the goal system. The API will be decided upon before hand so that other
systems can be developed independently.
6
Evaluation Plans
Possible evaluation metrics include the degree of grouping, efficiency measured as the
number of jumps, and the number of moves needed to reach the goal state. A “toy” task such
as if the AI can get all of the pieces to the other side of the board when there is no opponent
is an important test. The “hard” test will be to play against human players and see how
effective it is.