Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Artificial WINtelligence Group Members: Jamaal Alleyne, C. Barrett Ames, Daniel Sullivan TA: Yue Gao 1 Problem Statement and Motivation 1.1 Problem Statement Given sample code for a game of modified Chinese Checkers, improve it to move our pieces from their starting positions to occupy the opponent’s pieces starting positions, first. 1.2 Project Scope Project Objective Artificial WINtelligence seeks to improve the given sample code to learn and adapt in an evolving Chinese Checkers environment, while maneuvering its pieces to victoriously occupy the starting positions of the opponent’s pieces. Deliverables Preliminary proposal due September 13th 2011 Final proposal due September 16th 2011 Progress Updates to Teaching Assistant due weekly Code review #1 due Tuesday October 4th 2011 Code review #2 due Tuesday November 1st 2011 Final Presentation due Tuesday November 29th 2011 Final Project Reports due Friday December 2nd 2011 Behaviors The program will be interacting with a game server and based on the input gathered from the game server the program will react in a certain way. In order to do this the program is expected to process the input provided by the game server. Processing this input would involve: Determining the current state of the board Ascertaining the next best action in reaching the opponent’s starting position, using adversarial search Learning from experience of past games played On processing the given input the program will return the decided move. Technical Requirements Java programming Language Github code repository Limitations and Constraints 2 Expected runtime of application is 10 minutes Application should not communicate outside of the server Application should not access files it does not own Application should not interfere with other processes Input-Output Specification The input is managed via stdin and stdout. For input, according to the 4701 wiki: 0 means game start, 1 means request move, 2 means the opponent moved, and 3 means error. An opponent’s move is tracked by start cell and end cell. The output of the AI will be a move described in a fashion such that the server can understand it. 3 Background Research Most of the information that was researched came from Foundations of Artificial Intelligence: A Modern Approach. In addition, several games of Chinese Checkers were played, as well some online strategy guides that spoke of important plays made by the professional Chinese Checkers player. 4 General Approach The approach that will be taken is to have a set opening move. This comes to us from the study of Chinese Checkers Strategy. There are a few opening moves that are always used by the experts, and the game play doesn’t really get interesting until the pieces begin to interact. Thus until the pieces are interacting a script will be followed. Once the pieces begin to interact the moves will be made using an adversarial search algorithm paired with reinforcement learning to increase its long term WINtelligence. 4.1 “Where’s the AI?” Adversarial Search The adversarial search will utilize the minimax algorithm to determine the most optimal move to be taken at a given state. The minimax algorithm is used to determine the minimax value of a given state. If we are player 1 the state we move to would be based on whether that state has the maximum value among its sibling states. On the other hand if we are player 2 the state we move to would be based on whether that state has the minimum value among it sibling states. The proposed algorithm for the minimax search, as taken from the text Artificial Intelligence: A Modern Approach, is as follows: function MINIMAX-Decision(state) returns an action return arg maxa ε Actions(s) MIN_VALUE(RESULT(state,a)) end function function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) -∞ v for each a in Actions(state) do v MAX(v,MIN-VALUE(RESULT(s,a))) end for return v end function function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v ∞ for each a in ACTIONS(state) do v MIN(v, MAX-VALUE(RESULT(s,a))) end for return v end function MAX MIN E (13) F (15) A (13) B (13) G (14) C (9) H (34) I (9) (23) Figure 1 D (10) J (45) K (10) L (22) M Implementing the minimax algorithm on the above tree to find the MAX value at the root A, would result in a maximum value of 13. This is the result because 13 is the maximum of the values of A’s successors B, C and D, which are 13, 9 and 10 respectively. The values 13, 9 and 10 are the minimum values of the successors of B, C and D respectively. This example shows that the entire tree would need to be searched in order to determine the maximum value of the root. This can be improved using alpha-beta pruning. Using alpha-beta pruning can reduce the need to examine every node in the tree to determine the next best move based on the minimax algorithm. For example in figure 1 alpha beta pruning can be applied to the minimax algorithm to reduce the amount of nodes processed. Consider the following set of equations: MINIMAX(root) = max(min(13,15,14), min(34,9,45), min(10,22,23,)) =max(13,9,(10,x,y)) =max(13,9,z) where z = min(10,x,y) <=10 Max =13 Utilizing the alpha beta function has therefore resulted in only 1 of Ds successors from being generated. This is due to the minimum value of A being greater than the utility value of the first successor of D. This therefore nullifies the need to continue generating Ds successors because the minimum value will be <=10 which will always be < 13. This may however prove to be too time consuming because of the depth of the tree. If that proves to be the case, then the method explained in the book for imperfect real-time decisions will be utilized. In short, this allows the algorithm to make quicker decisions by giving leaf nodes at a certain depth a terminal state with a value decide on by a heuristic. Reinforcement Learning Passive reinforcement learning will be used to learn the utility functions that should be applied to moves. This will help the adversarial search become more accurate at choosing the proper plan. It will also help significantly if (when) imperfect real time decisions must be made. The reinforcement learning will help create a good heuristic function for the terminal states. 5 System Architecture 5.1 Input System Purpose This is the eye of the AI. It acts as the sensory node because the board state can be read in from the server, as well as the actions of the opponent. The random bot provides a base for this. However, an internal representation of the world will have to be made. This representation will be used for making goals and tracking the overall state of the game. Development and Testing The interface for the data structure will be agreed upon at the beginning of development, this will allow other sections of the code to be developed without the data structure. One individual will work it on until it has been completed, and tested thoroughly. It shall be tested using unit tests of data structure, these will ensure that the data passed into it is being properly organized, and it will prevent future changes from breaking the current build. 5.2 Learning System Purpose The learning system looks at what the goals were from the last plan, the actions that were taken to carry those goals out, and the current state, to see if the proposed actions carried out the plan. It should reinforce good behaviors that lead the program closer to the goal and should negatively influence goals and behaviors that lead the program farther away from its goals. Development and Testing The selection of metrics by which the program can measure its progress is a very important part of developing a good learning system. Thus an abundance of features will be selected at first. The system will be run nightly against other players and it’s performance will be judged on how well it’s learning from those interactions, as well as how well it’s actually playing. This will be the last piece of the code to be developed. 5.3 Goal System Purpose The goal system interfaces with both the input and output systems. It uses the data structure the input system provides as the start point from which it plans. Once it has created a plan that will maximize it’s utility function, it sends the plan to the output system, which generates the proper actions to carry this plan out. Development and Testing The utility function that the goal system will attempt to maximize shall be agreed on before the development of the goal system. Testing of the goal system will occur by comparing the actual maximum value of the utility function compared to what the goal system is selecting. For trivial cases this can be checked by a unit test. Also by playing against an opponent that performs the same moves every time / uses the same strategy every time, the utility function maximum can be compared to previous times. 5.4 Output System Purpose This is the action center of the AI. It translates the process that the goal planning section comes up with into movements of the piece. Development and Testing This will be tested by using unit tests to make sure the actions sent to it are being properly carried out, and that if the action is not properly carried out it is notifying the goal system. The API will be decided upon before hand so that other systems can be developed independently. 6 Evaluation Plans Possible evaluation metrics include the degree of grouping, efficiency measured as the number of jumps, and the number of moves needed to reach the goal state. A “toy” task such as if the AI can get all of the pieces to the other side of the board when there is no opponent is an important test. The “hard” test will be to play against human players and see how effective it is.