Download Project Overview

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Class Project 510
Team Members
John A. Watne
Jordan D. Howe
Ian R. Erlanson
Geoffrey A. Reglos
Sengdara Phetsomphou
1
Project Overview
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
Problem Description
Requirements Analysis
Technology
Settings and System Design
Algorithm
Graphical User Interface (GUI)
Lesson Learned
Future Enhancement
2
Problem Description
• In this project, we are attempting to design a Genetic Programming
system that will produce a pre-defined mathematical equation
equivalent to (y = (x² + 1) / 2),
• derived from training data consisting of several values for x and the
resulting values for y.
• Analogous to DNA evolution, this program will display
characteristics, such as crossover and mutation.
• A key component of the system
– is a fitness and selection function
– that will decide if the generated solution meets minimum requirements.
• We expect that each subsequent generation of solutions will be
“better” – that is, better reproduce the training data – than the
previous generation, thus eventually resulting in a correct
mathematical equation.
• An output is expected to be generated within a fifteen minute time
frame.
3
Requirement Analysis
–
–
–
–
Given training data, consisting of a set of ten
positive x values and the matching y values,
the genetic programming system will generate a
function that closely matches the pre-defined
mathematical function, y = (x² +1)/2.
The resulting function must be generated by the
genetic programming system within the allotted
fifteen minutes.
The expected output of the system will consist of
•
•
•
the function,
total elapsed time and
any pertinent information related to the resulting function,
such as the number of generations evolved, function,
fitness value, etc.
4
Requirement Definition - Continued
• If the genetic programming system fails to
produce a function within an acceptable
tolerance level in the fifteen minute time frame,
then terminate execution
• Output the best function along with its
associated fitness value upon termination of the
Genetic Programming generation and testing
loop, whether due to finding a solution within the
desired tolerance, or due to the allocated time
expiring
• The genetic programming system must run on
PCs available in the classroom.
5
Requirement Analysis – cont.
Finite State Machine
6
Unified Modeling Language
Timer
GPTester
-startTime : long
-currTime : long
-elapsedTime : long
-cutOffTime : long
+setCutoffTime(minutes : long) : void
+minutesElapsed() : long
+start() : void
+timeExpired() : boolean
-tolerance : double
-TheTimer : Timer
+readTrainingData() : TrainingData[]
+withinTolerance(inp : gpNode) : boolean
+printGenerationResults() : String
GPGeneration
GPRandomNumerGenerator
+initialize() : double
+getNumber() : Double
-nodeSet[] : GPNode
-totalFit : double
-bestNode : GPNode
-bestFit : double
-numberInGeneration : int
-averageFit : double
-crossoverProbability : double
-mutateProbability : double
-newEntrantProbability : double
-maxNumberInGeneration : double
+addNodeToGeneration() : void
+chooseNode(inp : GPNode, GPNode) : GPNode[]
+doCrossover(inp : GPNode, GPNode) : GPNode[]
+setMaxNumberInGeneration(inp : int) : void
+setTotalFit() : double
+getBestNode() : GPNode
+getTotalNode() : GPNode
+getAverageFit() : double
+setProbabilities(inp : double, double, double) : void
GPNode
TrainingData
-x : double
-y : double
+setX() : void
+setY(inp : double, double) : void
+getX() : double
+getY() : double
-leftOperand : GPNode
-rightOperand : GPNode
-label : char
-level : int
-nodeType : int
-parent : GPNode
+getLevel() : int
+toString() : String
+stringToCharStack() : stack
+evaluate(inp : double) : double
+getPrecedence(inp : char) : int
+doMutate() : void
+clone() : GPNode
+getFit(inp : TrainingData[]) : double
7
Technology
Programming Language Used
– Sun Java 1.4
– Development Environments
•
•
•
•
NetBeans
Eclipse
EditPlus
DOS Prompt
– Drawing UML
• Microsoft Visio
8
Why Java?
• There were a number of programming languages for our
use in this project, such as C or C++.
• Java was chosen as the programming language of
choice for a number of reasons. When we were
evaluating the technical skills of each team member,
Java was the language with the greatest familiarity in the
group. Also,
• Java is free to download and use.
• The construction of the GP Programs from individual
nodes lends itself to an object-oriented methodology,
and Java is an object-oriented programming language.
• Ease of implementation was another consideration since
we are not familiar with the classroom where the
presentation will take place.
9
Settings & System Design
–
Using an object-oriented system design that reflects the UML shown
in the Requirements Analysis section,
each class will be implemented by a separate java .class file.
All .class files needed by the genetic programming system will be
stored in the same directory on the PC on which the program is run.
For the first version of the program,
–
–
–
•
•
–
all inputs will be hard coded within the java source code, and
the output will be written to the standard output when executed from a
command prompt.
Future iterations of the program, to be implemented as time allows,
will allow to being a future enhancement,
and perhaps allowing user-chosen values for numeric constraints
–
•
•
such as probabilities of each genetic programming operation,
5 is a maximum depth of program trees, and maximum time allowed to
reach a solution.
10
Settings & System Design – cont.
– Random Number Generator
– Function and Terminal Set
– Data Structure Used
– Binary Tree
– Stack
– Tree Structure Execution and Memory
– Initializing GP Population
11
Genetic Operators
• Tree-Based Crossover
– Consists of choosing two individuals as parents, selecting a
random subtree in each parent and then swapping the selected
subtrees between the two parents
– The probability of crossover in our GP Project has been set at
80%
• Mutation
– Mutation operates on only one individual tree. When a tree has
been selected for mutation, there is an operation to randomly
select a point in the tree and replace the existing subtree with a
newly randomly generated subtree. The newly generated
subtree is created in the same way within the same limitations as
the existing tree. The probability of mutation in our GP Project
has been set at 10%.
12
Genetic Operators – cont.
• The probability of cloning in our GP
Project has been set at 15%.
– This consists of selecting a program tree from
the parent generation and copying it unaltered
to the child generation.
• New Entrant
– The new entrant involves the creation of a new tree to
become part of the generation. This new entrant has
not been spawned from prior generations.
– The probability of an individual program being a new
entrant in our GP Project has been set at 5%.
13
Algorithms
by
John A. Watne
14
Algorithms
• Fitness and Selection
– Fitness: sum of squared errors; targeted fitness value
= zero.
– p(i) = (1 / (n-1)) * [1 - (Fit(i) / Sum Fit(i))] for n > 1;
100% otherwise
– Any GP programs with division by zero errors for any
x value in the training data are determined to be
"Dead On Arrival", and are not allowed to reproduce
or count toward the total and average fitness values
for the generation.
• Method of Tree Traversal
– We implanted a post-order method for tree traversal.
15
Algorithms - continued
• Sorting
– After a new generation of GP programs has been created and
each one evaluated, they could be sorted in ascending order of
fitness.
– This would ease the selection of valid functions into the
subsequent generation because the possible solution would be
towards the front of the array. We chose not to use any sorting in
any part of the GP Project for a number of reasons.
– One reason is that we were concerned about the fifteen minute
time limit.
– Also, we chose to simplify the design to meet the deadline of the
project. We are also attempting to implement a GUI and we were
concerned that this logic would consume much needed
processing time from the CPU.
– We have considered adding sorting by fitness value as a future
enhancement.
16
Algorithms – continued.
Key Correction to Algorithm:
•
Issue: When reviewing the graph of best fit and
average fit of each succeeding generation, the values
were swinging up and down, rather than being
continuously non-increasing (that is, never increasing;
always decreasing or remaining level).
•
Resolution: Thus, rather than just cloning randomly
selected individuals from the prior generation, make
sure that the best program from the prior generation
survives unchanged as the first program added to the
new generation. This guarantees that the best fit for a
program in the new generation can be no worse than
the best fit from its previous (parent) generation
17
Best Fit of GP Program by Generation - continued
Before Fix:
18
Best Fit GP Program by Generation
After Fix:
19
Graphic User Interface
by
Ian R. Erlanson
20
Output Screen
i.Current Generation
21
SUMMARY
and Future Enhancement
BY
Geoffrey A. Reglos
22
Lessons Learned
Jordan Howe:
•
Individual
I got good practice at reading and working with other people’s code and writing
code that conformed to project specifications.
•
Group
I think we have worked together well in terms of figuring out who’s good at what,
and dividing up tasks accordingly.
•
Technical
I hadn’t known about representing arithmetic expressions in trees, and using
postfix order to parse them.
Sengdara Phetsomphou:
•
I personally have learned an essential step in the development of a computer
program especially when John and others start with a simple solution, then seek to
understand that solution’s performance characteristics, which I feel that it helps me
to see how to develop the computational procedure for solving a
problem. Although I have not fully apprehended the significance of GP program
generation in depth, I think I learn from rest of the team members especially on
our general approach to developing algorithmic solutions for this project. Finally, I
learn how to use new tool in the Microsoft Visio software.
23
Lesson Learned -continue
–
•
Geoffrey Reglos:
•
I underestimated the work involved with documentation.
Thus, I learned about the need for the documenter to work
more closely with the developer to understand the details
of the program(s).
•
I learned to work with a group of people in a short term
project. We were able work within each individual’s
strengths and weaknesses to accomplish a goal of
successfully completing the project in a timely manner. The
important characteristics of working with this group were
communication and trust of some degree.
Although I am able to read and comprehend the code, I feel that I
need more exposure to Java to be more involved in future
projects.
24
Lesson Learned -continued
•
•
John Watne:
I think the main thing I learned on this project was how essential it
is to make sure the best fit GP program of generation N survives
to generation (N+1) unmodified, to dramatically improve the
performance of the GP algorithm in finding an equivalent
program. This ensures that the fitness value of the best fit
function for each generation is nonincreasing with elapsed time
and generations; that is, it never gets worse with a new generation
-- it stays the same or improves.
In addition:
• How to use trees to represent equations.
• How to use postfix notation, and its value in simplifying the
coding of an equation using a tree.
• The use of probability of survival, so common to actuarial
work, applied to the creation of new software by software.
25
Proposed Enhancements, Possible Future Work and
Influences
•
•
•
Implement sorting in ascending order for the functions
in a generation. This will ensure that the function with
the best fitness value is at the top.
Implement more flexibility of the input of training data.
Currently, the training data is hardcoded. We would
like to have a GUI which will offer the user a number of
choices in how to accept training data in different
formats. This would also involve adding more logic to
parse and format the data into an acceptable form for
use by the GP program.
Use Ant to simplify the task of managing the build of
the project.
26
Annotated Bibliography and copies
•
•
•
•
•
•
•
•
•
•
•
http://java.sun.com/j2se/1.4.2/docs/api/java/util/Random.html
Visual Materials
** Graphs will be included where necessary **
User Manual
Notes:
These instructions are specified for a Windows environment. If using in
an environment other than Windows, please consult appropriate
operating system documentation.
The CLASSPATH and PATH variables of the Windows environment may
need to be adjusted, so that the Java environment knows where the
necessary files are located.
Setup
Download the appropriate Java 2 Platform, which can be found in
http://java.sun.com/downloads/index.html
Install the Java 2 Platform, according to the Java documentation provided
Download the GP files into a single directory.
27
Using GPTester
• Start a DOS command prompt.
– Start  Run  Type cmd.
• or
– Windows XP: Start  All Programs 
Accessories  Command Prompt.
• or
– Windows 2000/NT/Me/95: Start  Programs
 Accessories  Command Prompt.
28
At Dos Prompt
29
How to?
•
•
Change into the directory of the GP files
(from Step 3 in the Setup section) by
using the DOS command cd to move
one file forward and cd .. to move one
file backward.
To compile the java files, type: javac
*.java
30
Output
31
To run the GPTester
• Type: java GPTester
32
Q&A
QUESTION????
33