Download Work1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Group selection wikipedia , lookup

Human genome wikipedia , lookup

NUMT wikipedia , lookup

Koinophilia wikipedia , lookup

Transgenerational epigenetic inheritance wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genetic drift wikipedia , lookup

Frameshift mutation wikipedia , lookup

Genomic library wikipedia , lookup

Mutation wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Philopatry wikipedia , lookup

Genome evolution wikipedia , lookup

Epistasis wikipedia , lookup

Population genetics wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
Assignment 1
Submitters:
Arie Kozak 314346024
Amir Patoka 041857178
Question 1
Problem description
The goal is to solve Ackley Problem [ACK87], which is defined as finding the minimum
point of the following function for n = 3:
The search should be performed in (-32.768, 32.768)
Genome Representation
As defined in genetic algorithms, genome representation is simply a bit string. I simply used Java
type “String” filled with “0” or “1” for this. The genome represents the phonotype which is a vector
of 3 numbers of type double (the Xi for the function) as a chaining of the bit strings with equal
length, each such string represents a real number and calculated in following way:
X = N * (B – A) / (2 ^ L – 1) + A
Where:
X – the real number in the phenotype
N – the value of the bit string as a (natural) binary number
A - -32.768
B - 32.768
L – length of the bit string
Fitness calculation
The fitness is a real number (double in Java) and calculated in the following way:
F = 20 + e – F(x)
So that fitness will be always positive (yes, it’s not very important), and higher fitness
indicates better candidate because it assures lower value for F(x).
Process of work, conclusions and results
Results are written in the following format: first the input parameters for the algorithm,
next the plot of best and average fitness, and finally the result.
Representation length – is a number of bits for each X in the phenotype (so the length of
the genome is 3 times this number).
Pm and Pc is the probabilities as defined in the genetic algorithms.
Average fitness is the average fitness of the last generation.
The phenotype presented as vector of 3 numbers in “[]”, and finally the “f value” is the
F(best phenotype).
In the beginning I started with lower representation length – 16 seemed to be sufficient.
Pc was chosen to be 1, as experiments with different values showed, the cross-over is a
good thing, it increases diversity with little damaging effect, so that lower mutation rates
can be used, which more damaging. Population size does bring slightly better results
(when the rest of the parameters are the same), but increases run time, and better use for
the run time is increasing the number of generations instead, so 100 was sufficient value.
Mutation rates were changed from one run to another. Higher mutation rates increase
chance for “lucky” best fitness, but reduce the average fitness, so that the gap between
average fitness and best fitness increases. So I used higher mutation rates for smaller
number of generations, that showed better results because of “lucky” guesses, but not too
high – 0.01, higher than that was destructive. And for the higher number of generations,
the average fitness seems to be more important, as quality of population builds up in time
for better fitness.
First run
Representation length: 16
Pm (probability for mutation): 0.01
Pc (probability for cross-over): 1.0
Population size: 100
Number of generations: 100
25
20
15
Avergage fitness
Best Fitness
10
5
0
1
7
13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Best phenotype: [-0.013500205996798798, -0.015500236514839116, 0.027500419623102346]; Fitness: 22.61818603229005
Average fitness: 16.801908295479414
f value: 0.10009579616899389
Both average fitness and best fitness rising through the whole graph
which is good indication – better values will be found for higher
number of generations, so the parameters are not changed in the next
run.
Run 2
Representation length: 16
Pm (probability for mutation): 0.01
Pc (probability for cross-over): 1.0
Population size: 100
Number of generations: 500
25
20
15
Avergage fitness
Best Fitness
10
5
0
1
30 59 88 117 146 175 204 233 262 291 320 349 378 407 436 465 494
Best phenotype: [0.003500053406575887, -0.008500129701687342, 0.008500129701687342]; Fitness: 22.686587031235483
Average fitness: 17.10398251808778
f value: 0.031694797223562166
We are getting better results from before, so the representation length
should be increased to (20) allow better accuracy.
Run 3
Representation length: 20
Pm (probability for mutation): 0.01
Pc (probability for cross-over): 1.0
Population size: 100
Number of generations: 2000
25
20
15
Avergage fitness
Best Fitness
10
5
0
1
147 293 439 585 731 877 1023 1169 1315 1461 1607 1753 1899
Best phenotype: [0.00246875235438182, -0.002718752592805629,
0.00259375247360083]; Fitness: 22.707539978982076
Average fitness: 16.66716814436697
f value: 0.010741849476968657
The best fitness is still improved, but average fitness is declined,
the mutations effect seems to be devastating, therefore it’s reduced to
0.001.
Run 4
Representation length: 20
Pm (probability for mutation): 0.0010
Pc (probability for cross-over): 1.0
Population size: 100
Number of generations: 5000
25
20
15
Avergage fitness
Best Fitness
10
5
0
1
366 731 1096 1461 1826 2191 2556 2921 3286 3651 4016 4381 4746
Best phenotype: [0.004718754500160571, 0.0017812516987376625,
4.06250387428031E-4]; Fitness: 22.70614157387505
Average fitness: 22.037964018920682
f value: 0.012140254583993926
Average fitness is very good with lower mutation rate and very close to
the best fitness. The graph seems to be permanent and doesn’t improve
after high number of generations, just the improvements in the
beginning. May be there are possible improvements for the future, or
may be there are not. But since there is a slight improvement,
representation length is increased (to 21) again.
Run 5
Representation length: 21
Pm (probability for mutation): 0.0010
Pc (probability for cross-over): 1.0
Population size: 100
Number of generations: 10000
25
20
15
Avergage fitness
Best Fitness
10
5
0
1
896
1791 2686 3581 4476 5371 6266 7161 8056 8951 9846
Best phenotype: [-0.001328125633293098, -7.65625365083622E-4,
0.0025781262293520513]; Fitness: 22.711195137059793
Average fitness: 22.059355966696174
f value: 0.007086691399252221
The graph continues to be stable...
So this is the best phenotype found until now and is very close to the
solution which is [0, 0, 0]. Which is very good result for stochastic
algorithm.
The diversity was never a problem here (which is not very much needed
for this specific problem). The diversity graphs of the runs look like
this (this is the graph of the last generation of the last run):
Count
2.5
2
1.5
Count
1
0.5
0
0
5
10
15
20
25
The x axis represents the fitness, the y axis represents the number of phenotypes with this
fitness. There are almost no phenotypes with the same fitness, though there are many
with close fitness. And different fitness means different phenotype (and genotype since
the mapping between them is 1:1) with high probability.
Source code
AckleyProblem.java
Question 2
The following may help to unstuck:
- Increasing mutation rate.
- Increasing cross-over rate / more diversity with cross-over.
- Just waiting more time for more generations which might bring a “jump” to the
next fitness level.
- Using different selection technique. For example the fitness-proportionate
selection is known to cause divergence of population and eliminating diversity.
There are some ways to counter that (not learned yet).
- Increasing population size, too small population has limited diversity and can
cause to be stuck at local maximum.
- Introducing new (possibly random) phenotypes occasionally to the population.
Question 3
Problem Description:
Implement a genetic algorithm to solve a Maximum Clique problem, finding the
maximum (vertex wise) sub-graph which is a complete graph.
Process of Work:
- Genome representation:
We chose to represent our genomes with an array containing boolean values. Array[i]
== true means that the I'th vertex in the source graph is included in the clique the
genome represents.
- Fitness:
The fitness is the amount of flagged values in the genome. The flagged cells in it
participate in the sub-graph. Since the genome is a clique the fitness directly
represents a clique size.
- Genetic Operators:
o Selection:
We chose to use fitness-proportionate selection with roulette-wheel sampling.
o Cross-Over:
We used a one point cross-over, with crossover rate of 0.7. There were two crossover we tried. The first only tied to add vertex to the genome after the cross-over
point (the purpose was to find a less distractive cross-over). The second removed
the original vertex after the cross-over point in the child and then tried to add vertex
to the genome from the other child after the cross-over point.
o Mutation:
We used mutation rate of 0.001, 0.01, 0.1. The mutation flipped vertex out of the
genome, and tried to flipped vertex also in to the genome if it created a legitimate
clique.
Running Results:
- hamming6-2.clq:
best result was 32.
clique=1,4,6,7,10,11,13,16,18,19,21,24,25,28,30,31,34,35,37,40,41,44,46,47,49,52,54,55
,58,59,61,64
clique=1,4,6,7,10,11,13,16,18,19,21,24,25,28,30,31,34,35,37,40,41,44,46,47,49,52,
54,55,58,59,61,64
clique=1,4,6,7,10,11,13,16,18,19,21,24,25,28,30,31,34,35,37,40,41,44,46,47,49,52,
54,55,58,59,61,64
- c-fat500-1.clq:
best result was 14.
Clique=11,12,91,92,171,172,251,252,331,332,411,412,491,492
Clique=12,13,92,93,172,173,252,253,332,333,412,413,492,493
Clique=12,13,92,93,172,173,252,253,332,333,412,413,492,493
- p_hat500-1.clq:
best result was 9.
Clique=47,69,71,107,148,242,266,279,408
Clique=47,69,71,148,242,248,266,279,412
Cli
que=47,69,71,148,248,266,279,412,489
Conclusions:
All in all most of the time the better we emphasized exploration over exploitation, the
most obvious one is that in both the first and second graphs better results were achieved
through the use of 0.01 mutation and in the third through the use of 0.1 mutation. Further
more the better cross-over was the more destructive one, which further tends to be more
explorative than the other cross-over used.
The most apparent reason for this is the fact that there is no connection between close
values in the genotype string like in the phenotype itself, the vertex that participate in the
clique don't have to group together in their indexed order.
Usage (needs JFreeChart):
javaw -jar ecal_1_4.jar <clq file> <mutation probability> <cross-over probability>