Download Genetic algorithm

Document related concepts

Mutation wikipedia , lookup

Genetic engineering wikipedia , lookup

Koinophilia wikipedia , lookup

Frameshift mutation wikipedia , lookup

Epistasis wikipedia , lookup

Heritability of IQ wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Genetic testing wikipedia , lookup

Point mutation wikipedia , lookup

Human genetic variation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Group selection wikipedia , lookup

Genetic drift wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Genetic algorithm
Definition
• The genetic algorithm is a probabilistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character strings),
each with an associated fitness value, into a new
population of offspring objects using the
Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation
Genetic Algorithms - History
•
•
•
•
Pioneered by John Holland in the 1970’s
Got popular in the late 1980’s
Based on ideas from Darwinian Evolution
Can be used to solve a variety of problems
that are not easy to solve using other
techniques
Finding a solution of a problem is
often thought
• In computer science - is a process of search
through the space of possible solutions.
Partial solutions are viewed as a point in the
search space
• In Engineering & Mathematics- The problems
are first formulated as mathematical models.
Set the parameters that gives the best
solution
Why genetic Algorithm
• Genetic algorithm can be used to solve problems
that are not well suited for standard optimization
algorithms, including problems in which the
objective function is discontinuous, non
differentiable, stochastic, or highly nonlinear.
Classical derivative based optimization
Genetic algorithm
Generates a single point at each
iteration. The sequence of points
approaches an optimal solution
Generates a population of points at each
iteration. The best point in the
population approaches an optimal
solution
Selects the next point in the sequence by Selects the next population by
a deterministic computation.
computation which uses random number
generators
Optimization
• Optimization: process of finding an optimal solution
(maximum/ minimum) satisfying the constraints
• It focuses on 3 factors
• 1) objective function : function which is to be
maximized or minimized (Example : maximize the
profit and minimize the cost in the case of
manufacturing)
• 2) A set of unknowns or variables ( the amount of
resources used/ time spent etc
• 3) A set of constrains ( availability of space, money
etc)
Our Main concern here is to
1) How to describe the process of search
2) how to implement and carry out search
3)What are the elements required to carry out
search
Genetic Algorithm
Basic genetics
• All living organism consists of cells
• Each cell of a living thing contains chromosomes - strings of
DNA
• Each chromosome contains a set of genes - blocks of DNA
• Each gene determines some aspect of the organism (like eye
colour)
• A collection of genes is sometimes called a genotype
• A collection of aspects (like eye colour) is sometimes called a
phenotype
Basic genetics
General scheme of Evolutionary
process
Terminology
Working principle
Outline of genetic algorithm
Silly Example - Drilling for Oil
• Imagine you had to drill for oil somewhere
along a single 1km desert road
• Problem: choose the best place on the road
that produces the most oil per day
• We could represent each solution as a
position on the road say, a whole number
between [0..1000]
Where to drill for oil?
Solution1 = 300
Solution2 = 900
Road
0
500
1000
Digging for Oil
• The set of all possible solutions [0..1000] is
called the search space or state space
• In this case it’s just one number but it could
be many numbers or symbols
• Often GA’s code numbers in binary producing
a bit string representing a solution
• In our example we choose 10 bits which is
enough to represent 0..1000
Convert to binary string
512 256 128
64
32
16
8
4
2
1
900
1
1
1
0
0
0
0
1
0
0
300
0
1
0
0
1
0
1
1
0
0
1023
1
1
1
1
1
1
1
1
1
1
In GA’s these encoded strings are sometimes called “genotypes” or
“chromosomes” and the individual bits are sometimes called “genes”
Drilling for Oil
Solution1 = 300
(0100101100)
Solution2 = 900 (1110000100)
Road
OIL
0
1000
30
5
Location
Summary
We have seen how to:
• represent possible solutions as a number
• encoded a number into a binary string
• generate a score for each number given a function of
“how good” each solution is - this is often called a
fitness function
• Our silly oil example is really optimisation over a
function f(x) where we adapt the parameter x
Lecture 2
•
•
•
•
•
Representation
Selection (Reproduction)
Cross over
Mutation
Problem solving using GA
Representation
• Before any algorithm is put into work on any
problem, the partial solutions have to be
encoded so that a computer can process.
Chromosomes could be:
– Bit strings
(0101 ... 1100)
– Real numbers
(43.2 -33.1 ... 0.0 89.2)
– Permutations of element (E11 E3 E7 ... E1 E15)
– Lists of rules
(R1 R2 R3 ... R22 R23)
– Program elements
(genetic programming)
– ... any data structure ...
Binary encoding
• Binary representation: Here encoding is done using sequence of 1’s
and 0’s.
Example: Decoding a value
• For a string length ni the accuracy in the variable approximation is
(XUi - XLi ) / 2ni
Permutation encoding
Tree encoding
Genetic operators
• Selection ( Reproduction)
• Cross over (Recombination)
• Mutation
Selection
Fitness value F is calculated
The probability of selection of ith chromosome is done
Pi 
Fi
pop _ size
F
j
i
The cumulative frequency qi   Pj
j
j 1
 Generate a random number r from the range [o,z]
If r < q1, select the first chromosome, otherwise select
chromosome from 2 to pop_size
Different methods
•
•
•
•
Roulette wheel selection
Rank selection
Boltzman selection
Tournament selection
Example
Roulette -wheel selection
In roulette wheel selection, individuals are given a probability
of being selected that is directly proportionate to their fitness.
Populati Populati
on No
on
Fitness
Probabil Expecte
ity pi
d count
(nxpi)
Cumulat
ive
frequen
cy
Random String
number number
betwee
n 0 and
1
Count in
the
mating
pool
1
0000
1
.0429
0.33
0.0429
0.259
3
1
2
0010
2.1
.090
0.72
0.1326
0.038
1
1
3
0001
3.11
.1336
1.064
0.266
0.0486
5
1
4
0010
4.01
.1723
1.368
0.438
0.428
4
2
5
0110
4.66
.2
1.6
0.638
0.095
2
2
6
1110
1.91
.082
0.656
0.720
0.3
4
0
7
1100
1.93
.0829
0.664
0.809
0.616
5
0
8
0111
4.55
.1955
1.56
1
0.897
8
1
Problem
• Find the expected number of copies of the best
string for a maximization problem using 1)
Roulette wheel selection 2) tournament
selection
String
Fitness
01101
5
11000
2
10110
1
00111
10
10101
3
00010
100
Boltzmann Selection
Cross over
One –point cross over
Two-point cross over
Off spring 1
Offspring 2
11011 1100001 0110
11011 0010011 1110
Uniform crossover
Arithmetic crossover
Mutation
• Mutation is a genetic operator used to maintain
genetic diversity from one generation of
population of chromosome to the next.
• Various mutation operator are Boundary,
uniform, non uniform
Uniform Mutation
A gene(real number) is selected with the help of a randomly selected real number
within a specific range. For a chromosome Xt =[X1 , X2, … Xm]. A random number k is
selected such that k [1,n] and an offstring Xt+1 =[X1 ,… X’k … Xm] , where X’k is a
random value generated according to uniform probability distribution from the
range [XkL, XkU ]. Here XkL and XkU are lower and upper bounds on variable Xk
Boundary Mutation
The replacement of X’k by either XkL
boundary mutation
Non-uniform Mutation
Here X’k is selected
or
XkU each with equal probability is known as
 X k  (t , X kU  X k ) if the random digit is 0 
X k  

 X k  (t , X k  X L k ), if the random digit is 1
Where (t, y) returns a value in the range [ 0, y] such that probability of (t,y) being
close to 0 as t increases
Mutation can be implemented using 1) one’s complement operator 2) logic bitwise
operator 3) shift operator and 4) masking operator
Problem
Support Vector machine
• one of the most well studied and widely used learning
algorithms for binary classification
• Extensions of SVMs exist for a variety of other learning
problems, including regression, multiclass classification,
ordinal regression, ranking, structured prediction, and
many others.
• Similar to perceptrons they aim to find a hyper plane
that linearly separates data points belong to different
classes
• In addition SVMs aim to find the hyper plane that is
least likely to overfit the training data
Separating hyper planes
• Which one is better: B1 or B2? Why?
• Many other separating hyperplanes are possible
• Each instance in X is an n-dimensional real vector
• i.e X  Rn .
• Given a sample of m labeled examples
• Classification is done using the classifier
•
for some w Rn , bR
• Thus for X  Rn , the basic SVM algorithm selects a
classifier from the class of linear classifiers over X.
Learning linear SVM
• It is convenient to represent classes by +1 and -1 using
• y = 1; if wx+b > 0 ,
•
-1; if wx+b < 0
• w can be rescaled such that for all points x lying on the
respective boundaries it holds that wx+b = 1 or wx+b = -1
• These points are called the support vectors
•
•
•
•
•
•
•
•
The task of learning a linear SVM consists of
estimating the parameters w and b
The first criterion is that all points in the
training data must be classified correctly:
w.xi + b ≥ 1 if yi = 1
w.xi+b ≤ -1 if yi = -1
This can be re-written as:
yi(w.xi+b) ≥ 1 for 1≤ i ≤ N
Linear separable – hard margin SVM
• Although both classifier separates the data, the distance or margin with which
separation achieved is different.
• The SVM algorithm selects maximum classifier margin
• The margin on (xi yi) is simply a signed version of this distance, with
a positive sign if the example is classified correctly and negative
otherwise.
• The margin of the classifier given by (w,b) on a sample
is then defined as the mini mal margin on S:
• The margin of such a classier on S then becomes simply
• Thus maximizing the margin becomes equivalent of minimizing the
norm
subject to the constraints given in equation 5 which can
be written as following optimization problem
• i.e maximize the margin subject to the constrains that all points in the
training data must be classified correctly.
• This problem can be solved using Lagrange Multipliers