Download I. Geometric Crossover

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Holliday junction wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Transcript
EuroGP 2006
Geometric Crossover for
Biological Sequences
Alberto Moraglio, Riccardo Poli
& Rolv Seehuus
Contents
I.
Geometric Crossover
II.
Geometric Crossover for Sequences
III. Is Biological Recombination Geometric?
I. Geometric Crossover
Geometric Crossover
• Representation-independent generalization
of traditional crossover
• Informally: all offspring are between parents
• Search space: all offspring are on shortest
paths connecting parents
Geometric Crossover & Distance
• Search Space is a Metric Space: d(A,B)
=length of shortest paths between A and B
• Metric space: all offspring C are in the
segment between parents
• C in [A,B]d  d(A,C)+d(C,B)=d(A,B)
Example1: Traditional Crossover
• Traditional Crossover is Geometric
Crossover under Hamming Distance
Parent1:
011|101
Parent2:
010|111
Child:
011|111
HD(P1,C)+HD(C,P2)=HD(P1,P2)
1
+
1
=
2
Example2: Blending Crossover
• Blending Crossover for real vectors is
geometric under Euclidean Distance
P2
C
P1
ED(P1,C)+ED(C,P2)=ED(P1,P2)
Many Recombinations are Geometric
• Traditional Crossover for multary strings
• Box and Discrete recombinations for real
vectors
• PMX, Cycle and Order Crossovers for
permutations
• Homologous Crossover for GP trees
• Ask me for more examples over a coffee!
Being geometric crossover is
important because….
• We know how the search space is going to
be searched by geometric crossover for
any representation: convex search
• We know a rule-of-thumb on what type of
landscapes geometric crossover will
perform well: “smooth” landscape
• This is just a beginning of general theory,
in the future we will know more!
II. Geometric Crossover for
Sequences
Sequences & Edit Distance
• Sequence: variable-length string of character
from an alphabet A
• Edit distance: minimum number of edit
operations – insertion, deletion, substitution – to
transform one sequence into the other
• A = {a,c,t,g}, seq1 = agcacaca, seq2 = acacacta
• Seq1=agcacaca  acacacta  acacacta=Seq2
• ED(Seq1,Seq2)=2 (g deleted, t inserted)
Sequence Alignment (on contents)
• Alignment: put spaces (-) in both sequences
such as they become of the same length
Seq1’= agcacac-a
Seq2’= a-cacacta
• Alignment Score: number of mismatches = 2
• Optimal alignment: minimal score alignment
(Best Inexact Alignment on Contents)
• The score of the optimal alignment of two
sequences equals their edit distance:
ED(Seq1,Seq2)=Score(A)=2
Homologous Crossover
1. Align optimally two parent sequences
2. Generate randomly a crossover mask as long
as the alignment
3. Recombine as traditional crossover
4. Remove dashes from offspring
Mask =
Seq1’=
Seq2’=
SeqC’=
SeqC =
111111000
agcacac-a
a-cacacta
a-cacac-a
acacaca
Theorem: Geometricity of HC
• Homologous Crossover is geometric crossover
under edit distance
Seq1=agcacaca  SeqC=acacaca acacacta=Seq2
ED(Seq1,SeqC)+ED(SeqC,Seq2)=ED(Seq1,Seq2)
1
+
1
=
2
More theory on HC in the paper
• Extension to weighted edit distances
Extension to block ins/del edit distances
• Peculiarity of metric segments in edit
distance spaces
• Bounds on offspring size due to parents
size
III. Is Biological
Recombination Geometric?
Recombination at a molecular level
• DNA strands align on the contents, no
positionally
• DNA are flexible, can be stretched or folded to
align better to each others
• DNA strands do not need to be aligned at the
extremities
• Some pair matching are preferred to others
• DNA strands can form loops
• Crossover points happen to be where DNA
strands align better
• Not all details worked out yet!
Homologous Crossover as
a Model of Biological Recombination
Homologous Crossover Biological Recombination
•Alignment on Contents @
minimum distance
•Ins/del move
•Replacement move
•Weighted move
•Block ins/del move
•Transpositions/reversals
•Alignments on contents @
minimum free energy
•Frame-shift (one base gap)
•Base mismatch
•Allows to specify preferred
matching (a-t preferred to a-g)
•Allows to specify preference for
loops, folds, bigger gaps
• Subsequence transp./reversal
Many possible variants of edit distance that fit many
real requirements of biological recombination
“Minimum Free Energy” & Edit Distance
DNA strands align optimally according to edit
distance because:
(i) The alignment of two DNA strands
(macromolecules) obeys chemistry: it is the state
at “minimum free energy”
(ii) The weights of the edit moves can be
interpreted as repulsion forces at a single basis
level
(iii) The best alignment on edit distance is the best
trade-off for which the global effect of repulsion
forces is minimized: the “minimum free energy”
alignment
Is Biological Recombination
Geometric? Yes?!
So what?
Bridging Natural and Artificial Evolution
• Bridging Natural and Artificial Evolution
into a common theoretical framework
• Change in perspective: this allows to study
real biological evolution as a
computational process
• In the paper: we use geometric arguments
to claim that biological evolution does
efficient adaptation!
Summary
• Geometric crossover
– Geometric crossover: offspring between parents
– Many recombinations are geometric
– Some general theory for geometric crossover
• Homologous crossover
– Homologous crossover for sequences: alignment on contents before
recombination
– Homologous crossover is geometric under edit distance
• Biological Recombination
– Homologous crossover models biological recombination at DNA
level, so it is geometric
– Geometric theory applies to biological recombination, bridging
biological & artificial evolution
Questions?