Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EuroGP 2006 Geometric Crossover for Biological Sequences Alberto Moraglio, Riccardo Poli & Rolv Seehuus Contents I. Geometric Crossover II. Geometric Crossover for Sequences III. Is Biological Recombination Geometric? I. Geometric Crossover Geometric Crossover • Representation-independent generalization of traditional crossover • Informally: all offspring are between parents • Search space: all offspring are on shortest paths connecting parents Geometric Crossover & Distance • Search Space is a Metric Space: d(A,B) =length of shortest paths between A and B • Metric space: all offspring C are in the segment between parents • C in [A,B]d d(A,C)+d(C,B)=d(A,B) Example1: Traditional Crossover • Traditional Crossover is Geometric Crossover under Hamming Distance Parent1: 011|101 Parent2: 010|111 Child: 011|111 HD(P1,C)+HD(C,P2)=HD(P1,P2) 1 + 1 = 2 Example2: Blending Crossover • Blending Crossover for real vectors is geometric under Euclidean Distance P2 C P1 ED(P1,C)+ED(C,P2)=ED(P1,P2) Many Recombinations are Geometric • Traditional Crossover for multary strings • Box and Discrete recombinations for real vectors • PMX, Cycle and Order Crossovers for permutations • Homologous Crossover for GP trees • Ask me for more examples over a coffee! Being geometric crossover is important because…. • We know how the search space is going to be searched by geometric crossover for any representation: convex search • We know a rule-of-thumb on what type of landscapes geometric crossover will perform well: “smooth” landscape • This is just a beginning of general theory, in the future we will know more! II. Geometric Crossover for Sequences Sequences & Edit Distance • Sequence: variable-length string of character from an alphabet A • Edit distance: minimum number of edit operations – insertion, deletion, substitution – to transform one sequence into the other • A = {a,c,t,g}, seq1 = agcacaca, seq2 = acacacta • Seq1=agcacaca acacacta acacacta=Seq2 • ED(Seq1,Seq2)=2 (g deleted, t inserted) Sequence Alignment (on contents) • Alignment: put spaces (-) in both sequences such as they become of the same length Seq1’= agcacac-a Seq2’= a-cacacta • Alignment Score: number of mismatches = 2 • Optimal alignment: minimal score alignment (Best Inexact Alignment on Contents) • The score of the optimal alignment of two sequences equals their edit distance: ED(Seq1,Seq2)=Score(A)=2 Homologous Crossover 1. Align optimally two parent sequences 2. Generate randomly a crossover mask as long as the alignment 3. Recombine as traditional crossover 4. Remove dashes from offspring Mask = Seq1’= Seq2’= SeqC’= SeqC = 111111000 agcacac-a a-cacacta a-cacac-a acacaca Theorem: Geometricity of HC • Homologous Crossover is geometric crossover under edit distance Seq1=agcacaca SeqC=acacaca acacacta=Seq2 ED(Seq1,SeqC)+ED(SeqC,Seq2)=ED(Seq1,Seq2) 1 + 1 = 2 More theory on HC in the paper • Extension to weighted edit distances Extension to block ins/del edit distances • Peculiarity of metric segments in edit distance spaces • Bounds on offspring size due to parents size III. Is Biological Recombination Geometric? Recombination at a molecular level • DNA strands align on the contents, no positionally • DNA are flexible, can be stretched or folded to align better to each others • DNA strands do not need to be aligned at the extremities • Some pair matching are preferred to others • DNA strands can form loops • Crossover points happen to be where DNA strands align better • Not all details worked out yet! Homologous Crossover as a Model of Biological Recombination Homologous Crossover Biological Recombination •Alignment on Contents @ minimum distance •Ins/del move •Replacement move •Weighted move •Block ins/del move •Transpositions/reversals •Alignments on contents @ minimum free energy •Frame-shift (one base gap) •Base mismatch •Allows to specify preferred matching (a-t preferred to a-g) •Allows to specify preference for loops, folds, bigger gaps • Subsequence transp./reversal Many possible variants of edit distance that fit many real requirements of biological recombination “Minimum Free Energy” & Edit Distance DNA strands align optimally according to edit distance because: (i) The alignment of two DNA strands (macromolecules) obeys chemistry: it is the state at “minimum free energy” (ii) The weights of the edit moves can be interpreted as repulsion forces at a single basis level (iii) The best alignment on edit distance is the best trade-off for which the global effect of repulsion forces is minimized: the “minimum free energy” alignment Is Biological Recombination Geometric? Yes?! So what? Bridging Natural and Artificial Evolution • Bridging Natural and Artificial Evolution into a common theoretical framework • Change in perspective: this allows to study real biological evolution as a computational process • In the paper: we use geometric arguments to claim that biological evolution does efficient adaptation! Summary • Geometric crossover – Geometric crossover: offspring between parents – Many recombinations are geometric – Some general theory for geometric crossover • Homologous crossover – Homologous crossover for sequences: alignment on contents before recombination – Homologous crossover is geometric under edit distance • Biological Recombination – Homologous crossover models biological recombination at DNA level, so it is geometric – Geometric theory applies to biological recombination, bridging biological & artificial evolution Questions?