Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Evolving L-Systems to Capture Protein Structure Native Conformations Gabi Escuela1, Gabriela Ochoa2 and Natalio Krasnogor3 1,2 Department of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela [email protected], [email protected] 3 School of Computer Science and I.T., University of Nottingham [email protected] Content Proteins Protein Structure Prediction (PSP) The HP model EA approaches to PSP: current encoding L-Systems Why a grammatical encoding? Methods and Results Discussion and Future Work 3D structure of myoglobin, showing coloured alpha helices. Proteins • Linear chains of ~30-400 units from 20 different amino acids • Fold into a unique functional structure: native state or tertiary structure Show repeated substructures: alpha helices and beta sheets 1A8M 3-D Structure Protein Structure Prediction (PSP) Goal: Determining the 3D structure of proteins from their amino acid sequences Strategy: find an amino acid chain's state of minimum energy Solution will have practical consequences in medicine, drug development and agriculture The 2D HP Model Hydrophobic effect is the main force governing folding q Є{H, P}+, each letter of q has to be put in vertex of a given lattice L (at each point: turn 90º Left or Right, or continue ahead) Scoring function: adds -1 for each “contact” between two Hs adjacent in the lattice that are not consecutive in q 2 Amino acids types: hydrophobic (H) and polar or hydrophilic (P) HPHPPHHPHPPHPHHPPHPH Square Lattice 9 H-H bonds Score = -9 Objective: Find the organization (embedding) of q in L of minimum score (maximum contacts) EA approaches to PSP: Current (Direct) Encoding EAs and other stochastic methods: global optimization of a suitable energy function Encoding: Cartesian Coordinates, Distance Geometries, Internal Coordinates Absolute: structure encoded as a string of symbols. For example: In the 2D Square s = {Up, Down, Left, Right}+ Relative: each move is interpreted in terms of the previous one s = {Forward, TurnLeft, TurnRight} + Protein : HPHPPHHPHPPHPHHPPHPH L =20 Absolute Encoding R D L D RDDLULDLDLUURULURRD L = 19 First position is fixed Relative Encoding R R R F RFRRLLRLRRFRLLRRFR First and second position are fixed L = 18 L-Systems (Lindenmayer, 1968) A model of morphogenesis, based on formal grammars Rewriting: Define complex objects by replacing parts of a simple object using a set of productions. Symbols: F, f, +, -, [, ] Axiom (S) Production (replacement) rules r 1: F r 2: f S: F start F+f F F 1 F+f 2 F+f+F 3 F+f+F+F+f Why a Grammatical Encoding? Specifies how to construct the phenotype Can achieve greater scalability through self-similar and hierarchical structure Proteins exhibit high degree of regularity, and repeated motifs Current encoding may not be suitable for crossover and building block transfer between individuals 3D L-System Protein Structure Method Prove of principle: Can a folded protein be captured (encoded) by an L-system? How to find that L-system: An EA used to evolve an L-system that capture a folded protein (inverse problem) Input: Folded structure in Relative Coordinates RFRRLLRLRRFRLLRRFR EA Output: L-system L that once derived, will produce the target string RFRRLLRLRRFRLLRRFR Axiom = 01F Rules = {0:RFR1, 1:2L2, 2:R0L} Proposed Grammatical Encoding D0L-system (deterministic and context free): Alphabet: =t nt t={F,L,R} terminal symbols (relative coord.) nt={0,1,2,...,m-1} non-terminal symbols (rewriting rules), m = max. number of rules Axiom: α * Rewriting rules: i: wi , where i nt and wi * Example axiom R2 rules 0:R03F; 1:R01L; 2:F310; 3:LRL3 Evolutionary Algorithm Generational with rank based selection Randomly generated initial population Prefixed maximum number of rules Axiom and Rules: randomly generated strings of prefixed maximum length Genetic operators Uniform-like (homologous) recombination (rate = 1.0) complete production rules are interchanged Per symbol mutation in both axioms and rules (deletion (30%), insertion (10%), modification(60%)) Derivation, and Fitness Function Axiom = 31 genotype Rules ={0:3LL2; 1:R0RL; 2:RRF; 3:RFR1} Derivation: from genotype (axiom and rules) to phenotype (folded structure) Post-processing: nonterminal symbols pruning Fitness calculation: number of matches between the target string and the solution Min. = 0, Max = length of the desired folding. axiom 31 1st step RFR1 3 RFR R0RL 1 R0RL 1 R 3LL2 RL 0 RFRR 3LL2 RL R RFR1 LL RRF RL 3 0 2 RFRRLLRLRRFRLLRRFR 2nd step 3th step post-processing phenotype fitness= 18 Results (1) Instance Length Successes One Solution HPHPPHHPHPPHPHHPPHPH RFRRLLRLRRFRLLRRFR 18 5/50 (4 R) A = 31 R = {0:3LL2, 1:R0RL, 2:RRF, 3:RFR1} HHHPPHPHPHPPHPHPHPPH RRFRFRLFRRFLRLRFRR 18 3/50 (4 R) A = R2 R = {0:RLR, 1:3F32L, 2:1FR33,3:R102} HHPPHPPHPPHPPHPPHPPHPPHH RLLFLFFRRFLLFRRLRFFRRF 22 PPHPPHHPPPPHHPPPPHHPPPPHH FFRRFFFLLFFFFRRFFFFLLFF 23 0/50 (4 R) 1/50 (5 R) 1/50 (5 R) A = 1R R = { 0:4LF3,1:RL243, 2:00F3, 3:RRFL, 4:0R14F} A= 32 R ={0:20R2, 1:132F, 2:FF012, 3:0FLL} Results (2) Evolutionary progression towards the target structure Discussion The proposed EA discovered L-systems that capture a target folding under the HP model in 2D lattices We are not solving the PSP yet, but .. We are proposing a novel and potentially useful, generative encoding for evolutionary approaches to PSP Future work Incorporate problem knowledge about secondary structures Alpha Helix Beta Sheet Explore longer chains and 3D lattices Beta Turn