Download LDC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Evolving L-Systems to
Capture Protein Structure
Native Conformations
Gabi Escuela1, Gabriela Ochoa2 and Natalio Krasnogor3
1,2 Department
of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela
[email protected], [email protected]
3 School
of Computer Science and I.T., University of Nottingham
[email protected]
Content








Proteins
Protein Structure Prediction (PSP)
The HP model
EA approaches to PSP: current
encoding
L-Systems
Why a grammatical encoding?
Methods and Results
Discussion and Future Work
3D structure of
myoglobin, showing
coloured alpha helices.
Proteins
• Linear chains of ~30-400 units from
20 different amino acids
• Fold into a unique functional
structure: native state or tertiary
structure
Show repeated
substructures:
alpha helices
and beta sheets
1A8M 3-D Structure
Protein Structure Prediction (PSP)
Goal: Determining the 3D
structure of proteins from their
amino acid sequences
 Strategy: find an amino acid
chain's state of minimum
energy
 Solution will have practical
consequences in medicine,
drug development and
agriculture

The 2D HP Model

Hydrophobic effect is the main
force governing folding

q Є{H, P}+, each letter of q
has to be put in vertex of a
given lattice L (at each point:
turn 90º Left or Right, or
continue ahead)

Scoring function: adds -1 for
each “contact” between two
Hs adjacent in the lattice that
are not consecutive in q
2 Amino acids types:
hydrophobic (H) and
polar or hydrophilic (P)
HPHPPHHPHPPHPHHPPHPH
Square
Lattice
9 H-H bonds
Score = -9
Objective: Find the
organization (embedding) of
q in L of minimum score
(maximum contacts)
EA approaches to PSP: Current
(Direct) Encoding


EAs and other stochastic methods: global optimization
of a suitable energy function
Encoding: Cartesian Coordinates, Distance
Geometries, Internal Coordinates
 Absolute: structure encoded as a string of symbols.
For example: In the 2D Square
s = {Up, Down, Left, Right}+
 Relative: each move is interpreted in terms of the
previous one
s = {Forward, TurnLeft, TurnRight} +
Protein : HPHPPHHPHPPHPHHPPHPH
L =20
Absolute Encoding
R
D
L
D
RDDLULDLDLUURULURRD
L = 19
First position is fixed
Relative Encoding
R
R
R
F
RFRRLLRLRRFRLLRRFR
First and second
position are fixed
L = 18
L-Systems (Lindenmayer, 1968)


A model of morphogenesis,
based on formal grammars
Rewriting: Define complex
objects by replacing parts of a
simple object using a set of
productions.

Symbols: F, f, +, -, [, ]
 Axiom (S)
 Production
(replacement) rules
r 1: F
r 2: f
S: F
start
F+f
F
F
1
F+f
2
F+f+F
3
F+f+F+F+f
Why a Grammatical Encoding?




Specifies how to construct the
phenotype
Can achieve greater scalability
through self-similar and hierarchical
structure
Proteins exhibit high degree of
regularity, and repeated motifs
Current encoding may not be
suitable for crossover and building
block transfer between individuals
3D L-System
Protein Structure
Method

Prove of principle: Can a folded protein be
captured (encoded) by an L-system?
 How to find that L-system: An EA used to
evolve an L-system that capture a folded
protein (inverse problem)
Input: Folded structure in
Relative Coordinates
RFRRLLRLRRFRLLRRFR
EA
Output: L-system L that
once derived, will produce
the target string
RFRRLLRLRRFRLLRRFR
Axiom = 01F
Rules = {0:RFR1, 1:2L2, 2:R0L}
Proposed Grammatical Encoding

D0L-system (deterministic and context free):
Alphabet: =t  nt
t={F,L,R} terminal symbols (relative coord.)
nt={0,1,2,...,m-1} non-terminal symbols
(rewriting rules), m = max. number of rules
Axiom: α  *
Rewriting rules: i: wi , where i  nt and wi  *
Example
axiom R2
rules 0:R03F; 1:R01L;
2:F310; 3:LRL3
Evolutionary Algorithm
Generational with rank based selection
 Randomly generated initial population

Prefixed maximum number of rules
 Axiom and Rules: randomly generated strings of
prefixed maximum length


Genetic operators
Uniform-like (homologous) recombination (rate = 1.0)
complete production rules are interchanged
 Per symbol mutation in both axioms and rules

(deletion (30%), insertion (10%), modification(60%))
Derivation, and Fitness Function
Axiom = 31
genotype
Rules ={0:3LL2; 1:R0RL; 2:RRF; 3:RFR1}

Derivation: from genotype
(axiom and rules) to
phenotype (folded
structure)

Post-processing: nonterminal symbols pruning

Fitness calculation: number
of matches between the
target string and the
solution Min. = 0, Max =
length of the desired
folding.
axiom
31
1st step
RFR1
3
RFR R0RL
1
R0RL
1
R 3LL2 RL
0
RFRR 3LL2 RL R RFR1 LL RRF
RL 3
0
2
RFRRLLRLRRFRLLRRFR
2nd step
3th step
post-processing
phenotype
fitness= 18
Results (1)
Instance
Length
Successes
One Solution
HPHPPHHPHPPHPHHPPHPH
RFRRLLRLRRFRLLRRFR
18
5/50 (4 R)
A = 31
R = {0:3LL2, 1:R0RL,
2:RRF, 3:RFR1}
HHHPPHPHPHPPHPHPHPPH 
RRFRFRLFRRFLRLRFRR
18
3/50 (4 R)
A = R2
R = {0:RLR, 1:3F32L,
2:1FR33,3:R102}
HHPPHPPHPPHPPHPPHPPHPPHH

RLLFLFFRRFLLFRRLRFFRRF
22
PPHPPHHPPPPHHPPPPHHPPPPHH

FFRRFFFLLFFFFRRFFFFLLFF
23
0/50 (4 R)
1/50 (5 R)
1/50 (5 R)
A = 1R
R = { 0:4LF3,1:RL243,
2:00F3, 3:RRFL,
4:0R14F}
A= 32
R ={0:20R2, 1:132F,
2:FF012, 3:0FLL}
Results (2)
Evolutionary progression towards the target structure
Discussion
 The
proposed EA discovered L-systems
that capture a target folding under the HP
model in 2D lattices
 We are not solving the PSP yet, but ..
 We are proposing a novel and potentially
useful, generative encoding for
evolutionary approaches to PSP
Future work

Incorporate problem knowledge about secondary
structures
Alpha Helix

Beta Sheet
Explore longer chains and 3D lattices
Beta Turn
Related documents