Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Protein Structure Prediction on a Lattice Model via
Multimodal Optimization Techniques
Ka-Chun Wong, Kwong-Sak Leung, Man-Hon Wong
Department of Computer Science & Engineering
The Chinese University of Hong Kong, HKSAR, China
{kcwong, ksleung, mhwong}@cse.cuhk.edu.hk
Outline
Introduction
Background
Objective
Related Works
Paper Contributions
Apply multimodal optimization techniques
Propose a novel mutation method
Experiments
Conclusion
Introduction
Protein is:
a sequence of amino acid residues folded
into a 3D structure
important for living:
Material transportations across cells
Catalyzing metabolic reactions
Body defenses against viruses
Introduction
Protein Function:
Substantially depends on its 3D structure
http://www.pdb.org/pdb/explore/explore.do?structureId=2X7M
Introduction
Protein Structure Determination
“Wet-lab” experiments exist
X-ray crystallography
NMR spectroscopy
……
But they are:
Labor intensive
Not scalable
Expensive
Introduction
“Wet lab” experiments for Protein
Computational approaches for
Structure Determination are
Protein Structure Prediction are
Costly
Time-consuming
Not scalable
Accurate
Less Costly
Fast
Scalable
Less Accurate
Complementary Twins
Wet-labs for fine-tuning
Computation for coarse-tuning
Introduction
Protein Structure Prediction (PSP)
Input: An amino acid sequence
Output: The 3D structure of the sequence
Divided into two classes:
Using / Not using
similar sequences & their structures
……YDVAEGCKVV……
Prediction
Similar sequences & their structures
Introduction
This paper focuses on
De novo protein structure prediction on
the 3D HP lattice model using
evolutionary algorithms *
De novo means: the input of the method
only contains the sequence to be
predicted
*N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with
evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith,
editors, International Genetic and Evolutionary Computation Conference (GECCO99),
pages 1569-1601. Morgan Kaufmann, 1999.
Background
3D HP lattice model
Assume the main driving forces are the
interactions among the hydrophobic
amino acid residues
All known amino acid residues are
experimentally classified as either
hydrophobic (H) or polar (P).
Background
3D HP lattice model
An amino acid sequence is represented
as a string {H,P}+
The sequence folded into a limited space,
a cubic lattice
Background
Amino acid residue – Bead
Peptide bond – Straight Line
HPHPPHHPHPPHPHHPPHPH
H: Red color
P: Blue color
Objective
To find the conformation with the
minimal energy.
Maximize the number of the H-H bonds
which are formed by two non-sequenceadjacent residues (non-local H-H bonds)
Objective
Mathematically, it is to minimize the
following function:
Distance Function
Only non-sequenceadjacent residues are
checked
Bond Energy
* H. Li, R. Helling, C. Tang, and N. Wingreen. Emergence of Preferred Structures in a
Simple Model of Protein Folding. Science, 273(5275):666–669, 1996.
Related Works
Unger et al. first apply a hybridized
genetic algorithm to solve the problem
[1]
Patton et al. use a standard genetic
algorithm [2]
[1] Unger, R. and Moult, J. 1993. Genetic Algorithm for 3D Protein Folding Simulations. In
Proceedings of the 5th international Conference on Genetic Algorithms S. Forrest, Ed. Morgan
Kaufmann Publishers, San Francisco, CA, 581-588.
[2] Patton, A. L., Punch, W. F., and Goodman, E. D. 1995. A Standard GA Approach to Native
Protein Conformation Prediction. In Proceedings of the 6th international Conference on Genetic
Algorithms (July 15 - 19, 1995). L. J. Eshelman, Ed. Morgan Kaufmann Publishers, San Francisco,
CA, 574-581.
Related Works
Berger et al. prove that the problem
is NP-complete [1]
Krasnogor et al. publish a work
discussing the basic algorithmic
factors affecting the problem [2]
[1] Berger, B. and Leighton, T. 1998. Protein folding in the hydrophobic-hydrophilic (HP) is NPcomplete. In Proceedings of the Second Annual international Conference on Computational Molecular
Biology. RECOMB '98. ACM, New York, NY, 30-39.
[2] N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary
algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International
Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann,
1999.
Related Works
Since then, many related algorithms
are proposed. Some examples:
Multimeme algorithm by Krasnogor et al.
Guided genetic algorithm by Hoque et al.
Ant colony algorithm by Shmygelska et al.
Differential Evolution by Bitello et al.
Immune Algorithm by Cutello et al.
EDA by Santana et al.
Paper Contributions
Observation:
Some diversity preserving techniques are
incorporated in most algorithms
Duplicate predator [1]
Aging operator [2]
Additional renormalization of the pheromone [3]
[1] G. A. Cox, T. V. Mortimer-Jones, R. P. Taylor, and R. L. Johnston. Development and
optimisation of a novel genetic algorithm for studying model protein folding. Theoretical
Chemistry Accounts: Theory, Computation, and Modeling, 112(3):163–178, 2004.
[2] V. Cutello, G. Nicosia, M. Pavone, and J. Timmis. An immune algorithm for protein
structure prediction on lattice models. IEEE Transactions on Evolutionary Computation,
11(1):101–117, Feb. 2007.
[3] A. Shmygelska and H. Hoos. An ant colony optimisation algorithm for the 2d and 3d
hydrophobic polar protein folding problem. BMC Bioinformatics, 6(1):30, 2005.
Paper Contributions
Observation
Unger et al. have observed that there
can be multiple conformations for each
energy value [1]
A study also indicates the fitness
landscapes of the problem are
multimodal [2]
[1] R. Unger and J. Moult. Genetic algorithms for protein folding simulations. J. Mol.
Biol., 231:75–81, May 1993.
[2] S. D. Flores and J. Smith. Study of fitness landscapes for the HP model of
protein structure prediction. In Evolutionary Computation, 2003. CEC ’03. pages
2338–2345, Dec. 2003.
Paper Contributions
In this paper:
Apply multimodal optimization techniques
to solve the PSP problem
Fitness Sharing (SharingGA) [1]
Species Conserving (SCGA) [2]
Crowding (CGA) [3]
1.
2.
3.
Goldberg, D. E. and Richardson, J. 1987. Genetic algorithms with sharing for multimodal
function optimization. In Proceedings of the Second international Conference on Genetic
Algorithms on Genetic Algorithms and their Application, 41-49.
Li, J., Balazs, M. E., Parks, G. T., and Clarkson, P. J. 2002. A species conserving genetic
algorithm for multimodal function optimization. Evol. Comput. 10, 3 (Sep. 2002), 207-234.
De Jong, K. A. 1975 An Analysis of the Behavior of a Class of Genetic Adaptive Systems..
Doctoral Thesis. UMI Order Number: AAI7609381., University of Michigan.
Paper Contributions
In this paper:
Proposes a novel mutation method
Mixing two types of mutations together
Sometimes use RM, sometimes use AM
and apply it in CGA (called CGA-mixed)
RM: Mutation in Relative Encoding
AM: Mutation in Absolute Encoding
Experiments
Experiments are conducted:
Relative Encoding [1]
Hamming Distance
100 Individuals (Overlapping)
Uniform Deterministic (Parent Selection)
Truncation (Survival Selection)
50 runs
105 and 5x106 energy evaluations
UN [2] as a control algorithm
•N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with
evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and
Smith, editors, International Genetic and Evolutionary Computation Conference
(GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.
•K.A. De Jong, Evolutionary computation: a unified approach. MIT Press,
Cambridge MA, 2006
Experiments
105 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Experiments
5x106 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Experiments
The experimental results quoted in the following
literatures are taken and compared under the
same termination condition
Santana, R.; Larranaga, P.; Lozano, J.A.; , "Protein Folding in
Simplified Models With Estimation of Distribution Algorithms,"
Evolutionary Computation, IEEE Transactions on , vol.12, no.4,
pp.418-438, Aug. 2008
Cutello, V.; Nicosia, G.; Pavone, M.; Timmis, J.; , "An Immune
Algorithm for Protein Structure Prediction on Lattice
Models," Evolutionary Computation, IEEE Transactions on , vol.11,
no.1, pp.101-117, Feb. 2007
Experiments
105 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Experiments
5 x 106 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Conclusion
In this paper, we:
Apply multimodal optimization techniques for PSP
Propose a novel mutation method for PSP
Some results comparable with the state-ofthe-art algorithms have been obtained
The source codes can be downloaded at:
http://pc89075.cse.cuhk.edu.hk:8080/myap
p/GECCO2010-PSP-LatticeModels.zip
Q&A
The source codes can be downloaded at:
http://pc89075.cse.cuhk.edu.hk:8080/mya
pp/GECCO2010-PSP-LatticeModels.zip
Paper Contributions
Proposed mutation method
and apply it in CGA (called CGA-mixed)