Download ppt - Department of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Protein Structure Prediction on a Lattice Model via
Multimodal Optimization Techniques
Ka-Chun Wong, Kwong-Sak Leung, Man-Hon Wong
Department of Computer Science & Engineering
The Chinese University of Hong Kong, HKSAR, China
{kcwong, ksleung, mhwong}@cse.cuhk.edu.hk
Outline





Introduction
Background
Objective
Related Works
Paper Contributions
 Apply multimodal optimization techniques
 Propose a novel mutation method
 Experiments
 Conclusion
Introduction
 Protein is:
 a sequence of amino acid residues folded
into a 3D structure
 important for living:
 Material transportations across cells
 Catalyzing metabolic reactions
 Body defenses against viruses
Introduction
 Protein Function:
 Substantially depends on its 3D structure
http://www.pdb.org/pdb/explore/explore.do?structureId=2X7M
Introduction
 Protein Structure Determination
 “Wet-lab” experiments exist
 X-ray crystallography
 NMR spectroscopy
 ……
 But they are:
 Labor intensive
 Not scalable
 Expensive
Introduction

“Wet lab” experiments for Protein 
Computational approaches for
Structure Determination are
Protein Structure Prediction are




Costly
Time-consuming
Not scalable
Accurate




Less Costly
Fast
Scalable
Less Accurate
Complementary Twins
Wet-labs for fine-tuning
Computation for coarse-tuning
Introduction
 Protein Structure Prediction (PSP)
 Input: An amino acid sequence
 Output: The 3D structure of the sequence
 Divided into two classes:
 Using / Not using
 similar sequences & their structures
……YDVAEGCKVV……
Prediction
Similar sequences & their structures
Introduction
 This paper focuses on
 De novo protein structure prediction on
the 3D HP lattice model using
evolutionary algorithms *
 De novo means: the input of the method
only contains the sequence to be
predicted
*N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with
evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith,
editors, International Genetic and Evolutionary Computation Conference (GECCO99),
pages 1569-1601. Morgan Kaufmann, 1999.
Background
 3D HP lattice model
 Assume the main driving forces are the
interactions among the hydrophobic
amino acid residues
 All known amino acid residues are
experimentally classified as either
hydrophobic (H) or polar (P).
Background
 3D HP lattice model
 An amino acid sequence is represented
as a string {H,P}+
 The sequence folded into a limited space,
a cubic lattice
Background
 Amino acid residue – Bead
 Peptide bond – Straight Line
HPHPPHHPHPPHPHHPPHPH
H: Red color
P: Blue color
Objective
 To find the conformation with the
minimal energy.
 Maximize the number of the H-H bonds
which are formed by two non-sequenceadjacent residues (non-local H-H bonds)
Objective
 Mathematically, it is to minimize the
following function:
Distance Function
Only non-sequenceadjacent residues are
checked
Bond Energy
* H. Li, R. Helling, C. Tang, and N. Wingreen. Emergence of Preferred Structures in a
Simple Model of Protein Folding. Science, 273(5275):666–669, 1996.
Related Works
 Unger et al. first apply a hybridized
genetic algorithm to solve the problem
[1]
 Patton et al. use a standard genetic
algorithm [2]
[1] Unger, R. and Moult, J. 1993. Genetic Algorithm for 3D Protein Folding Simulations. In
Proceedings of the 5th international Conference on Genetic Algorithms S. Forrest, Ed. Morgan
Kaufmann Publishers, San Francisco, CA, 581-588.
[2] Patton, A. L., Punch, W. F., and Goodman, E. D. 1995. A Standard GA Approach to Native
Protein Conformation Prediction. In Proceedings of the 6th international Conference on Genetic
Algorithms (July 15 - 19, 1995). L. J. Eshelman, Ed. Morgan Kaufmann Publishers, San Francisco,
CA, 574-581.
Related Works
 Berger et al. prove that the problem
is NP-complete [1]
 Krasnogor et al. publish a work
discussing the basic algorithmic
factors affecting the problem [2]
[1] Berger, B. and Leighton, T. 1998. Protein folding in the hydrophobic-hydrophilic (HP) is NPcomplete. In Proceedings of the Second Annual international Conference on Computational Molecular
Biology. RECOMB '98. ACM, New York, NY, 30-39.
[2] N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary
algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International
Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann,
1999.
Related Works
 Since then, many related algorithms
are proposed. Some examples:






Multimeme algorithm by Krasnogor et al.
Guided genetic algorithm by Hoque et al.
Ant colony algorithm by Shmygelska et al.
Differential Evolution by Bitello et al.
Immune Algorithm by Cutello et al.
EDA by Santana et al.
Paper Contributions
 Observation:
 Some diversity preserving techniques are
incorporated in most algorithms
 Duplicate predator [1]
 Aging operator [2]
 Additional renormalization of the pheromone [3]
[1] G. A. Cox, T. V. Mortimer-Jones, R. P. Taylor, and R. L. Johnston. Development and
optimisation of a novel genetic algorithm for studying model protein folding. Theoretical
Chemistry Accounts: Theory, Computation, and Modeling, 112(3):163–178, 2004.
[2] V. Cutello, G. Nicosia, M. Pavone, and J. Timmis. An immune algorithm for protein
structure prediction on lattice models. IEEE Transactions on Evolutionary Computation,
11(1):101–117, Feb. 2007.
[3] A. Shmygelska and H. Hoos. An ant colony optimisation algorithm for the 2d and 3d
hydrophobic polar protein folding problem. BMC Bioinformatics, 6(1):30, 2005.
Paper Contributions
 Observation
 Unger et al. have observed that there
can be multiple conformations for each
energy value [1]
 A study also indicates the fitness
landscapes of the problem are
multimodal [2]
[1] R. Unger and J. Moult. Genetic algorithms for protein folding simulations. J. Mol.
Biol., 231:75–81, May 1993.
[2] S. D. Flores and J. Smith. Study of fitness landscapes for the HP model of
protein structure prediction. In Evolutionary Computation, 2003. CEC ’03. pages
2338–2345, Dec. 2003.
Paper Contributions
 In this paper:
 Apply multimodal optimization techniques
to solve the PSP problem
 Fitness Sharing (SharingGA) [1]
 Species Conserving (SCGA) [2]
 Crowding (CGA) [3]
1.
2.
3.
Goldberg, D. E. and Richardson, J. 1987. Genetic algorithms with sharing for multimodal
function optimization. In Proceedings of the Second international Conference on Genetic
Algorithms on Genetic Algorithms and their Application, 41-49.
Li, J., Balazs, M. E., Parks, G. T., and Clarkson, P. J. 2002. A species conserving genetic
algorithm for multimodal function optimization. Evol. Comput. 10, 3 (Sep. 2002), 207-234.
De Jong, K. A. 1975 An Analysis of the Behavior of a Class of Genetic Adaptive Systems..
Doctoral Thesis. UMI Order Number: AAI7609381., University of Michigan.
Paper Contributions
 In this paper:
 Proposes a novel mutation method
 Mixing two types of mutations together
 Sometimes use RM, sometimes use AM
and apply it in CGA (called CGA-mixed)
RM: Mutation in Relative Encoding
AM: Mutation in Absolute Encoding
Experiments
 Experiments are conducted:
 Relative Encoding [1]
 Hamming Distance
 100 Individuals (Overlapping)
 Uniform Deterministic (Parent Selection)
 Truncation (Survival Selection)
 50 runs
 105 and 5x106 energy evaluations
 UN [2] as a control algorithm
•N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with
evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and
Smith, editors, International Genetic and Evolutionary Computation Conference
(GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.
•K.A. De Jong, Evolutionary computation: a unified approach. MIT Press,
Cambridge MA, 2006
Experiments
 105 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Experiments
 5x106 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Experiments
 The experimental results quoted in the following
literatures are taken and compared under the
same termination condition


Santana, R.; Larranaga, P.; Lozano, J.A.; , "Protein Folding in
Simplified Models With Estimation of Distribution Algorithms,"
Evolutionary Computation, IEEE Transactions on , vol.12, no.4,
pp.418-438, Aug. 2008
Cutello, V.; Nicosia, G.; Pavone, M.; Timmis, J.; , "An Immune
Algorithm for Protein Structure Prediction on Lattice
Models," Evolutionary Computation, IEEE Transactions on , vol.11,
no.1, pp.101-117, Feb. 2007
Experiments
 105 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Experiments
 5 x 106 energy evaluations over 50 runs
H(x): The lowest energy over 50 runs
mean+σ: The lowest energy of a run averaged over 50 runs
Conclusion
 In this paper, we:
 Apply multimodal optimization techniques for PSP
 Propose a novel mutation method for PSP
 Some results comparable with the state-ofthe-art algorithms have been obtained
 The source codes can be downloaded at:
http://pc89075.cse.cuhk.edu.hk:8080/myap
p/GECCO2010-PSP-LatticeModels.zip
Q&A
The source codes can be downloaded at:
http://pc89075.cse.cuhk.edu.hk:8080/mya
pp/GECCO2010-PSP-LatticeModels.zip
Paper Contributions
 Proposed mutation method
and apply it in CGA (called CGA-mixed)
Related documents