Download Algorithms for Protein Folding and Structure Prediction Topics The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction
Popular Algorithms
New Results
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Topics
Algorithms for Protein Folding and Structure
Prediction
Introduction to proteins
Protein folding is an optimization problem
Martin Paluszewski
Ph.D. student
[email protected]
Not recent but popular algorithms
Recent algorithms
DIKU
12/12-2006
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
The Cell - Building block of life and protein factory
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Protein Synthesis
Proteins act as structure blocks, tools or machines in the cells.
1
2
3
Structural proteins: Hair, skin, nail, etc.
Enzymatic proteins: Catalysts in chemical reactions.
Functional proteins: Transportation of oxygen in blood,
antibody defense etc.
Synthesis steps
1
2
3
The Cell
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
4
Algorithms for Protein Folding and Structure Prediction
DNA is transcribed
into mRNA.
mRNA is
transported to the
ribosomes.
mRNA is translated
into a chain of
amino acids.
Transcription
...CAGAGAUCA...
mRNA
Ribosomes
...CAGAGAUCA...
translation
Q
R
S
amino acids
The chain folds to a
protein.
Martin Paluszewski Ph.D. student [email protected]
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
Protein Synthesis
...CAGAGATCA...
DNA
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
The Genetic Code
Synthesis steps
1
2
3
4
DNA is transcribed
into mRNA.
mRNA is
transported to the
ribosomes.
mRNA is translated
into a chain of
amino acids.
The chain folds to a
protein.
Martin Paluszewski Ph.D. student [email protected]
...CAGAGATCA...
DNA
Transcription
...CAGAGAUCA...
mRNA
Ribosomes
...CAGAGAUCA...
translation
Q
?
R
S
amino acids
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
The 20 Amino Acids
The Chain of Amino Acids
Amino Acids
R1
H 2N
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
R2
CH
CO2H
CH
Martin Paluszewski Ph.D. student [email protected]
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
H 2N
+
R3
H 2N
+
CH
CO2H
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
The Chain of Amino Acids
CO2H
The Chain of Amino Acids
Peptide Bonds
R1
H 2N
R2
CH
CO2H
H 2N
+
H 2N
R3
CH
CO2H
H 2N
+
CH
R1
O
CH
C
H
CO2H
R2
O
CH
C
NH
R3
NH
CH
Peptide
bond
H
{
{
Amino Acids
Peptide
bond
H
O
CO2H
H
O
The Rigid Planes
Peptide Bonds
R1
O
CH
C
R2
O
CH
C
R3
R
H
H
H
NH
NH
CH
{
{
H 2N
Peptide
bond
H
O
CO2H
Cα
ond
H
Pep
tid
e
O
eb
tid
Pep
d
N
Peptide
bond
H
H
C
C
φ
ψ
O
N
Cα
H
O
H
R
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
R
R
O
H
Cα
ond
eb
tid
H
φ
ψ
O
ψ
ψ
φ
O
R
Cα
eb
ond
de
ti
Pep
φ
ψ
bon
d
H
R
tid
R
Cα
eb
ond
C
φ
ψ
N
Cα
H
H
H
Pep
O
N
C
N
Cα
H
H
H
O
R
H
Example of Conformational Space Size
Degree of Freedom
Atoms in the peptide bonds are fixed in planes.
Planes can rotate according to φ and ψ.
The conformational space is all non-clashing conformations.
Martin Paluszewski Ph.D. student [email protected]
tid
C
N
Cα
H
Pep
O
C
φ
H
H
R
Cα
de
b
ond
N
C
N
Cα
R
Pep
ti
Pep
C
ψ
φ
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
The Chain of Amino Acids
H
de
bon
d
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
The Chain of Amino Acids
Pep
ti
R
Cα
bon
Algorithms for Protein Folding and Structure Prediction
Given a protein length N, then there are 2(N − 1) bond angles.
Assume 4 discrete values of φ and ψ. This gives C (N) = 42N−2 possible
configurations.
C(20) = 75557863725914323419136.
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Protein Folding
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Energy of a Protein
Energy Function (model)
An amino acid chain always
folds to a unique 3D structure
(Anfinsen).
U = Bond + Angle + Dihedral + Van der Waals + Electrostatic
It should therefore be
possible to compute the
folded structure of a protein
only using the amino acid
sequence as input.
Hypothesis: The folded
structure is the conformation
with minimum free energy. Local
or global?
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Why should we fold proteins?
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Problem variations
Protein Folding Problem
The 3D structure of a protein is found using X-ray and NMR
technology.
This is expensive, time consuming and for some proteins
impossible.
Can we contruct an algorithm that determines the 3D
structure of a protein given it’s amino acid sequence?
Trajectory
Unfolded amino acid chain
Folded protein
Protein Structure Prediction
DTSGNALYQVGLAINDYKLA
Amino acid sequence
Folded protein
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Introduction
Popular Algorithms
New Results
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Backbone of a protein
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Backbone of a protein
R
H
H
Pep
tid
O
Cα
eb
ond
Pep
φ
C
ψ
φ
ψ
e
tid
d
bon
H
N
C
N
tid
R
Cα
eb
ond
C
φ
ψ
N
Cα
Cα
H
H
R
Pep
O
H
O
Martin Paluszewski Ph.D. student [email protected]
R
H
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Backbone of a protein
Introduction
Popular Algorithms
New Results
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Structure Levels
Primary structure. The
amino acid sequence.
DTSGNALY ... DYKLA
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Structure Levels
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Topics
The Cell
Protein Synthesis
Structure of Proteins
Anfinsens Results
Folding
Structure Levels
Primary structure. The
amino acid sequence.
Primary structure. The
amino acid sequence.
Secondary structure.
Local structures, helix,
sheet, turn, coil etc.
Secondary structure.
Local structures, helix,
sheet, turn, coil etc.
Tertiary structure. 3D
coordinates of all atoms.
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Popular Algorithms and Models
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
The HP Model
Exact algorithm in the HP model.
Amino acids are either hydrophobic (H) or hydrophilic (P).
Side-chain positioning problem.
Observation: Hydrophobic amino acids tend to be packed in
the protein core.
Protein Threading.
Secondary structure prediction using neural networks.
Protein folding using distributed computing, (Folding@Home).
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Consider binary chain of H’s P’s
Embed a chain of points (AA) in a lattice such that the
number of neighbouring H’s is maximized.
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
The HP Model - Example
Amino acids:
HP:
Introduction
Popular Algorithms
New Results
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Approximation
FRDLDRYYFHDINNFRHIEG
HPPHPPHHHPPHPPHPPHPP
The HP problem is NP hard (also in 2D).
Approximation algorithms exist (Hart and Istrail).
Ratio: 1/4 (2D) and 3/8 (3D).
2D lattice
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
3D lattice
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Exact Method
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Exact Method
Observation
Optimal solutions usually have a core of optimal packed H’s
Algorithm
Input: A string s = {H, P}∗ .
Output: Lattice positions of s.
1
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Generate a set compact cores of H’s
2
Thread the string to the cores
3
If succes → done, else generate sets of almost compact cores
and try again.
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Exact Method - Interactive demo
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Side-chain Positioning Problem
SCP
Fixed backbone.
For each amino acid i, there is
a set of possible rotamers {ri }.
A rotamer is a specific rotation
of a side-chains.
Problem: Choose one rotamer
for each amino acid that
minimize:
E = E0 +
X
i
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
E (ir ) +
X
E (ir , js )
i <j
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Protein Threading
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Protein Threading
Naive idea
Similar amino acid sequences have similar structures.
Naive idea
Similar amino acid sequences have similar structures.
Better idea
We want to predict the structure of sequence A.
Find a set of homologue proteins B with similar sequence and
known structure.
Thread A on each core structural model in B.
The structure of A corresponds to the threading with the
lowest cost.
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Core Structural Model - Example
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Core Structural Model - Example
AGABZILMKAPFAHETWTNDAB
[1, 4]
5
AGA
[1, 4]
5
[3, 6]
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
2
[1, 2]
6
[3, 6]
BZILM
KAP
2
[1, 2]
6
[0, 4]
FA
H
TWTNDA
B
[0, 4]
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Core Structural Model - Example
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Core Structural Model - Example
Threading
Threading
Find threading with minimum cost.
Find threading with minimum cost.
Exponentially many threadings.
AGABZILMKAPFAHETWTNDAB
[1, 4]
AGA
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
5
BZILM
[3, 6]
KAP
Martin Paluszewski Ph.D. student [email protected]
2
[1, 2]
6
FA
H
TWTNDA
[0, 4]
B
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Core Structural Model - Example
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Prediction of Secondary Structure Using Neural Networks
Threading
Find threading with minimum cost.
Exponentially many threadings.
Input: AGABZILMKAPFAHETWTNDABGAPFAHET
Output: CCCCHHHHHHHHHCCCSSSSSSSCCCHHHH
NP hard. However, large instances can be solved using branch
and bound.
Secondary Structure Categories
AGABZILMKAPFAHETWTNDAB
H: helix
[1, 4]
5
[3, 6]
2
[1, 2]
6
FA
H
TWTNDA
[0, 4]
S: sheet
C: coil
AGA
BZILM
KAP
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
B
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Prediction of Secondary Structure Using Neural Networks
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Prediction of Secondary Structure Using Neural Networks
...
Dendrites
N
V
K
E
T
21
21
21
21
21
...
Input layer
Axon
Synapse
Hidden layer
Biology of the Brain
Axons transmits signals
Dendrites receives signals
H
Output layer
Synapses are junctions between axons and dendrites
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Prediction of Secondary Structure Using Neural Networks
S C
0.6 0.3 0.1
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Protein Folding Using Distributed Computing
Trajectory
Results
Best non-neural network approach: 55 % accuracy.
Multilayer perceptron: 60 % accuracy.
Consensus using different methods: >80 % accuracy.
Unfolded amino acid chain
Folded protein
Simulation
Start with an unfolded amino acid chain.
Iteratively apply Newtonian laws of motions.
A home PC can simulate ' 10−10 seconds of motion in an hour (small
proteins).
Simulation of 1 second of protein motion would take ' 1.000.000 years.
Super computers > 1.000 years.
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Protein Folding Using Distributed Computing
Simulation
New paradigm of worldwide distributed computing might break
the CPU time barrier.
SETI@HOME: Analyzes radio telescope data. CPU time > 2M
years.
Introduction
Popular Algorithms
New Results
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Ensemble Dynamics
Energy
Introduction
Popular Algorithms
New Results
Folding@HOME: How can a protein folding simulation be
massively distributed?
Conformational space
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Energy
Ensemble Dynamics
Energy
Ensemble Dynamics
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Conformational space
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Conformational space
Martin Paluszewski Ph.D. student [email protected]
Topics
The HP Model
Side-chain Positioning Problem
Protein Threading
Prediction of Secondary Structure
Protein Folding Using Distributed Computing
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure From Solvent Exposure
Using Tabu Search
Ensemble Dynamics
Energy
Algorithms for Molecular Biology
BioMed Central
Open Access
Research
Reconstructing protein structure from solvent exposure using tabu
search
Martin Paluszewski*1, Thomas Hamelryck2 and Pawel Winter1
Address: 1Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark and 2Bioinformatics
Center, Institute of Molecular Biology, University of Copenhagen, Universitetsparken 15 building 10, 2100 Copenhagen, Denmark
Email: Martin Paluszewski* - [email protected]; Thomas Hamelryck - [email protected]; Pawel Winter - [email protected]
* Corresponding author
Published: 27 October 2006
Algorithms for Molecular Biology 2006, 1:20
doi:10.1186/1748-7188-1-20
Received: 30 March 2006
Accepted: 27 October 2006
This article is available from: http://www.almob.org/content/1/1/20
Conformational space
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
© 2006 Paluszewski et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Martin Paluszewski Ph.D. student [email protected]
Abstract
Algorithms for Protein Folding and Structure Prediction
Background: A new, promising solvent exposure measure, called half-sphere-exposure (HSE), has
recently been proposed. Here, we study the reconstruction of a protein's Cα trace solely from
structure-derived HSE information. This problem is of relevance for de novo structure prediction
using predicted HSE measure. For comparison, we also consider the well-established contact
number (CN) measure. We define energy functions based on the HSE- or CN-vectors and
minimize them using two conformational search heuristics: Monte Carlo simulation (MCS) and tabu
search (TS). While MCS has been the dominant conformational search heuristic in literature, TS has
been applied only a few times. To discretize the conformational space, we use lattice models with
various complexity.
Results: The proposed TS heuristic with a novel tabu definition generally performs better than
MCS for this problem. Our experiments show that, at least for small proteins (up to 35 amino
acids), it is possible to reconstruct the protein backbone solely from the HSE or CN information.
In general, the HSE measure leads to better models than the CN measure, as judged by the RMSD
and the angle correlation with the native structure. The angle correlation, a measure of structural
similarity, evaluates whether equivalent residues in two structures have the same general
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Contact Vector
Contact number of a Cα -atom is the number of other Cα -atoms in a
sphere centered at the Cα -atom.
CN-vector is the contact number of all Cα -atoms.
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Contact Vector
CN-vector can be predicted from amino acid sequence using
neural networks.
Is it possible to reconstruct the protein backbone from the
CN-vector?
CN−vector
Amino acid sequence
M
L
S
D
E
D
F
K
A
V
F
G
M
T
R
8
7
7
NN
10
Predicted contact numbers (CN)
5
8 10 11 10 11 13 12 13 10
9 10
?
Structure
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Half Sphere Exposure Vector
The sphere is slit in upper- and lower hemispheres.
Energy
HSE is therefore a pair of numbers (up CN, down CN).
Recent results show that HSE vectors can be predicted.
Is it possible to reconstruct the protein backbone from the HSE-vector?
Given a structure S with N amino acids. Let HSE s be the
HSE-vector of structure S. Let HSE p be the predicted
HSE-vector. The energy of S is:
E=
CB
up
A
5
B
down
Vb
5
p
i (HSEi
rP
HSE−vector−up
HSE−vector−down
− HSEis )2
N
We want to find a structure S with zero (or near-zero) energy.
AB
C
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Advantages of using lattices
Discrete state space Well-defined combinatorial
optimization problem.
Lattice Model
Chain of Cα -atoms.
Exact computations - No
rounding errors.
The Cα -atoms are
positioned at lattice
points.
Many algorithmic problems
can be solved efficiently.
Succeeding Cα -atoms are
positioned at connected
lattice points.
Martin Paluszewski Ph.D. student [email protected]
Collision detection
Finding neighbours
Local moves
Simple backbone
representation
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Disadvantage
Advantages of using lattices
Nature does not
represent proteins in
lattices.
Discrete state space Well-defined combinatorial
optimization problem.
Cubic
Face Centered Cubic (FCC)
High Coordination (HC)
Exact computations - No
rounding errors.
Many algorithmic problems
can be solved efficiently.
Collision detection
Finding neighbours
Local moves
Simple backbone
representation
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing ProteinLattices
Structure Using Tabu Search
A Comparison
Cubic
Martin Paluszewski Ph.D. student [email protected]
Face Centered Cubic
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
High Coordination
cRMSD
Let a1 , · · · , aN be the coordinates of Structure A, and let
b1 , · · · , bN be the coordinates of Structure B. Then the similarity
between A and B is:
s
PN
2
i=1 |ai − bi |
cRMSD(A,B) =
N
!
!
6 basis vectors
RMSD: 2.69
!
!
12 basis vectors
RMSD: 1.84
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
!
!
678 basis vectors
RMSD: 0.34
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Tabu Search
Strategy
Given an amino acid sequence A, predict the HSE-vector using
neural networks.
Find a structure, S, in a high coordination lattice such that the
energy is minimized. This structure will have an HSE-vector
similar to the predicted HSE-vector.
Hypothesis: S will be similar to the native structure of the
amino acid sequence A.
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
1: bestStructure, s ← random_conformation()
2: bestCost ← cost(bestStructure)
3: while not stop() do
4:
N ← compute_neighbours(s)
5:
sort N with respect to cost
6:
for all i ∈ N do
7:
if cost(Ni ) < bestCost then
8:
bestCost ← cost(Ni )
9:
s, bestStructure ← Ni
10:
break loop
11:
end if
12:
if not Tabu(Ni , Q) then
13:
s ← Ni
14:
break loop
15:
end if
16:
end for
17:
pushback( Q, s )
18: end while
19: return bestStructure
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Protein: 1EDN, size: 21 amino acids.
Tabu
1: for all i ∈ Q do
2:
if cost(s) > cost(Qi ) AND RMSD(s, Qi ) ≤ then
3:
return true
4:
end if
5: end for
6: return false
Energy
a
4
b
c
ε
1
2
3
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Protein Structure Using Tabu Search
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Conclusion
Protein: 1SRK, size: 35 amino acids.
Tabu search and other metaheuristics often have difficulties
reaching all parts of the conformational space. Especially for
large proteins.
HSE-based energy functions generates a reasonable energy
landscape for small proteins.
However, more information is needed to fold large proteins.
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Work in progress: Branch and bound algorithm.
Use branch-and-bound on the whole conformational space to
guarantee implicit evaluation of all possible conformations.
Add more predictable information to the model:
Secondary structure.
Radius of gyration.
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Radius of gyration
Elongated chains have large Rg
Compact chains have small Rg
Rg of globular proteins can be predicted:
Predicted Rg
Radius of gyration
rP
Rg =
Martin Paluszewski Ph.D. student [email protected]
i (vi
− vcm )2
N
Algorithms for Protein Folding and Structure Prediction
Rg = 2.2N 0.38
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Segments
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Discrete directions
Segments are constructed from the secondary structure assignment
of the amino acid sequence.
The direction of a segment must be one of the FCC directions.
(1,1,0), (1,0,1), (1,-1,0), (1,0,-1), (-1,1,0), (-1,0,1), (-1,-1,0),
(-1,0,-1), (0,1,1), (0,1,-1), (0,-1,1), (0,-1,-1)
Amino acid sequence
MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF
Secondary structure assignment
CCCHHHHHHHCCCCHHHHHCCCHHHHHHHHCCCCCC
Segments
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Super structure
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Segment structures
Super structure definition
A super structure is a list of segments and their directions. If S is
the number of segments, then there exists N = 4 × 11S−2 different
super structures.
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Given a segment there are many configurations of the internal
Cα atoms. We call these segment structures.
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Segment structures
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Segment structures
Given a segment there are many configurations of the internal
Cα atoms. We call these segment structures.
For helices and sheets, we generate U such segment
structures by rotating one structure uniformly around the axis
defined by the segment.
Given a segment there are many configurations of the internal
Cα atoms. We call these segment structures.
For helices and sheets, we generate U such segment
structures by rotating one structure uniformly around the axis
defined by the segment.
For coils, we find U/2 best match structures from a fragment
library, and use 2 rotations for each match.
Coil sequence
K
E
K
G
L
F
Fragment library
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Complete structure
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Complete structure
Complete structure definition
Complete structure definition
Given a super structure. A corresponding complete structure is the
selection of exactly one segment structure for each segment. There
exists N = 4 × 11S−2 × U S different complete structures.
Given a super structure. A corresponding complete structure is the
selection of exactly one segment structure for each segment. There
exists N = 4 × 11S−2 × U S different complete structures.
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Branching
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Branching example
Branch tree for three segments: helix, coil, helix.
A node is branched by either fixing a segment direction or
fixing a segment structure.
Obviously, nodes in level 2 × S contain complete structures.
Performance of the algorithm depends very much on the order
fixing segment directions or fixing segment structures.
Best performance is observed when segment directions are
fixed as early as possible.
1
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Branching example
Branch tree for three segments: helix, coil, helix.
Branch tree for three segments: helix, coil, helix.
1
2
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Branching example
1
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
2
3
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Branching example
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Branching example
Branch tree for three segments: helix, coil, helix.
1
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
3
2
Branch tree for three segments: helix, coil, helix.
1
4
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
3
2
5
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Branching example
4
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Energy Example
Branch tree for three segments: helix, coil, helix.
Amino acid sequence
M
L
S
D
E
D
F
K
A
V
F
G
M
T
R
9 10
8
7
7
Desired contact numbers (CN)
5
1
3
2
4
Martin Paluszewski Ph.D. student [email protected]
5
6
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
8 10 11 10 11 13 12 13 10
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Energy Example
L
S
D
E
D
F
K
A
V
Amino acid sequence
F
G
M
T
R
M
L
S
D
E
Desired contact numbers (CN)
5
8 10 11 10 11 13 12 13 10
9 12 10 12 10 13 12 10 12
2
1
D
F
9 10
8
7
7
5
1
2
1
0
0
3
4
1
4
1
0
0
Martin Paluszewski Ph.D. student [email protected]
9
V
F
G
M
T
R
9 10
8
7
7
Structure contact numbers (CN)
9 10
7
8
9
7
9 12 10 12 10 13 12 10 12
0
1
1
2
2
1
9 10
7
8
9
0
1
1
2
Difference
2
0
Difference squared
1
A
8 10 11 10 11 13 12 13 10
Difference
2
K
Desired contact numbers (CN)
Structure contact numbers (CN)
7
4
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Energy Example
Amino acid sequence
M
Algorithms for Protein Folding and Structure Prediction
4
2
1
2
1
0
0
3
2
0
Difference squared
0
0
1
1
4
Algorithms for Protein Folding and Structure Prediction
4 1+
4+
1 4+
4 1+
1 4+
4 1+
1 0+
0 0+
0 9+
9 4+
4 0+
0 0+
0 1+
1 1+
1 4 = 34
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Energy Example
Difference squared
4 1+
4+
1 4+
4 1+
1 4+
4 1+
1 0+
0 0+
0 9+
9 4+
4 0+
0 0+
0 1+
1 1+
1 4 = 34
Amino acid sequence
M
L
S
D
E
D
F
K
A
V
F
G
M
T
R
15
13
6
Desired contact numbers (CN)
5
8 10 11 10 11 13 12 13 10
9 10
8
7
7
Definition
Structure contact numbers (CN)
7
9 12 10 12 10 13 12 10 12
9 10
7
8
9
CNdiff (i) =
Difference
2
1
2
1
2
1
0
0
3
2
0
0
1
1
2
si
X
(CNdi,j − CNi,j )2
j=0
Difference squared
4 1+
4+
1 4+
4 1+
1 4+
4 1+
1 0+
0 0+
0 9+
9 4+
4 0+
0 0+
0 1+
1 1+
1 4 = 34
15
13
E (Z , Q) =
6
N
X
CNdiff (i)
i=0
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Bound example
Lower bound
CNLdiff (i) = min CNdiff (i) ≤
Q∈QZ
min
Q∈QZOPT
CNdiff (i)
2
1
2
Problem 1
Given a segment i with a fixed segment structure qa ∈ Qi . Choose
exactly one of the allowed segment structures qb ∈ Qj , for each
segments 1 ≤ j ≤ N, j 6= i, such that CNdiff of segment i having
structure qa is minimized. Call this value CNLdiff (a, i). Let CNLdiff (i)
be the smallest of these.
3
1
2
1
Segment
Black
1 (1,0)
2 (1,1)
3 (0,1)
Red
(2,1)
−
(0,2)
R=(2,3)
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Another formulation
1:
2:
3:
4:
Problem 2
Given an X × Y matrix M of d-dimensional vectors and a
d-dimensional result vector R. Choose exactly one vector from each
row of M. Let S be the sum of the chosen vectors, let D be the
Euclidean distance between S and R. Choose the vectors of M such
that D is minimized.
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Algorithms for Protein Folding and Structure Prediction
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Input: 2 dimensional X × Y matrix M and result vector R
Output: Minimal D as defined in problem 2
for j ← 1 to X do
S1 .insert(M1,j )
end for
for i ← 2 to Y do
for j ← 1 to X do
for all m ∈ S1 do
S2 .insert(m + Mi,j )
end for
end for
S1 ← S2
end for
Return minimal distance between a vector in S1 and R
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Example
Introduction
Popular Algorithms
New Results
Reconstructing Proteins Using Tabu Search
Reconstructing Proteins Using Branch and Bound
Results. Work in Progress.
Protein 2GB1, 56 amino acids.
2,4
3,2
9,1
7,8
0,1
3,4
5,5
1,2
3,3
R = 23,12
S = 21,14
D = 22 x 22
2
3
9
4
2
1
7
0
3
8
1
4
5
1
3
5
2
3
R = 23
S = 21
D = 22
Martin Paluszewski Ph.D. student [email protected]
R = 12
S = 12
D = 0
Algorithms for Protein Folding and Structure Prediction
Martin Paluszewski Ph.D. student [email protected]
Algorithms for Protein Folding and Structure Prediction
Related documents