Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Proteins
Proteins control the biological functions of cellular organisms
e.g. metabolism, blood clotting, immune system
Building blocks – amino acids
amino group (NH2), carboxyl group (COOH), side chain R
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
The Protein Data Bank
60000
50000
40000
30000
Yearly
Total
20000
10000
0
Protein sequence and structure
Protein alphabet consists of 20 amino acids
Sequence view
ADKELKFLVVDDFSTMRRIV.....
Structure view
Protein structure and function
Function is determined by 3D shape/structure
Thrombin
Facilitates blood clotting
Hirudin
Anticoagulant
(blocks active site)
Protein structure and function
Structure conserves better evolution information
Myoglobin family
1MBC: VLSEGEWQLVLHVWAKVE.....
2FAL: XSLSAAEADLAGKSWAPV.....
Structural Bioinformatics
Pairwise alignment algorithms
DALI (Holm and Sander, Journal of Molecular Biology, 1993)
LOCK (Singh and Brutlag, ISMB, 1997)
CE (Shindyalov and Bourne, Protein Engineering, 1998)
SSM (Krissinel and Henrick, Acta Cryst., 2004)
Ye et al. JBCB, 2004
Multiple alignment algorithms
Gerstein and Levitt, ISMB, 1996: Iterative dynamic programming
SSAP (Orengo and Taylor, Methods Enymol., 1996): Two-level DP
Leibowitz et al., ISMB, 1999): Geometric hashing
CE-MC (Guda et al., PSB, 2001)
MAMMOTH (Lupyan et al., Bioinformatics, 2005)
MAPSCI (Ye at al., WABI, 2006)
Structural Bioinformatics
Homology detection
Hidden Markov models (Jaakola et al., JCB, 2000)
Spectrum, Mismatch kernel (Leslie et al., Bioinformatics, 2002)
Structure kernel (Qiu et al., Bioinformatics, 2007)
Protein structure prediction
Jones and Hadley, Bioinformatics: Sequence, structure and databanks. 2000.
FUGUE (Shi et al., J. Mol. Biol., 2001)
SCOP (Andreeva, Nucleic Acids Res., 2004)
Protein docking
Shoichet et al., J. Comput. Chem., 1992.
Choi et al., WABI, 2004.
Wang et al., PSB, 2005.
Sousa et al., Proteins, 2006.
Pairwise Structure Alignment
Given two proteins represented by the Cα atoms (backbone)
find 3D transformation that superimposes a large number of the Cα atoms
ensure that overall distance between matched pairs is as small as possible
Trade-off between number of matches and total distance between
Pairwise Structure Alignment
Ye et al. JBCB 2004
Uses orientation independent representation of proteins based on
the fact that Cα atoms are ~4 Ǻ apart
Pairwise Structure Alignment
Ye et al. JBCB 2004
The protein is represented as a sequence of angle triplets
{(α1, β1, γ1), (α2, β2, γ2), …, (αn, βn, γn) }
Pairwise Structure Alignment
Ye et al. JBCB 2004
Compute a local alignment based on angle representation
Find maximal subset of runs with similar transformation matrices
Pairwise Structure Alignment
Ye et al. JBCB 2004
The main algorithm
Compute the angle based representation
Align the angle based representation
Identify runs with similar transformation matrices
Compute initial structural alignment
Refine the alignment iteratively
Running time is ~(m+n)2 where m, n are the protein lengths
Multiple Structure Alignment
Given a set of proteins represented by the Cα atoms (backbone)
find a simultaneous alignment of all structures
find a consensus structure that represents all of them
Multiple Structure Alignment
The main algorithm
find initial consensus structure (one of the given proteins)
pairwise align the consensus and each of the proteins
merge the pairwise alignments from previous step
recompute the consensus protein; repeat from step 2
Merging the pairwise alignments similar to sequence case
P1 = BBCA, P2 = CBBA, P3 = BCCA
P1: -BBCA, P1:= BBCA
P2: CBB-A, P3:= BCCA
P: -BBCA
P: CBB-A
P: -BCCA
Multiple Structure Alignment
Computation of consensus structure (after merging alignments)
Multiple Structure Alignment
Algorithm flowchart