Download Protein Structure Prediction PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
11/11/05
Protein Structure
Prediction & Modeling
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
1
Bioinformatics Seminars
Nov 11 Fri 12:10 BCB Seminar
in E164 Lago
Building Supertrees Using Distances
Steve Willson, Dept of Mathematics
http://www.bcb.iastate.edu/courses/BCB691-F2005.html
Next week - Baker Center/BCB Seminars:
(seminar abstracts available at above link)
Nov 14 Mon 1:10 PM
Doug Brutlag, Stanford
Discovering transcription factor binding sites
Nov 15 Tues 1:10 PM
Ilya Vakser, Univ Kansas
Modeling protein-protein interactions
both seminars will be in Howe Hall Auditorium
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
2
Protein Structure & Function:
Analysis & Prediction
Mon
Protein structure: basics;
classification,databases,
visualization
Protein structure databases - cont.
Wed
Thurs Lab
Protein structure databases
Visualization software
Secondary structure prediction
Fri
Protein structure prediction
Protein-nucleic acid interactions
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
3
Reading Assignment (for Mon-Fri)
Mount Bioinformatics
• Chp 10 Protein classification & structure prediction
http://www.bioinformaticsonline.org/ch/ch10/index.html
• pp. 409-491
• Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
4
BCB 544 Additional Reading
Required:
• Gene Prediction:
• Burge & Karlin 1997 JMB 268:78
Prediction of complete gene structures in human genomic DNA
Optional:
• Structure Prediction:
• Schueler-Furman…Baker 2005 Science 310:638
Progress in modeling of protein structures and interactions
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
5
Review last lecture:
Protein Structure:
Databases, Classification &
Visualization
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
6
Protein sequence databases
• UniProt (SwissProt, PIR, EBI)
http://www.pir.uniprot.org
• NCBI Protein
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
More on these later: protein function prediction
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
7
Protein sequence & structure: analysis
• Diamond STING Millennium analysis tools, including Protein Dossier
many useful structure
http://trantor.bioc.columbia.edu/SMS/
• SwissProt (UniProt)
protein knowledgebase
http://us.expasy.org/sprot
• InterPRO
sequence analysis tools
http://www.ebi.ac.uk/interpro
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
8
Protein structure databases
• PDB
• MMDB
Protein Data Bank http://www.rcsb.org/pdb/
(RCSB) - THE protein structure database
Molecular Modeling Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
(NCBI Entrez) - has "added" value
• MSD
11/11/05
Molecular Structure Database
http://www.ebi.ac.uk/msd
Especially good for interactions, binding sites
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
9
Protein structure classification
• SCOP = Structural Classification of Proteins
Levels reflect both evolutionary and structural relationships
http://scop.mrc-lmb.cam.ac.uk/scop
• CATH = Classification by Class, Architecture,
Topology & Homology
http://cathwww.biochem.ucl.ac.uk/latest/
• DALI/FSSP (recently moved to EBI & reorganized)
• fully automated structure alignments
• DALI server
http://www.ebi.ac.uk/dali/index.html
• DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
10
Protein structure visualization
• Molecular Visualization Freeware:
http://www.umass.edu/microbio/rasmol
• MolviZ.Org
http://www.umass.edu/microbio/chime
• Protein Explorer
http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm
• RASMOL (& many decendents: Protein Explorer,PyMol, MolMol, etc.)
http://www.umass.edu/microbio/rasmol/index2.htm
• CHIME
http://www.umass.edu/microbio/chime/getchime.htm
• Cn3D
http://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/
• Deep View = Swiss-PDB Viewer
http://www.expasy.org/spdbv
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
11
Protein structure visualization
Superb interactive structure visualization software
by Jane & Dave Richardson, Duke University
• KINIMAGE
http://kinemage.biochem.duke.edu/
• Fantastic research tools for structure analysis & refinement
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
12
RCSB PDB - Beta site
http://pdbbeta.rcsb.org/pdb/Welcome.do
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
13
MMDB
http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
14
Cn3D
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
15
Cn3D : Displaying 2' Structures
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
16
Cn3D: Structural Alignments
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
17
SCOP - Structure Classification
http://scop.mrc-lmb.cam.ac.uk/scop
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
18
6 main classes of protein structure
1)
a Domains
2)
 Domains
3)
aDomains
4)
aDomains
5)
Multidomain (a  
6)
Membrane & cell-surface proteins
• Bundles of helices connected by loops
• Mainly antiparallel sheets, usually with 2 sheets forming
sandwich
• Mainly parallel sheets with intervening helices, also
mixed sheets
• Mainly segregated helices and sheets
• Containing domains from more than one class
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
19
CATH - Structure Classification
http://cathwww.biochem.ucl.ac.uk/latest/
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
20
Structural Genomics
~ 30,000 "traditional" genes in human genome
(not counting: ???)
~ 3,000 proteins in a typical cell
> 2 million sequences in UniProt
> 33,000 protein structures in the PDB
 Experimental determination of protein structure
lags far behind sequence determination!
Goal: Determine structures of "all" protein folds in nature, using
combination of experimental structure determination methods
(X-ray crystallography, NMR, mass spectrometry) & structure
prediction
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
21
Structural Genomics Projects
TargetDB: database of structural genomics targets
http://targetdb.pdb.org
Protein Structure Prediction?
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
22
Protein Folding
"Major unsolved problem in molecular biology"
In cells:
spontaneous
assisted by enzymes
assisted by chaperones
In vitro:
many proteins fold spontaneously
& many do not!
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
23
Steps in Protein Folding
1- "Collapse"- driving force is burial of hydrophobic aa’s
(fast - msecs)
2- Molten globule - helices & sheets form, but "loose"
(slow - secs)
3- "Final" native folded state - compaction, some 2'
structures rearranged
Native state? - assumed to be lowest free energy
- may be an ensemble of structures
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
24
Protein Dynamics
• Protein in native state is NOT static
• Function of many proteins depends on conformational
changes, sometimes large, sometimes small
• Globular proteins are inherently "unstable"
(NOT evolved for maximum stability)
• Energy difference between native and denatured
state is very small (5-15 kcal/mol)
(this is equivalent to 1 or 2 H-bonds!)
• Folding involves changes in both entropy & enthalpy
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
25
Protein Structure Prediction
•
Structure is largely determined by sequence
BUT:
• Similar sequences can assume different structures
• Dissimilar sequences can assume similar structures
• Many proteins are multi-functional
• Protein folding:
• determination of folding pathways
• prediction of tertiary structure
 still largely unsolved problems
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
26
New today:
Protein Structure Prediction
Secondary structure
(text focuses on this - I won't)
Tertiary structure
(let's do this instead!)
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
27
Deciphering the Protein Folding Code
• Protein Structure Prediction
or "Protein Folding" Problem
given the amino acid sequence
of a protein, predict its
3-dimensional structure (fold)
• "Inverse Folding" Problem
given a protein fold, identify
every amino acid sequence
that can adopt its
3-dimensional structure
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
28
Protein Structure Determination?
High-resolution structure determination
• X-ray crystallography (<1A)
• Nuclear magnetic resonance (NMR) (~1-2.5A)
Lower-resolution structure determination
• Cryo-EM (electron-microscropy) ~10-15A
Theoretical Models?
• Highly variable - now, some equiv to X-ray!
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
29
Tertiary Structure Prediction
Fold or tertiary structure prediction
problem can be formulated as a search
for minimum energy conformation
• search space is defined by psi/phi angles of backbone and
side-chain rotamers
• search space is enormous even for small proteins!
• number of local minima increases exponentially
of the number of residues
Computationally it is an exceedingly difficult problem!
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
30
Ab Initio Prediction
1. Develop energy function
•
•
•
•
•
bond energy
bond angle energy
dihedral angle energy
van der Waals energy
electrostatic energy
2. Calculate structure by minimizing energy function
(usually Molecular Dynamics or Monte Carlo methods)
 Ab initio prediction - not practical in general
•
•
Computationally? very expensive
Accuracy? Usually poor for all but short peptides
(but see Baker review!)
Provides both folding pathway & folded structure
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
31
Comparative Modeling
Two primary methods
1) Homology modeling
2) Threading (fold recognition)
Note: both rely on availability of experimentally
determined structures that are "homologous" or
at least structurally very similar to target
Provide folded structure only
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
32
Homology Modeling
1. Identify homologous protein sequences
•
•
PSI-BLAST
multiple sequence alignment (MSA)
2. Among those with available structures, choose
closest sequence match for template
3. Build model by placing residues into corresponding
positions of homologous structure models & refine
by "tweaking"
 Homology modeling - works "well"
•
•
Computationally? not very expensive
Accuracy? higher sequence identity  better model
Requires >30% sequence identity
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
33
Threading - Fold Recognition
Identify “best” fit between target sequence & template structure
1.
2.
3.
4.
5.

Develop energy function
Develop template library
Align target sequence with each template & score
Determine best score (1D to 3D alignment)
Build refine structure as in homology modeling
Threading - works "sometimes"
•
•
•
Computationally? Can be expensive or cheap, depends on energy
function & whether "all atom" or "backbone only" threading
Accuracy? in theory, should not depend on sequence
identity (should depend on quality of template library & "luck")
But, usually higher sequence identity  better model
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
34
Threading - a "local" example
Target
Sequence
ALKKGF…HFDTSE
Structure
Templates
1. Align target sequence with template structures
(fold library) from the Protein Data Bank (PDB)
2. Calculate energy (score) to evaluate goodness of fit
between target sequence & template structure
3. Rank models based on energy scores
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
35
Threading Goals & Issues
Find “correct” sequence-structure alignment of a
target sequence with its native-like fold in PDB
Structure database - must be complete: no decent model if no good
template in library!
Sequence-structure alignment algorithm:
Bad alignment  Bad score!
Energy function (scoring scheme):
• must distinguish correct sequence-fold alignment from
incorrect sequence-fold alignments
• must distinguish “correct” fold from close decoys
Prediction reliability assessment - how determine whether
predicted structure is correct (or even close?)
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
36
Threading – Structure database
Build a template database
(e.g., ASTRAL domain library derived from PDB)
Supplement with additional decoys, e.g., generated using
ab initio approach such as Rosetta (Baker)
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
37
Threading - Energy function
Two main methods (and combinations of these)
• Structural profile (environmental)
physico-chemical properties of aa’s
• Contact potential (statistical)
based on contact statistics from PDB
(Miyazawa & Jernigan - Jernigan now at ISU)
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
38
Protein Threading – typical
energy function
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
What is "probability"
that two specific
residues are in
contact?
How well does a
specific residue fit
structural environment?
Alignment gap
penalty?
Total energy: E_p + E_s + E_g
Find a sequence-structure alignment that minimizing
the energy function
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
39
A Rapid Threading Approach for
Protein Structure Prediction
Kai-Ming Ho, Physics
Haibo Cao
Yungok Ihm
Zhong Gao
James Morris
Cai-zhuang Wang
Drena Dobbs, GDCB
Jae-Hyung Lee
Michael Terribilini
Jeff Sander
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
40
Performance Evaluation? "Blind Test"
CASP5 Competition
(Critical Assessment of Protein Structure Prediction)
Given: Amino acid sequence
Goal: Predict 3-D structure
(before experimental results published)
11/11/05
D Dobbs ISU - BCB 444/544X: Protein Structure Prediction
41
Typical Results:
well, actually, BEST Results:
HO = #1 ranked CASP prediction for this target
Target 174
PDB ID = 1MG7
Predicted Structure
T174_1
Actual Structure
T174_2
Overall Performance in CASP5 Contest
(M. Levitt, Stanford)
FR Fold Recognition
(targets manually assessed by Nick Grishin)
-----------------------------------------------------------
Rank
1
2
3
4
5
6
7
8
9
Z-Score Ngood
24.26
21.64
19.55
16.88
15.25
14.56
13.49
11.34
10.45
9.00
7.00
8.00
6.00
7.00
6.50
4.00
3.00
3.00
Npred
12.00
12.00
12.50
10.00
7.00
11.50
11.00
6.00
5.50
NgNW
9
7
9
6
7
7
4
3
3
NpNW
12
12
14
10
7
13
11
6
6
Group-name
Ginalski
Skolnick Kolinski
Baker
BIOINFO.PL
Shortle
BAKER-ROBETTA
Brooks
Ho-Kai-Ming
Jones-NewFold
----------------------------------------------------------FR NgNW - number of good predictions without weighting for multiple models
FR NpNW - number of total predictions without weighting for multiple models
Related documents