Download Protein structure

Document related concepts

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

Genetic code wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Point mutation wikipedia , lookup

Interactome wikipedia , lookup

Biochemistry wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Anthrax toxin wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Protein structure
Wednesday, October 4, 2006
Introduction to Bioinformatics
Johns Hopkins School of Public Health
260.602.01
J. Pevsner
[email protected]
Copyright notice
Many of the images in this powerpoint presentation
are from Bioinformatics and Functional Genomics
by Jonathan Pevsner (ISBN 0-471-21004-8).
Copyright © 2003 by John Wiley & Sons, Inc.
These images and materials may not be used
without permission from the publisher. We welcome
instructors to use these powerpoints for educational
purposes, but please acknowledge the source.
The book has a homepage at http://www.bioinfbook.org
including hyperlinks to the book chapters.
Announcements
On Monday, Ingo Ruczinski will discuss
protein structure including modeling techniques
and hidden Markov models for structure prediction.
Keep working on the find-a-gene project.
If you’ve got a novel protein, you can try
to solve its structure (today’s topic).
You can next put it in a multiple sequence
alignment (the topic for Wednesday October 11)
Classical structural biology
Determine biochemical activity
Purify protein
Determine structure
Understand mechanism, function
Fig. 9.1
Page 274
Structural genomics
Determine genomic DNA sequence
Predict protein
Determine structure or analyze in silico
Understand mechanism, function
Fig. 9.1
Page 274
Structural genomics
A goal of structural genomics is to determine
protein structures that span the full extent
of sequence space.
Page 273
Protein Structure Initiative
http://www.nigms.nih.gov/Initiatives/PSI/
Protein function and structure
Function is often assigned based on homology. However,
homology based on sequence identity may be subtle.
Consider RBP and OBP: these are true homologs
(they are both lipocalins, sharing the GXW motif).
But they are distant relatives, and do not share significant
amino acid identity in a pairwise alignment.
Protein structure evolves more slowly than
primary amino acid sequence. RBP and OBP share
highly similar three dimensional structures.
Page 274
Questions addressed by structural genomics
Consider the lipocalin family of carrier proteins.
• What ligand does each protein transport?
• Can we predict the structural and functional
consequences of a particular mutation?
• Lipocalins can be classified by molecular phylogeny.
Do phylogenetic groupings reflect structural differences?
• Can we use the known structure of lipocalins (such as
RBP, β-lactoglobulin, OBP) to predict the structures
of other lipocalins?
Page 276
Principles of protein structure
Primary amino acid sequence
Secondary structure: α helices, β sheets
Tertiary structure: from X-ray, NMR
Quaternary structure: multiple subunits
Page 276
Protein secondary structure
Protein secondary structure is determined by the
amino acid side chains.
Myoglobin is an example of a protein having many
α-helices. These are formed by amino acid stretches
4-40 residues in length.
Thioredoxin from E. coli is an example of a protein
with many β sheets, formed from β strands composed
of 5-10 residues. They are arranged in parallel or
antiparallel orientations.
Page 279
Myoglobin
(John Kendrew, 1958)
Fig. 9.2
Page 275
Thioredoxin
Fig. 9.2
Page 275
Secondary structure prediction
Chou and Fasman (1974) developed an algorithm
based on the frequencies of amino acids found in
α helices, β-sheets, and turns.
Proline: occurs at turns, but not in α helices.
GOR (Garnier, Osguthorpe, Robson): related algorithm
Modern algorithms: use multiple sequence alignments
and achieve higher success rate (about 70-75%)
Page 279-280
Secondary structure prediction
Web servers:
GOR4
Jpred
NNPREDICT
PHD
Predator
PredictProtein
PSIPRED
SAM-T99sec
Table 9-1
Page 276
Fig. 9.3
Page 277
Page 277
Page 277
Page 277
Page 277
Fig. 9.3
Page 277
Fig. 9.3
Page 277
Tertiary protein structure: protein folding
Three main approaches:
[1] experimental determination
(X-ray crystallography, NMR)
[2] Comparative modeling (based on homology)
[3] Ab initio (de novo) prediction
(Ingo Ruczinski)
Page 282
Experimental approaches to protein structure
[1] X-ray crystallography
-- Used to determine 80% of structures
-- Requires high protein concentration
-- Requires crystals
-- Able to trace amino acid side chains
-- Earliest structure solved was myoglobin
[2] NMR
-- Magnetic field applied to proteins in solution
-- Largest structures: 350 amino acids (40 kD)
-- Does not require crystallization
Page 283
Steps in obtaining a protein structure
Target selection
Obtain, characterize protein
Determine, refine, model the structure
Deposit in repository
Fig 9.4
page 279;
page 285
Priorities for target selection for protein structures
Historically, small, soluble, abundant proteins were
studied (e.g. hemoglobin, cytochrome c, insulin).
Modern criteria:
• Represent all branches of life
• Represent previously uncharacterized families
• Identify medically relevant targets
• Some are attempting to solve all structures
within an individual organism (Methanococcus
jannaschii, Mycobacterium tuberculosis)
Page 285-286
The Protein Data Bank (PDB)
• PDB is the principal repository for protein structures
• Established in 1971
• Accessed at http://www.pdb.org
• Currently contains over 38,000 structure entities
Updated 9/06
Page 287
Fig. 9.5
Page 280
PDB content growth (www.pdb.org)
40,000
structures
30,000
20,000
10,000
2006
2000
1990
year
updated 8-22-06
1980
Fig. 9.6
Page 281
Number of unique folds (defined by SCOP) in PDB
structures
1,000
500
2006
2000
1990
year
updated 8-22-06
1980
PDB holdings
35,093
1,532
1,656
~15
38,320
Updated 8-22-06
proteins, peptides
protein/nucl. complexes
nucleic acids
other/carbohydrates
total
Table 9-2
Page 281
Figure 9.7
Page 282
Figure 9.8
Page 283
Visualizing structures in PDB with WebMol
For any entry in PDB, click WebMol (under Display Molecule)
to access a very useful visualization tool.
A peptide bond connects two amino acids
There are three main peptide torsion angles: phi Φ, psi Ψ,
omega Ω. In a peptide bond, phi and psi are free to rotate.
Ramachandran plotted the phi versus psi angles
to describe the allowable areas for amino acids
http://swissmodel.expasy.org/course/text/chapter1.htm
1. Go to www.pdb.org
2. Enter 4MBN (a myoglobin)
3. In WebMol, click Rama
A Ramachandran plot shows favored
conformations of amino acids
Many alpha helices are evident. The plot excludes
proline [no phi angle]
gateways to access PDB files
Swiss-Prot, NCBI, EMBL
Protein Data Bank
CATH, Dali, SCOP, FSSP
databases that interpret PDB files
Fig. 9.10
Page 285
Access to PDB through NCBI
You can access PDB data at the NCBI several ways.
• Go to the Structure site, from the NCBI homepage
• Use Entrez
• Perform a BLAST search, restricting the output
to the PDB database
Page 289
Fig. 9.11
Page 286
Fig. 9.12
Page 287
Fig. 9.13
Page 288
Fig. 9.14
Page 289
Access to PDB through NCBI
Molecular Modeling DataBase (MMDB)
Cn3D (“see in 3D” or three dimensions):
structure visualization software
Vector Alignment Search Tool (VAST):
view multiple structures
Page 291
Fig. 9.15
Page 290
Fig. 9.15
Page 290
Fig. 9.16
Page 291
Fig. 9.17
Page 292
Access to structure data at NCBI: VAST
Vector Alignment Search Tool (VAST) offers a variety
of data on protein structures, including
-- PDB identifiers
-- root-mean-square deviation (RMSD) values
to describe structural similarities
-- NRES: the number of equivalent pairs of
alpha carbon atoms superimposed
-- percent identity
Page 294
Fig. 9.18
Page 293
Additional web-based sites to visualize structures
Swiss-PDB Viewer
Chime
RasMol
MICE
VRML
Page 292
Swiss-Pdb Viewer
Fig. 9.19
Page 294
β
α
Chime
Fig. 9.20
Page 295
Many databases explore protein structures
SCOP
CATH
Dali Domain Dictionary
FSSP
Page 293
Structural Classification of Proteins (SCOP)
SCOP describes protein structures using a
hierarchical classification scheme:
Classes
Folds
Superfamilies (likely evolutionary relationship)
Families
Domains
Individual PDB entries
http://scop.mrc-lmb.cam.ac.uk/scop/
Page 293
Fig. 9.22
Page 297
SCOP statistics (September, 2006)
Class
All α
All β
α/β
α+β
…
Total
# folds
218
144
136
279
945
# superfamilies
376
290
222
409
1539
# families
608
560
629
717
2845
Table 9-4
Page 298
Class, Architecture, Topology, and
Homologous Superfamily (CATH) database
CATH clusters proteins at four levels:
C Class (α, β, α&β folds)
A Architecture (shape of domain, e.g. jelly roll)
T Topology (fold families; not necessarily homologous)
H Homologous superfamily
http://www.biochem.ucl.ac.uk/basm/cath_new
Page 293
The CATH hierarchy
Fig. 9.23
Page 298
Fig. 9.24
Page 299
Fig. 9.24
Page 299
Fig. 9.25
Page 300
Fig. 9.25
Page 300
Fig. 9.26
Page 301
Fig. 9.27
Page 302
Fig. 9.28
Page 303
Dali (Distance mAtrix aLIgnment)
DALI offers pairwise alignments of protein
structures. The algorithm uses the threedimensional coordinates of each protein to calculate
distance matrices comparing residues.
See Holm L and Sander C (1993) J. Mol. Biol.
233:123-138.
Dali Domain Dictionary
Dali contains a numerical taxonomy of all known
structures in PDB. Dali integrates additional data
for entries within a domain class, such as secondary
structure predictions and solvent accessibility.
Page 302
Fig. 9.29
Page 303
Fig. 9.30
Page 304
Fig. 9.30
Page 304
Fig. 9.30
Page 304
Fold classification based on structure-structure
alignment of proteins (FSSP)
FSSP is based on a comprehensive comparison of
PDB proteins (greater than 30 amino acids in length)
using DALI. Representative sets exclude sequence
homologs sharing > 25% amino acid identity.
The output includes a “fold tree.”
http://www.ebi.ac.uk/dali/fssp
Page 293
Fig. 9.31
Page 305
FSSP: fold tree
Fig. 9.32
Page 306
Fig. 9.33
Page 307
Fig. 9.34
Page 307
Approaches to predicting protein structures
There are ~38,000 structures in PDB, and ~3.1 million
protein sequences in UniProtKB (release 8.0, 5/06). For
most proteins, structural models derive from computational
biology approaches, rather than experimental methods.
The most reliable method of modeling and evaluating
new structures is by comparison to previously
known structures. This is comparative modeling.
An alternative is ab initio modeling.
Page 303-305
Approaches to predicting protein structures
obtain sequence (target)
fold assignment
comparative
modeling
ab initio
modeling
build, assess model
Fig. 9.35
Page 308
Comparative modeling of protein structures
[1] Perform fold assignment (e.g. BLAST, CATH, SCOP);
identify structurally conserved regions
[2] Align the target (unknown protein) with the template.
This is performed for >30% amino acid identity
over a sufficient length
[3] Build a model
[4] Evaluate the model
Page 305
Errors in comparative modeling
Errors may occur for many reasons
[1] Errors in side-chain packing
[2] Distortions within correctly aligned regions
[3] Errors in regions of target that do not match template
[4] Errors in sequence alignment
[5] Use of incorrect templates
Page 306
Comparative modeling
In general, accuracy of structure prediction depends
on the percent amino acid identity shared between
target and template.
For >50% identity, RMSD is often only 1 Å.
Page 306
Baker and Sali (2000)
Fig. 9.36
Page 308
Comparative modeling
Many web servers offer comparative modeling services.
Examples are
SWISS-MODEL (ExPASy)
Predict Protein server (Columbia)
WHAT IF (CMBI, Netherlands)
Page 309
Ab initio protein structure prediction
Ab initio prediction can be performed when a protein
has no detectable homologs.
Protein folding is modeled based on global free-energy
minimum estimates.
The “Rosetta Stone” methods was applied to sequence
families lacking known structures. For 80 of 131
proteins, one of the top five ranked models successfully
predicted the structure within 6.0 Å RMSD (Bonneau
et al., 2002).
Page 309-310
Protein structure and human disease
In some cases, a single amino acid substitution
can induce a dramatic change in protein structure.
For example, the ΔF508 mutation of CFTR alters
the α helical content of the protein, and disrupts
intracellular trafficking.
Other changes are subtle. The E6V mutation in the
gene encoding hemoglobin beta causes sicklecell anemia. The substitution introduces a
hydrophobic patch on the protein surface,
leading to clumping of hemoglobin molecules.
Page 311
Protein structure and human disease
Disease
Cystic fibrosis
Sickle-cell anemia
“mad cow” disease
Alzheimer disease
Protein
CFTR
hemoglobin beta
prion protein
amyloid precursor protein
Table 9.5
Page 312