* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein structure
Survey
Document related concepts
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Genetic code wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Point mutation wikipedia , lookup
Interactome wikipedia , lookup
Biochemistry wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Transcript
Protein structure Wednesday, October 4, 2006 Introduction to Bioinformatics Johns Hopkins School of Public Health 260.602.01 J. Pevsner [email protected] Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by John Wiley & Sons, Inc. These images and materials may not be used without permission from the publisher. We welcome instructors to use these powerpoints for educational purposes, but please acknowledge the source. The book has a homepage at http://www.bioinfbook.org including hyperlinks to the book chapters. Announcements On Monday, Ingo Ruczinski will discuss protein structure including modeling techniques and hidden Markov models for structure prediction. Keep working on the find-a-gene project. If you’ve got a novel protein, you can try to solve its structure (today’s topic). You can next put it in a multiple sequence alignment (the topic for Wednesday October 11) Classical structural biology Determine biochemical activity Purify protein Determine structure Understand mechanism, function Fig. 9.1 Page 274 Structural genomics Determine genomic DNA sequence Predict protein Determine structure or analyze in silico Understand mechanism, function Fig. 9.1 Page 274 Structural genomics A goal of structural genomics is to determine protein structures that span the full extent of sequence space. Page 273 Protein Structure Initiative http://www.nigms.nih.gov/Initiatives/PSI/ Protein function and structure Function is often assigned based on homology. However, homology based on sequence identity may be subtle. Consider RBP and OBP: these are true homologs (they are both lipocalins, sharing the GXW motif). But they are distant relatives, and do not share significant amino acid identity in a pairwise alignment. Protein structure evolves more slowly than primary amino acid sequence. RBP and OBP share highly similar three dimensional structures. Page 274 Questions addressed by structural genomics Consider the lipocalin family of carrier proteins. • What ligand does each protein transport? • Can we predict the structural and functional consequences of a particular mutation? • Lipocalins can be classified by molecular phylogeny. Do phylogenetic groupings reflect structural differences? • Can we use the known structure of lipocalins (such as RBP, β-lactoglobulin, OBP) to predict the structures of other lipocalins? Page 276 Principles of protein structure Primary amino acid sequence Secondary structure: α helices, β sheets Tertiary structure: from X-ray, NMR Quaternary structure: multiple subunits Page 276 Protein secondary structure Protein secondary structure is determined by the amino acid side chains. Myoglobin is an example of a protein having many α-helices. These are formed by amino acid stretches 4-40 residues in length. Thioredoxin from E. coli is an example of a protein with many β sheets, formed from β strands composed of 5-10 residues. They are arranged in parallel or antiparallel orientations. Page 279 Myoglobin (John Kendrew, 1958) Fig. 9.2 Page 275 Thioredoxin Fig. 9.2 Page 275 Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies of amino acids found in α helices, β-sheets, and turns. Proline: occurs at turns, but not in α helices. GOR (Garnier, Osguthorpe, Robson): related algorithm Modern algorithms: use multiple sequence alignments and achieve higher success rate (about 70-75%) Page 279-280 Secondary structure prediction Web servers: GOR4 Jpred NNPREDICT PHD Predator PredictProtein PSIPRED SAM-T99sec Table 9-1 Page 276 Fig. 9.3 Page 277 Page 277 Page 277 Page 277 Page 277 Fig. 9.3 Page 277 Fig. 9.3 Page 277 Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2] Comparative modeling (based on homology) [3] Ab initio (de novo) prediction (Ingo Ruczinski) Page 282 Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of structures -- Requires high protein concentration -- Requires crystals -- Able to trace amino acid side chains -- Earliest structure solved was myoglobin [2] NMR -- Magnetic field applied to proteins in solution -- Largest structures: 350 amino acids (40 kD) -- Does not require crystallization Page 283 Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model the structure Deposit in repository Fig 9.4 page 279; page 285 Priorities for target selection for protein structures Historically, small, soluble, abundant proteins were studied (e.g. hemoglobin, cytochrome c, insulin). Modern criteria: • Represent all branches of life • Represent previously uncharacterized families • Identify medically relevant targets • Some are attempting to solve all structures within an individual organism (Methanococcus jannaschii, Mycobacterium tuberculosis) Page 285-286 The Protein Data Bank (PDB) • PDB is the principal repository for protein structures • Established in 1971 • Accessed at http://www.pdb.org • Currently contains over 38,000 structure entities Updated 9/06 Page 287 Fig. 9.5 Page 280 PDB content growth (www.pdb.org) 40,000 structures 30,000 20,000 10,000 2006 2000 1990 year updated 8-22-06 1980 Fig. 9.6 Page 281 Number of unique folds (defined by SCOP) in PDB structures 1,000 500 2006 2000 1990 year updated 8-22-06 1980 PDB holdings 35,093 1,532 1,656 ~15 38,320 Updated 8-22-06 proteins, peptides protein/nucl. complexes nucleic acids other/carbohydrates total Table 9-2 Page 281 Figure 9.7 Page 282 Figure 9.8 Page 283 Visualizing structures in PDB with WebMol For any entry in PDB, click WebMol (under Display Molecule) to access a very useful visualization tool. A peptide bond connects two amino acids There are three main peptide torsion angles: phi Φ, psi Ψ, omega Ω. In a peptide bond, phi and psi are free to rotate. Ramachandran plotted the phi versus psi angles to describe the allowable areas for amino acids http://swissmodel.expasy.org/course/text/chapter1.htm 1. Go to www.pdb.org 2. Enter 4MBN (a myoglobin) 3. In WebMol, click Rama A Ramachandran plot shows favored conformations of amino acids Many alpha helices are evident. The plot excludes proline [no phi angle] gateways to access PDB files Swiss-Prot, NCBI, EMBL Protein Data Bank CATH, Dali, SCOP, FSSP databases that interpret PDB files Fig. 9.10 Page 285 Access to PDB through NCBI You can access PDB data at the NCBI several ways. • Go to the Structure site, from the NCBI homepage • Use Entrez • Perform a BLAST search, restricting the output to the PDB database Page 289 Fig. 9.11 Page 286 Fig. 9.12 Page 287 Fig. 9.13 Page 288 Fig. 9.14 Page 289 Access to PDB through NCBI Molecular Modeling DataBase (MMDB) Cn3D (“see in 3D” or three dimensions): structure visualization software Vector Alignment Search Tool (VAST): view multiple structures Page 291 Fig. 9.15 Page 290 Fig. 9.15 Page 290 Fig. 9.16 Page 291 Fig. 9.17 Page 292 Access to structure data at NCBI: VAST Vector Alignment Search Tool (VAST) offers a variety of data on protein structures, including -- PDB identifiers -- root-mean-square deviation (RMSD) values to describe structural similarities -- NRES: the number of equivalent pairs of alpha carbon atoms superimposed -- percent identity Page 294 Fig. 9.18 Page 293 Additional web-based sites to visualize structures Swiss-PDB Viewer Chime RasMol MICE VRML Page 292 Swiss-Pdb Viewer Fig. 9.19 Page 294 β α Chime Fig. 9.20 Page 295 Many databases explore protein structures SCOP CATH Dali Domain Dictionary FSSP Page 293 Structural Classification of Proteins (SCOP) SCOP describes protein structures using a hierarchical classification scheme: Classes Folds Superfamilies (likely evolutionary relationship) Families Domains Individual PDB entries http://scop.mrc-lmb.cam.ac.uk/scop/ Page 293 Fig. 9.22 Page 297 SCOP statistics (September, 2006) Class All α All β α/β α+β … Total # folds 218 144 136 279 945 # superfamilies 376 290 222 409 1539 # families 608 560 629 717 2845 Table 9-4 Page 298 Class, Architecture, Topology, and Homologous Superfamily (CATH) database CATH clusters proteins at four levels: C Class (α, β, α&β folds) A Architecture (shape of domain, e.g. jelly roll) T Topology (fold families; not necessarily homologous) H Homologous superfamily http://www.biochem.ucl.ac.uk/basm/cath_new Page 293 The CATH hierarchy Fig. 9.23 Page 298 Fig. 9.24 Page 299 Fig. 9.24 Page 299 Fig. 9.25 Page 300 Fig. 9.25 Page 300 Fig. 9.26 Page 301 Fig. 9.27 Page 302 Fig. 9.28 Page 303 Dali (Distance mAtrix aLIgnment) DALI offers pairwise alignments of protein structures. The algorithm uses the threedimensional coordinates of each protein to calculate distance matrices comparing residues. See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138. Dali Domain Dictionary Dali contains a numerical taxonomy of all known structures in PDB. Dali integrates additional data for entries within a domain class, such as secondary structure predictions and solvent accessibility. Page 302 Fig. 9.29 Page 303 Fig. 9.30 Page 304 Fig. 9.30 Page 304 Fig. 9.30 Page 304 Fold classification based on structure-structure alignment of proteins (FSSP) FSSP is based on a comprehensive comparison of PDB proteins (greater than 30 amino acids in length) using DALI. Representative sets exclude sequence homologs sharing > 25% amino acid identity. The output includes a “fold tree.” http://www.ebi.ac.uk/dali/fssp Page 293 Fig. 9.31 Page 305 FSSP: fold tree Fig. 9.32 Page 306 Fig. 9.33 Page 307 Fig. 9.34 Page 307 Approaches to predicting protein structures There are ~38,000 structures in PDB, and ~3.1 million protein sequences in UniProtKB (release 8.0, 5/06). For most proteins, structural models derive from computational biology approaches, rather than experimental methods. The most reliable method of modeling and evaluating new structures is by comparison to previously known structures. This is comparative modeling. An alternative is ab initio modeling. Page 303-305 Approaches to predicting protein structures obtain sequence (target) fold assignment comparative modeling ab initio modeling build, assess model Fig. 9.35 Page 308 Comparative modeling of protein structures [1] Perform fold assignment (e.g. BLAST, CATH, SCOP); identify structurally conserved regions [2] Align the target (unknown protein) with the template. This is performed for >30% amino acid identity over a sufficient length [3] Build a model [4] Evaluate the model Page 305 Errors in comparative modeling Errors may occur for many reasons [1] Errors in side-chain packing [2] Distortions within correctly aligned regions [3] Errors in regions of target that do not match template [4] Errors in sequence alignment [5] Use of incorrect templates Page 306 Comparative modeling In general, accuracy of structure prediction depends on the percent amino acid identity shared between target and template. For >50% identity, RMSD is often only 1 Å. Page 306 Baker and Sali (2000) Fig. 9.36 Page 308 Comparative modeling Many web servers offer comparative modeling services. Examples are SWISS-MODEL (ExPASy) Predict Protein server (Columbia) WHAT IF (CMBI, Netherlands) Page 309 Ab initio protein structure prediction Ab initio prediction can be performed when a protein has no detectable homologs. Protein folding is modeled based on global free-energy minimum estimates. The “Rosetta Stone” methods was applied to sequence families lacking known structures. For 80 of 131 proteins, one of the top five ranked models successfully predicted the structure within 6.0 Å RMSD (Bonneau et al., 2002). Page 309-310 Protein structure and human disease In some cases, a single amino acid substitution can induce a dramatic change in protein structure. For example, the ΔF508 mutation of CFTR alters the α helical content of the protein, and disrupts intracellular trafficking. Other changes are subtle. The E6V mutation in the gene encoding hemoglobin beta causes sicklecell anemia. The substitution introduces a hydrophobic patch on the protein surface, leading to clumping of hemoglobin molecules. Page 311 Protein structure and human disease Disease Cystic fibrosis Sickle-cell anemia “mad cow” disease Alzheimer disease Protein CFTR hemoglobin beta prion protein amyloid precursor protein Table 9.5 Page 312