* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein_structure_I
Survey
Document related concepts
Genetic code wikipedia , lookup
Paracrine signalling wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Metalloprotein wikipedia , lookup
Biochemistry wikipedia , lookup
Interactome wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Transcript
PLPTH 890 Introduction to Genomic Bioinformatics Lecture 20 Protein Structure Analysis - I Liangjiang (LJ) Wang [email protected] April 8, 2005 Outline • Basic concepts. • How protein structures are determined? – X-ray crystallography. – NMR spectroscopy. • Protein structure databases (PDB, MMDB). • Protein structure visualization (RasMol, Cn3D, etc). • Protein structure classification (SCOP and CATH). Structural Bioinformatics • A subdiscipline of bioinformatics that focuses on the representation, storage, visualization, prediction and evaluation of structural information. • References: – Baxevanis and Ouellette. 2005. Bioinformatics - A practical guide to the analysis of genes and proteins. 3rd edition. Chapter 9 and part of chapter 8. – Pevsner. 2003. Bioinformatics and functional genomics. Chapter 9. – Bourne and Weissig. 2003. Structural bioinformatics. Protein Primary Structures • Amino acid sequence of a polypeptide chain. R • 20 amino acids, each with a different side chain (R). • Peptide units are building blocks of protein structures. • The angle of rotation around the N−Cα bond is called phi (), and the angle around the Cα−C′ bond from the same Cα atom is called psi (). R (Brandon and Tooze, 1998) Protein Secondary Structures • Local substructures as a result of hydrogen bond formation between neighboring amino acids (backbone interactions). • The amino acid side chains affect secondary structure formation. • Types of secondary structures: – helix, – sheet, – Loop or random coil. Helix • Most abundant secondary structure. • 3.6 amino acids per turn, and hydrogen bond formed between every fourth residue. • Often found on the surface of proteins. Sheet • Hydrogen bonds formed between adjacent polypeptide chains. • The chain directions can be same (parallel sheet), opposite (anti-parallel), or mixed. Loop or Coil • Regions between helices and sheets. • Various lengths and 3-D configurations. • Often functionally significant (e.g., part of an active site). The active site of open /-barrel structures is in a crevice outside the carboxy ends of the strands. (Brandon and Tooze, 1998) Protein Tertiary Structure • The 3-D structure of a protein is assembled from different secondary structure components. • Tertiary structure is determined primarily by hydrophobic interactions between side chains. • Different classes of protein structures: All All Mixed Hemoglobin (3HHB) T cell CD8 (1CD8) Thermolysin (7TLN) Protein Tertiary Structure (Cont’d) • Fold: a certain type of 3-D arrangement of secondary structures. • Protein structures evolves more slowly than primary amino acid sequences. Four-helix bundles E. coli cytochrome b562 (256B) Human growth hormone (1HUW) Three-helix bundle Drosophila engrailed homeodomain (1ENH) Protein Quaternary Structure • Two or more independent tertiary structures are assembled into a larger protein complex. • Important for understanding protein-protein interactions. Horse spleen ferritin (1IES) E. coli ribosome (1ML5) Biological Knowledge from Structures (Bourne, 2004) X-Ray Crystallography • Basic steps: Expression, purification Gene targets Crystallization X-ray diffraction Structure solution Proteins • Advantages: – High-resolution structures. – Large protein complexes or membrane proteins. • Disadvantages: – Molecules in a solid-state (crystal) environment. – Requirement for crystals. Nuclear Magnetic Resonance (NMR) • NMR reveals the neighborhood information of atoms in a molecule, and the information can be used to construct a 3-D model of the molecule. • Advantages: – No requirement for crystals. – Proteins in a liquid state (near physiological state). • Disadvantages: – Limited by molecule size (up to 30 kD). – Membrane proteins may not be studied. – Inherently less precise than X-ray crystallography. Protein Data Bank (PDB) • The primary repository for protein structures. • Established in 1971 (the first bioinformatics database, set up with 7 protein structures). • Contains 30,179 structures by March 22, 2005. • Supports services for structure submission, search, retrieval, and visualization. • Search options: – SearchLite: PDB ID and key word search. – SearchFields: advanced search. (PDB can be accessed at http://www.rcsb.org/pdb/) PDB Content Growth Last updated: 06-Mar-2005 structures 30,000 5,000 1972 year 2005 Access to Structures through NCBI • MMDB (Molecular Modeling Database): – Structures obtained from PDB. – Data in NCBI’s ASN.1 format. – Integrated into NCBI’s Entrez system. • Cn3D (“see in 3D”): NCBI’s 3-D protein structure viewer. • VAST (Vector Alignment Search Tool): for direct comparison of 3-D protein structures. (NCBI at http://www.ncbi.nlm.nih.gov/) Ramachandran Plot sheet PSI Used to assess the quality of structures. Good structures – tight clustering patterns. helix Thioredoxin (2TRX) PHI (Baxevanis and Ouellette, 2005) 3-D Visualization Tool - RasMol • An open source software package, and the most popular tool for viewing 3-D structures. • RasMol represented a major break-through in software-driven 3-D structure visualization. • Structure file formats supported by RasMol: – PDB file format: outdated but human-readable. – mmCIF: a new and robust data representation, but supported by few software tools. • RasTop: provides a user-friendly graphical interface to RasMol. RasTop is available at http://www.geneinfinity.org/rastop/. Cn3D: NCBI’s Structure Viewer • Cn3D (“see in 3D”): allows interactive exploration of 3-D structures, sequences and alignments. • Can be used to produce high-quality molecular images. • Limitation: only accepts structure files in NCBI’s ASN.1 format (from MMDB). • Cn3D is available at http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml. Other 3-D Visualization Tools • Chime: a Netscape plug-in for 3-D structure visualization; based on RasMol source code. • Protein Explorer (http://www.proteinexplorer.org/): – A Chime-based software package. – Particularly user friendly and feature-rich. • Swiss-Pdb Viewer (Deep View, available at http://us.expasy.org/spdbv/): – Probably the most powerful, freely available molecular modeling and visualization package. – Supports homology modeling, site-directed mutagenesis, structure superposition, etc. Protein Structure Comparison • Why is structure comparison important? – To understand structure-function relationship. – To study the evolution of many key proteins (structure is more conserved than sequence). • Comparing 3-D structures is much more difficult than sequence comparison. • Protein structure classification: – SCOP: Structure Classification Of Proteins. – CATH: Class, Architecture, Topology and Homology. • Protein structure alignment: DALI and VAST. SCOP • SCOP is based on expert definition of protein structural similarities, and is manually curated. • Classification hierarchy: Class → Fold → Superfamily → Family • SCOP has 7 major classes: all , all , /, +, multi-domain proteins ( and ), membrane and cell surface proteins, and small proteins. • Domain is the base unit of the SCOP hierarchy, and proteins with multiple domains may appear at different places in the hierarchy. • SCOP at http://scop.mrc-lmb.cam.ac.uk/scop/. An Example of the SCOP Hierarchy SCOP fold definition: • Same major secondary structures. • Same arrangement. • Same topology. (Bourne, 2004) CATH • Classification hierarchy: Class (C) → Architecture (A) → Topology (T) → Homologous superfamily (H) • Based on secondary structure content (for C), literature (for A), structure connectivity and general shape (for T, using the SSAP algorithm), and sequence similarity (for H). • Multi-domain proteins are partitioned into their constituent domains before classification. • CATH at http://www.biochem.ucl.ac.uk/bsm/cath/. An Example of the CATH Hierarchy CATH classes: • mainly . • mainly . • mixed and . • Few secondary structures. (Pevsner, 2003) Summary • Protein structures are important for addressing many biological questions. • Protein Data Bank (PDB) is the primary repository for protein structures. • Powerful software tools (e.g., RasMol) are available for viewing 3-D protein structures. • SCOP and CATH are two manually curated databases for structure classification. • Next: structure alignment and prediction.