* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Powerpoint slides
Multi-state modeling of biomolecules wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Biochemistry wikipedia , lookup
Immunoprecipitation wikipedia , lookup
Molecular evolution wikipedia , lookup
Index of biochemistry articles wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
List of types of proteins wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein moonlighting wikipedia , lookup
Interactome wikipedia , lookup
Protein design wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Protein folding wikipedia , lookup
Protein domain wikipedia , lookup
Protein adsorption wikipedia , lookup
Proteolysis wikipedia , lookup
Structural alignment wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
CS 177 Proteins part 1: Structure-function relationships Review of protein structures Need for analyses of protein structures Sources of protein structure information Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Computational Modeling Need for analyses of protein structures A protein performs metabolic, structural, or regulatory functions in a cell. Cellular biochemistry works based on interactions between 3-D molecular structures The 3-D structure of a protein determines its function Therefore, the relationship of sequence to function is primarily concerned with understanding the 3-D folding of proteins and inferring protein functions from these 3-D structures (e.g. binding sites, catalytic activities, interactions with other molecules) Review of protein structures The study of protein structure is not only of fundamental scientific interest in terms of understanding biochemical processes, but also produces very valuable practical benefits Medicine Need for analyses of protein structures Sources of protein structure information Computational Modeling The understanding of enzyme function allows the design of new and improved drugs Agriculture Therapeutic proteins and drugs for veterinary purposes and for treatment of plant diseases Industry Protein engineering has potential for the synthesis of enzymes to carry out various industrial processes on a mass scale Need for analyses of protein structures Protein 3-D structure has direct medical implications: a incorrectly folded protein will not function properly Examples: - Adult-onset diabetes Protein misfolding may be responsible for blood-vessel damage, blindness and other debilitating effects of the disease - Cystic Fibrosis Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Most common mutation underlying cystic fibrosis hinders the dissociation of the transport-regulator protein from one of its chaperones. Thus, the final steps in normal folding cannot occur, and normal amounts of active protein are not produced. Need for analyses of protein structures Examples for diseases associates with protein misfolding (cont.): - Alzheimer's disease New studies indicate that Alzheimer's disease may be caused by small clumps of wrongly folded proteins. Scientists have found that misfolded amyloid beta protein molecules hinder memory processes in rat brains by blocking synapses References 1. Walsh, D. M. et al. Naturally secreted oligomers of amyloid (protein potently inhibit hippocampal long-term potentiation in vivo. Nature, 416, 535 - 539, (2002). Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling 2. Bucciantini, M. et al. Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature, 416, 507 - 511, (2002). CT scan of the brain of an Alzheimer's patient showing widespread destruction (pink) of brain tissue (green) Need for analyses of protein structures Examples for diseases associates with protein misfolding (cont.): - Transmissible Spongiform Encephalopathies (TSEs) (such as mad cow disease or the human version, Creutzfeldt-Jakob disease) Infectious agent is probably a small misfolded protein called prion. Prions naturally occur in the brain with unknown function. Infectious prions can cause correctly folded proteins to misfold. Domino-effect: large numbers of misfolded prions cause neural degeneration - Other non-infectious brain diseases such as Parkinson’s, Huntington’s, and Lou Gehrig’s. Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Sources of protein structure information 3-D macromolecular structures stored in databases The most important database: the Protein Data Bank (PDB) The PDB is maintained by the Research Collaboratory for Structural Bioinformatics (RCSB) and can be accessed at three different sites (plus a number of mirror sites outside the USA): - http://rcsb.rutgers.edu/pdb (Rutgers University) - http://www.rcsb.org/pdb/ (San Diego Supercomputer Center) - http://tcsb.nist.gov/pdb/ (National Institute for Standards and Technology) It is the very first “bioinformatics” database ever build Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling The Protein Data Bank (PDB) PDB: 20,254 structures (4 March 2003) Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling SwissProt: 122,564 entries (5 March 2003) Ratio: 1:6 (structure of more than 83% of proteins still unknown) Sources of protein structure information Experimental structure determination In practice, most biomolecular structures (>99% of structures in PDB) are determined using three techniques: - X-ray crystallography (low to very high resolution) Problem: requires crystals; difficult to crystallize proteins by maintaining their native conformation; not all protein can be crystallized; - Nuclear magnetic resonance (NMR) spectroscopy of proteins in solution (medium to high resolution) Problem: Works only with small and medium size proteins (~50% of proteins cannot be studied with this method); requires high solubility Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling - Electron microscopy and crystallography (low to medium resolution) Problem: (still) relatively low resolution Experimental methods are still very time consuming and expensive; in most cases the experimental data will contain errors and/or are incomplete. Thus the initial model needs to be refined and rebuild Sources of protein structure information Computational Modeling Researches have been working for decades to develop procedures for predicting protein structure that are not so time consuming and not hindered by size and solubility constrains. As protein sequences are encoded in DNA, in principle, it should therefore be possible to translate a gene sequence into an amino acid sequence, and to predict the three-dimensional structure of the resulting chain from this amino acid sequence Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Some common terminology used in homology modeling Motif (sequence context): conserved pattern of amino acids that is found in two or more proteins Motif (structural context): combination of several secondary structure elements (also referred to as super-secondary structures and folds) Fold: (also referred to folding motif) larger combination of secondary structure units in the same configuration. Thus, proteins sharing the same fold have the same combination of secondary structures that are connected by similar loops Domain (sequence context): (also referred to as homologous domain) extended sequence patterns, generally found by sequence alignment methods, that indicate a common evolutionary origin. It is generally longer than motifs (may include all of a given protein sequence) Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Domain (structural context): segment of the protein that can fold into a 3-D structure; they are considered elementary units of molecular function Family (sequence context): group of proteins of similar biochemical function that are more than 50% identical when aligned Family (structural context): structures that have a significant level of structural similarity but not necessarily significant sequence similarity Superfamily: group of protein families that are related by distant yet detectable sequence similarities Computational modeling Gene finding Identification of protein coding regions within DNA sequences (ORFs) This is one of the single biggest challenges facing the bioinformatics specialists working on Genome Projects Existing software is only about 90% accurate in predicting genes in large stretches of genomic DNA Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling The problem gets worse in eukaryotic genomes by the common occurrence of pseudogenes that are highly similar to real sequences, but are not transcribed Computational modeling How to find genes? Similarity search against the expressed sequence tag (EST) database (e.g. dbEST) Translation and similarity search against the protein databanks (e.g. SWISS-PROT and GenPept) - automatic translate and search functions implemented in BLASTX and TFASTA - if a protein (or EST sequence) matches, it can be aligned with the unknown genomic sequence; start and stop codons should line up nicely and the introns should be obvious - small error rate remains Review of protein structures If there are no handy template sequences in the databanks, one must rely on knowledge of DNA code - the transcription initiation site is generally a ATG codon; it is usually about 30bp downstream from a TAATAA sequence (or some close approximation) Need for analyses of protein structures - graphic map of all 6 reading frames can be produced to search for a long one Sources of protein structure information - problem: none of those programs is perfect; errors will occur Computational Modeling - several software packages are available that map ORF’s (e.g. FRAMES, GeneWorks, MacVector, DNA Strider, GRAIL, ORF finder, DNA translation, BCM GeneFinder) - confirming evidence can be collected by looking for regulatory sequences (promoters, enhancers, transcription factors; also known as signal sequences) that generally occur near ORF’s. Several databases for signal sequences are available (e.g. TransFac) and several software tool make use of these databases (e.g. Signal Scan, FindPatterns) Computational modeling How to predict the protein structure? Ab initio prediction of protein structure from sequence: not yet. Problem: the information contained in protein structures lies essentially in the conformational torsion angles. Even if we only assume that every amino-acid residue has three such torsion angles, and that each of these three can only assume one of three "ideal" values (e.g., 60, 180 and -60 degrees), this still leaves us with 27 possible conformations per residue. Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling For a typical 200-amino acid protein, this would give 27200 (roughly 1.87 x 10286) possible conformations! Q: Can’t we just generate all these conformations, calculate their energy and see which conformation has the lowest energy? If we were able to evaluate 109 conformations per second, this would still keep us busy 4 x 10259 times the current age of the universe There are optimized ab initio prediction algorithms available as well as fold recognition algorithms that use threading (compares protein folds with know fold structures from databases), but the results are still very poor Computational modeling Solution: homology modeling Homology (comparative) modeling attempts to predict structure on the strength of a protein’s sequence similarity to another protein of known structure Basic idea: a significant alignment of the query sequence with a target sequence from PDB is evidence that the query sequence has a similar 3-D structure (current threshold ~ 40% sequence identity). Then multiple sequence alignment and pattern analysis can be used to predict the structure of the protein Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Computational modeling Flow chart for protein structure prediction (from Mount, 2001) Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Protein sequence - partial or full sequences; predicted through gene finding Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Database similarity search - sequence is used as a query in a database similarity search against proteins in PDB Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Does the sequence align with a protein of known structure? - Yes: if the database similarity search reveals a significant alignment between the query sequence and a PDB target sequence, the alignment can be used to position the amino acids of the query sequence in the same approximate 3-D structure - No: proceed to protein family analysis Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Protein family analysis/relationship to known structure - Family (structural context): structures that have a significant level of structural similarity but not necessarily significant sequence similarity - the goal is to exploit these structure sequence relationships; two questions: 1) is the new protein a member of a family, 2) does the family have a predicted structural fold? - analyze sequence for family specific profiles and patterns. Available databases: 3D-Ali, 3D-PSSM, BLOCKS, eMOTIF, INTERPRO, Pfam …) - if the family analysis reveals that the query protein is a member of a family with a predicted structural fold, multiple alignment can be used for structural modeling Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Protein family analysis/relationship to known structure - if the family analysis is unsuccessful, proceed to structural analyses Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Structural analysis - several different types of analyses to infer structural information - presence of small amino acid motifs in a protein can be indicator of a biochemical function associated with a particular structure. Motifs are available from the Prosite catalog - spacing and arrangement of amino acids (e.g. hydrophobic amino acids) provide important structural clues that can be used for modeling - certain amino acid combinations can occur in certain types of secondary structure - These structural analyses can provide clues as to the presence of active sites and regions of secondary structure. These information can help to identify a new protein as a member of a known structural class Computational modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling 3-D structural analysis in lab - proteins that fail to show any relationship to proteins of known structure are candidates for structural analyses (X-ray crystallography, NMR). There are about 600 known fold families and new structures are frequently found to have already known structural fold. Accordingly, protein families with no relatives of known structure may represent a novel fold Computational modeling: summary Partial or full sequences predicted through gene finding Similarity search against proteins in PDB Find structures that have a significant level of structural similarity (but not necessarily significant sequence similarity) Alignment can be used to position the amino acids of the query sequence in the same approximate 3-D structure If member of a family with a predicted structural fold, multiple alignment can be used for structural modeling Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Structural analyses in the lab (X-ray crystallography, NMR) Infer structural information (e.g. presence of small amino acid motifs; spacing and arrangement of amino acids; certain typical amino acid combinations associated with certain types of secondary structure) can provide clues as to the presence of active sites and regions of secondary structure Computational modeling: summary Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling How to predict the protein structure? Ab initio prediction of protein structure from sequence Homology (comparative) modeling attempts to predict structure on the strength of a protein’s sequence similarity to another protein of known structure Experimental structure determination Computational modeling: summary Review of protein structures Need for analyses of protein structures Ab initio prediction Sources of protein structure information Computational Modeling Homology modeling Experimental structure determination Computational modeling Viewing protein structures A number of molecular viewers are freely available and run on most computer platforms and operating systems Examples: Cn3D 4.0 (stand-alone) Rasmol (stand-alone) Chime (Web browser based on Rasmol) Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling Swiss 3D viewer Spdbv (stand-alone) All these viewers can use the PDB identification code or the structural file from PDB