* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Domains
Rosetta@home wikipedia , lookup
Circular dichroism wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein design wikipedia , lookup
Structural alignment wikipedia , lookup
Protein folding wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Trimeric autotransporter adhesin wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Homology modeling wikipedia , lookup
Protein structure prediction wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
An Introduction to Bioinformatics Protein Modules AIMS To introduce the concept of multidomain proteins To define the terms associated with analysis of multidomain proteins To introduce the major secondary databases OBJECTIVES To select an appropriate secondary database for analysis of protein domains To carry out an analysis to establish to establish the domain structure of a protein To ascribe likely biological functions to protein domains When the amino acid sequences of two proteins are compared and found to exhibit significant similarity they are assumed to be evolutionarily related i.e. they are homologues two classes of homologue (orthologue and paralogue) orthologous genes are descended from a unique ancestral gene and their divergence with comparable genes in different organisms is simply parallel to speciation paralogous genes are descended from copies of a gene that duplicated within a single ancestral genome a substantial proportion of all proteins are composed of more than one domain A domain is defined as sequentially consecutive residues in a protein that can fold up independently of other parts of the protein Crystallographers commonly refer to domains as folds and the term module is also used The domain/module is the fundamental unit of protein structure inter-domain splicing, fusion, deletion, duplication and shuffling have occurred frequently during evolution, whereas intra-domain rearrangements have occurred rarely Influenza virus haemagglutinin When two homologous proteins are aligned, there are one or more regions where sequence identity is particularly high, and these regions frequently enable the definition of motifs or signature sequences that are diagnostic (Module 4) Any particular domain may have one or more characteristic motifs Domains/modules, motifs/signature sequences constitute the content of many secondary databases and are of enormous value in attempting to predict the function and structure of new proteins Low complexity regions The individual domains of multidomain proteins are frequently separated from each other by regions of low complexity, also referred to as linker sequences Long stretches of repeated residues, particularly proline, glutamine, serine or threonine often indicate linker sequences The program SEG detects such low complexity regions and can be used as part of BLAST to mask off segments of the query sequence that have low compositional complexity This leaves the biologically interesting regions of the query sequence available for matching against database sequences Secondary (pattern) databases Analysis of the primary protein sequence databases, usually through multiple sequence alignments has led to the identification of sequence patterns (motifs, signatures, blocks, profiles) common to homologous proteins or protein modules These motifs, usually of ~10-20 amino acids length, commonly correspond to key functional or structural elements, often domains/modules, and are extremely useful in identifying such features in new uncharacterized proteins An unknown protein is often too distantly related to any protein of known sequence to detect its resemblance by overall sequence alignment, but it can potentially be identified by the occurrence in its sequence of a particular motif There are a number of programs which allow the searching of an unknown protein against databases of motifs/profiles etc Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families, which is based on proteins from both SWISS-PROT and SP-TrEMBL SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs