* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Homology Detection
Gene expression wikipedia , lookup
P-type ATPase wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
List of types of proteins wikipedia , lookup
Interactome wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein adsorption wikipedia , lookup
Western blot wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Homology and Information Gathering and Domain Annotation for Proteins Outline • Homology • Information Gathering for Proteins • Domain Annotation for Proteins • Examples and exercises The concept of homology “The same organ in different animals under every variety of form and function.” – Richard Owen, 1843 http://bytesizebio.net/wp-content/uploads/2009/07/homology-limbs Homologous forelimbs Homology – Alikeness because of common ancestry • • • • • Homology: The relationship of any two characters that have descended with divergence from a common ancestral character (common ancestry) Analogy: The relationship of any two characters that have descended convergently from unrelated ancestors (convergent evolution) Characters are at very different levels of biological organization, ranging from entire organs over genes and domains to single nucleotides Homology is a concept of quality (all-or-none) Homology is not precisely defined pterosaur bat bird http://upload.wikimedia.org/wikipedia/commons/3/38/Homology.jpg Steven M. Carr, 2009 http://www.mun.ca/biology/scarr/Molecular_Homology_&_Analogy.html Subtypes of homology • Three disjoint subtypes – Orthology: Two homologous characters separated by a speciation event (Speciation) (Duplication) – Paralogy: Two homologous characters arising from a duplication event – Xenology: Two homologous characters whose history involves interspecies (horizontal) transfer of genetic material Horizontal transfer Walter M. Fitch,Trends in Genetics, 2000 Protein domain is a basic evolutionary module and an important unit of homology • • • • • Definition: A polypeptide chain capable of autonomous folding Many proteins are multi-domain proteins Many domains are found in different contexts – domain shuffling Exons in eukaryotic genomes often correspond to domains Therefore, protein classification schemes build on domains not on entire proteins Soding & Lupas, Bioessays, 2003 Assessment of homology in proteins • • • • • Assessed by comparing their sequence, structure, and function Sequence similarity is the primary marker of homology Due to the relatively minor size of protein structure space, similar structures are more likely to originate by convergence However, structure diverges more slowly and therefore allows for the recognition of more distant relationships Functional residues within an active site are often the most highly conserved positions in a protein sequence Sequence Structure Function Information gathering and domain annotation for proteins • Databases and servers • Domain annotation A variety of databases enable information gathering about your protein of interest • Run by different research institutions • Allow for free information retrieval for academic purposes • The spectrum ranges from broad all-around databases (Uniprot or NCBI) to databases that specialize in particular aspects (i.e. hierarchical structural classification) The National Center for Biotechnology Information (NCBI) at the National Institute of Health in the US • “The NCBI advances science and health by providing access to biomedical and genomic information” • Contains numerous popular resources – – – – – PubMed (life science literature) Sequences (whole genomes to individual proteins) Gene Expression data Taxonomy Numerous Tools, most importantly BLAST for homology detection → A good starting point for an analysis Protein classifications generate order among their tremendous diversity • Sequence-based domain classifications (grouping is based on homology inferred by detectable sequence similarity): – SMART: emphasizes on signaling domains, fast – Pfam: a comprehensive database to classify newly found domains into domain families • Structure-based classification schemes: – CATH: – SCOP: Class – Architecture – Topology – Homology Structural Classification of Proteins Class – Fold – Superfamily – Family → Homology is not a criterion on all levels of classification – In contrast to cellular life proteins are polyphyletic Example 1: Annotate domains in LRRK2 (Human) • Obtain sequence in FASTA1 format from the NCBI2 • Enter name of the protein (LRRK2) in Uniprot3 and see all the information one can retrieve there • Put the sequence into domain databases like SMART4 or Pfam5 and mark the identified domains in your log file 1) FASTA: 2) NCBI: 3) UniProt: 4) SMART: 5) Pfam: a widely used plain text file format for sequence data google “ncbi” or http://www.ncbi.nlm.nih.gov/ google “uniprot” or http://www.uniprot.org/ google “embl smart” or http://smart.embl-heidelberg.de/ google “pfam” or http://pfam.sanger.ac.uk/ Example 2: Annotate domains in NarX (E. coli) • … 1) FASTA: 2) NCBI: 3) UniProt: 4) SMART: 5) Pfam: a widely used plain text file format for sequence data google “ncbi” or http://www.ncbi.nlm.nih.gov/ google “uniprot” or http://www.uniprot.org/ google “embl smart” or http://smart.embl-heidelberg.de/ google “pfam” or http://pfam.sanger.ac.uk/