Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways in Saccharomyces cerevisiae What is this presentation about? • What is ‘orthology’? • Why do we study gene-ancestry/gene-trees (phylogenies)? • Why high-resolution orthology? • Automated high-resolution orthology detection • The FOG database and some applications Orthology • • • • “This gene in that other species …” We don’t have chicken genes ! They mean: the corresponding gene ? Why that particular gene ? Sure this actually is the gene ? • Sure that all n orthologs are correct ? Orthologous genes orthologs paralogs a long long time ago in a land far far away time there current is a speciation set of genes event another speciation event … one of the genes gets duplicated the line represents a gene another speciation event resulting with apparent in two history species butresulting one of the paralogous genesgenes is lost in two paralogous in some ancestral species withinthe onesame, of theorthologous new speciesgene Duplications, Speciations, and Orthology Two genes in two species are orthologous if they derive from one gene in their last common ancestor Orthologous genes are likely to have the same function Detecting orthologous genes • Usual methods based on blast hit quality: e.g. bi-directional best hit (BBH) ortholog ortholog BBH BBH KOG clusters • Based on triangle of BBH between genes of three species • InParalogs are added • Triangles are extended by other genes and other species KOG statistics Low Resolution: There must be functional specialization within these clusters! These large KOG clusters must have multiple representatives per species High-res versus Low-res • Many, • Complete, and • Closely related genomes Challenge: Automatic Orthology assignment Gene Families • Use PSI-blast to recognize (distant) homologs • Split gene set into families of homologous genes Challenge: Promiscuous domains Multi domain genes occur very often in Eukaryotic genomes Gene Families • Promiscuous domains cause genes to be only partially homologous: – Gene A-B is partially homolgous to gene A-C, as is gene B-C • Merging everything with homologous parts generates far too large gene families: – Not possible to obtain proper multiple alignments • More advanced technique for separating multidomain genes into gene families Generating Gene Families • More advanced technique for the merging of genes into gene families is not functional yet • Fall back on ‘known’ gene families using KOG: – Low resolution orthology assignments for Eukaryotes – Some inclusive families with many genes per species Some statistics: • 15 Fungal species with 104.440 genes in total • Divided into 11.020 KOG clusters (gene families) • Involving 70.867 genes (= 68%) Uncertainty in trees • Evolutionary noise – Differing rates of evolution – Convergent evolution (low complexity, coiled coils) – Promiscuous domains (recombination, fusion, fission) • Use of heuristic methods – Multiple alignment – Tree making Reading Gene-Trees Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1 The tree suggests at least 2 gene losses Analyze trees … but don’t trust them fully If this is correct …. this can’t be • Rigid analysis suggests many duplications and losses • Presume scp branch is wrongly placed! Analyze trees … but don’t trust them fully • And if we accept wrong placement of branches … Considering Three orthologous one wrongly groups placed gene suggesting leaves only 15 gene 2 gene losses losses Automatic Orthology assignment • LOFT: Levels of Orthology From Trees Result • Collection of genes is split into KOG families • KOG families are aligned and phylogenetic trees are derived • Phylogenetic trees are analyzed using LOFT resulting in high-resolution orthology Result Can LOFT be trusted? It seems okay! Applications • We now have FOG: a complete set of high resolution orthology assignments for fungi • We ‘know’ which orthologous genes are present and absent in which species • Phyletic distribution Complex I Complex I Complex I Phyletic distribution of mitochondrial orthologous groups Phylogenetic Tree for Mitochondrial Carrier Proteins Orthologous group 24 is an uncharacterized mitochondrial carrier In yeast this is known as YMC1, unknown function It is present in all fungi, except in Ashbya gossypii YMC1: predicted glycine/serine antiporter • There are three S.cerevisiae genes with the same phyletic distribution: – subunit glycine decarboxylase – other subunit glycine decarboxylase – gene with unknown function