* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Orthology, paralogy and GO annotation
Epigenetics of diabetes Type 2 wikipedia , lookup
Public health genomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Essential gene wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Protein moonlighting wikipedia , lookup
Population genetics wikipedia , lookup
Metagenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pathogenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Helitron (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Koinophilia wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Orthology, paralogy and GO annotation Paul D. Thomas SRI International Outline • Why does orthology matter to us? • A little background on evolution, orthology and paralogy • Practical considerations for RefGenome Why does “orthology” matter to us? • Goal – identify genes in reference genomes that have the same or similar functions, so that comprehensive curation can be done simultaneously • Why? – Different model organisms have different strengths for exploring different facets of gene function, and these can often inform each other – Most genes did not first evolve within a given extant species: they were INHERITED from a common ancestor. Genes in different organisms have similar functions because they were inherited, and haven’t changed much since the common ancestor. How do we identify genes with similar functions? • Evolutionary analysis • Where do orthologs fit in, and what do we mean by orthologs? How do we identify genes with similar functions? • Evolutionary analysis • Where do orthologs fit in, and what do we mean by orthologs? – Simple answer: “The same gene in different organisms” (separated only by speciation) • Orthology = similar function How do we identify genes with similar functions? • Evolutionary analysis • Where do orthologs fit in, and what do we mean by orthologs? – Simple answer: “The same gene in different organisms” (separated only by speciation) • Orthology = similar function – Unfortunately, the world is not that simple • Orthologous genes can have the different functions • Paralogous genes (duplications) can have (to some extent at least) similar functions How do we identify genes with similar functions? • Evolutionary analysis • Where do orthologs fit in, and what do we mean by orthologs? – Simple answer: “The same gene in different organisms” (separated only by speciation) • Orthology = similar function – Unfortunately, the world is not that simple • Orthologous genes can have the different functions • Paralogous genes (duplications) can have (to some extent at least) similar functions – Fortunately, a slightly more complicated view can get us much closer to addressing the question of gene function Representing evolution of related genes • Start with Darwin’s basic model: – Copying • An ancestral “species” “splits” into two separate species – Divergence • Each copy (species) changes independently over generations – NATURAL SELECTION: adaptation to different environment Darwin’s species tree • Number of generations/time along one axis • Amount of divergence along other axis • Characters in common are due to inheritance – Also tells us something about common ancestor Representing evolution of related genes • “Gene families” • Add detail from population genetics/molecular evolution to apply to genes – Copying • An ancestral species “splits” into two separate species – SPECIATION • A gene is duplicated in one population and subsequently inherited – DUPLICATION – Divergence • Each copy (gene sequence) changes independently over generations – NATURAL SELECTION: sequence substitutions to adapt to new function/role – NEUTRAL DRIFT: accumulation of “neutral” substitutions How does this relate to gene function? • Copying – An ancestral species “splits” into two separate species • SPECIATION: likely to continue performing ancestral function – BUT not always – A gene is duplicated in one population and subsequently inherited • DUPLICATION: “redundant gene” free from previous constraints can adapt to a new function – BUT still inherits some aspects of ancestral function • Divergence – Each “new” (gene sequence) changes independently over generations • NATURAL SELECTION: sequence substitutions adapt to new/modified function/role • NEUTRAL DRIFT: sequence changes from accumulation of “neutral” substitutions. This is the MAJOR source of sequence differences! A gene tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. • Only one “informative” axis: rate of sequence evolution – For neutral changes this can often act as a “molecular clock” – Non-neutral changes will speed up the rate of evolution So what? Practical considerations OrthoMCL “ortholog cluster” E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. • • • • An “ortholog cluster” is made by one or more “slices” through the protein family tree Some combination of evolutionary rates and history of duplications Might miss genes that could be efficiently annotated at the same time From a strict evolutionary standpoint, orthologs are separated ONLY by speciation events; TIGRFAMs has coined the term “equivalog” for functionally conserved groups “ISS” • Inference from sequence similarity • A class of database search algorithm (e.g. BLAST) has become a metaphor – Implies “genes have similar functions because they have similar sequences” – Function is best determined using pairwise comparison “ISS” • More properly, ISS of function is inheritance! – “related genes have a common function because their common ancestor had that function, which was inherited by its descendants” – ISS is not just a statement about one gene. It is also making assertions about • The common ancestor • Inheritance of a “character” by – Both “pairwise similar” descendants – Other descendants Homology inference in a tree inheritance and divergence of function E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. “methylene tetrahydrofolate reductase activity” (m.f.) “methionine metabolic process” (b.p.) D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. Homology inference in a tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. “methylene tetrahydrofolate reductase activity” (m.f.) “methionine metabolic process” (b.p.) NOT “methionine metabolic process” (b.p.) D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. NOT “methylene tetrahydrofolate reductase activity” (m.f.)? NOT “methionine metabolic process” (b.p.)? “regulation of homocysteine metabolic process” (b.p.) Homology inference in a tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. COMBINES: 1. Evolutionary information (tree) 2. Experimental knowledge (GO annotations from literature) 3. Organism-specific biological knowledge (curators) D.r. G.g. H.s. MTHFR R.n. M.m. This is just an easy, selfconsistent way of doing ISS! E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. We have a picture of ALL the relationships rather than N flat lists that need to be reconciled Tree annotation tool for RefGenome • Pre-computed, searchable “library” of gene trees – Include “outgroup” organisms to help infer evolutionary histories – Gene members in any tree can be modified by curator feedback • Tool for viewing tree and selecting “homology group” to be annotated • Tool for viewing tree labeled with in-depth GO annotations from all MODs, and inferring ancestral functions and homology annotations • Homology annotations will be supported by a tree node as evidence, trees will be available to scientific community • HMMs will be constructed to allow other genome projects to infer GO terms, distributed by InterPro