* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Finding orthologous groups
Gene therapy of the human retina wikipedia , lookup
Transposable element wikipedia , lookup
Metagenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene therapy wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Essential gene wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
The Selfish Gene wikipedia , lookup
Genomic imprinting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Minimal genome wikipedia , lookup
Genome evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Finding Orthologous Groups René van der Heijden NCMLS / CMBI, October 24th 2006 What is this lecture about? • What is ‘orthology’? • Why do we study gene-ancestry/gene-trees (phylogenies)? • Several approaches to find orthologous genes • High-resolution orthology • Steps involved • Things to think about (homework) Homology Genes are homologous if and only if they derive from the same ancestral gene • Sufficient sequence similarity proofs homology • Very dissimilar sequences: PSI blast, HMM searches Homologous genes tend to have similar functions The usual range Homologous genes tend to have similar functions Accurate function prediction requires something better than homology Orthology Orthology “This gene in that other species …” We don’t have chicken genes ! • They mean: the corresponding gene • Why that particular gene ? • Sure this actually is the gene ? • Sure that all n orthologs are correct ? Duplications, Speciations, and Orthology Evolution results in: • Growing number of genes – Gene duplications – Horizontal gene transfer Tendency for functional – De novo generation • Growing number of species The fate of gene duplicates: • Perish • Find a new functional niche expansion Duplications, Speciations, and Orthology Two genes in two species are orthologous if they derive from one gene in their last common ancestor • Orthologous genes are likely to have the same function • Much stronger than “tend to have similar function” Orthologous genes orthologs paralogs a long long time ago in a land far far away time there current is a speciation set of genes event another speciation event … one of the genes gets duplicated the line represents a gene another speciation event resulting with apparent in two history species butresulting one of the paralogous genesgenes is lost in two paralogous in some ancestral species withinthe onesame, of theorthologous new speciesgene Duplications, Speciations, and Orthology present genes primal ancestor evolutionary distance Homologs, Orthologs, and Paralogs • Homologous: one common ancestral gene • Orthologous: separated by a speciation event The view on orthology and • Paralogous: separated by a duplication event paralogy is relative to a certain speciation • Orthologs and Paralogs must be Homologs Are there homologous genes which are not orthologous nor paralogous? Inparalogs and Outparalogs • Both, In- and Outparalogous genes are separated by a gene duplication event • For Inparalogs, the duplication event Are Inparalogs Orthologs ? is not followed by speciation(s) Depends on your definition: Yes: two genes are orthologous if • Outparalogs arederive separated a duplication they from oneby gene in the last ancestor event, followed bycommon speciation(s) No: two genes are orthologous if they are only separated by Inparalogs are recent paralogs cell division events • • Outparalogs are more ancient paralogs Reading Gene-Trees Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1 The tree suggests at least 2 gene losses In-, and Outparalogs, Orthologs, and Co-orthologs www = What, Why, and hoW? • What: Orthologous genes are separated by cell division only • Why: Orthologous genes are likely to have the same function • How: Indeed: the “how” forms the remainder of this lecture Several approaches • The COG approach • InParanoid • Tree-based methods COG approach • Based on blast hits • Establishment and extension of triangles: COG approach II Extension of orthologous groups InParanoid I • Method denotes – IN- and OUTparalogs – For TWO species • Find all hits from species A on B • Find all hits from species B on A • Find all bi-directional best hits (BBH) – These form putative orthologs InParanoid II • Find all hits from A on A • Find all hits from B on B • Find all InParalogs – These are all hits better than the orthologs – Better => more recently split Detecting orthologous genes • Usual methods based on blast hit quality: e.g. bi-directional best hit (BBH) ortholog ortholog BBH BBH Genes with promiscuous domains • Gene A may hit on gene B because of a shared domain X • Gene B may hit on gene C because of a shared domain Y • Promiscuous domains require (manual) curation Tree-based methods 1. 2. 3. 4. Get all homologous genes Make multiple alignments Generate phylogenetic gene trees Analyze trees • • • • Uncertainty in multiple alignment? Different methods for distance calculations Superpose a trusted species tree? How to assess a level of accuracy? The Phylogenetic Gene-Tree • Multiple alignment for all genes • Distance matrix calculation – Kimura correction – PAM model – Categories model • Large trees: distance-based methods – Neighbor Joining Uncertainty in trees • Evolutionary noise – Differing rates of evolution – Convergent evolution (low complexity, coiled coils) – Promiscuous domains (recombination, fusion, fission) • Use of heuristic methods – Multiple alignment – Tree making Analyze trees … but don’t trust them fully If this is correct …. this can’t be • Rigid analysis suggests many duplications and losses • Presume scp branch is wrongly placed! Analyze trees … but don’t trust them fully • And if we accept wrong placement of branches … Considering Three orthologous one wrongly groups placed gene suggesting leaves only 15 gene 2 gene losses losses Horizontal gene-transfer ! Remember … “In-, and Outparalogs, Orthologs, and Co-orthologs” Levels of Orthology High-res versus Low-res • Many, • Complete, and • Closely related genomes • Use phylogenetic trees Challenge: Automatic Orthology assignment Differential gene-loss Things to think about (homework) • Select a partner • Collect a gene tree (and some copies) • Carefully deduce which nodes are duplications and which are speciations • Denote which genes are orthologous to each other (orthologous groups) • Select interesting parts to predict what – The COG procedure would say – InParanoid would say – What would have happened if some genes (or species) where not involved in the analysis Homework: also think about …