Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PrIME Probabilistic Integrated Models of Evolution Bengt Sennblad, Dept. Plant and Env. Sciences, GU and Stockholm Bioinformatics Center (SBC) Outline • PrIME models – GEM • Gene Evolution Model – SRT • Sequence evolution with Rates and Times – GSR • Gene Sequence evolution with iid Rates – HGM • Gene evolution in hybrid networks Outline • • • • • What? Why? When? How? So? GEM The Gene Evolution Model Lars Arvestad, KTH Ann-Charlotte Berggren, UU Jens Lagergren, KTH Bengt Sennblad What? • Gene (loci) tree evolution – GEM – gene loci (Coalescence – gene alleles) – Gene duplication and loss process • Models duplications/losses that are fixed in population What? – genetic mechanism • Recombination errors – Unequal crossing-over – Tandem repeats – Segmental duplication • Retrotransoposition – mRNA cDNA insertion – Loss of intron, regulatory regions • Chromosome/genome duplication What? – the process • Gene tree evolves inside species tree Speciation Duplication Loss ÷ Why? Ohno, S. 1970, Evolution by gene duplication, Springer, QuickTime och en TIFF (okomprimerat)-dekomprimerare krävs för att kunna se bilden. Fitch’s Definition of Orthology mouse chicken • • Speciation shown as Duplication shown as • Fitch 1970 • Orthologs - LCA speciation • Paralogs - LCA duplication frog Orthologs - more likely to share function Paralogs - more likely to have different function Why? • Biological causes – Duplication-loss – Lateral gene transfer QuickTime och en TIFF (okomprimerat)-dekomprimerare krävs för att kunna se bilden. – Allele sorting • Methodological – Systematic – Stochastic Mushegian et al. 1998. Genome Res Reconciling given species and gene tree Species tree A B C D E F G H Gene tree A B C A B C D E E F F G H H Reconciliation Reconciled tree (G,) Gene tree vertex on species tree edge is a duplication Gene tree vertex on species tree vertex is a speciation ÷ When? • Parsimony Reconciliation (MPR) – Goodman et al. 1979, Page and co-workers sev. papers, Guigo 1996, Hallett & Lagergren 2000 Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 A A B C B C Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 Most parsimonious reconciliation • • Speciation shown as Duplication shown as • Goodman et al 1979 When? • Parsimony Reconciliation (MPR) – Goodman et al. 1979, Page and co-workers sev. papers, Guigo 1996, Hallett & Lagergren 2000 • Bootstrapped MPR – Storm & Sonnhammer 2002, Zmasek & Eddy 2002 Why only MPR? • • Speciation shown as Duplication shown as Why only MPR? • • Speciation shown as Duplication shown as Why only MPR? • • Speciation shown as Duplication shown as Intuition: Depending on gene family, we might believe more in some reconciliations than in others Probabilistic reconciliation When? • Parsimony Reconciliation (MPR) – Goodman et al. 1979, Page and co-workers sev. papers, Guigo 1996, Hallett & Lagergren 2000 • Bootstrapped MPR – Storm & Sonnhammer 2002, Zmasek & Eddy 2002 • Probabilistic models – Full reconciliation model, Arvestad et al. 2003, 2004, – Gene copy number, Hahn 2005, Csürös & Miklós2006 The Gene Evolution Model How? Generation vs reconstruction – the Birth-Death model • Generation of data from model – Example: BD(l,m) tree T – Repeated generation distribution – Statistical tests, e.g., ’parametric bootstrapping’ Birth-death process gives trees 0 1 Birth-death process gives trees 0 1 Birth-death process gives trees 0 1 Birth-death process gives trees 0 1 Birth-death process gives trees 0 1 Birth-death process gives trees 0 1 Birth-death process gives trees 0 1 Generation vs reconstruction the birth-death model • Generation of data from model – Example: BD(l,m) tree T – Repeated generation distribution – Statistical tests, e.g., ’parametric bootstrapping’ • Reconstruction – Probability of given data – Example: given T Pr[T|l, m] under BD – Compare different trees ’reconstruction’ Extinct lineages 0 1 Extinct lineages 0 1 Isomorphisms Gene evolution model • Gene effecting events: losses, duplications, and speciation Gene evolution model • Gene effecting events: losses, duplications, and speciation • A gene tree evolves inside the species tree Gene evolution model • Gene effecting events: losses, duplications, and speciation • A gene tree evolves inside the species tree 1. A gene starts at time t before the root Gene evolution model • Gene effecting events: losses, duplications, and speciation • A gene tree evolves inside the species tree 1. A gene starts at time t before the root 2. Along an edge linear birth death process Gene evolution model • Gene effecting events: losses, duplications, and speciation • A gene tree evolves inside the species tree 1. A gene starts at time t before the root 2. Along an edge linear birth death process 3. At a speciation each gene linage splits into two Gene evolution model • Gene effecting events: losses, duplications, and speciation • A gene tree evolves inside the species tree 1. 2. 3. 4. A gene starts at time t before the root Along an edge linear birth death process At a speciation each gene linage splits into two Finally, losses are pruned Gene evolution model • Gene effecting events: losses, duplications, and speciation • A gene tree evolves inside the species tree 1. 2. 3. 4. • A gene starts at time t before the root Along an edge linear birth death process At a speciation each gene linage splits into two Finally, losses are pruned Biologically sound Nei et al., 1997 Generating a scenario Speciation Duplication Loss Generating a scenario Speciation Duplication Loss Generating a scenario Speciation Duplication Loss Generating a scenario Speciation Duplication Loss Generating a scenario Speciation Duplication Loss Generating a scenario Speciation Duplication Loss Generating a scenario Speciation Duplication Loss Reconciled tree (G, ) Reconciliation – losses have been pruned ÷ Reconstruction • Reconstruction Pr[G,|S, l, m, t] non-trivial – – – – BD over edges Ghosts Isomorphisms Sum over scenarios • No. scenarios is exponential in tree size – Dynamic programming (DP) – Efficient algorithm GEM • Generation relatively simple • Probability of reconciled tree • Probability of gene tree: • Posterior of reconciliation: • Max likelihood reconciliation: Example application: Probabilistic orthology analysis Sennblad, Lagergren submitted MHC example: MPR orthology MHC example: MPR orthology 1(b) Three other reconciliations Reconciliation probabilities 2 1(b) Posterior & posterior mean Modification of DP for Pr[G|S,l,m] MCMC Speciation probabilities ABCA Speciation probabilities MHC birth-death parameter posterior When is MP-reconciliation incorrect? • 1000 (G,) per square QuickTime och en TIFF (okomprimerat)-dekomprimerare krävs för att kunna se bilden. Performance of probabilistic orthology analysis • Draw from posterior distribution – Biological realism • Generate synthetic (G,) – Speciations are positives • Analyze – classify genes as duplication/speciation based on probability and threshold • ROC-curves – Sensitivity = TP/(TP+FN) – Specificity = TN/(TN+FP) ROC for MHC-like data • Speciation? Y sensitivity, X specificity ROC for ABCA-like data • Speciation? Y sensitivity, X specificity Programs • primeGEM; – Probability of gene tree: – Orthology analysis: • xprimeGEM – Probability of reconciled tree – Posterior of reconciliation: • xprimeGEM-max – Max likelihood reconciliation: • xprimeGEM-enum – Enumerates all with Prob. ’Comparison’ • Coalesence -- Allele evolution – Alleles evolve by random sampling from parental population • Gene duplication and loss -- gene family evolution – Gene copies are created by duplication events – Gene copies are lost by loss events – Underlying pure birth process – Underlying birth-death process Later similar work -- gene copy number Later similar work -- coalescence Later similar work -- coalescence