Download speciation - Stockholm Bioinformatics Center

Document related concepts
no text concepts found
Transcript
PrIME
Probabilistic Integrated Models of
Evolution
Bengt Sennblad,
Dept. Plant and Env. Sciences, GU
and Stockholm Bioinformatics Center (SBC)
Outline
• PrIME models
– GEM
• Gene Evolution Model
– SRT
• Sequence evolution with Rates and Times
– GSR
• Gene Sequence evolution with iid Rates
– HGM
• Gene evolution in hybrid networks
Outline
•
•
•
•
•
What?
Why?
When?
How?
So?
GEM
The Gene Evolution Model
Lars Arvestad, KTH
Ann-Charlotte Berggren, UU
Jens Lagergren, KTH
Bengt Sennblad
What?
• Gene (loci) tree evolution
– GEM – gene loci
(Coalescence – gene alleles)
– Gene duplication and loss process
• Models duplications/losses that are fixed in population
What?
– genetic mechanism
• Recombination errors
– Unequal crossing-over
– Tandem repeats
– Segmental duplication
• Retrotransoposition
– mRNA  cDNA  insertion
– Loss of intron, regulatory regions
• Chromosome/genome duplication
What?
– the process
• Gene tree evolves inside species tree
Speciation
Duplication
Loss
÷
Why?
Ohno, S. 1970, Evolution by gene duplication, Springer,
QuickTime och en
TIFF (okomprimerat)-dekomprimerare
krävs för att kunna se bilden.
Fitch’s Definition of Orthology

 
mouse

 
chicken
•
•
Speciation shown as
Duplication shown as
•
Fitch 1970
• Orthologs - LCA speciation
• Paralogs - LCA duplication
 
frog
Orthologs - more likely to
share function
Paralogs - more likely to
have different function
Why?
• Biological causes
– Duplication-loss
– Lateral gene transfer
QuickTime och en
TIFF (okomprimerat)-dekomprimerare
krävs för att kunna se bilden.
– Allele sorting
• Methodological
– Systematic
– Stochastic
Mushegian et al. 1998. Genome Res
Reconciling given species and gene
tree
Species tree
A
B
C
D
E
F
G
H
Gene tree
A
B
C
A
B
C
D
E
E
F
F
G
H
H
Reconciliation   Reconciled tree (G,)
Gene tree vertex on species tree edge is a duplication
Gene tree vertex on species tree vertex is a speciation
÷
When?
• Parsimony Reconciliation (MPR)
– Goodman et al. 1979, Page and co-workers sev.
papers, Guigo 1996, Hallett & Lagergren 2000
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
A
A
B
C
B
C
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
Most parsimonious reconciliation
•
•
Speciation shown as
Duplication shown as
•
Goodman et al 1979
When?
• Parsimony Reconciliation (MPR)
– Goodman et al. 1979, Page and co-workers sev. papers,
Guigo 1996, Hallett & Lagergren 2000
• Bootstrapped MPR
– Storm & Sonnhammer 2002, Zmasek & Eddy 2002
Why only MPR?
•
•
Speciation shown as
Duplication shown as
Why only MPR?
•
•
Speciation shown as
Duplication shown as
Why only MPR?
•
•
Speciation shown as
Duplication shown as
Intuition: Depending on gene
family, we might believe more in
some reconciliations than in others
 Probabilistic reconciliation
When?
• Parsimony Reconciliation (MPR)
– Goodman et al. 1979, Page and co-workers sev. papers,
Guigo 1996, Hallett & Lagergren 2000
• Bootstrapped MPR
– Storm & Sonnhammer 2002, Zmasek & Eddy 2002
• Probabilistic models
– Full reconciliation model, Arvestad et al. 2003, 2004,
– Gene copy number, Hahn 2005, Csürös & Miklós2006
The Gene Evolution Model
How?
Generation vs reconstruction
– the Birth-Death model
• Generation of data from model
– Example: BD(l,m)  tree T
– Repeated generation  distribution
– Statistical tests, e.g., ’parametric bootstrapping’
Birth-death process gives trees
0
1
Birth-death process gives trees
0
1
Birth-death process gives trees
0
1
Birth-death process gives trees
0
1
Birth-death process gives trees
0
1
Birth-death process gives trees
0
1
Birth-death process gives trees
0
1
Generation vs reconstruction
the birth-death model
• Generation of data from model
– Example: BD(l,m)  tree T
– Repeated generation  distribution
– Statistical tests, e.g., ’parametric bootstrapping’
• Reconstruction – Probability of given data
– Example: given T  Pr[T|l, m] under BD
– Compare different trees  ’reconstruction’
Extinct lineages
0
1
Extinct lineages
0
1
Isomorphisms
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
•
A gene tree evolves inside the species tree
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
•
A gene tree evolves inside the species tree
1. A gene starts at time t before the root
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
•
A gene tree evolves inside the species tree
1. A gene starts at time t before the root
2. Along an edge linear birth death process
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
•
A gene tree evolves inside the species tree
1. A gene starts at time t before the root
2. Along an edge linear birth death process
3. At a speciation each gene linage splits into two
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
•
A gene tree evolves inside the species tree
1.
2.
3.
4.
A gene starts at time t before the root
Along an edge linear birth death process
At a speciation each gene linage splits into two
Finally, losses are pruned
Gene evolution model
•
Gene effecting events: losses, duplications,
and speciation
•
A gene tree evolves inside the species tree
1.
2.
3.
4.
•
A gene starts at time t before the root
Along an edge linear birth death process
At a speciation each gene linage splits into two
Finally, losses are pruned
Biologically sound Nei et al., 1997
Generating a scenario
Speciation
Duplication
Loss
Generating a scenario
Speciation
Duplication
Loss
Generating a scenario
Speciation
Duplication
Loss
Generating a scenario
Speciation
Duplication
Loss
Generating a scenario
Speciation
Duplication
Loss
Generating a scenario
Speciation
Duplication
Loss
Generating a scenario
Speciation
Duplication
Loss
Reconciled tree (G, )
Reconciliation – losses have been pruned
÷
Reconstruction
• Reconstruction Pr[G,|S, l, m, t]
non-trivial
–
–
–
–
BD over edges
Ghosts
Isomorphisms
Sum over scenarios
• No. scenarios is exponential in
tree size
– Dynamic programming (DP)
– Efficient algorithm
GEM
• Generation relatively simple
• Probability of reconciled tree
• Probability of gene tree:
• Posterior of reconciliation:
• Max likelihood reconciliation:
Example application:
Probabilistic orthology analysis
Sennblad, Lagergren submitted
MHC example: MPR orthology
MHC example: MPR orthology
1(b)
Three other reconciliations
Reconciliation probabilities
2
1(b)
Posterior & posterior mean
Modification of DP for Pr[G|S,l,m]
MCMC
Speciation probabilities
ABCA
Speciation probabilities
MHC birth-death parameter posterior
When is MP-reconciliation incorrect?
• 1000 (G,) per square
QuickTime och en
TIFF (okomprimerat)-dekomprimerare
krävs för att kunna se bilden.
Performance of probabilistic orthology analysis
•
Draw from posterior distribution
– Biological realism
•
Generate synthetic (G,)
– Speciations are positives
•
Analyze
– classify genes as duplication/speciation based on probability
and threshold
•
ROC-curves
– Sensitivity = TP/(TP+FN)
– Specificity = TN/(TN+FP)
ROC for MHC-like data
• Speciation? Y sensitivity, X specificity
ROC for ABCA-like data
• Speciation? Y sensitivity, X specificity
Programs
• primeGEM;
– Probability of gene tree:
– Orthology analysis:
• xprimeGEM
– Probability of reconciled tree
– Posterior of reconciliation:
• xprimeGEM-max
– Max likelihood reconciliation:
• xprimeGEM-enum
– Enumerates all  with Prob.
’Comparison’
• Coalesence -- Allele
evolution
– Alleles evolve by random
sampling from parental
population
• Gene duplication and loss
-- gene family evolution
– Gene copies are created by
duplication events
– Gene copies are lost by loss
events
– Underlying pure birth
process
– Underlying birth-death
process
Later similar work -- gene copy number
Later similar work -- coalescence
Later similar work -- coalescence