Download Slides - Lirmm

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Approximate
Bayesian
Computation
Studying demographic parameters
Joao Lopes, Mark Beaumont
University of Reading
[email protected]
1.

ABC algorithm:
Assumptions:

Discordance between gene and species trees is not expected
 Mutation rate is variable in space, but not in time

Features:

Based on construction of gene trees using The Coalescent model
 Easily applied to 4 or 5 populations/species
 Some tweaks are necessary to use in more populations

But most importantly:

Handles large datasets (typically hundreds of samples per population/species)
 Complex population/species models can be used (e.g. presence of gene flow)
 Assumptions can be greatly relaxed (e.g. variable mutation rate over time)
1.
ABC algorithm
ABC algorithm:
1.
2.
3.
_
4.
_
5.
F = {Ne1, Ne2, NeA, m1, m2, t}
Sample from prior(s):
Fi ~ p(F)
Simulate data, given Fi: Di ~ p(D | Fi)
Summarize Di with set of Summary Statistics obtaining
Si; go to 1. until N points (S,F) have been created.
NeA
Popanc
Accept the points whose S is within a distance d from s’
the real data summarized by the same set.
Correct the values F according to their distance from the
real data by performing a local linear regression
t
m2
Ne1
m1
Pop1
Ne2
Pop2
The population model
2.
Simulated data
DNA sequence data (1 locus)
Pop1: 45 samples
Pop2: 55 samples
ABC: 200 data sets
Comparison with MCMC: 10 data sets
Relative Mean Integrated Square Error (relMISE):
2
1 n  f i  f '

n i 1  f '2




,
where n is the number of accepted points, fi is the value
of a determined parameter for the ith point and f‘ is the
true value of the parameter.
Summary Statistics used:
1.
mean of pairwise differences
a)
in each population
b)
both populations joined together
2.
number of segregating sites
a)
in each population
b)
both populations joined together
3.
number of haplotypes
a)
in each population
b)
both populations joined together
2.
Simulated data
‘Real’ data and Prior information
10000
0
12500
20000
0
Ne1
5000
40000
Ne2
0
0
10000
NeA
0
0
0.0005
m1
5000
0
0.0005
0
m2
10000
t
ABC
“real” data
MCMC
prior distribution
2.
Simulated data
ABC (500 000 iter, tol=0.02, logit transf, sstats=9 ):
Simulation 8:
Ne1
Ne2
Mig1
Neanc
Mig2
Tev
average relMISE:
Ne1
(10 data sets)
Ne2
NeA
m1
m2
t
ABC
0.05
0.011
0.22
0.035
0.27
0.034
23.00E-09
3.28E-09
8.74E-09
1.62E-09
0.24
0.020
MCMC
0.04
0.007
0.11
0.029
0.16
0.015
1.28E-09
0.25E-09
0.60E-09
0.18E-09
0.05
0.013
Priors
0.27
-
0.33
-
0.33
-
-
0.33
83.33E-09
-
83.33E-09
-
2.
Simulated data: optimized ABC method
ABC (2500 000 iter, tol=0.004, log transf, sstats=9):
Simulation 8:
Ne1
Ne2
Mig1
Neanc
Mig2
Tev
average relMISE:
Ne1
(10 data sets)
Ne2
NeA
m1
m2
t
ABC
0.05
0.011
0.22
0.035
0.27
0.034
23.00E-09
3.28E-09
8.74E-09
1.62E-09
0.24
0.020
ABC*
0.06
0.012
0.18
0.033
0.24
0.035
10.10E-09
2.11E-09
3.07E-09
0.92E-09
0.18
0.019
MCMC
0.04
0.007
0.11
0.029
0.16
0.015
1.28E-09
0.25E-09
0.63E-09
0.18E-09
0.05
0.013
Priors
0.27
-
0.33
-
0.33
-
-
0.33
83.33E-09
-
83.33E-09
-
2.
Simulated data: adding summary stats
ABC (2500 000 iter, tol=0.004, log transf, sstats=21)
Simulation 8:
Ne1
Ne2
Mig1
Neanc
Mig2
Tev
average relMISE:
Ne1
(10 data sets)
Ne2
NeA
m1
m2
t
ABC
0.05
0.011
0.22
0.035
0.27
0.034
23.00E-09
3.28E-09
8.74E-09
1.62E-09
0.24
0.020
ABC*
0.06
0.012
0.18
0.033
0.24
0.035
10.10E-09
2.11E-09
3.07E-09
0.92E-09
0.18
0.019
ABC**
0.05
0.003
0.11
0.005
0.23
0.006
6.21E-09
0.26E-09
1.87E-09
0.08E-09
0.15
0.005
MCMC
0.04
0.007
0.11
0.029
0.16
0.015
1.28E-09
0.25E-09
0.60E-09
0.18E-09
0.05
0.013
Priors
0.27
-
0.33
-
0.33
-
83.3E-09
-
-
0.33
83.33E-09
-
Model-choice: migration present/absent
ABC (1000 000 iter, tol=0.004, log transf, sstats=21):
Population model 1 (M = M1)
Population model 2 (M = M2)
Popanc
Popanc
or
Pop1
pM1 = 2%
Pop2
Pop1
x
Pop2
pM2 = 98%
(10 data sets)
2.
Simulated data: using model-choice step
ABC (2500 000 iter, tol=0.004, log transf, sstats=21):
Simulation 8:
Ne1
Ne2
Mig1
Neanc
Mig2
Tev
average relMISE:
Ne1
(10 data sets)
Ne2
NeA
m1
m2
t
ABC
0.05
0.011
0.22
0.035
0.27
0.034
23.00E-09
3.28E-09
8.74E-09
1.62E-09
0.24
0.020
ABC*
0.06
0.012
0.18
0.033
0.24
0.035
10.10E-09
2.11E-09
3.07E-09
0.92E-09
0.18
0.019
ABC**
0.05
0.003
0.11
0.005
0.23
0.006
6.21E-09
0.26E-09
1.87E-09
0.08E-09
0.15
0.005
ABC***
0.03
0.001
0.12
0.007
0.19
0.005
-
-
-
0.07
0.007
MCMC
0.04
0.007
0.11
0.029
0.16
0.015
1.28E-09
2.53E-10
6.03E-10
1.84E-10
0.05
0.013
Priors
0.27
-
0.33
-
0.33
-
8.33E-08
-
8.33E-08
-
0.33
-
-
2.
Simulated data: 10 vs 200 datasets
ABC (2500 000 iter, tol=0.004, log transf, sstats=21):
Simulation 8:
Ne1
Ne2
Mig1
Neanc
average relMISE:
Ne1
Mig2
Tev
(10 data sets) and (200 data sets)
Ne2
NeA
m1
m2
t
ABC***
0.03
0.001
0.12
0.007
0.19
0.005
-
-
-
-
0.07
0.007
ABC***
0.04
0.002
0.09
0.005
0.19
0.005
-
-
-
-
0.06
0.003
Priors
0.27
-
0.33
-
0.33
-
-
8.33E-08
-
0.33
8.33E-08
-
3.

Comparison between ABC and MCMC methods:







Conclusions:
ABC up to 2 orders of magnitude faster than MCMC method for single locus
ABC modes are similar to MCMC (full likelihood method)
Can easily incorporate more complex population models with relaxed assumptions
Using a model-framework comes just naturally from the ABC approach
Easily handles multi-modal Posterior distributions
Does not have problems associated with Local Maximums in Likelihood distributions
ABC improves with:

parameters transformation
 more iterations
 more summary statistics
 model-choice framework
Take home message:
 Phylogenetic
methods based on gene trees using
The Coalescence are being greatly explored.
 These
methods will be available in a near by future
Acknowledgements
I would like to acknowledge David Balding for providing frequent
meetings on the subject. And also a special thanks to Mark Beaumont
for advice and comments on the work.
Support for this work was provided by EPSRC.
[email protected]
http://www.rdg.ac.uk/~sar05sal