Download ppt - OHLL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Hardy–Weinberg principle wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Viral phylodynamics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Public health genomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Genetic testing wikipedia , lookup

Genome (book) wikipedia , lookup

Koinophilia wikipedia , lookup

Genetic studies on Bulgarians wikipedia , lookup

Medical genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Microevolution wikipedia , lookup

Genetic drift wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
The Neolithic transition in Europe:
different views from population genetic
(a tentative discussion around
some methodological questions)
Lounès Chikhi
Evolution et Diversité Biologique
CNRS
Université Paul Sabatier, Toulouse
Inference in population genetics
• Data collection
•
Genetic typing
• Description of patterns of genetic variability
• Analysis and interpretation
• Test (simulations)
Inference in population genetics
• Sampling of “populations”
•
“Choice” of the markers (genome sampling)
• mitochondrial DNA : female demography
• Y Chromosome : male demography
• nuclear genes (markers: allozymes, microsatellites,
RFLP, AFLP, SNPs, etc.)
• Description of the patterns :
– Diversity within samples
– Diversity between samples
– Are there spatial patterns ?
A similar pattern with Y
chromosome data
Semino et al. (2000)
Science
What to do of the patterns ?
How to interpret them ?
Inference in population genetics
• Are the patterns, if any, compatible with
hypotheses or demographic scenarios from other
areas (archaeology, linguistics, etc.) ?
A possible scheme of
population movements since Paleolithic
18,000 BP
45,000 BP
10,000 BP
Inference in population genetics
• Is there a link between these images (archeogenetico-linguistic) ?
• Can we estimate demographic parameters ?
– Population : stable ? growing ? bottleneck ?
– Admixture between populations ?
• Can we date these events ?
• Can we detect selection ?
Effect of population size changes on some
measures of genetic diversity
nA = 7
r=8
nA/r = 0.88
gap
• nA drops quicker
than He because
rare alleles are
eliminated and do
not contribute to
He = 0.74
Bottleneck
– He = 1-Σpi2.
• gappy allelic size
distributions
• range varies little
(r=range)
gaps
nA = 4
r=7
nA/r = 0.57
He = 0.71
Allele sizes
(nb of repetitions)
Inference in population genetics
• Thus there is some information in genetic data
about ancient demographic events.
• However, this information, may be qualitative
rather than quantitative and does not allow us
to determine whether other scenarios could
have played a role (or selection).
Recent data from the Y
chromosome have been
interpreted as indicating
a Neolithic contribution
of 22% (Semino et al.,
2000).
This figure (22%) is
the sum of the
frequencies of 4
haplotypes called
Eu4, 9, 10 and 11
Question : why should
the proportion of
haplotypes exhibiting a
clinal distribution today
represent the so-called
“Neolithic” contribution?
There are two problems with this “estimation”:
1. Clines are only expected for alleles that were
present in different frequencies in the populations when
they mixed (dilution problem).
Moreover, drift in the last 4000-8000 years may have
blurred clines that were visible at the time.
Many haplotypes are observed only 1, 2 or 3 times in
each sample (i.e. no cline is going to be as visible by eye
as those observed for the 4 selected haplotypes)
2. Even if it were estimated properly it would be
meaningless for understanding the processes of
European colonization. A single number cannot
summarise a cline.
PN = proportion of farmers in any admixed population
n= number of admixture events
Geometric decrease of Neolithic contrib. from PN to PNn
Ex: PN=0.9 and n=25, then PNn=0.07
+: n=10
Δ: n=25
O: n=50
Average = 100(PN +PN2+…+
PNn)/n.
Same value of PN = 0.9
(90% farmers + 10% huntergatherers).
Horizontal lines are averages:
n=10:
average = 62%
n=25:
average = 36%
n=50:
average = 21%
Thus, a lack of pattern or a low
average can correspond to
a high PN value.
Two major models have been proposed
(or at least structure current debate)
The demic diffusion model: significant correlations between
archaeological and genetic maps are explained by a movement of people
entering Europe from the Levant and Anatolia during the Neolithic. We
would expect a significant genetic contribution.
The cultural diffusion model: the spread of agriculture in Europe
involved the movement of ideas, not of people. The genetic contribution of
Near East farmers to the European gene pool should be limited.
Demic diffusion
Cultural diffusion
Average
Large Genetic contribution
Small Genetic contribution
Inférence en génétique des populations
What kind of inference ?
– Qualitative versus quantitative ?
– Detection versus estimation.
– Models and underlying assumptions.
Admixture model
Parent 1
(huntergatherers)
T/N1
1 – p1
p1
Hybrid
T/Nh
Parent 2
(farmers.)
Past
T/N2
Parent 1
Hybrid
Parent 2
(Basques)
(Europe)
(Near East)
Separates the effects of drift and admixture
T
Present
allele freq.
Evolution of allele frequencies
under genetic drift
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
P1
H
P2
In other words:
1
allele freq.
The effect of drift is
that the « hybrid »
population may
not even be
intermediate after
a limited number
of generations.
2
3
4
5
6
7
8
(i)
Nb. Generations
1
0,9
0,8
0,7
0,6
P1
H
P2
0,5
0,4
0,3
0,2
0,1
0
1
2
3
4
5
6
Nb Generations
7
8
the information on
admixture
decreases with
time.
(ii) It is risky to analyse
single locus data
when
demographic
events are ancient.
p1 = 0.3
Little drift
More drift
1) We simulate data
according to the
model (figure above)
varying some
parameters (here
drift)
2) The outputs are given
to the program
implementing the
method
3) One distribution is
obtained for each
simulation
p1 = 0.3
1) We note that for the VERY
SAME scenario inference can be
extremely different !
2) This inference varies from one
locus to the other.
3) When two loci produce different
estimates, we cannot conclude
that they had a different
demographic history.
4) Worse : we are in an optimal
situation : we « know » the real p
and the data were simulated
according to the model. This
NEVER happens in real life.
1) One solution : multi-locus
data.
2) Increasing sample sizes is
NOT very useful.
3) Better to have multilocus
data much later than one locus
juste after:
Ex: 5 loci after 100 generations
versus 1 locus after 1
generation (for N=1000)
4) Don’t throw your allozyme
data away.
What if we re-analyse
Semino et al.’s data ?
Y chromosome data (Semino et
al., 2000).
p1 represents the huntergatherer contribution
(descendants = Basques)
Each curve corresponds to the
analysis of a European
population.
Significant cline observed for (1p1) values (i.e. Near Eastern
contribution) against geog.
distance calculated from the
Near East.
After Semino
et al., 2000
Science
As a test we can analyse
the same data considering
Sardinians as
descendants of the huntergatherers.
We find an extremely
similar result.
The « Neolithic »
contribution is even
slightly superior: on the
order of 65% instead of
50%.
Model-based results (i)
(mostly on Y chromosome data)
1) There are significant clines for the parameter representing the
Neolithic contribution Néolithique across Europe.
2) This “trend” is signifcantly different from that “obtained” by
Semino et al. (2000).
3) The Neolithic contribution appears to be around 50% rather
than 22%.
4) Re-analysis of all European populations using the Sardinian
population as P1 shows very similar results with higher
Neolithic contribution (average of 65%).
Conclusion:
The cultural diffusion model is unlikely to explain
the patterns observed using the Y chrom. data.
• Tests performed are partial and the model is
simplistic but it is a first step towards
quantification of demographic parameters
clearly identified.
• Qualitative approach :
– Easy and useful BUT little or misleadingly precise
• Quantitative approach:
– Assumptions are explicit
– Results can be precise (or not) BUT often complicated
to interpret and (maybe) model-dependent.
Inference in population genetics
• Data collection
•
Genetic typing
• Description of patterns of genetic variability
• Analysis and interpretation
• Test (simulations)
Inference in population genetics
In case I was not specific enough :
Beware the use of any method whose assumptions
you do not understand or which have not been
extensively tested on simulations :
– Nested Clade Analysis
– Median-network
Thank you
AND MANY THANKS TO
Mark Beaumont, Mike Bruford,
Guido Barbujani, Richard Nichols, etc.