Download Population structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inbreeding wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Human genetic variation wikipedia , lookup

Microevolution wikipedia , lookup

Genetic drift wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
Population structure
• The evolutionary significance of structure
• Detecting and describing structure
– Wright’s F statistics
• Implications for genetic variability
– Inbreeding effects of structure
– The Wahlund effect
– Drift and founder effects
• Island models of population structure
– Identity by descent
– Diffusion methods
– The coalescent with structure
• Selection in subdivided populations
– Location adaptation
– Clines
– Wright’s Shifting-Balance theory
Copyright: Gilean McVean, 2001
1
Population structure
Distribution of surname
Hannah
• Non-random location
• Non-random mating
Goodacre and Sykes
Genetic and phenotypic divergence due to
Chance
Selection
Selection plus chance
Copyright: Gilean McVean, 2001
2
Detecting and describing genetic structure
Wright’s FST statistic
Heterozygosity over
all populations
HT − H S
=
HT
Average heterozygosity
within subpopulations
Testing by permutation
Copyright: Gilean McVean, 2001
3
The hierarchical nature of F statistics
• F statistics can be used to contrast structure
at different levels
e.g.
HS − HI
FIS =
HS
Average withinindividual
heterozygosity
measure of inbreeding
H Individual < H Subpopulation < H Population < H Region < H Total
Copyright: Gilean McVean, 2001
4
FST in natural populations
Allozymes
Organism
HT
HS
FST
Human (major races)
0.130
0.121
0.069
Human (Yanomama)
0.039
0.036
0.077
House mouse
0.097
0.086
0.113
Jumping rodent
0.037
0.012
0.676
Nei (1975)
SNPs
HT
HS
FST
Human (major races)
0.195
0.201
0.067
Drosophila melanogastera
0.0154
0.0151
0.023
Organism
aBased
on pairwise diversity
Copyright: Gilean McVean, 2001
5
The inbreeding effect of population
structure
• Differences in allele frequency between populations
lead to an excess of homozygotes
HWeqm
Combined samples
Expected
homozygosity
p12 + q12
Observed
homozygosity
p12 + q12 + σ 2p + σ 2q
Heterozygosity = 1- Homozygosity
σ 2p + σ 2q
FS − FT
FST =
=
1 − FT
1− p2 − q 2
Copyright: Gilean McVean, 2001
6
The Wahlund effect
• Increase in heterozyogisty following
mixing of isolated populations
Combine
Random
mating
• Medical implications for disease incidence
in admixed populations
– Recessive disease reduced by mixing
Disease
High risk
population
Disease allele
frequency
Cystic fibrosis
Caucasians
0.022
Hopi
0.07
Ashkenazi Jews
0.013
Albinism
Tay-Sachs disease
Copyright: Gilean McVean, 2001
7
Differences between allozymes and
DNA?
• American oysters (Crassostrea virginica)
1
Allozymes
0.8
0.6
0.4
0.2
0
MA SC
GA
FL
FL
FL
FL
FL
LA
1
DNA
mtDNA
0.8
0.6
0.4
0.2
0
MA SC
GA GFL FL
FL
FL
FL
LA
Avise (1994)
Copyright: Gilean McVean, 2001
8
Differences between allozymes?
Locus
FST
pgm
0.028
pgi
0.052
got
0.017
ak
0.062
bdh
0.034
α-gpdh
to
hk
0.027
Checkersport butterfly
Euphydryas editha
McKechnie et al. 1975
0.035
0. 291
Unusually high
differentiation
Problems with FST
• Arbitrary a priori choice of structure to test
• High sampling variance when polymorphism low
• Throws away much information
Copyright: Gilean McVean, 2001
9
Population genetics models of structure
• Quantify relationship between genetic drift,
selection and population differentiation
Island model
n-island model
• Assumptions
–
–
–
–
Infinite mainland population (island)
Equal population size (n-island)
Constant population size
Proportion m of population replaced migrants
each generation
– Symmetric migration (n-island)
Copyright: Gilean McVean, 2001
10
Identity by descent in the island model
Event
Same parent
1
ft-1
0
1/ 2Ne
1 − 1 / 2 N e − 2m
2m
Identity
Probability
Different parents Migration
At equilibrium
1
f =
1 + 4Nem
4 N e m = 2 × Number of migrants per generation
Only a few migrants each generation are
required to prevent a build up of identity
within the island population
Copyright: Gilean McVean, 2001
11
Relationship between FST and
migration rate
1
E[ FST ] ≈
1 + 4Nem
• Can estimate scaled migration rate from
estimated FST (assuming equilibrium, etc.)
100
Nem
10
1
0
0.2
0.4
0.6
0.8
1
FST
0.1
0.01
E.g. in humans, FST ≈ 0.067
N e m ≈ 3.5
NB: This is NOT a good estimator – do not trust the answer!
Copyright: Gilean McVean, 2001
12
Wright’s diffusion model for allele
frequencies with migration
Mainland
frequency = xm
Deterministic
M δx = m( xm − x)
x(1 − x)
Vδx =
2Ne
Island
frequency = x
Drift
Wright (1951)
4 N e m = 10
Probability
density
Allele frequency on
mainland = 0.5
4 N e m = 0 .2
0
0.2
0.4
0.6
0.8
1
allele frequency on island
Copyright: Gilean McVean, 2001
13
Example: SNP frequencies in African
Americans
• Goddard et al. (2000)
African American frequency
– 114 SNPs in 33 genes
– 190 African Americans sampled
1
0.75
0.5
0.25
0
0
0.25
0.5
0.75
1
Worldwide frequency
• Likelihood estimation of Nem from sample
– assume independence between SNPs
0
∆Ln(L)
0
5
-10
-20
10
Nem
15
N e m = 5.0
-30
-40
-50
Copyright: Gilean McVean, 2001
14
The coalescent in structured
populations
• Two-island model
Population 1
Pr{coalescence} =
Pr{migration} =
Copyright: Gilean McVean, 2001
Population 2
ni ( ni − 1)
4Ne
ni m
15
The time to coalescence for two
sequences sampled from the same
population
Pr{1st event is a coalescence}
1/ 2Ne
1
=
1 / 2 N e + 2m 1 + 4 N e m
Pr{1st event is a migration}
2m
4Nem
=
1 / 2 N e + 2m 1 + 4 N e m
4Ne
Expected time to coalescence =
Ne
For expected
pairwise diversity
(within population)
≡
Ne
Subdivided: 4Nem = 0.2
BUT
Single population
0
3
6
Copyright: Gilean McVean, 2001
9
12
15
18
21
24
2Ne
Variance
affected by
population
structure
Average pairwise differences
16
Effect on allele frequency spectrum
Slow
coalescence
between
populations
Mutation at
high frequency
Rapid
coalescence
within
population
Subdivided: 4Nem = 0.1
Single population
1
4
7
10
13
16
19
Frequency of derived allele
Copyright: Gilean McVean, 2001
17
Effect on neutrality statistics within
populations
• Tajima’s D statistic
Single population
Subdivided: 4Nem = 0.2
-4
-3
-2
-1
0
1
2
3
4
• Fu and Li D statistic
Single population
Subdivided: 4Nem = 0.2
-4.5
-3
-1.5
0
1.5
3
Main effect is to increase the variance
Other statistics (e.g. Fay and Wu, 2000) more sensitive
Copyright: Gilean McVean, 2001
18
Effect on polymorphism between
populations
• Tajima’s D statistic
Subdivided: 4Nem = 0.2
Single population
-4
-3
-2
-1
0
1
2
3
4
• Frequency distribution
Single
population
1
3
Copyright: Gilean McVean, 2001
5
7
Subdivided: 4Nem = 0.2
9
11
13
15
17
19
19
Effect on linkage disequilibrium
• Linkage disequilibrium measures correlations
between alleles at different loci
D = f AB − f A f B
• Population structure increases linkage
disequilibrium between linked loci
Single population
4Ner = 1
Subdivided: 4Nem = 0.1
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
r2
• Population structure creates linkage disequilibrium
between unlinked loci in different populations
f A = 0.2, f B = 0.8
D=0
f A = 0.8, f B = 0.2
D=0
Copyright: Gilean McVean, 2001
Naive analysis
D = 0.09
Admixture
20
Admixture dynamics
• Combination of two previously separated
populations
f − f = δA
1
A
2
A
f B1 − f B2 = δ B
• Over time random mating returns population to
equilibrium
D0 = 14 δ Aδ B
Dt = D0 (1 − r ) t
• Disequilibrium between unlinked loci can persist
for several generations, while Hardy-Weinberg
equilibrium is achieved instantly
Dt / D0
1cM distance
unlinked
0
Copyright: Gilean McVean, 2001
2
4
6
8
10
generation
21
Selection in a subdivided population
• Maruyama (1970)
– The fixation probability of an unconditionally
beneficial mutation is unaffected by population
structure (Pfix ≈ 2s)
• Levene (1953)
– Environmental heterogeneity can maintain
genetic polymorphism
favoured on
favoured on
• BUT
– If migration high, selection has to be strong and
finely balanced to habitat frequencies to
maintain polymorphism
• Low migration rates can promote local adaptation
– Heavy metal tolerance in plants
– Melanism in the peppered moth
– Milk tolerance in humans
Copyright: Gilean McVean, 2001
22
Selection at different scales
• Evidence for local adaptation from gradients in
allele frequency : clines
• Continental clines in Adh activity and allozyme
variation in Drosophila
∇1
Frequency
0.6
F/S
0.4
0.2
0
22
27
32
37
Driven by scale of environmental heterogeneity
42
47
Latitude
Berry & Kreitman (1993)
• Clines in genetic and morphological characters in
the toad Bombina
Balance between
selection against
hybrids and
migration,
following
secondary contact
1
Frequency B. variegata
morphological
Genetic
0
-10
-5
0
Distance (√km)
Copyright: Gilean McVean, 2001
5
10
Szymura & Barton (1991)
23
Indirect evidence for local adaptation?
Microsatellite diversity
• Local hitch-hiking?
India
China
Zimbabwe
Antilles
Locus
Schlötterer et al. (1997)
• But the structured coalescent also leads to variation
in coalescence times
Copyright: Gilean McVean, 2001
24
The interaction between selection,
gene flow and genetic drift
• Wright’s Shifting Balance theory
• Epistasis between alleles at different loci
AA Aa aa
Locus 1
BB
least fit
Locus 2 Bb
bb
most fit
• The adaptive landscape
– Epistasis creates adaptive valleys between
peaks of fitness
Adaptive valley
Population fitness
Starting point
of population
Frequency
allele B
Frequency allele A
Copyright: Gilean McVean, 2001
25
The Shifting Balance theory
• Drift allows population to cross adaptive valley due
to stochastic processes in finite populations
Subpopulations are natural
experiments, allowing
species to evolve across
complex adaptive landscapes
• Evidence for widespread epistasis?
– F2 hybrid breakdown
– Coadapted gene complexes
• Theoretical issues
– Very difficult for a population that has crossed a
valley to spread throughout rest of population
– The interaction between epistatic selection and
genetic drift may be important in reproductive
isolation
• e.g. recessive epistatic interactions important in
Haldane’s rule of unisexual hybrid sterility
Copyright: Gilean McVean, 2001
26
Future directions
• Theoretical and statistical issues
– Methods for discriminating between local
adaptation and chance effects of
coalescence in a structured population
– The relationship between population
structure and linkage disequilibrium
– Selection on polygenic traits in subdivided
populations
• Empirical challenges
– Describing patterns of gene diversity at
many loci across genomes (from an wellchosen sample)
– Comparing differentiation for different types
of mutation (e.g. silent v replacement)
– Mapping genetic variation to phenotypic
variation
Copyright: Gilean McVean, 2001
27