Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Genomic Conflict and
DNA Sequence Variation
Marcy K. Uyenoyama
Department of Biology
Duke University
Overview
• Population genetics
Historically model-rich
Present need: model-based interpretation of observed
patterns of genomic variation
What are hallmarks of each model?
• Self-incompatibility systems in plants
Recognizing genomic conflict due to sexual
antagonism
Canonical models
• Neutral evolution
Pure neutrality: distribution of offspring number is
independent of any trait in parent
Demographic history: deme founding, gene flow
Purifying selection: maintain functioning state
against random deleterious mutations
• Selection
Balancing selection: maintenance of different forms
Selective sweeps: substitution of most fit for less fit
Hallmarks of evolution
• How do we know it when we see it?
Patterns evident in genome variation
• Model selection
Choosing among a small number of canonical models
for any particular system
A random sample of genes
Observed
Sample
Ancestral sequence
Allele and mutation spectra
Site frequency spectrum
Number of mutations
7
6
5
4
3
2
1
0
1
2
3
4
5
6
17
Multiplicity
a = {a1 = 6, a3 = 1, a5 = 1, a6 = 1}, for ai
the number of alleles with multiplicity i
The neutral coalescent
Sample root from stationary distribution of P,
mutation transition matrix and bifurcate
After an interval t : exp(1 2 )
choose a lineage at random
– Replace it by two identical copies
with probability 1 / (1 2 )
– Mutate it according to P with probability
2 / (1 2 )
Evolutionary rates
• Events on level k
Bifurcation at rate
k
1 / N
2
Mutation at rate
2 ku
• Population parameters: ratios of rates
Next event is a bifurcation/coalescence with probability
k
/ 2 N
2
1
k 1
u
lim
for lim
u
,
1
/
N
0
u ,1 / N 0 1 / 2 N
1 2
k 1
k
/ 2 N ku
2
Allele and mutation spectra
Site frequency spectrum
Number of mutations
7
6
5
4
3
2
1
0
1
2
3
4
5
6
17
Multiplicity
a = {a1 = 6, a3 = 1, a5 = 1, a6 = 1}, for ai
the number of alleles with multiplicity i
Infinite-alleles model
• Mutation
Novel allelic types formed at rate u per gene per generation
• Reproduction
Frequency of allele i in the parental population: pi
Multinomial sampling of N genes to form the offspring
To find: probability of the sample of n genes
(n1, n2, …, nk) or (a1, a2, …, an)
for k the number of distinct haplotypes (alleles)
ni the number of replicates of allele i
ai the number of alleles with i replicates
Ewens sampling formula
n!
1
p(a)
( 1) ( n 1) i 1 i ai !
n
ai
a = (a1, a2, …, an), for ai the number of alleles represented
by i replicates in a sample of size n
= 2Nu, for N the effective number of genes and
u the per-locus, per-generation rate of mutation
Ewens (1972, Theoretical Population Biology)
Allele and mutation spectra
Site frequency spectrum
Number of mutations
7
6
5
4
3
2
1
0
1
2
3
4
5
6
17
Multiplicity
a = {a1 = 6, a3 = 1, a5 = 1, a6 = 1}, for ai
the number of alleles with multiplicity i
Population genomics
About 750 accessions isolated from natural populations worldwide
Summary statistics for sample of 19 entire genomes
http://www.arabidopsis.org
Arabidopsis SNP spectra
Minor allele counts
2
3
4
5
6
7
8
Site frequency spectra differ among functional classes
Kim et al. (2008 Nature Genetics. 39: 1151)
ESF conditioned on two alleles
• Biallelic sample of size m
m 1
P( K 2 | m)
l 1
m
l 1
j 2
j 1
j 1
• Multiplicities i and (m – i )
P(ai 1, am i 1| K 2, m)
1/ i 1/ (m i )
m 1
1/ j
for i m / 2
j 1
P(am /2 2 | K 2, m)
2/m
m 1
1/ j
j 1
independent of !
Ewens sampling formula
n!
1
p(a)
( 1) ( n 1) i 1 i ai !
n
ai
a = (a1, a2, …, an), for ai the number of alleles represented
by i replicates in a sample of size n
= 2Nu, for N the effective number of genes and
u the per-locus, per-generation rate of mutation
Ewens (1972, Theoretical Population Biology)
Actual site frequency spectra
Excess of rare and common types, deficiency of intermediate types
Data from NIEHS Environmental Genome Project
Direct resequencing of loci considered environmentally-sensitive
Global representation of ethnicities
Hernandez, Williamson, and Bustamante (2007)
Spectrum shape
Signature of expansion?
Expansions maintain more rare mutations
Signature of selective sweep?
Neutral variants experience selection as
a population bottleneck
Braverman et al. (1995)
Black: constant population size
Grey: recent expansion from small population size
Arabidopsis SNP spectra
Minor allele counts
2
3
4
5
6
7
8
Site frequency spectra differ among functional classes
Kim et al. (2008 Nature Genetics. 39: 1151)
Modelling a SNP data set
Nordborg (2001 Handbook of Statistical Genetics)
• Single segregating mutation in the sample genealogy
Conditional on exactly one segregating site, determine the
distribution of the size (number of descendants) of the
branch on which the mutation occurs
• Exactly two alleles in the sample
Conditional on two haplotypes, bearing any number of
segregating sites, determine the distribution of numbers of
the two alleles
Conditioning
• Two alleles
m 1
P( K 2 | m)
l 1
m
l 1
j 2
j 1
j 1
• One segregating site
m 1
P( S 1| m)
l 1 l 1
m
j 2
j 1
j 1
Multiplicity conditioned on a SNP
• Single segregating site in a sample of size m
m 1
P( S 1| m)
l 1
m
l 1
j 2
j 1
j 1
• Multiplicity i
1 m i 1 l 1 m l
i l 2 l 1 i 1
f (i | m, )
m 1 m
1
i j 2 j 1
dependent on θ !
Ganapathy and Uyenoyama (2009 Theoretical Population Biology)
Arabidopsis SNP spectra
Minor allele counts
2
3
4
5
6
7
8
Site frequency spectra differ among functional classes
Kim et al. (2008 Nature Genetics. 39: 1151)
Overview
• Population genetics
Historically model-rich
Present need: model-based interpretation of observed
patterns of genomic variation
What are hallmarks of each model?
• Self-incompatibility systems in plants
Recognizing genomic conflict due to sexual
antagonism
Genomic conflict
• Phenotypes
Multiple genes generally influence a given phenotype
• Conflict
Target trait value differs among genes that control
phenotype
Sexual antagonism
Male and female function collaborate in reproduction
Genes influencing each function may come into conflict
Conflict and genomic variation
• Mating type regions as a battleground
S-locus controls self-incompatibility in flowering
plants
How does sexual antagonism affect the pattern of
molecular-level variation within the S-locus?
What are hallmarks of conflict?
• Develop a basis for inference
Model-based approach to the analysis of genetic
variation
• Flower development
Basic perfect flower includes
both male and female
components
Mariana Ruiz
http://commons.wikimedia.org/wiki/File:Mature_flower_diagram.svg
• Fertilization
Pollen grains deposited on
stigma germinate and pollen
tubes grow down style to
the ovary
Mariana Ruiz
http://commons.wikimedia.org/wiki/File:Mature_flower_diagram.svg
• Gametophytic SI (GSI)
Specificity expressed by
individual pollen grain or
tube determined by own Sallele
• Pollen rejection
Growth of pollen tube
arrested in style
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Gametophytic_self-incompatibility.png
Mariana Ruiz
http://commons.wikimedia.org/wiki/File:Mature_flower_diagram.svg
• Sporophytic SI (SSI)
Specificity expressed by
individual pollen grain or
tube determined by the Slocus genotype of its parent
• Pollen rejection
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Sporophytic_self-incompatibility.png
Germination of pollen grain
may be arrested at stigma
surface
Mariana Ruiz
http://commons.wikimedia.org/wiki/File:Mature_flower_diagram.svg
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Gametophytic_self-incompatibility.png
Pistil (A) component: rejection of
recognized specificities
Pollen (B) component: declaration of
specificity
A n Bn
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Sporophytic_self-incompatibility.png
Sn
Mating type regions
Uyenoyama (2005)
Human Y chromosome
Skaletsky et al. (2003 Nature 423: 825)
• Non-recombining male-specific Y (MSY)
Euchromatic region ~ 23 MB
Differences between two random Ys every 3 – 4 KB
• Mammalian sex determinant SRY
Y-linked regulator of transcription of many male-specific
Y-linked genes
Mating type regions
Linkage between pistil (A) and pollen (B)
components is essential to SI function
• Pollen: declaration of specificity
• Pistil: rejection of recognized specificities
Uyenoyama (2005)
Brassica S-locus
Pollen component
Pistil component
Nasrallah (2000 Curr. Opin. Plant Biol.)
Natural populations often contain 30 – 50
S-alleles
Ubiquitin tags proteins for degradation
• Style: S-RNase disrupts pollen tube growth
Upon entering a pollen tube, S-RNases initially sequestered in a vacuole
In incompatible crosses, vacuole breaks down, releasing S-RNases into
cytoplasm of pollen tube
• Pollen: SLF (S-locus F-box)
Mediator of ubiquitinylation (attachment of ubiquitin)
Disables all S-RNases except those of the same specificity
Vierstra (2009, Nature Reviews Molecular Cell Biology)
Sexual antagonism
• Pistil: why reject fertilization?
Screening of potential mates may improve offspring
quality
Cost under incomplete reproductive compensation:
ovules may go unfertilized
• Pollen: why provoke rejection?
Self-rejection may improve quality of own ovules
Rejection by other plants reduces siring success
Hide behind another S-specificity in sporophytic SI?
Decline to declare S-specificity altogether?
GSI model
• Basic discrete time recursion
Pjk
Pik
P qi
qj
/2
k i , j 1 qi qk
k i , j 1 q j qk
'
ij
• Symmetries in genotype and allele frequencies
Model change in frequency of focal allele i, assuming
all other alleles in equal frequency
Pij P for j i
qi q P(n 1) / 2
n 1
Pjk [1 P(n 1)] /
for j, k i
2
q j (1 q) / (n 1) for j i
Wright (1937, Genetics)
Diffusion approximation
• Change in allele frequency
q(1 qn)
for n the number of common S-alleles
n 3 2q
nq(1 nq)
for q 1/ n
(n 1)(n 2)
q
• Diffusion equation coefficients
( x) nx(1 nx) / (n 1)(n 2) ux
2 ( x) x(1 2 x) / 2 N
holds for large population size (N) and u (rate of
mutation to new S-alleles) of order 1/N
Wright (1937, Genetics)
Number of S-alleles
Wright’s diffusion model
• Diffusion with jumps
nx(1 nx)
(x)
ux
(n 1)(n 2)
x(1 2x)
2
2N
• Turnover rate
Frequency in population
4 Nun
(n 1)(n 2)
Expansion of time scale
under balancing selection
• High rate of invasion of
rare alleles
Promotes invasion of new
and retention of rare types
Maintains high numbers of
alleles
• Genealogical relationships
Tree shape similar under
symmetric balancing
selection and neutrality
Greatly expanded time scale
Takahata (1993, Mechanisms of Molecular Evolution)
S-allele turnover
• Quasi-equilibrium of S-alleles
Invasion of new, rare S-alleles balanced by extinction
of common S-alleles
• Expansion of time scale
Rate of divergence among S-allele classes similar to
rate among neutral lineages, but in a population of
size fN:
j
j
(1 1/ n)
2
2
2
n
n(n 1)(n 2)
f
2 Nf
4N
16 N 2u
n
2
Gametophytic SI models
• Basic discrete time recursion
Pjk
Pik
P qi
qj
/2
k i , j 1 qi qk
k i , j 1 q j qk
'
ij
• Diffusion approximation
nx(1 nx)
( x)
ux
(n 1)(n 2)
x(1 2 x)
2
( x)
2N
Parameters:
Effective population size (N)
Rate of mutation to new S-specificities (u)
Simulation results
• Stationary distribution of
allele frequency
Most time spent close to
deterministic equilibrium (1/n)
or in boundary layer close to
extinction
• Number of S-alleles
Analytical expectation for
number of common S-alleles
Vallejo-Marín and Uyenoyama (2008)
Mariana Ruiz
http://commons.wikimedia.org/wiki/File:Mature_flower_diagram.svg
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Gametophytic_self-incompatibility.png
Pistil (A) component: rejection of
recognized specificities
Pollen (B) component: declaration of
specificity
A n Bn
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Sporophytic_self-incompatibility.png
Sn
Pollen specificity in GSI
• Each pollen expresses its
own specificity
Rarer specificities are
incompatible with fewer plants
• Incompatible matings
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Gametophytic_self-incompatibility.png
For n S-alleles in equal
frequencies, a pollen type is
incompatible with a proportion
2/n of all plants
Sexual antagonism
• Pistil: why reject fertilization?
Screening of potential mates may improve offspring
quality
Cost under incomplete reproductive compensation:
ovules may go unfertilized
• Pollen: why provoke rejection?
Self-rejection may improve quality of own ovules
Rejection by other plants reduces siring success
Hide behind another S-specificity in sporophytic SI?
Decline to declare S-specificity altogether?
Fate of style-part mutant
Col um n 2
of inbred offspring ( )
Relative viability
Da t a
f r om
" Ai n v "
1. 0
An+1 Bn
0. 8
Full SC
Sa
0. 6
C o
I
0. 4
0. 2
0. 0
0. 0
Polymorphism
Full SI
0. 2
0. 4
0. 6
s
Self-pollen fraction (s)
0. 8
1. 0
n
Fate of pollen-part mutant
inbred offspring ( )
Relative viabilityn=of 10
Da t a
f r om
" Bi n v "
1. 0
An Bn+1
0. 8
Sb
Full SC
0. 6
0. 4
0. 2
Disruption
Polymorphism
Full SI
0. 0
0. 0
0. 2
0. 4
0. 6
0. 8
1. 0
s
Self-pollen
fraction (s)
Uyenoyama, Zhang, and Newbigin (2001)
An Bn
Sn
An Bn+1
An+1 Bn
Sb
Sa
Direction of pollen flow
An+1 Bn+1
Sn+1
Uyenoyama, Zhang, and Newbigin (2001)
An Bn
Evolutionarily unlikely
Sn
TURN OFF
Partial breakdown of SI
by pollen disablement
An Bn+1
An+1 Bn
Sb
Sa
Evolutionarily unlikely
An+1 Bn+1
TURN ON
Restoration of SI
by stylar recognition
Sn+1
Uyenoyama, Zhang, and Newbigin (2001)
Joint genealogies
Solanaceae and Plantaginaceae
Rosaceae
Unlike S-RNase genes, SLF genes show
– Low divergence between allelic types
– No trans-specific sharing of lineages
Newbigin, Paape, and Kohn (2008)
Cycles of loss/restoration of SI?
• Family-specific genealogies
Rosaceae: do highly-diverged, ancient SFB lineages
reflect continuous operation or restoration of same
F-box genes?
Solanaceae, Plantaginaceae: Recruitment of new Fbox genes?
• Turnover of pollen-specificity loci
Expression and recognition of a paralogue of the
former pollen specificity gene?
Can homologues be distinguished from paralogues
with new function?
Brassica S-locus
Pollen component
Pistil component
Nasrallah (2000 Curr. Opin. Plant Biol.)
Natural populations often contain 30 – 50
S-alleles
An appeal for inference methods
• Sexual antagonism in mating type regions
Neutral variation in linked regions
Rates of substitution at determinants of mating type
• Inference
Goal: use the pattern of variation in population
samples of genomic regions as a basis for inference
about the evolutionary process
Detection
• genomic conflict and other forms of selection
• mating systems and population structure
Pollen specificity in SSI
• Codominance
Both specificities expressed
Almost twice as many incompatible
styles under SSI than GSI for same
number of S-alleles
• Complete dominance
One specificity expressed
Norbert Holstein
http://commons.wikimedia.org/wiki/File:Sporophytic_self-incompatibility.png
SRK genealogies
• Sporophytic SI
Diploid genotype of pollen parent
determines S-specificity of each
pollen grain
Class I is dominant over Class II,
with codominance within class
• Class II: pollen-recessive
Lower number of segregating
alleles, each with relatively
higher frequency in population
Greater genealogical relationship
within class?
Edh, Widén and Ceplitis (2009)
Is class II
younger
than class I?
• MRCA ages
Class I: 25.5 ± 8.1 MY
Class II: 3.1 ± 0.9 MY
I/II: 41.4 ± 12.7 MY
• Origin of SLG/SRK
system
42.1 ± 9.0 MY
Uyenoyama (1995)