Download Document

Document related concepts
no text concepts found
Transcript
Joint Linkage and Linkage
Disequilibrium Mapping
Key Reference
Li, Q., and R. L. Wu, 2009 A multilocus model for
constructing a linkage disequilibrium map in
human populations. Statistical Applications in
Genetics and Molecular Biology 8 (1): Article 18.
Genetic Designs for Mapping
• Controlled crosses – Backcross, F2, full-sib
family, … (linkage)
• Unrelated (random) individuals from a natural
population (linkage disequilibrium)
• Cases and controls from a natural population
• Unrelated (random) families from a natural
population (linkage and LD)
• Related (non-random) families from a natural
population (linkage, LD and identical-by-descent)
Family designs are increasingly used for genetic
studies because of much information contained.
Natural Population
• Consider two SNPs 1 (with two allele A and a)
and 2 (with two alleles B and b)
• The two SNPs are linked with recom. frac. r
• The two SNPs form four haplotypes, AB, Ab,
aB, and ab
• Prob(A) = p, Prob(B) = q, linkage
disequilibrium = D. We have haplotype
frequencies as
Diagrammatic Presentation
Family Design: family number and size
Mating frequencies of families and
offspring genotype frequencies per family
HWE assumed
Can you figure out where this assumption is
needed?
Segregation of double heterozygote
• Overall haplotype frequencies produced by
this parent are calculated as
1/2ω1 for AB or ab and 1/2ω2 for Ab or aB
A Joint Probability
• Mother genotypes (Mm)
• Father genotypes (Mf )
• Offspring genotypes (Mo)
P(Mm,Mf,Mo) = P(Mm,Mf)P(Mo|Mm.Mf)
= P(Mm)P(Mf)P(Mo|Mm,Mf)
A joint two-stage log-likelihood
Let unknown parameters
Upper-stage Likelihood
EM algorithm for Θ
• E step
• M step
Lower-stage Likelihood
EM algorithm for r
• E step - calculate the probability with which a
considered haplotype produced by a double
heterozygote parent is the recombinant type
using
E step (cont’d)
• Calculate the probability with which a double
heterozygote offspring carries recombinant
haplotypes by
M step
where m equals the sum of the following terms:
Hypothesis tests
Linkage and Linkage disequilibrium
H0: r = 0 and D = 0
H1: At least one equality does not hold
LR = -2(log L0 – log L1)
Critical threshold x2 (df=2)
Hypothesis tests
Sex-specific difference in population structure
Hypothesis test
• Sex-specific difference in the recombination
fraction
Simulation
rˆ
r


r

r

r

Power
Conclusions
The model can jointly estimate the linkage and
linkage disequilibrium between two markers
- LD from parents
- Linkage from offspring
The model can draw a LD map to study the
evolution of populations and high-resolution
mapping of traits
Three-locus Analysis
Marker segregation in a natural population:
Three markers produce eight haplotypes: ABC, ABc, AbC, Abc, aBC, aBc, abC, and abc.
Haplotype frequencies are
P(A) = p, P(a) = 1 - p
P(B) = q, P(b) = 1 - q
P(C) = r, P(c) = 1 – r
DAB = LD between markers A and B, DBC = LD between markers B and C,
DAC = LD between markers A and C, DABC = LD among markers A, B, and C
Three-locus Analysis: Marker segregation in a family
Consider a triple heterozygote AaBbCc
AaBbCc produces 8 types of gametes (haplotypes) which
are classified into four groups
Recombinant # between
ABC and abc
ABc and abC
aBC and Abc
AbC and aBc
A and B
B and C
0
0
1
1
0
1
0
1
Frequency
g00
g01
g10
g11
Matrix notation
Markers A and B
Markers B and C
Recombinant
Non-recombinant
Total
Recombinant
Non-recombinant
g11
g01
g10
g00
rAB
1-rAB
Total
rBC
1-rBC
1
What is the recombination fraction between A and C?
rAC = g01 + g10
Thus, we have
rAB = g11 + g10
rBC = g11 + g01
rAC = g01 + g10
Triple heterozygote may have four possible diplotypes, each producing eight
haplotypes with frequencies given below:
AaBbCC may have two possible diplotypes, each producing four
haplotypes with frequencies given below:
How about
AaBbcc
AaBBCc
AabbCc
AABbCc
aaBbCc
How about AaBBCC and other genotypes with one marker being heterozygous?
Study design
For a parent with triple heterozygotic genotype AaBbCc, there will be four
possible diplotypes, ABC|abc, Abc|abC, AbC|aBc or Abc|aBC, whose relative
frequencies in the natural population are
These diplotypes will produce haplotypes ABC, ABc, AbC, Abc, aBC, aBc, abC,
and abc, with the frequencies:
For a parent with
double heterozygotic
genotypes, the possible
diplotypes and their
according relative
frequencies are listed
here:
Let
Note: theta’s are
the recombination
fraction
Upperlevel
Likelihood
EM algorithm
E step: calculate the probability with which a double heterozygote parent carries
a particular diplotype and a triple heterozygote parent carries a particular diplotype
M step: estimate haplotype frequencies by
Lower-level
likelihood
EM algorithm
In the E step: The probabilities with which a
considered haplotype produced by a double
heterozygote or triple heterozygote parent is
the recombinant type are calculated.
In the M step: The estimates of crossover
probabilities g's are obtained.
Very complex – omitted here.
Simulation
Conclusions
Three-point analysis provides the estimates of high-order
LD and the pair-wise linkage (this helps to model genetic
interference)
rAC = rAB + rBC – 2crABrBC, where c is related to genetic
interference
Three-point analysis can provide the estimation of the
linkage and linkage disequilibria as precisely as two-point
analysis although more parameters need to be estimated
for the former
Three-point analysis can estimate the linkage when two
markers are not associated (LD = 0).
Quantitative Genetic Analysis
We now consider the genetic effects of
haplotypes on complex phenotype
Study Design
Notation
Unifying Likelihood
The first part
This can be estimated by the algorithm
developed before
The second part
Risk Haplotype
Genetic effects
EM algorithm
M step
Hypothesis tests
Model selection
Simulation
Simulation with three markers
Power
Part of this lecture come from Dr. Qin Li’s dissertation.
Related documents