Download Human_lecture5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Slide 1
Linkage Disequilibrium
Joe Mychaleckyj
Center for Public Health Genomics
982-1107
[email protected]
Joe Mychaleckyj
Slide 2
Today we’ll cover…
•
•
•
•
Haplotypes
Linkage Disequilibrium
Visualizing LD
HapMap
Joe Mychaleckyj
Slide 3
References
Principles of Population Genetics,
Fourth Edition (Hardcover) by Daniel L. Hartl,
Andrew G. Clark (Author)
Genetic Data Analysis II Bruce S Weir
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
x
Joe Mychaleckyj
x
x
Slide 4
References
Statistical Genetics: Gene Mapping Through
Linkage and Association Eds Benjamin M.
Neale, Manuel A.R. Ferreira,
Sarah E. Medland, Danielle Posthuma
Joe Mychaleckyj
Slide 5
SNP1
[A / T]
SNP2
[C / G]
SNP3
[A / G]
A
A
T
C
C
G
G
A
G
Haplotype: specific combination of alleles occurring (cis) on the same
chromosome (segment of chromosome)
N SNPs - How many Haplotypes are possible ?
2N (ie very large diversity possible)
Joe Mychaleckyj
Slide 6
Terminology
• Haplotype: Specific combination
(phasing) of alleles occurring (cis) on
the same chromosomal segment
• Linkage/Linked Markers: Physical colocation of markers on the same
chromosome
• Diplotype: Haplogenotype ie pair of
phased haplotypes one maternally, one
paternally inherited
Joe Mychaleckyj
Slide 7
SNP1 [ A / a ]
SNP2 [ B / b ]
Major Allele Freq:
p(A)
p(B)
Minor Allele Freq:
p(a)
p(b)
Independently segregating SNPs:
Haplotype Frequency p(ab) = p(a) x p(b)
LINKAGE EQUILIBRIUM
(How many haplotypes in total ?)
LINKAGE DISEQUILIBRIUM
Haplotype Frequency p(ab)≠ p(a) x p(b)
Joe Mychaleckyj
Slide 8
Linkage Disequilibrium
• Non-random assortment of alleles at 2
(or more) loci
• The closer the markers, the stronger
the LD since recombination will have
occurred at a low rate
• Markers co-segregate within and
between families
Joe Mychaleckyj
Slide 9
* LINKAGE EQUILIBRIUM *
Not a Punnett
Square!
SNP2 Allele
SNP1
Allele
B
b
A
p(A)p(B)
p(A)p(b) p(A)
a
p(a)p(B)
p(a)p(b) p(a)
p(B)
p(b)
Example:
p(A)p(B)+p(a)p(B)=p(B){ p(A)+p(a)} = p(B)
Joe Mychaleckyj
Slide 10
SNP1 [ A / a ]
SNP2 [ B / b ]
Major Allele Freq:
p(A)
p(B)
Minor Allele Freq:
p(a)
p(b)
LINKAGE DISEQUILIBRIUM
Haplotype Frequency p(ab) = p(a) p(b) + D
(sign of D is generally arbitrary, unless comparing D values
between populations or studies)
D: Lewontin’s LD Parameter (Lewontin 1960)
Joe Mychaleckyj
Slide 11
* LINKAGE DISEQUILIBRIUM *
SNP1
Allele
SNP2 Allele
B
b
A
p(A)p(B)+D
p(A)p(b)-D p(A)
a
p(a)p(B)-D
p(a)p(b)+D p(a)
p(B)
p(b)
p(A)p(B)+D + p(a)p(B)-D =p(B){ p(A)+p(a)} =
p(B)
Joe Mychaleckyj
Slide 12
b
a
A
0.16
0.14
B
0.04
0.66
p(a)=0.20
p(B)=0.80
What is the LD ?
≠0
p(ab) ≠ p(a) p(b)
p(b)=0.30 p(B)=0.70
p(ab) = p(a) p(b) + D
0.16 = 0.2 x 0.3 + D
D = 0.1
Since p(ab) = p(a)p(b)+ D
+D was used and D is +ve here, but arbitrary
eg can relabel alleles A,B as minor
Joe Mychaleckyj
Slide 13
Range of D values (-ve to +ve)
D has a minimum and maximum value that depends on the allele
frequencies of the markers
Since haplotype frequencies cannot be -ve
p(aB) = p(a)p(B) - D ≥ 0
D ≤ p(a)p(B)
p(Ab) = p(A)p(b) - D ≥ 0
D ≤ p(A)p(b)
These cannot both be true, so D ≤ min( p(a)p(B), p(A)p(b) )
p(ab) = p(a)p(b) + D ≥ 0
D ≥ -p(a)p(b)
p(AB) = p(A)p(B) + D ≥ 0
D ≥ -p(A)p(B)
These cannot both be true, so D ≥ max( -p(a)p(b), -p(A)p(B) )
* Similar equations if we had defined p(ab) = p(a)p(b) - D
Joe Mychaleckyj
Slide 14
Limits of D LD Parameter
Limits of D are a function of allele
frequencies
Standardize D by rescaling to a
proportion of its maximal value for the
given allele frequencies (D')
D’ =
D
Dmax
Joe Mychaleckyj
Slide 15
D’ (Lewontin, 1964)
D’ = D / Dmax
Dmax = min (p(A)p(B), p(a)p(b))
D<0
Dmax = min (p(A)p(b), p(a)p(B))
D>0
Again, sign of D’ depends on definition
D’ = 1 or -1 if one of p(A)p(B), p(A)p(b),
p(a)p(B), p(a)p(b) = 0
= Complete LD (ie only 3 haplotypes seen)
D’=1 or -1 suggests that no recombination has
taken place between markers
Beware rare markers - may not have enough
power/sample size to detect 4th haplotype
Joe Mychaleckyj
Slide 16
D’ Interpretation
b
B
b
B
a
0.06
0.14
p(a)=0.20
a
0.2
0
p(a)=0.20
A
0.24
0.56
p(A)=0.80
A
0.1
0.7
P(A)=0.80
p(b)=0.30 p(B)=0.70
D=0 ; Dmax undefined
p(b)=0.30 p(B)=0.70
D=Dmax =0.14 ; D’ = +1
p(a) = 0.2
D’=1 (perfect LD using D’ measure
- No recombination between marker
- Only 3 haplotypes are seen
Joe Mychaleckyj
p(b)= 0.3
Slide 17
Creation of LD
• Easiest to understand when markers are
physically linked
• Creation of LD
–
–
–
–
–
–
–
Mutation
Founder effect
Admixture
Inbreeding / non-random mating
Selection
Population bottleneck or stratification
Epistatic interaction
• LD can occur between unlinked markers
• Gametic phase disequilibrium is a more
general term
Joe Mychaleckyj
Slide 18
SNP1
SNP1
SNP2
A
B
A
n=3 haplotypes
Recombination
n=2 haplotypes
a
A
b
a
B
SNP1
SNP2
A
B
A
b
a
B
a
b
n=4 haplotypes
Joe Mychaleckyj
Slide 19
Destruction of LD
• Main force is recombination
• Gene conversion may also act at short
distances (~ 100-1,000 bases)
• LD decays over time (generations of
interbreeding)
Joe Mychaleckyj
Slide 20
SNP1
SNP2
Probability Recombination
occurs = θ
Probability Recombination
does not occur = 1-θ
Initial LD between SNP1 - SNP2: D0
After 1 generation
Preservation of LD:
D1 = D0(1-θ)
After t generations:
Dt = D0 (1- θ)t
NB: Overly simple model does not account for allele
frequency drift over time
Joe Mychaleckyj
Slide 21
Dt = D0 (1-θ)t
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Joe Mychaleckyj
Slide 22
r2 LD Parameter (Hill & Robertson, 1968)
r
2
=
D2
p(a)p(b)p(A)p(B)
• Squared correlation coefficient varies 0 - 1
• Frequency dependent
• Better LD measure for allele correlation
between markers - predictive power of SNP1
alleles for those at SNP2
• Used extensively in disease gene or
phenotype mapping through association
testing
Joe Mychaleckyj
Slide 23
r2 Interpretation
b
B
b
B
a
0.06
0.14 p(a)=0.20
a
0.2
0
p(a)=0.20
A
0.24
0.56 p(A)=0.80
A
0.1
0.7
p(A)=0.80
p(b)=0.30 p(B)=0.70
p(b)=0.30 p(B)=0.70
D=0 ; Dmax undefined
D=Dmax =0.14 ; D’ = +1
r2 = 0
r2 = 0.14/0.24 = 0.58
p(a) = 0.2
p(b) = 0.3
r2 ≠ 1 Correlation is not perfect, even
though D’ = 1
r2 = 1 if D’ = 1 and p(a) = p(b) = 0.3
Joe Mychaleckyj
Slide 24
r2 Interpretation
p(a) = 0.3
p(b) = 0.3
Only 2 haplotypes:
r2 = 1 Correlation is perfect
D’ =1 (less than 4 haplotypes)
p(a) = p(b) (= 0.3 in this example)
• r2=1 when there is perfect correlation between
markers and one genotype predicts the other exactly
– Only 2 haplotypes present
• D’ = 1 ≠> r2 = 1
• No recombination AND markers must have identical
allele frequency
– SNPs are of similar age
• Corollary
– Low r2 values do not necessarily = high recombination
– Discrepant allele frequencies Joe Mychaleckyj
Slide 25
Common Measures of Linkage Disequilibrium
-1
D’
1
Recombination
0
r2
1
Correlation
Other LD Measures exist, less
common usage
Joe Mychaleckyj
Slide 26
Visualizing LD metrics
Joe Mychaleckyj
Slide 27
SNP
1 2
| D’ |
1.0
0.8
0.6
0.2
0
3
4
5
6
SNP1
SNP2
SNP3
SNP4
SNP5
SNP6
Not usually worried about sign of D’
Joe Mychaleckyj
Slide 28
Joe Mychaleckyj
Slide 29
Haploview: TCN2 (r2)
Joe Mychaleckyj
Slide 30
http://www.hapmap.org
Launched October 2002
Joe Mychaleckyj
Slide 31
International HapMap
Project
• Initiated Oct 2002
• Collaboration of scientists worldwide
• Goal: describe common patterns of human
DNA sequence variation
• Identify LD and haplotype distributions
• Populations of different ancestry (European,
African, Asian)
– Identify common haplotypes and population-specific
differences
• Has had major impact on:
– Understanding of human popualtion history as reflected in
genetic diversity and similarity
– Design and analysis of genetic association studies
Joe Mychaleckyj
Slide 32
HapMap samples
• 90 Yoruba individuals (30 parent-parent-offspring
trios) from Ibadan, Nigeria (YRI)
• 90 individuals (30 trios) of European descent from
Utah (CEU)
• 45 Han Chinese individuals from Beijing (CHB)
• 44 Japanese individuals from Tokyo (JPT)
Joe Mychaleckyj
Slide 33
Project feasible
because of:
• The availability of the human genome sequence
• Databases of common SNPs (subsequently enriched by
HapMap) from which genotyping assays could be
designed
• Development of inexpensive, accurate technologies for
highthroughput SNP genotyping
• Web-based tools for storing and sharing data
• Frameworks to address associated ethical and cultural
issues
Joe Mychaleckyj
Slide 34
HapMap goals
• Define patterns of genetic variation across human
genome
• Guide selection of SNPs efficiently to “tag” common
variants
• Public release of all data (assays, genotypes)
• Phase I: 1.3 M markers in 269 people
1 SNP/5kb (1.3M markers)
Minor allele frequency (MAF) >5%
• Phase II: +2.8 M markers in 270 people
Joe Mychaleckyj
Slide 35
http://www.hapmap.org/
Joe Mychaleckyj
Slide 36
Joe Mychaleckyj
Slide 37
Joe Mychaleckyj
Slide 38
HapMap publications
•
The International HapMap Consortium. A Haplotype Map of the
Human Genome.
Nature 437, 1299-1320. 2005.
•
The International HapMap Consortium. The International
HapMap Project.
Nature 426, 789-796. 2003.
•
The International HapMap Consortium. Integrating Ethics and
Science in the International HapMap Project.
Nature Reviews Genetics 5, 467 -475. 2004.
•
Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. The
International HapMap Project Web site.
Genome Research,15:1591-1593. 2005.
Joe Mychaleckyj
Slide 39
ENCODE project
• Aim: To compare the genome-wide resource
to a more complete database of common
variation—one in which all common SNPs
and many rarer ones have been discovered
and tested
• Selected a representative collection of ten
regions, each 500 kb in length
• Each 500-kb region was sequenced in 48
individuals, and all SNPs in these regions
(discovered or in dbSNP) were genotyped in
the complete set of 269 DNA samples
Joe Mychaleckyj
Slide 40
Comparison of linkage
disequilibrium and
recombination for two ENCODE
regions
Nature 437, 1299-1320. 2005
Joe Mychaleckyj
Slide 41
LD in Human Populations
Joe Mychaleckyj
Slide 42
Haplotype Blocks
N SNPs = 2N Haplotypes possible, ie very large
diversity possible
But: we do not see the full extent of haplotype
diversity in human populations
Extensive LD especially at short distances eg
~20kbases.
Haplotypes are broken into blocks of markers with
high mutual LD separated by recombination hotspots
Non-uniform LD across genome
Joe Mychaleckyj
Slide 43
Haplotype Blocks
Haplotype blocks: at least 80% of observed haplotypes
with frequency >= 5% could be grouped into common
patterns
Whole Genome Patterns of Common DNA Variation
in Three Human Populations, Science 2005, Hinds et al.
Joe Mychaleckyj
Slide 44
Length of LD spans
r2
Joe Mychaleckyj
Slide 45
Example: Large block of LD on chromosome 17
Cluster of common (frequent SNPs In high LD)
518 SNPs, spanning 800 kb
25% in EUR, 9% in AFR, missing in CHN
Genes:
Microtubule-associated protein tau
Mutations associated with a variety of
neurodegeneartive disorders
Gene coding for a protease similar to
presenilins
Mutations result in Alzheimer’s disease
Gene for corticotropin-releasing hormone
receptor
• Immune, endocrine, autonomic, behavioral response to
stress
Joe Mychaleckyj
Slide 46
Chromosome 17 LD Region
Prevalent inversion in EUR
human population
~25%
Joe Mychaleckyj
Related documents