Download Slide 1

Document related concepts
no text concepts found
Transcript
Association analysis
Shaun Purcell
Boulder Twin Workshop 2004
Overview
• Candidate gene association
• Haplotypes and linkage disequilibrium
• Linkage and association
• Family-based association
What is association?
• Categorical traits
– disease susceptibility genes
• Continuous traits
– quantitative trait loci, QTL
Disease traits
Is there a difference in allele/genotype frequency
between cases and controls?
Case
AA n1
Aa n3
aa n5
Control
n2
n4
n6
Disease traits
Is there a difference in allele/genotype frequency
between cases and controls?
Case
AA
Aa
aa
Control
30
50
20
Test for independence
p2
2p(1-p)
(1-p)2
25
50
25
 2,
p-value
Disease traits
Additive model
General model
Case Control
Case
Control
Dominant model for A
Case Control
AA n1
n2
A
2n1+n3
2n2+n4
A* n1+n3 n2+n4
Aa n3
n4
a
2n5+n3
2n6+n4
aa
aa
n6
n5
1 df
2 df
Effect sizes calculated as odds ratios
n5
n6
1 df
Quantitative traits
4
3
2
Aa
1
0
aa
AA
-1
-2
Y = aA + dD + e
ID
001
002
003
004
005
…
aa
Y
0.34
1.23
1.66
2.74
1.33
…
Aa
G
aa
Aa
Aa
AA
AA
…
AA
A
-1
0
0
1
1
…
D
0
1
1
0
0
…
Some web resources
• BGIM
http://statgen.iop.kcl.ac.uk/bgim/
Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language.
• GxE moderator models
http://statgen.iop.kcl.ac.uk/gxe/
• Power calculation
http://statgen.iop.kcl.ac.uk/gpc/
• Case/control association tools
http://statgen.iop.kcl.ac.uk/gpc/model/
Relative risk
Genotype
P(D|G)
RR
AA
P(D|AA)
P(D|AA)/P(D|aa)
Aa
P(D|Aa)
P(D|Aa)/P(D|aa)
aa
P(D|aa)
1
P(D|AA) / P(D|aa) labelled RR(AA)
P(D|Aa) / P(D|aa)
labelled RR(Aa)
Genetic models
Model
RR(Aa)
RR(AA)
General
x
y
Multiplicative
x
x2
Dominant
x
x
Recessive
1.000
x
No effect
1.000
1.000
Tests
Test
Alternate
Null
Any effect?
General
No effect
Any effect assuming a multiplicative gene?
Multiplicative
No effect
Any effect assuming a dominant gene?
Dominance
No effect
Any effect assuming a recessive gene?
Recessive
No effect
Can we assume a multiplicative effect?
General
Multiplicative
Can we assume a dominant effect?
General
Dominance
Can we assume a recessive effect?
General
Recessive
Multiple samples
• Constrain frequencies across samples
• Constrain effects across samples
– Can test genetic models with effects and/or
frequencies constrained to be equal
– Can perform tests of homogeneity of effects and/or
frequencies across samples
An example
2 case/control samples
• Population frequency 5%
Case Control
Case Control
AA 17
11
AA 37
10
Aa
35
59
Aa 67
43
aa
24
40
aa
37
20
Homogeneous effects across samples
Homogeneous allele frequencies across samples
Model
----Gen
Mult
Dom
Rec
None
p
0.367
0.367
RR(Aa)
-----1.979
1.979
RR(AA)
-----3.663
3.663
-2LL
----
0.367
0.367
1.911
1.911
3.651
3.651
793.199
0.401
0.401
1.990
1.990
1.990
1.990
802.927
0.405
0.405
1.000
1.000
1.921
1.921
805.064
0.442
0.442
1.000
1.000
1.000
1.000
815.628
793.143
Heterogeneous effects across samples
Homogeneous allele frequencies across samples
Model
----Gen
Mult
Dom
Rec
None
p
0.367
0.367
RR(Aa)
-----1.235
2.890
RR(AA)
-----2.136
5.547
-2LL
----
0.367
0.367
1.440
2.282
2.073
5.208
788.262
0.401
0.401
1.216
2.936
1.216
2.936
796.422
0.405
0.405
1.000
1.000
1.519
2.195
803.849
0.443
0.443
1.000
1.000
1.000
1.000
815.628
786.498
TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS
=========================================================
Gen
Mult
Dom
Rec
Gen
Gen
Gen
vs
vs
vs
vs
vs
vs
vs
None
None
None
None
Mult
Dom
Rec
(2
(1
(1
(1
(1
(1
(1
df)
df)
df)
df)
df)
df)
df)
:
:
:
:
:
:
:
22.485
22.429
12.701
10.564
0.056
9.784
11.921
p
p
p
p
p
p
p
=
=
=
=
=
=
=
0.000
0.000
0.000
0.001
0.813
0.002
0.001
TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS
===========================================================
Gen
Mult
Dom
Rec
Gen
Gen
Gen
vs
vs
vs
vs
vs
vs
vs
None
None
None
None
Mult
Dom
Rec
(4
(2
(2
(2
(2
(2
(2
df)
df)
df)
df)
df)
df)
df)
:
:
:
:
:
:
:
29.130
27.366
19.205
11.779
1.764
9.925
17.351
p
p
p
p
p
p
p
=
=
=
=
=
=
=
0.000
0.000
0.000
0.003
0.414
0.007
0.000
TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS
===========================================
w/
w/
w/
w/
Gen model
Mult model
Dom model
Rec model
(2
(1
(1
(1
df)
df)
df)
df)
:
:
:
:
6.645
4.938
6.505
1.215
p
p
p
p
=
=
=
=
0.036
0.026
0.011
0.270
Indirect association
Genotyped markers
QTL
Ungenotyped markers
Recombination
Homologous chromosomes in one parent
Paternal chromosome
Maternal chromosome
Recombination event
during meiosis
Recombinant gamete transmitted,
harboring mutation
Recombination
Homologous chromosomes in one parent
Paternal chromosome
Maternal chromosome
No recombination event
during meiosis
Nonrecombinant gamete transmitted,
not harboring mutation
Linkage: affected sib pairs
Paternal chromosome
Maternal chromosome
First affected offspring,
no recombination
Second affected offspring,
recombinant gamete
IBD sharing from this one parent (0 or 1)
1
0
Association analysis
• Mutation occurs on a ‘red’ chromosome
Association analysis
• Mutation occurs on a ‘red’ chromosome
Association analysis
• Association due to `linkage disequilibrium’
Haplotypes
M
m
A
AM
Am
a
aM
am
This individual has aa and Mm genotypes
and am and aM haplotypes
Haplotypes
M
m
A
AM
Am
a
aM
am
This individual has Aa and Mm genotypes
and AM and am haplotypes
… but given only genotype data,
consistent with Am/aM as well as AM/am
Haplotypes
M
m
A
AM
Am
a
aM
am
This individual has AA and Mm genotypes
and AM and Am haplotypes
Equilibrium haplotype frequencies
M
m
A
a
pr
qr
r
ps
qs
s
p
q
Linkage disequilibrium
M
m
A
a
pr + D
qr - D
r
ps - D
qs + D
s
DMAX = Min(qs, pr)
D’ = D /DMAX
r2 = D’ / pqrs
p
q
Haplotype analysis
1. Estimate haplotypes from genotypes
2. Associate haplotypes with trait
Haplotype
AAGG
AAGT
CGCG
AGCT
Freq.
40%
30%
25%
5%
Odds Ratio
1.00*
2.21
1.07
0.92
* baseline, fixed to 1.00
Linkage
Association
Sib correlation
Trait
aa Aa AA
QTL genotype
0 1 2
IBD at the QTL
Sib correlation
Sib correlation
Trait
LD
RF
0 1 2
IBD at the Marker
Trait
0
1
2
IBD at the QTL
aa Aa AA
Marker genotype
aa Aa AA
QTL genotype
Variance Components
• Means
M1
M2
ASSOCIATION
• Variance-covariance matrix
LINKAGE
V1
C12
C21
V2
Variance Components
• Means
M1 + bG1
M2 + bG2
ASSOCIATION
b = regression coef.
G = individual’s genotype
• Variance-covariance matrix
LINKAGE
V1
C12 + q(-½)
C21+ q(-½)
V2
q = regression coef.
 = IBD sharing
0, ½,1
Components of a Genetic Theory
G
– Allele & genotype frequencies
G
– Demographics & population history
– Linkage disequilibrium, haplotype structure
• TRANSMISSION MODEL
– Mendelian segregation
– Identity by descent & genetic relatedness
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
P
P
• PHENOTYPE MODEL
– Biometrical model of quantitative traits
– Additive & dominance components
G
Time
• POPULATION MODEL
G
Linkage without association
3/5
3/6
2/6
5/6
3/5
3/2
Both families are ‘linked’ with the marker…
…but a different allele is involved.
2/6
5/2
Linkage and association
3/5
3/6
2/6
5/6
3/6
3/2
2/4
6/2
4/6
6/6
All families are ‘linked’ with the marker…
… and allele 6 is ‘associated’ with disease
Linkage is just association within families
2/6
6/6
Association without linkage
Controls
Cases
6/6
6/2
3/5
3/4
3/6
2/4
3/2
5/6
3/6
4/6
2/2
2/6
5/2
Allele 6 is more common in the GREEN population
The disease is more common in the GREEN population
… a ‘spurious association’
2/5
TDT
• Transmission disequilibrium test
– test for linkage and association
AA
Aa
Aa
AA
AA
AA
aa
AA
Aa
Aa
Aa
Aa
TDT “A” disease allele
AA x Aa
AA x Aa
aa x Aa
aa x Aa
AA
Aa
Aa
aa
Additive
+
-
+
-
Dominant
0.5
0.5
+
-
Recessive
+
-
0.5
0.5
Between and within components
Sib1
Sib2
Sib1 = B - W
Sib2 = B + W
Between and within components
• Fulker et al (1999)
S1
S2
S1 S2
B
W
S1
S2
AA
AA
1
1
1
0
B+W
B-W
AA
Aa
1
0
0.5
0.5
B+W
B-W
AA
aa
1
-1
0
1
B+W
B-W
Note : W = S1 – B
Parental genotypes
• Use parental genotypes to
generate B
• Examples
– AA from AAxAA
– Aa from AAxAa
– Aa from AaxAa
W=0
W = -0.5
W=0
Pat Mat
B
1
1
1
0
1
0
-1
1
1
0.5
0
0.5
0
0
-1
0
-1
1
0
-0.5
0
-1
-1
0
-1
-0.5
-1
assoc.mx
• Sibling pair sample
• B and W components precalculated in input file
• Single SNP genotype
• Quantitative trait
assoc.dat
s1
-0.007
-0.829
0.369
0.318
1.52
-0.948
0.596
-1.91
0.499
-1.17
-0.16
s2
-0.972
-0.196
0.645
1.55
0.910
-1.55
-0.394
-0.905
0.940
-1.29
-1.81
g1
-1
1
1
0
0
1
1
0
1
1
1
g2
0
1
1
1
0
1
0
1
0
0
1
b
-0.5
1
1
0.5
0
1
0.5
0.5
0.5
0.5
1
w1
-0.5
0
0
-0.5
0
0
0.5
-0.5
0.5
0.5
0
w2
0.5
0
0
0.5
0
0
-0.5
0.5
-0.5
-0.5
0
! Mx script for QTL association: sib pairs, univariate
Group 1 :
Calc NG=2
Begin Matrices;
! ** Parameters
B Full 1 1 free
W Full 1 1 free
M Full 1 1 free
S Full 1 1 free
N Full 1 1 free
! association : between component
! association : within component
! mean
! Shared residual variance
! Nonshared residual variance
! ** Definition variables **
C Full 1 1
! association : between
X Full 1 1
! association : within, sib 1
Y Full 1 1
! association : within, sib 2
End Matrices;
! ** Uncomment for B=W model
! Equate W 1 1 1 B 1 1 1
! Starting values
Matrix B 0
Matrix W 0
Matrix M 0
Matrix S 0.5
Matrix N 0.5
End
Group2 : Data Group
Data NI=7 NO=0
RE file=assoc.dat
Labels Sib1 Sib2 g1 g2 b w1 w2
Select Sib1 Sib2 b w1 w2 /
Definition b w1 w2 /
Matrices = Group 1
Means
M + B*C + W*X
Covariance
S + N
S
|
|
Specify C b /
Specify X w1 /
Specify Y w2 /
End
|
S _
S + N /
M + B*C + W*Y
/
Models
B&W
B Full 1 1 free
W Full 1 1 free
!Equate W 1 1 1 B 1 1
1
B=W
B Full 1 1 free
W Full 1 1 free
Equate W 1 1 1 B 1 1
1
B
B Full 1 1 free
W Full 1 1
!Equate W 1 1 1 B 1 1
1
B=W=0
B Full 1 1
W Full 1 1
!Equate W 1 1 1 B 1 1
1
Tests
Test
HA
H0
Standard association test
B=W
B=W=0
Test of stratification
B&W
B=W
Robust association test
B&W
B
assoc.mx
Model
B
W
-2LL
df
B&W
-0.478
-0.365
2103.96
795
B=W
-0.420
-0.420
2105.05
796
B
-0.4778
2127.01
796
2163.34
797
B=W=0
Test of total association
HA
H0
B=W
B=W=0
2105.05
2163.34
Δ-2LL = 58.29, df = 1, p < 1e-14
assoc.mx
Model
B
W
-2LL
df
B&W
-0.478
-0.365
2103.96
795
B=W
-0.420
-0.420
2105.05
796
B
-0.4778
2127.01
796
2163.34
797
B=W=0
Test of stratification
HA
H0
B &W
B=W
2103.96
2105.05
Δ-2LL = 1.09, df = 1, p =0.29
assoc.mx
Model
B
W
-2LL
df
B&W
-0.478
-0.365
2103.96
795
B=W
-0.420
-0.420
2105.05
796
B
-0.4778
2127.01
796
2163.34
797
B=W=0
Test of within association
HA
H0
B &W
B
2103.96
2127.01
Δ-2LL = 23.06, df = 1, p < 1e-6
Implementation
• QTDT
–
–
–
–
–
–
Abecasis et al (2001) AJHG
extends between/within model to general pedigrees
multiple alleles
covariates
combined test of linkage and association
discrete as well as quantitative traits
Linkage
Association
• families
• unrelateds or families
• detectable over large
distances >10 cM
• detectable over small
distances <1 cM
• large effects OR >3,
variance>10%
• small effects OR<2,
variance<1%
Related documents