Download linkage workshop 2001

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Linkage Analysis:
An Introduction
Pak Sham
Twin Workshop 2001
Linkage Mapping
Compares inheritance pattern of trait with the
inheritance pattern of chromosomal regions
First gene-mapping in 1913 (Sturtevant)
Uses naturally occurring DNA variation
(polymorphisms) as genetic markers
>400 Mendelian (single gene) disorders mapped
Current challenge is to map QTLs
Linkage = Co-segregation
A3A4
A1A2
A1A3
A1A2
A1A4
A2A4
A3A4
A2A3
A3A2
Marker allele A1
cosegregates with
dominant disease
Recombination
A1
Q1
Parental genotypes
A2
Q2
A1
Q1
A2
Q2
A1
Q2
A2
Q1
Likely gametes
(Non-recombinants)
Unlikely gametes
(Recombinants)
Recombination of three linked loci
1
2
(1-1)(1-2)
(1-1)2
1(1-2)
12
Map distance
Map distance between two loci (Morgans)
= Expected number of crossovers per meiosis
Note: Map distances are additive
Recombination & map distance
0.5
Recombination fraction
0.45
0.4
0.35
0.3
Haldane map
function
0.25
0.2
2m
1 e

2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
Map distance (M)
0.8
1
Methods of Linkage Analysis
Model-based lod scores
Assumes explicit trait model
Model-free allele sharing methods
Affected sib pairs
Affected pedigree members
Quantitative trait loci
Variance-components models
Double Backcross :
Fully Informative Gametes
aabb
AABB
AaBb
AaBb
aabb
Non-recombinant
aabb
Aabb
Recombinant
aaBb
Linkage Analysis :
Fully Informative Gametes
Count Data
Parameter
Recombinant Gametes: R
Non-recombinant Gametes: N
Recombination Fraction: 
Likelihood
L() = R (1- )N
Parameter
ˆ  R ( N  R)
Chi-square



 R log   N log( 1   ) 

  2


(
R

N
)
log(.
5
)


2
Phase Unknown Meioses
AaBb
AaBb
aabb
Either : Non-recombinant
Or :
Recombinant
aabb
Aabb
aaBb
Recombinant
Non-recombinant
Linkage Analysis :
Phase-unknown Meioses
Count Data
or
Likelihood
Recombinant Gametes: X
Non-recombinant Gametes: Y
Recombinant Gametes: Y
Non-recombinant Gametes: X
L() = X (1- )Y + Y (1- )X

An example of incomplete data :
Mixture distribution likelihood function
Parental genotypes unknown
AaBb
aabb
Aabb
aaBb
Likelihood will be a function of
allele frequencies (population parameters)
 (transmission parameter)
Trait phenotypes
Penetrance parameters
Phenotype
Genotype
f2
AA
Aa
aa
Disease
f1
1- f2
f0
1- f1
1- f0
Normal
Each phenotype is compatible with multiple genotypes.
General Pedigree Likelihood
Likelihood is a sum of products
(mixture distribution likelihood)
n
f
n
1
1
f 1
L   pen( xi | gi ) pop( gi ) trans( gi | gif , gim)
G
number of terms = (m1, m2 …..mk)2n
where mj is number of alleles at locus j
Elston-Stewart algorithm
Reduces computations by Peeling:
Step 1
Condition likelihoods of
family 1 on genotype of
X.
1
X
2
Step 2
Joint likelihood of
families 2 and 1
Lod Score: Morton (1955)
L 
Lod    log
L  0.5
Lod > 3  conclude linkage
Prior odds
1:50
linkage ratio
1000
Lod <-2  exclude linkage
Posterior odds
20:1
Linkage Analysis
Admixture Test
Model
Probabilty of linkage in family = 

Likelihood
L(, ) =  L() + (1- ) L(=1/2)
Allele sharing
(non-parametric) methods
Penrose (1935): Sib Pair linkage
For rare disease
Concordant affected
Concordant normal
Discordant
Therefore Affected sib pair design
Test H0: Proportion of alleles IBD =1/2
IBD
Affected sib pairs:
incomplete marker information
Parameters: IBD sharing probabilities
Z=(z0, z1, z2)
Marker Genotype Data M: Finite Mixture Likelihood
2
Lz    zi PM | IBD  i 
i 0
SPLINK, ASPEX
Joint distribution of Pedigree IBD
IBD of relative pairs are independent
e.g If IBD(1,2) = 2 and IBD (1,3) = 2
then IBD(2,3) = 2
Inheritance vector gives joint IBD distribution
Each element indicates whether
paternally inherited allele is transmitted (1)
or maternally inherited allele is transmitted (0)
Vector of 2N elements (N = # of non-founders)
Pedigree allele-sharing methods
Problem
APM: Affected family members
Uses IBS
ERPA: Extended Relative Pairs Analysis
Genehunter NPL: Non-Parametric Linkage
Dodgy statistic
Conservative
Genehunter-PLUS: Likelihood (“tilting”)
•All these methods consider affected members only
Convergence of parametric and
non-parametric methods
Curtis and Sham (1995)
MFLINK: Treats penetrance as parameter
Terwilliger et al (2000)
Complex recombination fractions
Parameters with no simple biological interpretation
Quantitative Sib Pair Linkage
X, Y standardised to mean 0, variance 1
r = sib correlation
VA = additive QTL variance
Haseman-Elston Regression (1972)
(X-Y)2 = 2(1-r) – 2VA(-0.5) +
Haseman-Elston Revisited (2000)
XY = r + VA(-0.5) +
Improved Haseman-Elston
Sham and Purcell (2001)
Use as dependent variable
 X  Y 2   X  Y 2
X Y
(1XrY) 22 
2
(1  r ) 2
(1  r ) 2
Gives equivalent power to variance components model
for sib pair data
Variance components linkage
Models trait values of pedigree members jointly
Assumes multivariate normality conditional on IBD
Covariance between relative pairs
= Vr + VA [-E()]
Where
V = trait variance
r = correlation (depends on relationship)
VA= QTL additive variance
E() = expected proportion IBD

QTL linkage model for sib-pair data
1
[0 / 0.5 / 1]
N
S
n
s
PT1
Q
q
Q
S
q
s
PT2
N
n
No linkage
Under linkage
Incomplete Marker Information
IBD sharing cannot be deduced from marker
genotypes with certainty
Obtain probabilities of all possible IBD values
Finite mixture likelihood
L   Zi L X | IBD  i;VA
Pi-hat likelihood
L  L X | IBD  2ˆ ;VA

QTL linkage model for sib-pair data
1
ˆ
N
S
n
s
PT1
Q
q
Q
S
q
s
PT2
N
n
Conditioning on Trait Values
Usual test
 ln  Z i L X | IBD  i;VA  

ln LR  Max


ln
L
X
;
V
A  0


Conditional test
 ln  Z i L X | IBD  i;VA  

ln LR  Max
 ln  P L X | IBD  i;VA  
i


Zi = IBD probability estimated from marker genotypes
Pi = IBD probability given relationship
QTL linkage: some problems
Sensitivity to marker misspecification of marker
allele frequencies and positions
Sensitivity to non-normality / phenotypic selection
Heavy computational demand for large pedigrees or
many marker loci
Sensitivity to marker genotype and relationship errors
Low power and poor localisation for minor QTL
Related documents