Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Linkage and LOD score
Manuel AR Ferreira
Massachusetts General Hospital
Harvard Medical School
Boston
Egmond, 2006
Outline
1. Aim
2. The Human Genome
3. Principles of Linkage Analysis
4. Parametric Linkage Analysis
Practical
5. Nonparametric Linkage Analysis
1. Aim
QTL mapping
LOCALIZE and then IDENTIFY a locus that regulates a trait (QTL)
Nucleotide or sequence of nucleotides with variation in the population,
with different variants associated with different trait levels.
For a heritable trait...
Linkage:
localize region of the genome where a QTL that
regulates the trait is likely to be harboured
Family-specific phenomenon:
Affected individuals in a family share the same
ancestral predisposing DNA segment at a given QTL
Association: identify a QTL that regulates the trait
Population-specific phenomenon:
Affected individuals in a population share the same
ancestral predisposing DNA segment at a given QTL
2. Human Genome
DNA structure
A DNA molecule is a linear backbone
of alternating sugar residues and
phosphate groups
Attached to carbon atom 1’ of each sugar
is a nitrogenous base: A, C, G or T
Two DNA molecules are held together
in anti-parallel fashion by hydrogen
bonds between bases [Watson-Crick
rules] Antiparallel double helix
A gene is a segment of DNA which
is transcribed to give a protein or
RNA product
Only one strand is read during gene
transcription
Nucleotide: 1 phosphate group + 1 sugar + 1 base
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
A
C
C
T
A
G
G
C
G
A
T
A
C
T
A
A
A
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
T
G
G
A
T
C
C
G
C
T
A
T
G
A
T
T
T
DNA polymorphisms
Microsatellites
>100,000
Many alleles, (CA)n, very
informative, even, easily automated
SNPs
11,961,761 (11 Sept ‘06)
Most with 2 alleles (up to 4), not very
informative, even, easily automated
A
B
C
A
A
T
G
C
T
T
T
G
T
A
C
G
A
C
A
C
A
G
G
C
G
A
T
A
C
T
A
A
A
-
G
T
T
A
C
G
A
A
A
C
A
T
G
C
T
G
T
G
T
C
C
G
C
T
A
T
G
A
T
T
T
(CA)n
C -G
G -C
T -G
DNA organization
22 + 1
♂
A-
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
2 (22 + 1)
♁
chr1
A-
B-
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
♂
♁
♂
♁
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
2 (22 + 1)
♁
♂
A-
A-
2 (22 + 1)
A-
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
A-
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
♁
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A A-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-B B-
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
Mitosis
-A
-B
chr1
G1 phase
Haploid
gametes
Diploid zygote
1 cell
S phase
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
A-
B-
-
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
♁
♂
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
M phase
Diploid zygote
>1 cell
22 + 1
DNA recombination
22 + 1
A-
(♂)
A-
2 (22 + 1)
2 (22 + 1)
B-
♁
Meiosis
♂
A-
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
(♂)
♁
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
chr1
-A A-
-B B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
chr1
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
B-
-B
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
(♁)
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
NR
chr1
chr1
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
R
-B
chr1
A-
chr1 chr1
(♁)
A-
Diploid gamete
precursor cell
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
chr1
Haploid gamete
precursors
B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
R
chr1
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
NR
-B
chr1
Hap. gametes
DNA recombination between linked loci
22 + 1
(♂)
AB-
2 (22 + 1)
♁
♂
AB-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
Meiosis
(♂)
♁
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A A-B B-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
(♁)
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
AB-
(♁)
AB-
Diploid gamete
precursor
AB-
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
-A
-B
Haploid gamete
precursors
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
C
A
A
T
G
C
T
T
T
A
A
G
C
G
A
T
G
T
A
C
-
G
T
T
A
C
G
A
A
A
T
T
C
G
C
T
A
C
A
T
G
NR
-A
-B
NR
NR
-A
-B
NR
Hap. gametes
Human Genome - summary
DNA is a linear sequence of nucleotides partitioned into 23 chromosomes
Two copies of each chromosome (2x22 autosomes + XY), from
paternal and maternal origins. During meiosis in gamete precursors,
recombination can occur between maternal and paternal homologs
Recombination fraction between loci A and B (θ)
Proportion of gametes produced that are recombinant for A and B
If A and B are very far apart: 50%R:50%NR - θ = 0.5
If A and B are very close together: <50%R
- 0 ≤ θ < 0.5
Recombination fraction (θ) can be converted to genetic distance (cM)
Haldane: cM  100   0.5  ln 1  2  
eg. θ=0.17, cM=20.8
Kosambi: cM  100  0.25  ln 1  2   1  2   eg. θ=0.17, cM=17.7
3. Principles of Linkage Analysis
Linkage Analysis requires genetic markers
Q
M1
θ
Mn
M2
0.5 0.5
.4
.3
.3
.4
0.5
.15
M1 M2
θ
0.5
0.5
M1
M2
Mn
.35 .22 .26 .35
.4 .3 .1 .3 .4
0.5
Mn
Linkage Analysis: Parametric vs. Nonparametric
Gene
M
Recombination
Genetic factors
Q
A
Mode of
inheritance
Correlation
Chromosome
Phe
D
C
Environmental
factors
E
Adapted from Weiss & Terwilliger 2000
4. Parametric Linkage Analysis
Linkage with informative phase known meiosis
Gene
♂
♁
M1..6
Chromosome
Q1,2
Autosomal dominant, Q1 predisposing allele
Estimate θ between M and Q
M2M5Q2Q2
M1 Q 1
Informative
Phase known
M2 Q 2
M1M6Q1Q?
Q21Q
/M12QQ22
M11M
M3M4Q2Q2
M1Q1/M3Q2 M2Q2/M3Q2 M1Q1/M4Q2 M1Q1/M4Q2 M2Q2/M4Q2 M2Q1/M3Q2
NR: M1Q1
NR: M2Q2
R: M1Q2
R: M2Q1
θMQ = 1/6 = 0.17
(~20.8 cM)
Linkage with informative phase unknown meiosis
Q2Q2
Q1Q?
Informative
Phase unknown
M21/M
Q12Q
M1Q
Q212
M1 Q 1
M1 Q 2
M2 Q 2
M2 Q 1
M3M4Q2Q2
M1Q1/M3Q2 M2Q2/M3Q2 M1Q1/M4Q2 M1Q1/M4Q2 M2Q2/M4Q2 M2Q1/M3Q2
M1Q1/M2Q2
NR: M1Q1
NR: M2Q2
R: M1Q2
R: M2Q1
L X |   
L X |   0.5 

P
1-θ

R: M1Q1
R: M2Q2
NR: M1Q2
NR: M2Q1
0
1
θ
1 1
5
   1   
2
M1Q2/M2Q1
N
3
2

1
5
 0.51  1  0.5
2


P
N
3
2
θ
0
1
1-θ

+
1 5
1
   1   
2
+
1
1
 0.55  1  0.5
2


 0.5
6
Parametric LOD score calculation
LOD  log 10
LOD  log 10



L( X |  )
L( X |   0.5)
1 1
1
5
1
   1       5  1   
2
2
0.56

3
2
LOD score
L( X |  )
OD 
L( X |   0.5)
1
0
0.1
0.2
0.3
0.4
θ
-1
-2
R
-3
-4
-5
Overall LOD score for a given θ is the sum of all family LOD scores at θ
eg. LOD=3 for θ=0.28
0.5
Parametric Linkage Analysis - summary
Q
M1
θ
Mn
M2
0.5 0.5
.4
.3
.3
.4
0.5
.1
For each marker, estimate the θ that yields highest LOD score across all
families
This θ (and the LOD) will depend upon the mode of inheritance assumed
MOI determines the genotype at the trait locus Q and thus determines the
number of meiosis which are recombinant or nonrecombinant. Limited to
Mendelian diseases.
Markers with a significant parametric LOD score (>3) are said to be linked
to the trait locus with recombination fraction θ
Practical: what is the most likely θ between M and Q?
M1M2Q1Q1
M2M3Q1Q1 M1M4Q1Q2
M3M4Q1Q2
M1M4Q1Q1
M2M4Q1Q2
M2M4Q1Q2
1. Identify informative individual with offspring in the pedigree
2. Reconstruct possible phases of that individual and of all offspring
3. Classify the gametes that individual produces as R or NR
4. Count the number of R and NR gametes effectively produced
5. Express
L X |   L X |   0.5
6. Express LOD score
f ( )
Practical: answers
1.
3.
4.
5.
6. Express
Identify
Classify
Count
the
informative
the
LOD
number
R
that
with
produces
effectively
in the
as
Rpedigree
or
produced
NR
f |(individual
 )NR
2.
Reconstruct
possible
that
individual
and
all
offspring
 Xscore
Lgametes
|  ofindividual
 Xand
 offspring
Lphases
 of
0.5gametes
M1M2Q1Q1
M3M4Q1Q2
M22QM13/M
Q31Q1 M1Q
Q22 M1M
Q14/M
Q22
M
M14/M
Q14Q
Q14QQ11 M
M22Q
M14/M
Q14Q
M3Q1/M4Q2
NR: M3Q1
NR: M4Q2
R: M3Q2
R: M4Q1
L X |   
L X |   0.5 

1 1
4
   1   
2


1
4
 0.51  1  0.5
2

P
1-θ
M
Q22
M22Q
M14/M
Q14Q
M3Q2/M4Q1
N
1
3
R: M3Q1
θ
R: M4Q2
NR: M3Q2
1-θ
NR: M4Q1
0
1
θ
P

0
1

+
1 4
1
   1   
2
+
1
1
 0.54  1  0.5
2

N
1
3

 0.5
5
Outline
1. Aim
2. The Human Genome
3. Principles of Linkage Analysis
4. Parametric Linkage Analysis
5. Nonparametric Linkage Analysis
5. Nonparametric Linkage Analysis
Approach
Parametric: genotypes marker locus & genotypes trait locus
(latter inferred from phenotype according to a specific disease model)
Parameter of interest: θ between marker and trait loci
Nonparametric: genotypes marker locus & phenotype
If a trait locus truly regulates the expression of a phenotype, then two
relatives with similar phenotypes should have similar genotypes at a
marker in the vicinity of the trait locus, and vice-versa.
Interest: correlation between phenotypic similarity and marker genotypic
similarity
No need to specify mode of inheritance, allele frequencies, etc...
Phenotypic similarity between relatives
Squared trait differences
 X 1  X 2 2
Squared trait sums
 X 1  X 2 2
Trait cross-product
Trait variance-covariance matrix
X1    X 2   
 Var X 1  Cov X 1 X 2 






Cov
X
X
Var
X
1 2
2


Affection concordance
T2
T1
Genotypic similarity between relatives
IBS
Alleles shared Identical By State “look the same”, may have the
same DNA sequence but they are not necessarily derived from a
known common ancestor
IBD
M1
Q1
Alleles shared
M2
Q2
M3
Q3
M3
Q4
Identical By Descent
are a copy of the
same
M1 M2
Q1 Q2
M3 M 3
Q 3 Q4
ancestor
allele
M1 M3
Q1 Q 3
Inheritance vector (M)
0
0
M1 M3
Q1 Q4
0
1
IBS
IBD
2
1
1
Genotypic similarity between relatives Inheritance vector
(M)
Number of alleles
shared IBD

Proportion of alleles
shared IBD -

M1 M3
Q1 Q3
M2 M3
Q2 Q4
0
0
1
1
0
0
M1 M3
Q1 Q3
M1 M3
Q1 Q 4
0
0
0
1
1
0.5
M1 M3
Q 1 Q3
M1 M3
Q1 Q 3
0
0
0
0
2
1
ˆ
Genotypic similarity between relatives A
B
A1A2
2
x0/x1
1
x0/x1
22n
A1A2
A1A3
D
C
A1/A3
A3A2
A1/A2
A1/A2
A1/A3
A3/A2
A1/A2
3
4
Inheritance
vector
IBD
Prior
probability
Posterior
probability
Posterior
probability
Posterior
probability
x0/x0
x0/x0
x0/x0
x0/x0
x0/x1
x0/x1
x0/x1
x0/x1
x1/x0
x1/x0
x1/x0
x1/x0
x1/x1
x1/x1
x1/x1
x1/x1
x0/x0
x0/x1
x1/x0
x1/x1
x0/x0
x0/x1
x1/x0
x1/x1
x0/x0
x0/x1
x1/x0
x1/x1
x0/x0
x0/x1
x1/x0
x1/x1
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
2
1
1
0
1
2
0
1
1
0
2
1
0
1
1
2
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
0
1/12
1/12
1/12
1/12
0
1/12
1/12
1/12
1/12
0
1/12
1/12
1/12
1/12
0
0
1/4
0
0
1/4
0
0
0
0
0
0
1/4
0
0
1/4
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
P (IBD=0)
P (IBD=1)
P (IBD=2)
1/4
1/2
1/4
1/3
2/3
0
0
1
0
0
1
0
0
1
2 1
ˆ   0    1    2     2
2
2
2 2
Phenotypic similarity
Statistics that incorporate both phenotypic and
genotypic similarities
0
0.5
1

Genotypic similarity ( ˆ)
Haseman-Elston regression – Quantitative traits

E  X 1  X 2  | ˆ
2


 X 1  X 2 2
 
 E X 12  X 22  2  X 1  X 2 | ̂
 Var X1   Var X 2   2Cov X1 X 2 | ̂ 
0
Var  X 1   Var  X 2   VQ  VA  VC  VE
1
Cov X 1 , X 2 | ˆ   ˆ VQ  VA  VC
2


E  X 1  X 2  | ˆ  2 VQ  ˆ  2 VQ  VA  2 VE
2
Phenotypic
(dis)similarity
=
b × Genotypic
similarity
+
c
1
2
3
4
5
…
1000
0.5
1
ˆ
2
ˆ
X1
2.2
1.9
2.3
3.4
2.5
X2
2.1
2.3
2.6
1.6
2.3
(X1-X2)
0.01
0.16
0.09
3.24
0.04
0.9
0.6
0.7
0.1
0.8
2.4
2.4
0
0.9
VC ML – Quantitative & Categorical traits
ˆ method
Cov X1 , X 2 
0
H1:
Var  X 1   Var  X 2   VQ  VA  VC  VE
Cov X 1 , X 2 | ˆ   ˆ VQ  2   VA  l VC
H0: Var X   Var X  
1
2
Cov X1, X 2 | ̂  
VA  VC  VE
2   VA  l VC
LOD  log 10
e.g. LOD=3
L( H1 )
L( H 0 )
0.5
ˆ
1
Genome-wide linkage analysis (e.g. VC)
Individual LOD scores can be expressed as P values (Pointwise)
LOD
2.1
(x4.6)
Chi-sq (n-df)
9.67
P value
0.0009
LOD
True positive
Theoretical (Lander & Kruglyak 1995)
k
Type I error
LOD = 3.6, Chi-sq = 16.7, P = 0.000022
Nonparametric Linkage Analysis - summary
No need to specify mode of inheritance
Models phenotypic and genotypic similarity of relatives
Expression of phenotypic similarity, calculation of IBD
HE and VC are the most popular statistics used for linkage of
quantitative traits
Other statistics available, specially for affection traits
Related documents