Download IBD-2001

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
IBD Estimation in Pedigrees
Gonçalo Abecasis
University of Oxford
3 Stages of Genetic Mapping

Are there genes influencing this trait?


Where are those genes?


Epidemiological studies
Linkage analysis
What are those genes?

Association analysis
Relationship Checking
Where are those genes?
Tracing Chromosomes
Sometimes it is easy…
1
2
1
1
1
2
2
1
2
2
1
1
1
1
2
1
Sharing, or Not?
?
1
?
1
?
1
1
?
1
?
1
1
1
1
1
?
1
1
1
1
1
1
1
1
1
1
1
1
1
1
?
1
Data

Polymorphic markers




Eg. Microsatellite repeats, SNPs
Allele frequency
Location
Task


Phase markers
Place recombinants
Complexity of the Problem

For each meiosis


For each location


In a pedigree with n non-founders, there
are 2n meioses each with 2 possible
outcomes
One for each of m markers
Up to 4nm distinct outcomes
Elston-Stewart Algorithm

Factorize likelihood by individual

Each step assigns phase





for all markers
for one individual
Complexity  n * 4m
Small number of markers
Large pedigrees

With little inbreeding
Lander-Green Algorithm

Factorize likelihood by marker

Each step assigns phase




Complexity  m * 4n
Large number of markers


For one marker
For all individuals in the pedigree
Assumes no interference
Relatively small pedigrees
Markov-Chain Monte-Carlo



Approximate solutions
Explore only most likely outcomes
Remove restrictions





Pedigree size
Number of markers
Inbreeding
Assuming no interference
Computationally intensive
Popular Packages

Elston-Stewart Algorithm



Lander-Green Algorithm



LINKAGE / FASTLINK (Lathrop et al, 1985)
VITESSE (O’Connell and Weeks, 1995)
Genehunter (Kruglyak et al, 1995)
Allegro (Gudbjartsson et al, 2000)
MCMC


Simwalk2 (Sobel et al, 1996)
LOKI (Heath, 1998)
1. Enumerate Possibilities


Enumerate geneflow patterns
Gene-flow pattern:


Sets transmitted
allele for each
meiosis
Implies founder
allele for each
individual
Meiosis 1
Meiosis 2
Meiosis 2
V1
V2
V3
V4
2. Founder Allele Sets


For each gene flow pattern v
Enumerate set A(G,v)


All allele states a = [a1, …, a2f]
Compatible with both:



Gene flow v
Genotypes G
The likelihood is L(v|G) = 2-2nai f(ai)

f(ai) is the frequency of allele ai
For example ...
Genotypes
?
?
1
1
?
1
?
Gene Flow
?
?
?
Founder Alleles
?
1
Four meioses.
Three one alleles required.
Likelihood = ½4 f(a1)3
1
?
1
1
1
1
1
1
Single Marker Probabilities

We now have ...

Likelihood for each gene flow pattern




Conditional on genotypes
Conditional on allele frequencies
Conditional on a single marker
Probability for each gene-flow pattern

P(v) = L(v) / vL(v)
3. Allowing for Recombination

Transition Probability


T(vavb, ) = (1-)nr(Va,Vb)r(Va,Vb)
Transition Matrix
Location A
Location B
v1
v1
v2
…
v2
…
(1-)n_meiosis (1-)n_meiosis-1 …
(1-)n_meiosis-1 (1-)n_meiosis …
…
…
…
Moving along chromosome

Input



Vector v of likelihoods at location A
Matrix T of transition probabilities AB
Output

Vector v’ of likelihoods at location B


Conditional on likelihoods at A
For k vectors, requires k2 operations
L( v'i | v)   j L( v j )T ( vi v' j , )
Elston and Idury Algorithm

T
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Requires k log2 k operations
(1-) T
1
2
3
4
5
6
7
8
T
9
10
11
12
13
14
15
16
=
T
(1-) T
9
10
11
12
13
14
15
16
T
1
2
3
4
5
6
7
8
=
Moving Along Chromosome
v1
v2
…
vk
L (   1 | G1 )
v1
v2
…
vk
L(  2 | G1,12 )
v1
v2
…
vk
L(  2 | G2 )
v1
v2
…
vk
L (   2 | G1 ,  12 , G 2 )
T ( 1, 2 )
*
=
Markov-Chains

Single Marker

Left Conditional

Right Conditional

Full Likelihood
MERLIN



Fast multipoint calculations
Non-parametric linkage analyses
Error detection


e.g., unlikely obligate recombinants
Haplotyping

most likely, exhaustive lists, sampling
Sparse Gene Flow Trees
PACKED TREE
L1
L2
L1
L2
L1
L2
L1
L2
SPARSE TREE
Legend
Node with zero likelihood
L1
Node identical to sibling
L2
L1
L2
Likelihood for this branch
Dense maps

Computational challenge


Require more memory
Require Lander-Green algorithm


Limited pedigree size
Computational advantages

Reduced recombination between markers

Approximate solutions possible if steps with
many recombinants are ignored
MERLIN: Example Pedigrees
MERLIN: Timings
Timings for Simultaneous Linkage Analysis, Haplotyping and IBD Estimation
A (x1000)
Genehunter
Exact
Allegro
Exact
Merlin
Exact
Merlin Approximations
2 recombinants
Grandparents Genotyped
B
C
D
36s
59m44s
-
-
17s
2m06s
4h29m02s*
-
10s
44s
42m37s
-
13s
2s
5s
32s
Simulations generated a map of 50 microsatellite markers at 1 cM spacing. The expected number
of recombinants between consecutive markers is 0.4 (pedigree D).
All timings are for 700 Mhz Pentium computer, using 2 GB of RAM.
* Also using 20 GB of RAID storage for disk swapping
MERLIN: Memory Usage
Timings for Haplotyping the Data of Keavney et al (1998)
Allegro
Exact
Merlin
Exact
Merlin Approximations
0 recombinants
1 recombinant
2 recombinants
3 recombinants
21 min 25 sec
(1500 MB)
48 sec
(128 MB)
<1 sec
3 sec
23 sec
1 min 50 sec
(4 MB)
(4 MB)
(32 MB)
(64 MB)
Command Line Options
Effect of Genotyping Error

Modest levels are likely


Mendelian inheritance checks


Up to 1% may be typical
Detect up to 30% of errors for SNPs
Effect on power


Linkage vs. Association
SNPs vs. Microsatellites
Affected Sib Pair Sample
4
3
Average LOD
2
1
0
0
-1
-2
-3
-4
5
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Unselected Sample
Average lod retained (% of maximum)
100%
80%
60%
40%
20%
0%
0
10
20
30
40
50
60
Map position (cM)
70
80
90
100
Association Analysis
100%
Average LOD retained (% of
maximum)
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0%
2%
4%
6%
Error rate
8%
10%
Error Detection


Genotype errors can
introduce unlikely
recombinants
Change likelihood


Replace (1-) with 
Test sensitivity of
likelihood to each
genotype

Detects errors that have
largest effect on linkage
2
2
2
2
1
2
1
2
1
1
2
1
1
2
1
2
1
2
2
1
1
1
2
1
2
1
2
2
2
2
1
2
X
2
X 2
1
1
2
1
1
2
1
2
1
2
2
X
2
X
1
1
2
1
2
1
Practical Exercise


Lon Cardon
Stacey Cherny
Related documents