Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linkage and LOD score Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston Egmond, 2006 Outline 1. Aim 2. The Human Genome 3. Principles of Linkage Analysis 4. Parametric Linkage Analysis Practical 5. Nonparametric Linkage Analysis 1. Aim QTL mapping LOCALIZE and then IDENTIFY a locus that regulates a trait (QTL) Nucleotide or sequence of nucleotides with variation in the population, with different variants associated with different trait levels. For a heritable trait... Linkage: localize region of the genome where a QTL that regulates the trait is likely to be harboured Family-specific phenomenon: Affected individuals in a family share the same ancestral predisposing DNA segment at a given QTL Association: identify a QTL that regulates the trait Population-specific phenomenon: Affected individuals in a population share the same ancestral predisposing DNA segment at a given QTL 2. Human Genome DNA structure A DNA molecule is a linear backbone of alternating sugar residues and phosphate groups Attached to carbon atom 1’ of each sugar is a nitrogenous base: A, C, G or T Two DNA molecules are held together in anti-parallel fashion by hydrogen bonds between bases [Watson-Crick rules] Antiparallel double helix A gene is a segment of DNA which is transcribed to give a protein or RNA product Only one strand is read during gene transcription Nucleotide: 1 phosphate group + 1 sugar + 1 base C A A T G C T T T A A G C G A T G T A C A C C T A G G C G A T A C T A A A - G T T A C G A A A T T C G C T A C A T G T G G A T C C G C T A T G A T T T DNA polymorphisms Microsatellites >100,000 Many alleles, (CA)n, very informative, even, easily automated SNPs 11,961,761 (11 Sept ‘06) Most with 2 alleles (up to 4), not very informative, even, easily automated A B C A A T G C T T T G T A C G A C A C A G G C G A T A C T A A A - G T T A C G A A A C A T G C T G T G T C C G C T A T G A T T T (CA)n C -G G -C T -G DNA organization 22 + 1 ♂ A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G 2 (22 + 1) ♁ chr1 A- B- - G T T A C G A A A T T C G C T A C A T G B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G ♂ ♁ ♂ ♁ C A A T G C T T T A A G C G A T G T A C 2 (22 + 1) ♁ ♂ A- A- 2 (22 + 1) A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C ♁ - G T T A C G A A A T T C G C T A C A T G -A A- C A A T G C T T T A A G C G A T G T A C -B B- - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G Mitosis -A -B chr1 G1 phase Haploid gametes Diploid zygote 1 cell S phase B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G A- B- - - G T T A C G A A A T T C G C T A C A T G ♁ ♂ C A A T G C T T T A A G C G A T G T A C C A A T G C T T T A A G C G A T G T A C G T T A C G A A A T T C G C T A C A T G -A -B C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B M phase Diploid zygote >1 cell 22 + 1 DNA recombination 22 + 1 A- (♂) A- 2 (22 + 1) 2 (22 + 1) B- ♁ Meiosis ♂ A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C (♂) ♁ - G T T A C G A A A T T C G C T A C A T G chr1 -A A- -B B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B chr1 C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - C A A T G C T T T A A G C G A T G T A C - C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A B- -B C A A T G C T T T A A G C G A T G T A C (♁) G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G - G T T A C G A A A T T C G C T A C A T G NR chr1 chr1 - G T T A C G A A A T T C G C T A C A T G -A R -B chr1 A- chr1 chr1 (♁) A- Diploid gamete precursor cell G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B chr1 Haploid gamete precursors B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G R chr1 C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A NR -B chr1 Hap. gametes DNA recombination between linked loci 22 + 1 (♂) AB- 2 (22 + 1) ♁ ♂ AB- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C Meiosis (♂) ♁ - G T T A C G A A A T T C G C T A C A T G -A A-B B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B (♁) G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G AB- (♁) AB- Diploid gamete precursor AB- C A A T G C T T T A A G C G A T G T A C C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B Haploid gamete precursors - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G NR -A -B NR NR -A -B NR Hap. gametes Human Genome - summary DNA is a linear sequence of nucleotides partitioned into 23 chromosomes Two copies of each chromosome (2x22 autosomes + XY), from paternal and maternal origins. During meiosis in gamete precursors, recombination can occur between maternal and paternal homologs Recombination fraction between loci A and B (θ) Proportion of gametes produced that are recombinant for A and B If A and B are very far apart: 50%R:50%NR - θ = 0.5 If A and B are very close together: <50%R - 0 ≤ θ < 0.5 Recombination fraction (θ) can be converted to genetic distance (cM) Haldane: cM 100 0.5 ln 1 2 eg. θ=0.17, cM=20.8 Kosambi: cM 100 0.25 ln 1 2 1 2 eg. θ=0.17, cM=17.7 3. Principles of Linkage Analysis Linkage Analysis requires genetic markers Q M1 θ Mn M2 0.5 0.5 .4 .3 .3 .4 0.5 .15 M1 M2 θ 0.5 0.5 M1 M2 Mn .35 .22 .26 .35 .4 .3 .1 .3 .4 0.5 Mn Linkage Analysis: Parametric vs. Nonparametric Gene M Recombination Genetic factors Q A Mode of inheritance Correlation Chromosome Phe D C Environmental factors E Adapted from Weiss & Terwilliger 2000 4. Parametric Linkage Analysis Linkage with informative phase known meiosis Gene ♂ ♁ M1..6 Chromosome Q1,2 Autosomal dominant, Q1 predisposing allele Estimate θ between M and Q M2M5Q2Q2 M1 Q 1 Informative Phase known M2 Q 2 M1M6Q1Q? Q21Q /M12QQ22 M11M M3M4Q2Q2 M1Q1/M3Q2 M2Q2/M3Q2 M1Q1/M4Q2 M1Q1/M4Q2 M2Q2/M4Q2 M2Q1/M3Q2 NR: M1Q1 NR: M2Q2 R: M1Q2 R: M2Q1 θMQ = 1/6 = 0.17 (~20.8 cM) Linkage with informative phase unknown meiosis Q2Q2 Q1Q? Informative Phase unknown M21/M Q12Q M1Q Q212 M1 Q 1 M1 Q 2 M2 Q 2 M2 Q 1 M3M4Q2Q2 M1Q1/M3Q2 M2Q2/M3Q2 M1Q1/M4Q2 M1Q1/M4Q2 M2Q2/M4Q2 M2Q1/M3Q2 M1Q1/M2Q2 NR: M1Q1 NR: M2Q2 R: M1Q2 R: M2Q1 L X | L X | 0.5 P 1-θ R: M1Q1 R: M2Q2 NR: M1Q2 NR: M2Q1 0 1 θ 1 1 5 1 2 M1Q2/M2Q1 N 3 2 1 5 0.51 1 0.5 2 P N 3 2 θ 0 1 1-θ + 1 5 1 1 2 + 1 1 0.55 1 0.5 2 0.5 6 Parametric LOD score calculation LOD log 10 LOD log 10 L( X | ) L( X | 0.5) 1 1 1 5 1 1 5 1 2 2 0.56 3 2 LOD score L( X | ) OD L( X | 0.5) 1 0 0.1 0.2 0.3 0.4 θ -1 -2 R -3 -4 -5 Overall LOD score for a given θ is the sum of all family LOD scores at θ eg. LOD=3 for θ=0.28 0.5 Parametric Linkage Analysis - summary Q M1 θ Mn M2 0.5 0.5 .4 .3 .3 .4 0.5 .1 For each marker, estimate the θ that yields highest LOD score across all families This θ (and the LOD) will depend upon the mode of inheritance assumed MOI determines the genotype at the trait locus Q and thus determines the number of meiosis which are recombinant or nonrecombinant. Limited to Mendelian diseases. Markers with a significant parametric LOD score (>3) are said to be linked to the trait locus with recombination fraction θ Practical: what is the most likely θ between M and Q? M1M2Q1Q1 M2M3Q1Q1 M1M4Q1Q2 M3M4Q1Q2 M1M4Q1Q1 M2M4Q1Q2 M2M4Q1Q2 1. Identify informative individual with offspring in the pedigree 2. Reconstruct possible phases of that individual and of all offspring 3. Classify the gametes that individual produces as R or NR 4. Count the number of R and NR gametes effectively produced 5. Express L X | L X | 0.5 6. Express LOD score f ( ) Practical: answers 1. 3. 4. 5. 6. Express Identify Classify Count the informative the LOD number R that with produces effectively in the as Rpedigree or produced NR f |(individual )NR 2. Reconstruct possible that individual and all offspring Xscore Lgametes | ofindividual Xand offspring Lphases of 0.5gametes M1M2Q1Q1 M3M4Q1Q2 M22QM13/M Q31Q1 M1Q Q22 M1M Q14/M Q22 M M14/M Q14Q Q14QQ11 M M22Q M14/M Q14Q M3Q1/M4Q2 NR: M3Q1 NR: M4Q2 R: M3Q2 R: M4Q1 L X | L X | 0.5 1 1 4 1 2 1 4 0.51 1 0.5 2 P 1-θ M Q22 M22Q M14/M Q14Q M3Q2/M4Q1 N 1 3 R: M3Q1 θ R: M4Q2 NR: M3Q2 1-θ NR: M4Q1 0 1 θ P 0 1 + 1 4 1 1 2 + 1 1 0.54 1 0.5 2 N 1 3 0.5 5 Outline 1. Aim 2. The Human Genome 3. Principles of Linkage Analysis 4. Parametric Linkage Analysis 5. Nonparametric Linkage Analysis 5. Nonparametric Linkage Analysis Approach Parametric: genotypes marker locus & genotypes trait locus (latter inferred from phenotype according to a specific disease model) Parameter of interest: θ between marker and trait loci Nonparametric: genotypes marker locus & phenotype If a trait locus truly regulates the expression of a phenotype, then two relatives with similar phenotypes should have similar genotypes at a marker in the vicinity of the trait locus, and vice-versa. Interest: correlation between phenotypic similarity and marker genotypic similarity No need to specify mode of inheritance, allele frequencies, etc... Phenotypic similarity between relatives Squared trait differences X 1 X 2 2 Squared trait sums X 1 X 2 2 Trait cross-product Trait variance-covariance matrix X1 X 2 Var X 1 Cov X 1 X 2 Cov X X Var X 1 2 2 Affection concordance T2 T1 Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same DNA sequence but they are not necessarily derived from a known common ancestor IBD M1 Q1 Alleles shared M2 Q2 M3 Q3 M3 Q4 Identical By Descent are a copy of the same M1 M2 Q1 Q2 M3 M 3 Q 3 Q4 ancestor allele M1 M3 Q1 Q 3 Inheritance vector (M) 0 0 M1 M3 Q1 Q4 0 1 IBS IBD 2 1 1 Genotypic similarity between relatives Inheritance vector (M) Number of alleles shared IBD Proportion of alleles shared IBD - M1 M3 Q1 Q3 M2 M3 Q2 Q4 0 0 1 1 0 0 M1 M3 Q1 Q3 M1 M3 Q1 Q 4 0 0 0 1 1 0.5 M1 M3 Q 1 Q3 M1 M3 Q1 Q 3 0 0 0 0 2 1 ˆ Genotypic similarity between relatives A B A1A2 2 x0/x1 1 x0/x1 22n A1A2 A1A3 D C A1/A3 A3A2 A1/A2 A1/A2 A1/A3 A3/A2 A1/A2 3 4 Inheritance vector IBD Prior probability Posterior probability Posterior probability Posterior probability x0/x0 x0/x0 x0/x0 x0/x0 x0/x1 x0/x1 x0/x1 x0/x1 x1/x0 x1/x0 x1/x0 x1/x0 x1/x1 x1/x1 x1/x1 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 2 1 1 0 1 2 0 1 1 0 2 1 0 1 1 2 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 0 1/12 1/12 1/12 1/12 0 1/12 1/12 1/12 1/12 0 1/12 1/12 1/12 1/12 0 0 1/4 0 0 1/4 0 0 0 0 0 0 1/4 0 0 1/4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P (IBD=0) P (IBD=1) P (IBD=2) 1/4 1/2 1/4 1/3 2/3 0 0 1 0 0 1 0 0 1 2 1 ˆ 0 1 2 2 2 2 2 2 Phenotypic similarity Statistics that incorporate both phenotypic and genotypic similarities 0 0.5 1 Genotypic similarity ( ˆ) Haseman-Elston regression – Quantitative traits E X 1 X 2 | ˆ 2 X 1 X 2 2 E X 12 X 22 2 X 1 X 2 | ̂ Var X1 Var X 2 2Cov X1 X 2 | ̂ 0 Var X 1 Var X 2 VQ VA VC VE 1 Cov X 1 , X 2 | ˆ ˆ VQ VA VC 2 E X 1 X 2 | ˆ 2 VQ ˆ 2 VQ VA 2 VE 2 Phenotypic (dis)similarity = b × Genotypic similarity + c 1 2 3 4 5 … 1000 0.5 1 ˆ 2 ˆ X1 2.2 1.9 2.3 3.4 2.5 X2 2.1 2.3 2.6 1.6 2.3 (X1-X2) 0.01 0.16 0.09 3.24 0.04 0.9 0.6 0.7 0.1 0.8 2.4 2.4 0 0.9 VC ML – Quantitative & Categorical traits ˆ method Cov X1 , X 2 0 H1: Var X 1 Var X 2 VQ VA VC VE Cov X 1 , X 2 | ˆ ˆ VQ 2 VA l VC H0: Var X Var X 1 2 Cov X1, X 2 | ̂ VA VC VE 2 VA l VC LOD log 10 e.g. LOD=3 L( H1 ) L( H 0 ) 0.5 ˆ 1 Genome-wide linkage analysis (e.g. VC) Individual LOD scores can be expressed as P values (Pointwise) LOD 2.1 (x4.6) Chi-sq (n-df) 9.67 P value 0.0009 LOD True positive Theoretical (Lander & Kruglyak 1995) k Type I error LOD = 3.6, Chi-sq = 16.7, P = 0.000022 Nonparametric Linkage Analysis - summary No need to specify mode of inheritance Models phenotypic and genotypic similarity of relatives Expression of phenotypic similarity, calculation of IBD HE and VC are the most popular statistics used for linkage of quantitative traits Other statistics available, specially for affection traits