Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Genetic Theory Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston Boulder, 2007 Outline 1. Aim of this talk 2. Genetic concepts 3. Very basic statistical concepts 4. Biometrical model 5. Introduction to linkage analysis 1. Aim of this talk Gene mapping LOCALIZE and then IDENTIFY a locus that regulates a trait Linkage analysis Association analysis Linkage: If a locus regulates a trait, Trait Variance and Covariance between individuals can be expressed as a function of this locus. Association: If a locus regulates a trait, Trait Mean in the population can be expressed as a function of this locus. Revisit common genetic parameters - such as allele frequencies, genetic effects, dominance, variance components, etc Use these parameters to construct a biometric genetic model Model that expresses the: (1) Mean (2) Variance (3) Covariance between individuals for a quantitative phenotype as a function of the genetic parameters of a given locus. See how the biometric model provides a useful framework for linkage and association methods. 2. Genetic concepts G T T A C G A A A T T C G C T A C A T G T G G A T C C G C T A T G A T T T C A A T G C T T T A A G C G A T G T A C A C C T A G G C G A T A C T A A A A. DNA level DNA structure, organization recombination G B. Population level Allele and genotype frequencies G G G G G G G G G G G G G G G G G G C. Transmission level Mendelian segregation Genetic relatedness G G G G P P D. Phenotype level Biometrical model Additive and dominance components G A. DNA level A DNA molecule is a linear backbone of alternating sugar residues and phosphate groups Attached to carbon atom 1’ of each sugar is a nitrogenous base: A, C, G or T Two DNA molecules are held together in anti-parallel fashion by hydrogen bonds between bases [Watson-Crick rules] Antiparallel double helix A gene is a segment of DNA which is transcribed to give a protein or RNA product Only one strand is read during gene transcription Nucleotide: 1 phosphate group + 1 sugar + 1 base C A A T G C T T T A A G C G A T G T A C A C C T A G G C G A T A C T A A A - G T T A C G A A A T T C G C T A C A T G T G G A T C C G C T A T G A T T T DNA polymorphisms Microsatellites >100,000 Many alleles, (CA)n repeats, very informative, even, easily automated SNPs 11,961,761 (build 126, 03 Mar ‘07) Most with 2 alleles (up to 4), not very informative, even, easily automated A Copy Number polymorphisms ~2000-3000 (?) Many alleles, even, not yet automated B C A A T G C T T T G T A C G A C A C A G G C G A T A C T A A A - G T T A C G A A A C A T G C T G T G T C C G C T A T G A T T T (CA)n C -G G -C T -G DNA organization 22 + 1 ♂ A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G 2 (22 + 1) ♁ chr1 A- B- - G T T A C G A A A T T C G C T A C A T G B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G ♂ ♁ ♂ ♁ C A A T G C T T T A A G C G A T G T A C 2 (22 + 1) ♁ ♂ A- A- 2 (22 + 1) A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C ♁ - G T T A C G A A A T T C G C T A C A T G -A A- C A A T G C T T T A A G C G A T G T A C -B B- - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G Mitosis -A -B chr1 G1 phase Haploid gametes Diploid zygote 1 cell S phase B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G A- B- - - G T T A C G A A A T T C G C T A C A T G ♁ ♂ C A A T G C T T T A A G C G A T G T A C C A A T G C T T T A A G C G A T G T A C G T T A C G A A A T T C G C T A C A T G -A -B C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B M phase Diploid zygote >1 cell 22 + 1 DNA recombination 22 + 1 A- (♂) A- 2 (22 + 1) 2 (22 + 1) B- ♁ Meiosis ♂ A- B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C (♂) ♁ - G T T A C G A A A T T C G C T A C A T G chr1 -A A- -B B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B chr1 C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - C A A T G C T T T A A G C G A T G T A C - C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A B- -B C A A T G C T T T A A G C G A T G T A C (♁) G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G - G T T A C G A A A T T C G C T A C A T G NR chr1 chr1 - G T T A C G A A A T T C G C T A C A T G -A R -B chr1 A- chr1 chr1 (♁) A- Diploid gamete precursor cell G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B chr1 Haploid gamete precursors B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G R chr1 C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A NR -B chr1 Hap. gametes DNA recombination between linked loci 22 + 1 (♂) AB- 2 (22 + 1) ♁ ♂ AB- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C Meiosis (♂) ♁ - G T T A C G A A A T T C G C T A C A T G -A A-B B- C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B (♁) G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G AB- (♁) AB- Diploid gamete precursor AB- C A A T G C T T T A A G C G A T G T A C C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G -A -B Haploid gamete precursors - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G C A A T G C T T T A A G C G A T G T A C - G T T A C G A A A T T C G C T A C A T G NR -A -B NR NR -A -B NR Hap. gametes B. Population level 1. Allele frequencies A a A single locus, with two alleles - Biallelic - Single nucleotide polymorphism, SNP Alleles A and a - Frequency of A is p - Frequency of a is q = 1 – p Every individual inherits two alleles - A genotype is the combination of the two alleles - e.g. AA, aa (the homozygotes) or Aa (the heterozygote) A a B. Population level 2. Genotype frequencies (Random mating) Allele 1 Allele 2 A (p) a (q) A (p) AA (p2) Aa (pq) a (q) aA (qp) aa (q2) Hardy-Weinberg Equilibrium frequencies P (AA) = p2 P (Aa) = 2pq P (aa) = q2 p2 + 2pq + q2 = 1 C. Transmission level Mendel’s law of segregation Mother (A3A4) Segregation (Meiosis) Father (A1A2) A3 (½) A4 (½) A1 (½) A1A3 (¼) A1A4 (¼) A2 (½) A2A3 (¼) A2A4 (¼) Gametes G D. Phenotype level G G G G G G G G G G G G G G G G G G G 1. Classical Mendelian traits Dominant trait - AA, Aa - aa 1 0 Recessive trait - AA, Aa - aa 0 1 Codominant trait - AA - Aa - aa X Y Z G G G G P P D. Phenotype level 2. Quantitative traits g==-1 g==0 .128205 .072 Fraction AA g==-1 .128205 .128205 g==-1 g==0 g==1 g==0 -3.90647 Fraction .128205 0 Fraction Fraction Aa 0 g==1 .128205 0 0 g==1 -3.90647 -3.90647 .128205 -3.90647 2.7156 2.7156 qt Histograms by g aa 0 -3.90647 2.7156 qt e.g. cholesterol levels 0 -3.90647 0 -3.90647 2.7156 qt Histograms by g 2.7156 qt Histograms by g D. Phenotype level P(X) Aa aa Biometric Model AA X aa Aa AA m D. Phenotype level P(X) Aa aa Biometric Model AA X aa Aa AA m –a m–a d m+d +a Genotypic effect m+a Genotypic means 3. Very basic statistical concepts Mean, variance, covariance 1. Mean (X) X x i i n xi f xi i X x1 x2 x3 x4 … xn Mean, variance, covariance 2. Variance (X) x 2 Var ( X ) i i n 1 xi f xi 2 i X X-μ (X-μ )2 x1 x1-μ (x1-μ )2 x2 x2-μ (x2-μ ) x3 x3-μ (x3-μ )2 x4 … xn x4-μ … xn-μ (x4-μ )2 … (xn-μ )2 2 Mean, variance, covariance 3. Covariance (X,Y) Cov( X , Y ) x y i X i Y i n 1 xi X yi Y f xi , yi i X Y X-μ X Y-μ Y x1 y1 x1-μ X y1-μ Y x2 y2 x2-μ X y2-μ Y x3 y3 x3-μ X y3-μ Y x4 y4 … … xn yn x4-μ X … xn-μ X y4-μ Y … yn-μ Y 4. Biometrical model Biometrical model for single biallelic QTL Biallelic locus - Genotypes: AA, Aa, aa - Genotype frequencies: p2, 2pq, q2 Alleles at this locus are transmitted from P-O according to Mendel’s law of segregation Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL) Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean, (2) Variance and (3) Covariance between individuals for this quantitative trait X Biometrical model for single biallelic QTL xi f xi 1. Contribution of the QTL to the Mean (X) i e.g. cholesterol levels in the population Genotypes AA Aa aa Effect, x a d -a Frequencies, f(x) p2 2pq q2 Mean (X) = a(p2) + d(2pq) – a(q2) = a(p-q) + 2pqd Biometrical model for single biallelic QTL Var xi f xi 2 2. Contribution of the QTL to the Variance (X) i Genotypes AA Aa aa Effect, x a d -a Frequencies, f(x) p2 2pq q2 Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = VQTL Broad-sense heritability of X at this locus = VQTL / V Total Biometrical model for single biallelic QTL Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 m = a(p-q) + 2pqd = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Demonstration: final 3 slides Additive effects: the main effects of individual alleles Dominance effects: represent the interaction between alleles Biometrical model for single biallelic QTL d=0 –a +a d +a +a m=0 aa Aa AA Additive Biometrical model for single biallelic QTL d>0 –a +a+d d +a-d +a m=0 aa Aa AA Dominant Biometrical model for single biallelic QTL d<0 –a d +a-d +a+d +a m=0 aa Aa AA Recessive Statistical definition of dominance is scale dependent +4 +4 +0.7 +0.4 log (x) aa Aa AA No departure from additivity aa Aa AA Significant departure from additivity Genotypic mean Biometrical model for single biallelic QTL a 0 -a aa Aa AA aa Aa AA Additive aa Aa AA Dominant aa Aa AA Recessive Var (X) = Regression Variance + Residual Variance = Additive Variance + Dominance Variance = VAQTL + VDQTL Practical H:\ferreira\GeneticTheory\sgene.exe Practical Aim Visualize graphically how allele frequencies, genetic effects, dominance, etc, influence trait mean and variance Ex1 a=0, d=0, p=0.4, Residual Variance = 0.04, Scale = 2. Vary a from 0 to 1. Ex2 a=1, d=0, p=0.4, Residual Variance = 0.04, Scale = 2. Vary d from -1 to 1. Ex3 a=1, d=0, p=0.4, Residual Variance = 0.04, Scale = 2. Vary p from 0 to 1. Look at scatter-plot, histogram and variance components. Some conclusions 1. Additive genetic variance depends on allele frequency p & additive genetic value a as well as dominance deviation d 2. Additive genetic variance typically greater than dominance variance Biometrical model for single biallelic QTL 1. Contribution of the QTL to the Mean (X) 2. Contribution of the QTL to the Variance (X) 3. Contribution of the QTL to the Covariance (X,Y) Biometrical model for single biallelic QTL 3. Contribution of the QTL to the Cov (X,Y) Cov( X , Y ) xi X yi Y f xi , yi i AA (a-m) Aa (d-m) AA (a-m) (a-m)2 Aa (d-m) (a-m) (d-m) (d-m)2 aa (-a-m) (a-m) (-a-m) (d-m)(-a-m) aa (-a-m) (-a-m)2 Biometrical model for single biallelic QTL 3A. Contribution of the QTL to the Cov (X,Y) – MZ twins Cov( X , Y ) xi X yi Y f xi , yi i AA (a-m) AA (a-m) p2(a-m)2 Aa (d-m) 0 (a-m) (d-m) aa (-a-m) 0 (a-m) (-a-m) Cov(X,Y) Aa (d-m) aa (-a-m) 2pq (d-m)2 0 (d-m)(-a-m) q2 (-a-m)2 = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Biometrical model for single biallelic QTL 3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring AA (a-m) AA (a-m) Aa (d-m) aa (-a-m) p3(a-m)2 Aa (d-m) p2q (a-m) (d-m) aa (-a-m) 0 (a-m) (-a-m) pq (d-m)2 pq2 (d-m)(-a-m) q3 (-a-m)2 • e.g. given an AA father, an AA offspring can come from either AA x AA or AA x Aa parental mating types AA x AA will occur p2 × p2 = p4 and have AA offspring Prob()=1 AA x Aa will occur p2 × 2pq = 2p3q and have AA offspring Prob()=0.5 and have Aa offspring Prob()=0.5 Therefore, P(AA father & AA offspring) = p4 + p3q = p3(p+q) = p3 Biometrical model for single biallelic QTL 3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring AA (a-m) AA (a-m) aa (-a-m) p3(a-m)2 Aa (d-m) p2q (a-m) (d-m) aa (-a-m) 0 (a-m) (-a-m) Cov (X,Y) Aa (d-m) pq (d-m)2 pq2 (d-m)(-a-m) = (a-m)2p3 + … + (-a-m)2q3 = pq[a+(q-p)d]2 = ½VAQTL q3 (-a-m)2 Biometrical model for single biallelic QTL 3C. Contribution of the QTL to the Cov (X,Y) – Unrelated individuals AA (a-m) AA (a-m) Aa (d-m) aa (-a-m) p4(a-m)2 Aa (d-m) 2p3q (a-m) (d-m) 4p2q2 (d-m)2 aa (-a-m) p2q2(a-m) (-a-m) 2pq3 (d-m)(-a-m) Cov (X,Y) = (a-m)2p4 + … + (-a-m)2q4 =0 q4 (-a-m)2 Biometrical model for single biallelic QTL 3D. Contribution of the QTL to the Cov (X,Y) – DZ twins and full sibs ¼ genome # identical alleles inherited from parents ¼ genome 2 ¼ (2 alleles) 1 1 (father) (mother) + ½ (1 allele) + MZ twins Cov (X,Y) ¼ genome P-O ¼ genome 0 ¼ (0 alleles) Unrelateds = ¼ Cov(MZ) + ½ Cov(P-O) + ¼ Cov(Unrel) = ¼(VAQTL+VDQTL) + ½ (½ VAQTL) + ¼ (0) = ½ VAQTL + ¼VDQTL Summary so far… Biometrical model predicts contribution of a QTL to the mean, variance and covariances of a trait Association analysis Mean (X) = a(p-q) + 2pqd Linkage analysis Var (X) = VAQTL + VDQTL Cov (MZ) = VAQTL + VDQTL On average! Cov (DZ) = ½VAQTL + ¼VDQTL 0, 1/2 or 1 0 or 1 For a sib-pair, do the two sibs have 0, 1 or 2 alleles in common? IBD estimation / Linkage 5. Introduction to Linkage Analysis For a heritable trait... Linkage: localize region of the genome where a QTL that regulates the trait is likely to be harboured Family-specific phenomenon: Affected individuals in a family share the same ancestral predisposing DNA segment at a given QTL Association: identify a QTL that regulates the trait Population-specific phenomenon: Affected individuals in a population share the same ancestral predisposing DNA segment at a given QTL Linkage Analysis: Parametric vs. Nonparametric Gene M Recombination Genetic factors Q A Mode of inheritance Correlation Chromosome Phe D Dominant trait 1 - AA, Aa 0 - aa C Environmental factors E Adapted from Weiss & Terwilliger 2000 Approach Parametric: genotypes marker locus & genotypes trait locus (latter inferred from phenotype according to a specific disease model) Parameter of interest: θ between marker and trait loci Nonparametric: genotypes marker locus & phenotype If a trait locus truly regulates the expression of a phenotype, then two relatives with similar phenotypes should have similar genotypes at a marker in the vicinity of the trait locus, and vice-versa. Interest: correlation between phenotypic similarity and marker genotypic similarity No need to specify mode of inheritance, allele frequencies, etc... Phenotypic similarity between relatives Squared trait differences X 1 X 2 2 Squared trait sums X 1 X 2 2 Trait cross-product Trait variance-covariance matrix X1 X 2 Var X 1 Cov X 1 X 2 Cov X X Var X 1 2 2 Affection concordance T2 T1 Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same DNA sequence but they are not necessarily derived from a known common ancestor IBD M1 Q1 Alleles shared M2 Q2 M3 Q3 M3 Q4 Identical By Descent are a copy of the same M1 M2 Q1 Q2 M3 M 3 Q 3 Q4 ancestor allele M1 M3 Q1 Q 3 Inheritance vector (M) 0 0 M1 M3 Q1 Q4 0 1 IBS IBD 2 1 1 Genotypic similarity between relatives Inheritance vector (M) Number of alleles shared IBD Proportion of alleles shared IBD - M1 M3 Q1 Q3 M2 M3 Q2 Q4 0 0 1 1 0 0 M1 M3 Q1 Q3 M1 M3 Q1 Q 4 0 0 0 1 1 0.5 M1 M3 Q 1 Q3 M1 M3 Q1 Q 3 0 0 0 0 2 1 ˆ Genotypic similarity between relatives A B A1A2 2 x0/x1 1 x0/x1 22n A1A2 A1A3 D C A1/A3 A3A2 A1/A2 A1/A2 A1/A3 A3/A2 A1/A2 3 4 Inheritance vector IBD Prior probability Posterior probability Posterior probability Posterior probability x0/x0 x0/x0 x0/x0 x0/x0 x0/x1 x0/x1 x0/x1 x0/x1 x1/x0 x1/x0 x1/x0 x1/x0 x1/x1 x1/x1 x1/x1 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 x0/x0 x0/x1 x1/x0 x1/x1 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 2 1 1 0 1 2 0 1 1 0 2 1 0 1 1 2 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 0 1/12 1/12 1/12 1/12 0 1/12 1/12 1/12 1/12 0 1/12 1/12 1/12 1/12 0 0 1/4 0 0 1/4 0 0 0 0 0 0 1/4 0 0 1/4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P (IBD=0) P (IBD=1) P (IBD=2) 1/4 1/2 1/4 1/3 2/3 0 0 1 0 0 1 0 0 1 2 1 ˆ 0 1 2 2 2 2 2 2 Var (X) = VAQTL + VDQTL Cov (MZ) = VAQTL + VDQTL Cov (DZ) = ½VAQTL + ¼VDQTL Cov (DZ) = ̂ VAQTL + 2 VDQTL On average! For a given twin pair Cov (DZ) = ̂ VA Cov (DZ) = VAQTL ̂ Phenotypic similarity QTL Slope ~ VAQTL 0 0.5 1 Genotypic similarity ( ˆ) Statistics that incorporate both phenotypic and genotypic similarities to test VQTL Regression-based methods Haseman-Elston, MERLIN-regress Variance components methods Mx, MERLIN, SOLAR, GENEHUNTER Biometrical model for single biallelic QTL 1/3 2A. Average allelic effect (α) The deviation of the allelic mean from the population mean Mean (X) Allele a Population Allele A ? a(p-q) + 2pqd ? a A a AA Aa aa a d -a p q p q αa αA A Allelic mean Average allelic effect (α) ap+dq dp-aq q(a+d(q-p)) -p(a+d(q-p)) Biometrical model for single biallelic QTL Denote the average allelic effects - αA = q(a+d(q-p)) - αa = -p(a+d(q-p)) If only two alleles exist, we can define the average effect of allele substitution - α = αA - αa - α = (q-(-p))(a+d(q-p)) = (a+d(q-p)) Therefore: - αA = qα - αa = -pα 2/3