* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download AA - Virginia Institute for Psychiatric and Behavioral Genetics
Heritability of autism wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genetic testing wikipedia , lookup
Medical genetics wikipedia , lookup
Human genetic variation wikipedia , lookup
Public health genomics wikipedia , lookup
Irving Gottesman wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Genome (book) wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Genetic drift wikipedia , lookup
Microevolution wikipedia , lookup
Population genetics wikipedia , lookup
Behavioural genetics wikipedia , lookup
Intro to Quantitative Genetics HGEN502, 2011 Hermine H. Maes Intro to Quantitative Genetics 1/18: Course introduction; Introduction to Quantitative Genetics & Genetic Model Building 1/20: Study Design and Genetic Model Fitting 1/25: Basic Twin Methodology 1/27: Advanced Twin Methodology and Scope of Genetic Epidemiology 2/1: Quantitative Genetics Problem Session Aims of this talk Historical Background Genetical Principles Genetic Parameters: additive, dominance Biometrical Model Statistical Principles Basic concepts: mean, variance, covariance Path Analysis Likelihood Quantitative Genetics Principles Analysis of patterns and mechanisms underlying variation in continuous traits to resolve and identify their genetic and environmental causes Continuous traits have continuous phenotypic range; often polygenic & influenced by environmental effects Ordinal traits are expressed in whole numbers; can be treated as approx discontinuous or as threshold traits Some qualitative traits; can be treated as having underlying quantitative basis, expressed as a threshold trait (or multiple thresholds) Types of Genetic Influence Mendelian Disorders Single gene, highly penetrant, severe, small % affected (e.g., Huntington’s Disease) Chromosomal Disorders Insertions, deletions of chromosomal sections, severe, small % affected (e.g., Down’s Syndrome) Complex Traits Multiple genes (of small effect), environment, large % population, susceptibility – not destiny (e.g., depression, alcohol dependence, etc) Genetic Disorders Great 19th Century Biologists Gregor Mendel (1822-1884): Mathematical rules of particulate inheritance (“Mendel’s Laws”) Charles Darwin (1809-1882): Evolution depends on differential reproduction of inherited variants Francis Galton (1822-1911): Systematic measurement of family resemblance Karl Pearson (1857-1936): “Pearson Correlation”; graduate student of Galton Family Measurements Standardize Measurement Pearson and Lee’s diagram for measurement of “span” (finger-tip to finger-tip distance) Parent Offspring Correlations From Pearson and Lee (1903) p.378 Sibling Correlations From Pearson and Lee (1903) p.387 Nuclear Family Correlations © Lindon Eaves, 2009 Quantitative Genetic Strategies Family Studies Does the trait aggregate in families? The (Really!) Big Problem: Families are a mixture of genetic and environmental factors Twin Studies Galton’s solution: Twins One (Ideal) solution: Twins separated at birth But unfortunately MZA’s are rare Easier solution: MZ & DZ twins reared together Twin Studies Reared Apart Minnesota Study of Twins Reared Apart (T. Bouchard et al, 1979 >100 sets of reared-apart twins from across the US & UK All pairs spent formative years apart (but vary tremendously in amount of contact prior to study) 56 MZAs participated Types of Twins Monozygotic (MZ; “identical”): result from fertilization of a single egg by a single sperm; share 100% of genetic material Dizygotic (DZ, “fraternal” or “nonidentical”): result from independent fertilization of two eggs by two sperm; share on average 50% of their genes Logic of Classical Twin Study MZs share 100% genes, DZs (on avg) 50% Both twin types share 100% environment If rMZ > rDZ, then genetic factors are important If rDZ > ½ rMZ, then growing up in the same home is important If rMZ < 1, then non-shared environmental factors are important Causes of Twinning For MZs, appears to be random For DZs, Increases with mother’s age (follicle stimulating hormone, FSH, levels increase with age) Hereditary factors (FSH) Fertility treatment Rates of twins/multiple births are increasing, currently ~3% of all births Zygosity of Twins Chorionicity of Twins 100% of DZ twins are dichorionic ~1/3 of MZ twins are dichorionic and ~2/3 are monochorionic Twin Correlations Virginia Twin Study of Adolescent Behavioral Development Scatterplot for corrected MZ stature Scatterplot for age and sex corrected stature in DZ twins 20 13 8 3 HTDEV2 HTDEV2 10 0 -2 r=0.924 r=0.535 -7 -10 -12 -20 -10 -5 0 5 10 -16 -11 -6 -1 4 HTDEV1 HTDEV1 MZ Stature DZ Stature 9 14 © Lindon Eaves, 2009 Ronald Fisher (1890-1962) 1918: On the Correlation Between Relatives on the Supposition of Mendelian Inheritance 1921: Introduced concept of “likelihood” 1930: The Genetical Theory of Natural Selection 1935: The Design of Experiments Fisher developed mathematical theory that reconciled Mendel’s work with Galton and Pearson’s correlations Fisher (1918): Basic Ideas Continuous variation caused by lots of genes (polygenic inheritance) Each gene followed Mendel’s laws Environment smoothed out genetic differences Genes may show different degrees of dominance Genes may have many forms (multiple alleles) Mating may not be random (assortative mating) Showed that correlations obtained by Pearson & Lee were explained well by polygenic inheritance [“Mendelian” Crosses with Quantitative Traits] Biometrical Genetics Lots of credit to: Manuel Ferreira, Shaun Purcell Pak Sham, Lindon Eaves Building a Genetic Model Revisit common genetic parameters - such as allele frequencies, genetic effects, dominance, variance components, etc Use these parameters to construct a biometrical genetic model Model that expresses the: (1) Mean (2) Variance (3) Covariance between individuals for a quantitative phenotype as a function of genetic parameters. Genetic Concepts G Population level Allele and genotype frequencies G G G G G G G G G G G G G G G G G G Transmission level Mendelian segregation Genetic relatedness G G G G P P Phenotype level Biometrical model Additive and dominance components G Population level 1. Allele frequencies A single locus, with two alleles - Biallelic / diallelic - Single nucleotide polymorphism, SNP A a Alleles A and a - Frequency of A is p - Frequency of a is q = 1 – p Every individual inherits two alleles - A genotype is the combination of the two alleles - e.g. AA, aa (the homozygotes) or Aa (the heterozygote) A a Population level 2. Genotype frequencies (Random mating) Allele 2 Allele 1 A (p) a (q) A (p) AA (p2) Aa (pq) a (q) aA (qp) aa (q2) Hardy-Weinberg Equilibrium frequencies P (AA) = p2 P (Aa) = 2pq P (aa) = q2 p2 + 2pq + q2 = 1 Transmission level Mendel’s experiments AA Pure Lines F1 aa Aa Aa Intercross AA Aa Aa 3:1 Segregation Ratio aa Transmission level F1 Pure line Aa aa Aa aa Back cross 1:1 Segregation ratio Transmission level AA Pure Lines F1 aa Aa Aa Intercross AA Aa Aa 3:1 Segregation Ratio aa Transmission level F1 Pure line Aa aa Aa aa Back cross 1:1 Segregation ratio Transmission level Mendel’s law of segregation Mother (A3A4) Segregation, Meiosis Father (A1A2) A3 (½) A4 (½) A1 (½) A1A3 (¼) A1A4 (¼) A2 (½) A2A3 (¼) A2A4 (¼) Gametes Phenotype level 1. Classical Mendelian traits Dominant trait (D - presence, R - absence) - AA, Aa D - aa R Recessive trait (D - absence, R - presence) - AA, Aa D - aa R Codominant trait (X, Y, Z) - AA - Aa - aa X Y Z Phenotype level 2. Dominant Mendelian inheritance Mother (Dd) Father (Dd) D (½) d (½) D (½) DD (¼) Dd (¼) d (½) dD (¼) dd (¼) Phenotype level 3. Dominant Mendelian inheritance with incomplete penetrance and phenocopies Mother (Dd) Father (Dd) D (½) d (½) D (½) d (½) DD (¼) Dd (¼) dD (¼) dd (¼) Incomplete penetrance Phenocopies Phenotype level 4. Recessive Mendelian inheritance Mother (Dd) Father (Dd) D (½) d (½) D (½) DD (¼) Dd (¼) d (½) dD (¼) dd (¼) Phenotype level Two kinds of differences Continuous Graded, no distinct boundaries e.g. height, weight, blood-pressure, IQ, extraversion Categorical Yes/No Normal/Affected (Dichotomous) None/Mild/Severe (Multicategory) Often called “threshold traits” because people “affected” if they fall above some level of a measured or hypothesized continuous trait Phenotype level Polygenic Traits Mendel’s Experiments in Plant Hybridization, showed how discrete particles (particulate theory of inheritance) behaved mathematically: all or nothing states (round/wrinkled, green/yellow), “Mendelian” disease How do these particles produce a continuous trait like stature or liability to a complex disorder? 1 Gene 3 Genotypes 3 Phenotypes 2 Genes 9 Genotypes 5 Phenotypes 3 Genes 27 Genotypes 7 Phenotypes 4 Genes 81 Genotypes 9 Phenotypes Phenotype level Quantitative traits g==-1 g==0 .128205 .072 Fraction AA g==-1 g==-1 g==0 .128205 g==1 .128205 g==0 -3.90647 Fraction .128205 0 Fraction Fraction Aa 0 g==1 .128205 0 0 g==1 -3.90647 -3.90647 .128205 -3.90647 2.7156 2.7156 qt Histograms by g aa 0 -3.90647 2.7156 qt 0 -3.90647 0 -3.90647 2.7156 qt Histograms by g 2.7156 qt Histograms by g Phenotype level P(X) Aa aa Biometric Model AA X aa Aa AA m -a d +a m -a m +d m +a Genotypic effect Genotypic means Very Basic Statistical Concepts 1. Mean (X) 2. Variance (X) 3. Covariance (X,Y) 4. Correlation (X,Y) Mean, variance, covariance 1. Mean (X) x E ( X ) x f x n i i i i i Mean, variance, covariance 2. Variance (X) 2 x i Var ( X ) E ( X ) x f x i i n 1i 2i 2 Mean, variance, covariance 3. Covariance (X,Y) x y i Cov ( X , Y ) E X Y X Y i X i n 1 x y f x ,y i X i Y i i i Y Mean, variance, covariance (& correlation) 4. Correlation (X,Y) rx,y cov x,y sx sy Biometrical model for single biallelic QTL Biallelic locus - Genotypes: AA, Aa, aa - Genotype frequencies: p2, 2pq, q2 Alleles at this locus are transmitted from P-O according to Mendel’s law of segregation Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL) Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean, (2) Variance and (3) Covariance between individuals for this quantitative trait X Biometrical model for single biallelic QTL Biallelic locus - Genotypes: AA, Aa, aa - Genotype frequencies: p2, 2pq, q2 Alleles at this locus are transmitted from P-O according to Mendel’s law of segregation Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL) Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean, (2) Variance and (3) Covariance between individuals for this quantitative trait X Biometrical model for single biallelic QTL x x if i 1. Contribution of the QTL to the Mean (X) i Genotypes AA Aa aa Effect, x a d -a Frequencies, f(x) p2 2pq q2 Mean (X) = a(p2) + d(2pq) – a(q2) = a(p-q) + 2pqd Biometrical model for single biallelic QTL Var x fx i i 2 2. Contribution of the QTL to the Variance (X) i Genotypes AA Aa aa Effect, x a d -a Frequencies, f(x) p2 2pq q2 Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = VQTL Broad-sense heritability of X at this locus = VQTL / V Total Broad-sense total heritability of X = ΣVQTL / V Total Biometrical model for single biallelic QTL Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Additive effects: the main effects of individual alleles Dominance effects: represent the interaction between alleles aa –a Aa AA d +a m d=0 Biometrical model for single biallelic QTL Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Additive effects: the main effects of individual alleles Dominance effects: represent the interaction between alleles aa –a m Aa AA d +a d>0 Biometrical model for single biallelic QTL Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Additive effects: the main effects of individual alleles Dominance effects: represent the interaction between alleles aa –a Aa d m AA +a d<0 Biometrical model for single biallelic QTL +a d m –a aa Aa AA Var (X) = Regression Variance + Residual Variance = Additive Variance + Dominance Variance Biometrical model for single biallelic QTL Var (X) = 2pq[a+(q-p)d]2 + (2pqd)2 Demonstrate VAQTL + VDQTL 2A. Average allelic effect 2B. Additive genetic variance NOTE: Additive genetic variance depends on allele frequency & additive genetic value as well as dominance deviation p a d Additive genetic variance typically greater than dominance variance 1/3 Biometrical model for single biallelic QTL 2A. Average allelic effect (α) The deviation of the allelic mean from the population mean Allele a ? Mean (X) a A a AA Aa aa a d -a p q p q Population Allele A a(p-q) + 2pqd αa ? αA A Allelic mean Average allelic effect (α) ap+dq dp-aq q(a+d(q-p)) -p(a+d(q-p)) 2/3 Biometrical model for single biallelic QTL Denote the average allelic effects - αA = q(a+d(q-p)) - αa = -p(a+d(q-p)) If only two alleles exist, we can define the average effect of allele substitution - α = αA - αa - α = (q-(-p))(a+d(q-p)) = (a+d(q-p)) Therefore: - αA = qα - αa = -pα 3/3 Biometrical model for single biallelic QTL 2A. Average allelic effect (α) 2B. Additive genetic variance The variance of the average allelic effects Freq. VAQTL AA p2 Aa aa αA = qα αa = -pα Additive effect 2pq 2αA αA + αa = 2qα = (q-p)α q2 2αa = -2pα = (2qα)2p2 + ((q-p)α)22pq + (-2pα)2q2 = 2pqα2 = 2pq[a+d(q-p)]2 d = 0, VAQTL= 2pqa2 p = q, VAQTL= ½a2 Biometrical model for single biallelic QTL 1. Contribution of the QTL to the Mean (X) 2. Contribution of the QTL to the Variance (X) 2A. Average allelic effect (α) 2B. Additive genetic variance 3. Contribution of the QTL to the Covariance (X,Y) Biometrical model for single biallelic QTL 3. Contribution of the QTL to the Cov (X,Y) Cov ( X , Y ) x y f x , y i X i Y i i i AA (a-m) Aa (d-m) AA (a-m) (a-m)2 Aa (d-m) (a-m) (d-m) (d-m)2 aa (-a-m) (a-m) (-a-m) (d-m)(-a-m) aa (-a-m) (-a-m)2 Biometrical model for single biallelic QTL 3A. Contribution of the QTL to the Cov (X,Y) – MZ twins Cov ( X , Y ) x y f x , y i X i Y i i i AA (a-m) AA (a-m) p2(a-m)2 Aa (d-m) 0 (a-m) (d-m) aa (-a-m) 0 (a-m) (-a-m) Aa (d-m) aa (-a-m) 2pq (d-m)2 0 (d-m)(-a-m) q2 (-a-m)2 Covar (Xi,Xj) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2 = 2pq[a+(q-p)d]2 + (2pqd)2 = VAQTL + VDQTL Biometrical model for single biallelic QTL 3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring AA (a-m) AA (a-m) Aa (d-m) aa (-a-m) p3(a-m)2 Aa (d-m) p2q (a-m) (d-m) aa (-a-m) 0 (a-m) (-a-m) pq (d-m)2 pq2 (d-m)(-a-m) q3 (-a-m)2 Biometrical model for single biallelic QTL e.g. given an AA father, an AA offspring can come from either AA x AA or AA x Aa parental mating types AA x AA will occur p2 × p2 = p4 and have AA offspring Prob()=1 AA x Aa will occur p2 × 2pq = 2p3q and have AA offspring Prob()=0.5 and have Aa offspring Prob()=0.5 therefore, P(AA father & AA offspring) = p4 + p 3 q = p3(p+q) = p3 Biometrical model for single biallelic QTL 3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring AA (a-m) AA (a-m) aa (-a-m) p3(a-m)2 Aa (d-m) p2q (a-m) (d-m) aa (-a-m) 0 (a-m) (-a-m) Cov (Xi,Xj) Aa (d-m) pq (d-m)2 pq2 (d-m)(-a-m) = (a-m)2p3 + … + (-a-m)2q3 = pq[a+(q-p)d]2 = ½VAQTL q3 (-a-m)2 Biometrical model for single biallelic QTL 3C. Contribution of the QTL to the Cov (X,Y) – Unrelated individuals AA (a-m) AA (a-m) Aa (d-m) aa (-a-m) p4(a-m)2 Aa (d-m) 2p3q (a-m) (d-m) 4p2q2 (d-m)2 aa (-a-m) p2q2(a-m) (-a-m) 2pq3 (d-m)(-a-m) Cov (Xi,Xj) = (a-m)2p4 + … + (-a-m)2q4 =0 q4 (-a-m)2 Biometrical model for single biallelic QTL 3D. Contribution of the QTL to the Cov (X,Y) – DZ twins and full sibs ¼ genome # identical alleles inherited from parents ¼ genome 2 ¼ (2 alleles) MZ twins Cov (Xi,Xj) ¼ genome 1 (father) + 1 (mother) ½ (1 allele) + P-O = ¼ Cov(MZ) + ½ Cov(P-O) + ¼ Cov(Unrel) = ¼(VAQTL+VDQTL) + ½ (½ VAQTL) + ¼ (0) = ½ VAQTL + ¼VDQTL ¼ genome 0 ¼ (0 alleles) Unrelateds Summary Biometrical model predicts contribution of a QTL to the mean, variance and covariances of a trait 1 QTL Var (X) = VAQTL + VDQTL Cov (MZ) = VAQTL + VDQTL Cov (DZ) = ½VAQTL + ¼VDQTL Multiple QTL Var (X) = Σ(VAQTL) + Σ(VDQTL) = VA + VD Cov (MZ) = Σ(VA ) + Σ(VD ) = VA + VD QTL QTL Cov (DZ) = Σ(½VA ) + Σ(¼VD ) = ½VA + ¼VD QTL QTL Summary Biometrical model underlies the variance components estimation performed in Mx Var (X) = VA + VD + VE Cov (MZ) = VA + VD Cov (DZ) = ½VA + ¼VD Path Analysis HGEN502, 2011 Hermine H. Maes Model Building Write equations for means, variances and covariances of different type of relative or Draw path diagrams for easy derivation of expected means, variances and covariances and translation to mathematical formulation Method of Path Analysis Allows us to represent linear models for the relationship between variables in diagrammatic form, e.g. a genetic model; a factor model; a regression model Makes it easy to derive expectations for the variances and covariances of variables in terms of the parameters of the proposed linear model Permits easy translation into matrix formulation as used by statistical programs Path Diagram Variables Squares or rectangles denote observed variables Circles or ellipses denote latent (unmeasured) variables Upper-case letters are used to denote variables Lower-case letters (or numeric values) are used to denote covariances or path coefficients Variables latent variables observed variables Path Diagram Arrows Single-headed arrows or paths (–>) are used to represent causal relationships between variables under a particular model - where the variable at the tail is hypothesized to have a direct influence on the variable at the head Double-headed arrows (<–>) represent a covariance between two variables, which may arise through common causes not represented in the model. They may also be used to represent the variance of a variable Arrows double-headed arrows single-headed arrows Path Analysis Tracing Rules Trace backwards, change direction at a 2headed arrow, then trace forwards (implies that we can never trace through two-headed arrows in the same chain). The expected covariance between two variables, or the expected variance of a variable, is computed by multiplying together all the coefficients in a chain, and then summing over all possible chains. Non-genetic Example Cov AB Cov AB = kl + mqn + mpl Expectations Cov AB = Cov BC = Cov AC = Var A = Var B = Var C = Expectations Cov AB = kl + mqn + mpl Cov BC = no Cov AC = mqo Var A = k2 + m2 + 2 kpm Var B = l2 + n2 Var C = o2 Genetic Examples MZ Twins Reared Together DZ Twins Reared Together MZ Twins Reared Apart DZ Twins Reared Apart Parents & Offspring MZ Twins Reared Together MZ Twins RT Expected Covariance Twin 1 Twin 2 Twin 1 a2+c2+e2 variance a2+c2 Twin 2 a2+c2 covariance a2+c2+e2 DZ Twins Reared Together DZ Twins RT Expected Covariance Twin 1 Twin 1 Twin 2 a2+c2+e2 .5a2+c2 Twin 2 .5a2+c2 a2+c2+e2 MZ Twins Reared Apart DZ Twins Reared Apart Twins and Parents Role of model mediating between theory and data