* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Partitioning Genetic Variance
Survey
Document related concepts
Medical genetics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Genetic code wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic drift wikipedia , lookup
Behavioural genetics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Transcript
PSYC 5102 Genetic Variance: 1 Partitioning Genetic Variance Introduction Table 1 presents the notation that will be used here for deriving genetic variances for a single locus on a quantitative phenotype. For simplicity’s sake, only two alleles for this locus are used, but the same substantive results would have been obtained for more than two alleles (Crow & Kimura, 197X). Let the two alleles be denoted as A and a, giving the three genotypes aa, Aa, and AA. Algebraic quantities will be subscripted by the genotypes. Although this notation looks quite cumbersome, it actually makes the equations easier to read in English. Table 1. Notation for a two allele locus. Genotype aa Aa AA Frequency faa fAa fAA Raw γαα γAa γAA Genotypic Values Deviations Deviations from the mean from the midpoint µ + gaa m-α µ + gAa m + δα µ + gAA m+α Contrast Codes Linear -1 0 1 Quadratic -1 2 -1 To be general, the model does not assume Hardy-Weinberg equilibrium. Hence, the genotypic frequencies are simply denoted by f in place of the more familiar notation of p and q. There are three different ways to model the genotypic or genetic values of the three genotypes. The genotypic or genetic value for a genotype is simply the mean phenotypic value of that genotype. The first way of modeling this is to simply let the genotypic values equal an algebraic quantity. In Table 1, γ is used to denote the genotypic values and is given in the column labeled “Raw”. The second method is to express the genotypic values as a deviation from the population mean. Here, let µ denote the population phenotypic mean and let the quantity g denote the deviation of the genotypic value from the population mean. For example, if the population mean is 100 and the genotypic value for genotype aa is 97.3, then gaa is 97.3 - 100 = -2.7. The genotypic value for genotype aa is then µ + gaa. The algebraic expressions for these genotypic means are given in the column labeled “Deviations from the mean” in Table 1. PSYC 5102 Genetic Variance: 2 The third way is to express the genotypic values in terms of displacements from the midpoint between the two homozygotes. For example, if the genetic value for aa is 97.3 and the genetic value for AA is 104.9, then the midpoint is simply the average of these two quantities or 101.1. Let m denote this midpoint. Then the genetic value for genotype AA may be written as m + α and that for aa may be written as m - α, where α is the difference between the genotypic value and the midpoint. The genotypic value for the heterozygote may be written as m + δα where δ denotes a dominance parameter. When δ = 1, then allele A is completely dominant and when δ = -1, allele a is completely dominant. When δ = 0, there is no dominance and when δ > 1 or δ < -1 there is overdominance1. These algebraic quantities are given in the column “Deviations from the midpoint” in Table 1. Because these three different parameterizations are effectively “saying the same thing but with different words,” the three different quantities for a single genotype will be mathematically identical. For example, for genotype aa, γ aa = µ + g aa = m − α . Phenotypic Mean The phenotypic mean equals a weighted mean of the three genotypic means, the weight in this case being the frequency of the genotypes. Thus, using the first parameterzation, µ = f aa γ aa + f Aa γ Aa + f AA γ AA . For the second parameterization, µ = f aa (µ + g aa ) + f Aa (µ + g Aa ) + f Aa (µ + g Aa ) = ( faa + f Aa + f AA )µ + f aa g aa + f Aa g Aa + f AA g AA Now, the sum of the frequencies of the genotypes must be 1.0, so faa + f Aa + f AA = 1.0 . Also, it is a mathematical necessity that the sum of the deviations from the mean must equal 0, so faa g aa + f Aa g Aa + f AA g AA = 0 . Substituting these quantities into the equation gives the identity µ = µ. For the third parameterization, 1 This parameterization is not applicable for the extremely unlikely case where the genotypic values for both homozygotes are equal but different from the value of the heterozygote. For this case simply let δ be expressed in abolute units (instead of a fraction of α) so that the genotypic value for the heterozygote is m + δ. PSYC 5102 Genetic Variance: 3 µ = f aa (m − α ) + f Aa (m + δα ) + f AA (m + α ) which reduces to µ = m + α ( f AA − faa + f Aaδ ). Phenotypic Variance The equation for the phenotypic value for the ith person with the jth genotype (or Pij) may be written in a general form as Pij = γ j + Rij . where Rij denotes a residual deviation from the population mean. This residual value will include all environmental factors as well as the influence of all loci other than the A locus. For example, the phenotypic value for the ith person with genotype aa will be Pi.aa = γ aa + Ri.aa = µ + g aa + Ri.aa = m − α + Ri.aa . In English, this equation states that that an individual’s phenotypic value equals the genotypic mean for his/hers genotype plus the effects of “all other genetic and environmental factors.” At this point, we note that this model assumes no gene-gene interaction (aka epistasis) involving the A locus and no gene-environment interaction involving the A locus. (There may indeed be epistasis and/or gene-environment interaction involving other loci; this assumption applies only to the A locus.) We also introduce a second assumption of no covariance between the genotypic values at the A locus and any other genes and environments. Because of this assumption, the mean value for R for each of the three genotypes will be 0. Just as the phenotypic mean was the weighted sum of the genotypic means, the phenotypic variance will equal the weighted squared deviations from the mean. To see how this is so, we first write the phenotypic variance in summation notation, PSYC 5102 VP = Genetic Variance: 4 N N 1 N ∑ (µ + gaa + Ri.aa − µ )2 + ∑ ( µ + gAa + Ri.Aa − µ )2 + ∑ (µ + g AA + Ri. AA − µ) 2 = N i= 1 i =1 i= 1 N N 1 N 2 2 ∑ (gaa + Ri.aa ) + ∑ (g Aa + Ri. Aa ) + ∑ (g AA + Ri. AA )2 = N i=1 i =1 i =1 aa Aa aa AA Aa AA N N 1 N 2 2 ) ∑ (gaa + 2gaa Ri.aa + Ri.2 aa ) + ∑ (g 2Aa + 2gAa Ri. Aa + Ri.2Aa ) + ∑ (g 2AA + 2gAA Ri.AA + Ri.AA N i =1 i =1 i =1 aa Aa AA N Aa 2 2 2 Now expressions such as ∑ g Aa will equal N Aa g Aa because the constant g Aa is simply being i=1 N Aa N Aa i =1 i =1 summed NAa times. Quantities such as ∑ 2gAa Ri.Aa = 2g Aa ∑ Ri. Aa = 0 . This occurs because N Aa the mean of the Rs for each genotype equal 0, so the sum of the Rs, or ∑ Ri.Aa for example, i =1 must also equal 0. Substituting these quantities into the above equation gives VP = N N N 1 2 2 2 2 N aagaa + N Aagaa + N AAg 2AA + ∑ Ri.aa + ∑ Ri.Aa + ∑ Ri.2 AA = N i =1 i= 1 i =1 aa N aa 2 N N 2 gaa + Aa g 2Aa + AA gAA + N N N Aa AA N aa N Aa N AA i =1 i=1 i =1 2 2 2 ∑ Ri.aa + ∑ Ri. Aa + ∑ Ri.AA = N AA N j faa g2aa + f Aag 2Aa + fAA g2AA + 2 2 ∑ ∑ Rij2 j = aa i=1 N 2 The quantity faa g aa + f Aa g Aa + f AA g AA can be shown to equal the variance of the AA N j ∑ ∑ R ij2 genotypic values, which will be denoted here as VG. Also, the quantity j = aa i=1 N equals the variance of the residuals, or say VR. Thus, the phenotypic variance is the sum of two variances-the variance of the genotypic values (or as it is most commonly called, the total genetic variance) and the variance of the residuals, VP = VG + V R . PSYC 5102 Genetic Variance: 5 This result is almost intuitive to those who have had quantitative genetics. The purpose of this exercise was not to demonstrate the obvious, but to demonstrate the techniques whereby many further equations may be derived.2 Additive genetic variance For a single locus, the total genetic variance is partitioned into two types of variance, the additive genetic variance and dominance variance. Here we give the derivation for additive genetic variance. We begin by noting the orthogonal contrast codes for the three genotypes at the right hand side of table 1. There are two of these, a linear contrast code of -1, 0, and 1 for genotypes aa, Aa, and AA, respectively, and a quadratic contrast code of -1, 2, and -1. The variance associated with the linear contrast code is the additive genetic variance. To find out the algebraic formula for this variance, we use a simple linear regression to regress the phenotypic values on the contrast codes. The general equation for this is Pij = a + bX l + U ij j where Pij, as before, is the phenotypic value for the ith person with the jth genotype, X l j is the value of the contrast code for the jth genotype, Uij is a residual, and a and b are respectively the intercept and slope for the regression line. The equations for the two regression parameters are b= cov(P, X l ) VX l and a = µ − bX l . The additive genetic variance is the variance associated with the slope of the regression line or cov(P, X l ) 2 V A = b VX = . VX 2 l l To derive this quantity, it is necessary first to obtain expressions for the mean and variance of variance Xl, the linear contrast codes. As before, we find the mean as a weighted sum of the genotypic means, or 2 The summation notation will always give the correct result, but it is much more cumbersome than using mathematical expectations. Students who wish to pursue this topic are urged to express the algebra in terms of expectaions. PSYC 5102 Genetic Variance: 6 X l = f aa (−1) + f Aa (0) + f AA (1) = f AA − faa1 . The variance equals a weighted sum of the squared deviations of the contrast codes from the contrast code mean, V X = f aa (−1 − X l )2 + f Aa (0 − X l )2 + f AA (1 − X l ) 2 l which reduces to V X = f aa + f AA − ( faa − f AA ) 2 . l The final quantity is the covariance between the phenotypic values and the contrast codes. First, write the phenotypic values according to the linear model given in Equation X.X. The calculation of the covariance begins with multiplying the deviation of an individual’s phenotypic value from the phenotypic mean (or Pij − µ ) by the deviation of the individual’s contrast code from the contrast code mean (or X l j − X l ). Once this has been done for all individuals, these “cross products” are then summed over all individuals. Dividing by the total number of individuals gives the covariance. The algebraic formula, expressed in summation notation is N N ∑ (µ + gaa + Ri.aa − µ )(−1 − X l ) + ∑ ( µ + g Aa + Ri. Aa − µ )(0 − X l ) 1 i=1 i =1 cov(P, X l ) = N N + ∑ (µ + g AA + R i. AA − µ)(1 − X l ) i=1 aa Aa AA which reduces (mercifully) to cov(P, X l ) = f AA g AA − f aa gaa . Substituting this expression into that for the slope gives b= cov(P, X l ) f AA g AA − f aa g aa = . VX f aa + f AA − ( f aa − f AA ) 2 l Hence, the additive genetic variance equals ( f AA g AA − faa g aa ) 2 V A = b VX = . f aa + f AA − ( faa − f AA )2 2 l The numerator for this expression is important. In English, it equals the square of the weighted difference between the two heterozygote means. Hence, even if the two homozygotes PSYC 5102 Genetic Variance: 7 had identical genetic values (admittedly, an implausible case), there could still be additive genetic variance. A more reasonable situation where additive genetic variance is small is when gAA and gaa are of the same sign and their respective frequencies are such that f AA g AA almost equals faa g aa . This is the classic situation of overdominance where the heterozygote genotypic value is much greater than (or much less than) the average value of the two homozygotes. The final, and indeed most important case of small additive genetic variance occurs with a rare recessive gene. For the sake of exposition, assume that aa is the recessive genotype. Because aa is very rare, the quantity faa will be very small, so the term faa g aa will be quite small. On the other hand, the population mean will be very close to the genotypic value of AA, making the difference between the population mean and the genetic value of AA--i.e., the quantity gAA--will be very small. This makes the expression f AA g AA small. Consequently, the numerator in Equation X.X will be tiny and there will be little additive genetic variance. Dominance Variance The term “dominance” variance is unfortunate because it is often misinterpreted as dominant transmission of a trait. We shall see that a rare dominant allele actually has very little dominance variance. A better term would be something akin to “nonadditive main effect variance,” but the usage of dominance variance is so widespread that custom dictates its use here. Dominance variance is literally the difference between the total genetic variance and the additive genetic variance. In terms of a regression model, one would estimate dominance variance as the explanatory variance gained after entering the quadratic term into the model. That is, one would perform two regressions. In the first, one would enter only the linear contrast. In the second, one would enter both the linear and the quadratic contrasts. The R2 (i.e., multiple correlation squared) from the first model is the additive genetic heritability for the locus. The R2 from the second model is the total heritability. The dominance heritability is simply the difference between the two R2s. Tedious algebra shows that the dominance variance will equal faa f Aa f AA (2g Aa − gaa − g AA )2 . f aa + f AA − ( f AA − f aa ) 2 PSYC 5102 Genetic Variance: 8 The numerator for this expression reveals what dominance variance is. The important term is (2g Aa − g aa − g AA )2 . Because this is squared (and because the expression involving all the fs must be positive), the result must always be greater than or equal to 0. There will be no dominance variance when ga a + gAA = 2gAa. This will occur only when the genetic value for the heterozygote is exactly midway between the genetic values of the two homozygotes. It is instructive to view the relationship between additive and dominance variance with dominant and recessive alleles. Let allele A be the dominant allele and let the genotypic values for aa, Aa, and AA be respectively 0, 1 and 1. Then the phenotypic mean is simply fAa + fAA, and the genotypic values expressed as deviations from the mean become faa - 1, faa and faa . A bit of 2 2 algebra reveals the numerator for VA as faa ( f Aa + 2 f AA ) and the numerator for VD as faa f Aa f AA . Hence, the ratio of additive to dominance variance is VA f aa2 ( f Aa + 2 f AA ) 2 = . VD f aa f Aa f AA At this point, it will be convenient to express the genotypic frequencies in terms of HardyWeinberg frequencies. Let p denote the frequency of the dominant allele A and q = 1 - p, the frequency of allele a. Thus, faa = q2, fAa = 2pq, and fAA = p2. Then substitution and algebraic reduction of the equation gives VA q = . VD p This equation shows that when there is complete dominance or recessivity, then the ratio of additive to dominance variance depends only on the allele frequencies! When there is a rare dominant, p is very small, q must be large, so the ratio is very large. Hence, a rare dominant gives a large amount of additive genetic variance. When the locus is a rare recessive which, of course, is the same as a common dominant, then p is large, q is small, and the ratio is very small. Hence, a rare recessive will have little additive genetic variance but large dominance variance. Relationship between additive and dominance variance The relationship between additive and dominance variance is depicted in Figure 1. The solid squares give the genetic values for the three genotypes. The straight line represents the line PSYC 5102 Genetic Variance: 9 of best fit when regressing these genotypic values upon the linear contrast codes. The variance associated with this straight line is the additive genetic variance. The deviations of the actual genotypic values from their values predicted on the values of the regression line are depicted by the double headed arrows. These are prediction errors from the simple additive model. The variance associated with these prediction errors are the dominance variance. Literally, computation of dominance variance would begin by measuring the length of a double headed arrow. Then square that length and then multiply this squared length by the frequency of the genotype. Summing these “weighted square lengths” over the three genotypes gives the dominance variance. Figure 1: Additivity and Dominance G e n o t y p i c V a l u e aa Aa AA Genotype Epistasis Epistasis occurs when genes and/or gene products interact and epistatic variance is the statistical interaction variance. It is important to emphasis the term statistical in this definition. It is entirely possible for biochemical products of loci to physically interact but this does not necessarily lead to a statistical interaction3. Classic examples of epistasis for behavior can be seen in many rare genetic disorders. A Tay-Sachs genotype, for example, interacts with those loci that contribute to individual differences in normal cognitive development during infancy. If an infant has Tay-Sachs disease, the expression of these normal loci is inhibited. For those without Tay-Sachs disease, the other loci will be expressed. However, epistatic variance for the Tay-Sachs locus will be very small because the disorder is very rare. 3 The same must be said of the gene-environment interaction. In casual discourse, this term often implies that both genes and environment are important for behavior. In quantitative genetics, however, the term implies a statistical interaction. Hence, when heritability is less than 1.0, there is always gene-environment interaction in the loose sense, but there may be no gene-environment interaction in the strict sense. PSYC 5102 Genetic Variance: 10 Statistical epistasis has the same meaning as the statistical interaction in ANOVA or regression. We can think of the linear and the dominance terms as the main effects of an ANOVA. With two loci or more loci, epistatic variance is equivalent to the interaction terms in ANOVA. Visually, epistatic variance occurs when the regression lines for one locus are not parallel when they are plotted as a function of another locus. Figures 2 and 3 illustrate cases of, respectively, no epistatic variance and epistatic variance. In Figure 2, the regression lines for genotypes bb, Bb, and BB are parallel; hence there is no interaction. In Figure 3, however, the A locus has no influence on individual differences in the presence of genotype bb. With genotype Bb, aa is recessive but with genotype BB, the A locus is totally additive. The fact that the three lines for genotypes at the B locus are not parallel indicates a statistical interaction between the A and the B locus. Figure 2: No epistasis bb Bb BB aa Aa Genotype: A locus AA Genotypic Value Genotypic Value Figure 2: Epistasis bb Bb BB aa Aa AA Genotype: A locus The total epistatic genetic variance is subdivided into several statistical components. To examine the statistical components, let us return to the contrast codes for additive and dominance effects as they would apply to two loci, the W and the B locus. These are given in the left hand columns of Table 2 under the label “Main Effects.” In actually performing an analysis of this type, one would regress the phenotype on the contrast codes. The first regression would enter the additive codes for both locus W and locus B at the same time, i.e. a multiple regression with Add.W and Add.B as the independent variables. The R2 for this regression gives the additive genetic variance for the trait. The second regression would add the two dominance contrast codes, Dom.W and Dom.B. The R2 from this less the initial R2 would give the dominance variance. PSYC 5102 Genetic Variance: 11 Table 2. Contrast codes for two loci. Main Effects Genotype WWBB WWBb WWbb WwBB WwBb Wwbb wwBB wwBb wwbb Additive Add. Add. W B 1 1 1 0 1 -1 0 1 0 0 0 -1 -1 1 -1 0 -1 -1 Dominance Dom Dom W B -1 -1 -1 2 -1 -1 2 -1 2 2 2 -1 -1 -1 -1 2 -1 -1 Additive* Additive Add.W* Add.B 1 0 -1 0 0 0 -1 0 1 Interactions Additive* Domiance Add.W* Dom.W* Dom.B Add.B -1 -1 2 0 -1 1 0 2 0 0 0 -2 1 -1 -2 0 1 1 Dominance* Dominance Dom.W* Dom.B 1 -2 1 -2 4 -2 1 -2 1 The epistatic or interaction variance is literally the results of multiplying the additive and dominance contrast codes for the W locus with those for the B locus. Multiplying the additive code for W (Add.W) with that for B (Add.B) gives the contrast code for the first epistatic component, additive by additive epistasis. The variance associated with this contrast code is called additive by additive epistatic variance and is usually abbreviated as VAA. On would estimate this component by entering the contrast code into the regression equation. There are now five independent variables (Add.W, Add.B, Dom.W, Dom.B, and Add.W*Add.B). The R2 for this less the R2 for the model containing dominance gives the estimate of VAA. The second epistatic component involves multiplying the additive contrast codes for one locus and the dominance codes for the other locus. There are two ways of doing this. The first way is to multiply Add.W with Dom.B, and the second is to multiply Dom.W with Add.B. Entering both of these contrast codes into the last regression equation and subtracting the R2 from the previous R2 gives what is called the additive by dominance epistatic variance, or VAD. The final epistatic component is VDD or dominance by dominance epistasis. This is estimated by multiplying the two dominance contrast codes together, entering the resulting contrast code into the last regression equation, and subtracting the R2s. PSYC 5102 Genetic Variance: 12 Epistatic components for additional loci will be formed in a similar way, but there will be more epistatic components. For example, let us add the C locus to the above problem. There would now be three additive by additive contrast codes, Add.W*Add.B, Add.W*Add.C, and Add.B*Add.C. Entering these three contrast codes simultaneously into the regression and subtracting the R2s would now give the estimate of VAA, the additive by additive epistatic variance. Following this logic would give 6 contrast codes for estimating VAD (or all the two way interactions between the additive and dominance contrast codes), and 3 for VD (or all the two way interactions among the dominance contrast codes. With the three locus case, however, there is also the possibility of three way interactions among the loci. Just as the two way interactions were subdivided into components, so are the three way epistatic interactions subdivided into individual components that reflect the products of the additive and dominance contrast codes. The first of these would be VAAA or the additive by additive by additive epistatic variance. The contrast code for this is simply Add.W*Add.B*Add.C, or the product of the three additive contrast codes. The second component would be VAAD (additive by additive by dominance epistatic variance), the next component would be VADD (additive by dominance by dominance epistatic variance), and the final three way interaction would be VDDD (the dominance by dominance by dominance epistatic variance). Once again the contrast codes may be found by multiply all the relevant additive and dominance main effects contrast codes, and the variance components would be estimated by hierarchical multiple regression. The total epistatic variance for the three locus case is simply the addition of all the individual components of variance. Let VI denote the total epistatic variance. Then, VI = V AA + V AD + V DD + V AAA + V AAD + V ADD + V DDD . Additional loci may be accommodated using identical logic. With n loci, the variance component VAA is simply the sum of all two way additive by additive interactions among all n loci, VAAD is the sum of all possible additive by additive by dominance interactions among n loci, and so on.