Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The first thing to consider is how to deal with the Qi and g(Q)i, since it can be assumed g(Q)j(i) is related to Qi,. To do this, we can consider one factor for the genetic effect, G, which should account for both these genetic effect variables as: G=Qi+g(Q)j(i) The ANOVA equation can then be written as: y ij Gi ij , for y~ NID(μ,σ2p), G~NID(0, σ2g), ε~NID(0, σ2e), where yij is the trait value for genotype i in replication j, μ is the mean, Gi the genetic effect for genotype i and εij the errors. The trait must assume a normalised phenotypic, genotypic and error variance respectively, based on the data distribution, and this gives: σ2p=σ2g+σ2e To relate this equation to the data we are given, we must include the data for phenotypes in four environments in blocks of replication number 149 (one cell line missing). To adapt the ANOVA to the one-way equation we include Environment and Blocks into the equation to give the theoretical form of the ANOVA for the genetic effect as: yij Gi E j (GE) i j i jk , where Gi and GEij are the Genotypic effects measured within blocks. Source dof Expected MSQ F Value Environment(E) e-1 σ2g+bσ2ge+beσ2e =MSQE/MSQe =MSQE / MSQG Blocks (b-1)e Genotypes (G) g-1 GE (g-1)(e-1) Error (e) (b-1)(g-1)e MSQ in blocks to be expected e2 b ge2 be g2 e2 b ge2 e2 =MSQG/MSQe =MSQGE/MSQe Table 1: ANOVA Table: Randomized Blocks within environment and within sets/blocks in environment = b = replications. Focus - on genotype effect. The F values compare each component as a ratio to error component. Also there is an F ratio to compare Environment to Genotype. The tests involve measuring the variance in the data sets of each environment location for the environment, to calculate σ2e. The variance σ2g also has to be calculated for the genotypes which can be calculated from σ2g as: σ2=((1/nQQ+1/nQq)/4) , where nQQ is the number of genotype QQ and nQq is the number of genotype Qq. The F-test, t-test and pooled variance of Environment and Genotype are the main tests. The pooled estimate σ2ge can be calculated from the individual variances σ2g and σ2e: σ2eg=[(bE-1)σ2E + (bG-1)σ2G ] [bE + bG -2] The pooled estimate σ2EG is used to calculate the pooled variance t-test statistic[1]: t E G ( E G ) 2 EG ( 1 1 ) bE bG Where, E is the sample mean of the Environment. G is the sample mean of the Genotype. μE is the population mean of the Environment. μG is the population mean of the Genotype. σ2EG is the pooled variance estimate from above. bE is the number of replications for the Environment samples bG is the number of replications for the Genotype sample The t-test statistic can then be used to test the means of the two variants environment and genotype to the population means. The equation for the analysis of variance for a single marker using backcross progeny is given as: Yi ( j ) k M i g ( M ) j (i ) i ( j ) k where: Yi(j)k=trait value for an individual j with marker genotype i in the replication k. μ=population mean Mi = Marker Genotype effect in i g(M)j(i) = Marker genotypic effect which is unexplained. Εi(j)k= is the error in the marker Similary to part (a) we assume Mi and g(M)j(i) to additively represent G, the genetic effect. G=Mi+g(M)j(i) The ANOVA table from Table 1 is drawn up again for the genotypic effect in the blocks for the Environment and for the Genotype: yij Gi E j (GE) i j i jk The variances for the marker genotype can be calculated identically to that of the QTL since the definition of genetic effect is the same for QTL and for the marker, so the concept of additive and dominance effects μ1, μ2 etc. still apply. The pooled estimate of variance is still: σ2eg=[(bE-1)σ2E + (bG-1)σ2G ] [bE + bG -2] The t-test statistic is also: t E G ( E G ) 2 EG ( 1 1 ) bE bG The expectation of difference between the marker genotype classes is given by calculating the expected trait value for each genotype and calculating the difference between the two genotypes as follows: AA (1 r )1 r 2 Aa r1 (1 r ) 2 E (diff ) AA Aa Q2.(a). The t-tests were calculated for each marker, taking: The environment population mean as the average of the environment sample means. The genotype population mean as the average of the 26 marker genetic effects. The environment sample mean is the average phenotype in the replication block. The genotype sample mean is the genetic effect calculated for each marker The t-test was calculated using the formula in Q1(b) and the raw data for this is summarised in Appendix 1 Table A as follows: The data contained some zero’s representing missing data, which we assume to be non-trivial but should be disregarded in the calculations because any replacement of zeros as representing either one or two would result in bias of the data, so to represent the sampling as accurately as possible the zero’s were disregarded. The genetic effect of genotype Steptoe was calculated as the ratio of the Steptoe genotype to total genotype, given as no.1’s divided by total number, and total number was calculated as the sum of 1’s and 2’s for each marker. Hence u1 and u2 were calculated as a proportion of the total genotype. The mean and variant genetic effect was calculated from the formula described in Q1(a). The CV was calculated to show how the variance is compared to the mean genetic effect. The population mean for genetic effect was calculated as the overall mean of the genetic means taken from each marker. The means and variances for the phenotype was calculated from the replication blocks in four environments, and the population mean was calculated as the average of the phenotype means. The population variance of environment was calculated as the average of the phenotype variances. From this data and the equation in Q1(b), the pooled estimate of variance and degrees of freedom were calculated for the t-test which follows. Table 2. The results of the t-tests show t-test statistics and p values for each respective environment and marker. Table 3. The results for the expected mean differences for each marker. The estimated mean differences of trait values are given from the formula in Q1. The values for μ1 and μ2 are used from the previous calculations. The recombination fraction r, was not readily computable, and considering that if the distance between each marker is sufficient, we can assume that the recombination fraction is zero. For each marker, μAA is denoted as genotype Steptoe and μAa is denoted as genotype Morex. In terms of difference, it is not important which is which, since we are only measuring the difference between the two. The expected difference is then the difference between μ1 and μ2, where Steptoe is μ1 and Morex is μ2. The total E(Diff) over all the 26 markers is given as the average of the 26 individual marker results. This is given: _______ E(diff) = ΣE(diff)/n Where n is the total number of markers 26. The E(diff) equals to 0.07. The null hypothesis here is that the test statistic for the pooled Genotype and Environment means and variance falls within the 95% critical region. Any t-test statistic values found below 5% should lead to rejection of the null hypothesis. From the t-test statistics found in table 2 show that they all fall outside the 5% critical region, indicating that there is a large variation in the pooled Genotype and Environment data, most likely due to the large differences found going from one location to the next. Q.2(b) Degrees of Expected ANOVA Freedom MSQ F Value Environment 3 1.54E+03 656.02 9.33 Blocks 447 Genotypes 25 1.65E+02 70.33 G x E Error 75 1.43E+02 61.02 Error 300 2.34789 Table4. The ANOVA results for the trait values given the genotype and phenotype(environment) data. In Table 4 above, the ANOVA was calculated from the theoretical from of the ANOVA found in Q1(a). The variances used for genotype and environment were calculated as averages of the multiple variances calculated for each marker and location respectively for both Genotype and Environment. The pooled variance for the Genotype and Environment interaction term GE were calculated from the pooled estimate of variance found in Q1(b). The error variance was calculated from the equation in Q1(a) where if the phenotype and genotype variances are known, the error variance is the difference between the phenotype(environment) and genotypic variances. In calculating the MSQ, the variances for each variable were used, together with the number of replications (149) and number of environments(4). The F values compared the MSQs of each component. Most of the F values compare each variable MSQ to the error MSQ. The results show that most of the variations are found in the main variables Genotype, Environment and the interaction of Genotype and Environment, shown by the large F ratio found for each (656.02, 70.33, 61.02 respectively). The F ratio was also calculated between the Environment MSQ and Genotype MSQ to show where most of the variation lies. The result is 9.33, which shows that the environment factor accounts for 9 times more variation than the genotypic effect factor. To further explain this, the t-test results showed something similar. The t-test results show that for each location there is reproducibility between each marker choosing one location. The phenotype variations at each individual location are low, represented by low co-efficient of variations (CV) in the ranges of 24% (see Appendix 1, Table A). Further, the t-test results are shown in table 2, and it can be seen that the individual locations have to some degree similar t-test statistics for each marker, and the CV calculated is a reasonable measure below 20% for all locations. The biggest variance comes from the changing locations, where the t-test statistic varies greatly. The high F-value for MSQE/MSWQG in the ANOVA and the big shifting in the t-test statistics going from one location to another are due to one factor: the use of environmental population mean as calculated from the individual location means, which themselves vary around the calculated population mean quite significantly. Q2 (c) The ratio of the Steptoe genotype occurrences to total genotype number was calculated by counting the total number of Steptoe occurrences in each marker, totalling for all markers, and dividing by the total number of possible markers (not including the zero’s which are missing data). The Morex genotype ratio to total genotype number was calculated likewise. The probability of the Streptoe occurring was taken as the ratio of Steptoe occurrences in the total data, and the probability of Morex calculated likewise. Given the raw data, the occurrences of Steptoe and Morex in each cell line were plotted. Steptoe and Morex 30 25 Count 20 Steptoe 15 Morex 10 5 0 0 20 40 60 80 100 120 140 160 Cell Line no Figure 1. A scatter plot showing the total occurrences of both Steptoe and Morex over all the markers for the genotype experiments. 7.00E-02 6.00E-02 5.00E-02 pdf 4.00E-02 Steptoe 3.00E-02 Morex 2.00E-02 1.00E-02 0.00E+00 0 20 40 60 80 100 120 140 160 -1.00E-02 Cell line Figure2. Predicted Binomial probability distribution function of the genotype data. Given the probability of both Steptoe and Morex occurring, a binomial distribution can be drawn up to allow prediction of genotype occurrences. For both Steptoe and Morex, the predicted variation of the genotype count is possible from the regression in Figure2. This can be related to the data in the ANOVA and ttest results earlier in that the genetic effect of both genotypes is measured for each genotype as in μ1 and μ2. The probability of either Steptoe and Morex occurring directly affects the calculation of the genetic effect g and its variance σ2g, which is calculated from μ1 and μ2. Conclusion In the analysis of t-test statistics, based on the assumption in the null hypotheses, (Q2b) any p-value of less than 0.05 should lead to rejection of the null hypothesis. This means that the effect of the genotype-environment interaction is greater than that expected to occur by chance, and indicates that this interaction has a significant effect on phenotype. Appendix 1: Number Marker >WG622 >ABG313B >CDO669 >BCD402B >BCD351D >TubA1 >Dhn6 >WG1026B >Adh4 >ABA003 >ABG484 >WG464 >BCD453B >ABG472 >ABG500B >ABG366 >ABG397 >BCD351F >ABG008 >KFP195 >ABR337 >Ica1 >ABG499 >WG110 >ABG004 >ABG395 1's Degrees u1 u2 82 58 0.59 0.41 82 61 0.59 0.44 81 56 0.58 0.40 65 68 0.46 0.49 78 70 0.56 0.50 71 69 0.51 0.49 81 66 0.58 0.47 80 68 0.57 0.49 75 67 0.54 0.48 80 64 0.57 0.46 81 65 0.58 0.46 73 67 0.52 0.48 85 62 0.61 0.44 79 62 0.56 0.44 91 55 0.65 0.39 82 54 0.59 0.39 83 63 0.59 0.45 74 70 0.53 0.50 68 62 0.49 0.44 78 71 0.56 0.51 54 75 0.39 0.54 57 86 0.41 0.61 73 68 0.52 0.49 71 72 0.51 0.51 74 76 0.53 0.54 86 62 0.61 0.44 Population Mean for Genetic Effect : Pop.Mean for Environment Effect: Means for each environment location Idaho 2's g 0.086 0.075 0.089 -0.011 0.029 0.007 0.054 0.043 0.029 0.057 0.057 0.021 0.082 0.061 0.129 0.100 0.071 0.014 0.021 0.025 -0.075 -0.104 0.018 -0.004 -0.007 0.086 σ2g 7.36E-03 7.15E-03 7.55E-03 7.52E-03 6.78E-03 7.14E-03 6.87E-03 6.80E-03 7.06E-03 7.03E-03 6.93E-03 7.16E-03 6.97E-03 7.20E-03 7.29E-03 7.68E-03 6.98E-03 6.95E-03 7.71E-03 6.73E-03 7.96E-03 7.29E-03 7.10E-03 6.99E-03 6.67E-03 6.94E-03 CV 8.6 9.5 8.5 -70.2 23.7 100.0 12.8 15.9 24.7 12.3 12.1 33.4 8.5 11.9 5.7 7.7 9.8 48.6 36.0 26.9 -10.6 -7.0 39.8 -195.8 -93.3 8.1 0.036676 74.08075 Montana Oregon Washington Average 74.3 73.1 73.65 2 Variance (σ E) 1.84 2.79 2.44 CV 2.476818 3.819478 3.313153 75.23 74.08 2.32 2.35 3.081314 Table A. Raw data used to calculate T-test statistic. References: [1] Teh Sin Yin* and Abdul Rahman Othman. “When does the pooled variance ttest fail?””. African Journal of Mathematics and Computer Science Research Vol. 2(4), pp. 056-062, May, 2009. σ2EG 0.95 0.94 0.96 0.98 0.93 0.95 0.93 0.93 0.95 0.94 0.93 0.95 0.93 0.95 0.93 0.97 0.93 0.94 0.99 0.92 0.99 0.94 0.95 0.94 0.92 0.93 of freedom 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149