* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistical Power for Computational Mapping
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Genome evolution wikipedia , lookup
Metagenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Behavioural genetics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Population genetics wikipedia , lookup
Genetic engineering wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Human genetic variation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Public health genomics wikipedia , lookup
Microevolution wikipedia , lookup
Statistical Power for Computational Mapping Given a set of measurements divided into k groups, ANOVA tests the null consists of within-group variance and between-group variance. For a power analysis using one-way ANOVA, one standard way to define the effect size is (17): In our case, the groups are defined by haplotypes, and 2 is the genetic effect of the haplotypes on the trait value. Let n be the total sample size and k be the number of groups. When the group sizes are equal, the F statistics for samples with effect size 2 follows the noncentral F distribution F(k – 1, n – k, ) with the noncentrality parameter Therefore, the power of the one-way ANOVA test with significance level is given by: (2) where Fcrit = F(1 , k–1, n–k) is the (1 ) quantile of the F distribution with k – 1 and n – k degrees of freedom. Note that within a haplotype block, the number of strains with each different haplotype is usually not the same. Therefore, an equal group size cannot be obtained for this analysis. The power for unequal group sizes is expected to be lower. Table 3 shows the power as a function of effect size for = 0.01, n = 13, 14, 15, 16, and k = 2, 3. When there are two different haplotypes within a locus, 80% power can be achieved using 16 strains when effect size is greater than 0.49 or using 13 strains when the effect size is greater than 0.56. With this background, we can now analyze the performance of this haplotypebased computational mapping method. It correctly predicted the genetic basis for strain-specific differences in several biologically important traits (4). In one published example, haplotypic blocks associated with categorical MHC phenotypes for the class Ia K, class III S, and the class Ib Qa2 loci were correctly identified. The identified blocks were contained within regions of 27, 51, and 100 kb, respectively, which contained the actual MHC genes corresponding to the trait. The MHC phenotypes represent a diverse class of categorical phenotypes. The 16 strains were grouped into five phenotypic categories for the class Ia K and class III S traits. There are two categories for the class Ib Qa2 phenotypes. Most importantly, there were no false positives among top predictions for these traits. In another example, a binary response phenotype—measuring the induction of cytochrome P450 enzymes after treatment with aromatic hydrocarbons (AH) response—using data obtained from 13 inbred mouse strains was computationally analyzed. The phenotypic data was analyzed, and two adjacent haplotypic blocks within a 27-kb region, each containing three haplotypes, were computationally identified. The identified region contained the Ahr locus, a gene that contributes to the AH response phenotype. The functional genetic element that contributes to the phenotype was easily identified by analysis of the polymorphisms within the region. In one other example, the pattern of expression of a differentially expressed gene within the lungs of 10 inbred mouse strains was computationally analyzed to identify a novel cis-acting allele-specific 64 Wang and Peltz Table 3 The Power of the One-Way ANOVA as a Function of the Genetic Effect Size for Different Total Sample Size (n) and Number of Groups (k) n = 13 n = 14 n = 15 n = 16 2 k = 2 k = 3 k = 2 k = 3 k = 2 k = 3 k = 2 k = 3 0.2 0.15 0.09 0.17 0.10 0.19 0.11 0.21 0.13 0.25 0.21 0.13 0.24 0.15 0.27 0.17 0.30 0.19 0.3 0.29 0.17 0.32 0.20 0.36 0.23 0.40 0.26 0.35 0.37 0.23 0.42 0.27 0.46 0.31 0.51 0.35 0.4 0.47 0.30 0.52 0.35 0.57 0.40 0.62 0.45 0.45 0.58 0.39 0.63 0.45 0.68 0.50 0.73 0.56 0.5 0.68 0.49 0.74 0.56 0.79 0.62 0.83 0.67 0.55 0.79 0.60 0.84 0.67 0.87 0.73 0.90 0.78 0.6 0.88 0.72 0.91 0.78 0.94 0.83 0.96 0.88 0.65 0.94 0.83 0.96 0.88 0.98 0.92 0.99 0.94 0.7 0.98 0.91 0.99 0.95 0.99 0.97 1.00 0.98 0.75 1.00 0.97 1.00 0.99 1.00 0.99 1.00 1.00 Haplotype-Based Computational Genetic Analysis In Mice 65 enhancer element (4). For this analysis, the level of expression of the H2-E gene in the lungs of 10 inbred strains was measured. A log transformation of this gene expression data was computationally analyzed to identify a 1kb region within the first intron of the H2-E gene. This computational prediction led to the discovery of a novel functional element regulating the H2-E expression. Of note, only 10 strains were used in this computational mapping. When gene expression data from only eight or nine strains were used for the computational analysis, the same region was predicted, and no falsepositive predictions were obtained. These examples demonstrate that the computational mapping can be achieved using phenotypic data from a relatively limited number of inbred strains. To illustrate how haplotype-based computational mapping is performed, we provide a detailed description of how H2-E gene expression data was analyzed. For this analysis, the level of H2-E gene expression in femalelung was measured three times for each of the 10 inbred strains analyzed (Table 4). An important assumption of haplotype-based analysis is that the residuals are normally distributed and the standard deviation is the same for groups of strains sharing haplotypes. However, the level of expression of this gene was quite different among the strains analyzed, and the standard deviation within each mouse strain was proportional to its level of expression. Therefore, the log transform was used, and the error distribution was closer to normal. The plot of the residual against the strain average (Fig. 2) shows that the assumptions used for the linear model are approximately true for the log-transformed data. When replicate measurements are available, the data can be evaluatedprospectively to determine whether the computational analysis is likely to correctly identify a genetic locus. The genetic influence of a single locus should be larger than the threshold determined by the power analysis. Even though it impossible to estimate the contribution of the primary locus based on the raw data, it is possible to estimate the total genetic effect. This is Table 4