Download Statistical Power for Computational Mapping

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Genome evolution wikipedia , lookup

Tag SNP wikipedia , lookup

Metagenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Behavioural genetics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

History of genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Population genetics wikipedia , lookup

Genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Human genetic variation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Pathogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression programming wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Statistical Power for Computational Mapping
Given a set of measurements divided into k groups, ANOVA tests the
null consists of within-group variance and between-group variance. For
a power analysis using one-way ANOVA, one standard way to define
the effect size is (17): In our case, the groups are defined by haplotypes,
and 2 is the genetic effect of the haplotypes on the trait value. Let n be
the total sample size and k be the number of groups. When the group
sizes are equal, the F statistics for samples with effect size 2 follows the
noncentral F distribution F(k – 1, n – k, ) with the noncentrality
parameter Therefore, the power of the one-way ANOVA test with
significance level  is given by: (2) where Fcrit = F(1 , k–1, n–k) is the
(1 ) quantile of the F distribution with k – 1 and n – k degrees of
freedom. Note that within a haplotype block, the number of strains with
each different haplotype is usually not the same. Therefore, an equal
group size cannot be obtained for this analysis. The power for unequal
group sizes is expected to be lower. Table 3 shows the power as a
function of effect size for  = 0.01, n = 13, 14, 15, 16, and k = 2, 3. When
there are two different haplotypes within a locus, 80% power can be
achieved using 16 strains when effect size is greater than 0.49 or using
13 strains when the effect size is greater than 0.56. With this
background, we can now analyze the performance of this
haplotypebased computational mapping method. It correctly predicted
the genetic basis for strain-specific differences in several biologically
important traits (4). In one published example, haplotypic blocks
associated with categorical MHC phenotypes for the class Ia K, class III
S, and the class Ib Qa2 loci were correctly identified. The identified
blocks were contained within regions of 27, 51, and 100 kb, respectively,
which contained the actual MHC genes corresponding to the trait. The
MHC phenotypes represent a diverse
class of categorical phenotypes. The 16 strains were grouped into five
phenotypic
categories for the class Ia K and class III S traits. There are two
categories for the class Ib Qa2 phenotypes. Most importantly, there
were no false positives among top predictions for these traits. In
another example, a binary response phenotype—measuring the
induction of cytochrome P450 enzymes after treatment with aromatic
hydrocarbons (AH) response—using data obtained from 13 inbred
mouse strains was computationally analyzed. The phenotypic data was
analyzed, and two adjacent haplotypic blocks within a 27-kb region,
each containing three haplotypes, were computationally identified. The
identified region contained the Ahr locus, a gene that contributes to the
AH response phenotype. The functional genetic element that
contributes to the phenotype was easily identified by analysis of the
polymorphisms
within the region. In one other example, the pattern of expression of a
differentially expressed gene within the lungs of 10 inbred mouse
strains was computationally analyzed to identify a novel cis-acting
allele-specific 64 Wang and Peltz Table 3 The Power of the One-Way
ANOVA as a Function of the Genetic Effect Size for Different Total
Sample Size (n) and Number of Groups (k)
n = 13 n = 14 n = 15 n = 16
2 k = 2 k = 3 k = 2 k = 3 k = 2 k = 3 k = 2 k = 3
0.2 0.15 0.09 0.17 0.10 0.19 0.11 0.21 0.13
0.25 0.21 0.13 0.24 0.15 0.27 0.17 0.30 0.19
0.3 0.29 0.17 0.32 0.20 0.36 0.23 0.40 0.26
0.35 0.37 0.23 0.42 0.27 0.46 0.31 0.51 0.35
0.4 0.47 0.30 0.52 0.35 0.57 0.40 0.62 0.45
0.45 0.58 0.39 0.63 0.45 0.68 0.50 0.73 0.56
0.5 0.68 0.49 0.74 0.56 0.79 0.62 0.83 0.67
0.55 0.79 0.60 0.84 0.67 0.87 0.73 0.90 0.78
0.6 0.88 0.72 0.91 0.78 0.94 0.83 0.96 0.88
0.65 0.94 0.83 0.96 0.88 0.98 0.92 0.99 0.94
0.7 0.98 0.91 0.99 0.95 0.99 0.97 1.00 0.98
0.75 1.00 0.97 1.00 0.99 1.00 0.99 1.00 1.00
Haplotype-Based Computational Genetic Analysis In Mice 65 enhancer
element (4). For this analysis, the level of expression of the H2-E gene
in the lungs of 10 inbred strains was measured. A log transformation of
this gene expression data was computationally analyzed to identify a 1kb region within the first intron of the H2-E gene. This computational
prediction led to the discovery of a novel functional element regulating
the H2-E expression. Of note, only 10 strains were used in this
computational mapping.
When gene expression data from only eight or nine strains were used
for the computational analysis, the same region was predicted, and no
falsepositive predictions were obtained. These examples demonstrate
that the computational mapping can be achieved using phenotypic data
from a relatively limited number of inbred strains.
To illustrate how haplotype-based computational mapping is
performed, we provide a detailed description of how H2-E gene
expression data was analyzed. For this analysis, the level of H2-E gene
expression in femalelung was measured three times for each of the 10
inbred strains analyzed (Table 4). An important assumption of
haplotype-based analysis is that the residuals are normally distributed
and the standard deviation is the same for groups of strains sharing
haplotypes. However, the level of expression of this
gene was quite different among the strains analyzed, and the standard
deviation
within each mouse strain was proportional to its level of expression.
Therefore, the log transform was used, and the error distribution was
closer to normal. The plot of the residual against the strain average
(Fig. 2) shows that the assumptions used for the linear model are
approximately true for the log-transformed data. When replicate
measurements are available, the data can be evaluatedprospectively to
determine whether the computational analysis is likely to correctly
identify a genetic locus. The genetic influence of a single locus should be
larger than the threshold determined by the power analysis. Even
though it impossible to estimate the contribution of the primary locus
based on the raw data, it is possible to estimate the total genetic effect.
This is
Table 4