Download Text S1.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy of the human retina wikipedia , lookup

Human genetic variation wikipedia , lookup

NEDD9 wikipedia , lookup

Minimal genome wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene therapy wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Ridge (biology) wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genomic imprinting wikipedia , lookup

Public health genomics wikipedia , lookup

Group selection wikipedia , lookup

Mutation wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene desert wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Tag SNP wikipedia , lookup

Gene wikipedia , lookup

Population genetics wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epistasis wikipedia , lookup

Point mutation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

The Selfish Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Supplement to Fraser et al., “Systematic detection of polygenic cis-regulatory evolution”
Notes on the test of selection
We note that the F2/eQTL mapping version of our test is readily applicable to any species in which
F2 populations can be produced and genotyped. This generally requires inbred parental lines derived from
independent populations, or a haploid/diploid life cycle (such as S. cerevisiae). At present this includes
nearly all major model organisms (mouse, rat, nematode, fruit fly, Arabidopsis, S. pombe, etc.), underscoring
the general applicability of our test. In addition, outbred individuals could be used as the parents in the F1
RNA-seq version of our method, as long as these individuals come from distinct populations (so that
adaptive differences could have accumulated) and a sufficient number of sequence differences are known for
the two parents (or at least the two parental populations) to allow measurement of allele-specific expression
for a significant fraction of the genome.
As stated in the main text, nearly all previous tests of selection require either 1) information from
neutral sites in order to assess when neutrality can be rejected; 2) assumptions about population demography
that are violated by bottlenecks or other common demographic scenarios; 3) assumptions about mutation
rates or the distribution of fitness effects of mutations; or 4) some combination of these. To name just a
handful of these tests that depend on such assumptions: dN/dS, McDonald-Kreitman, and others require
assuming neutrality of synonymous sites, which is often violated in real data; polymorphism frequency
spectrum-based tests (such as Tajima's D, Fay and Wu's H, Fst, etc.), haplotype-based methods (e.g. iHS2)
and the McDonald-Kreitman test are sensitive to bottlenecks and other irregular population demographics
(e.g. refs 3-4); and Poisson Random Field is sensitive to many assumptions about demography and the
distribution of selection coefficients5. Because the present test (like Orr’s1) focuses only on the directionality
of differences between lineages, it requires no such assumptions. Put another way, there is no known
mechanism by which changes in population size, mutation rate, or fitness effects of new mutations could
cause some gene sets to accumulate an excess (compared to all other genes) of independent cis-regulatory
mutations that act in the same direction, aside from effects on the selective forces acting on those gene sets
(which is what we are measuring). Although most new mutations are in fact down-regulating (see section
“Note on positive selection vs. relaxed negative selection” below), and so a simple increase in mutation rate
is expected to increase the number of down-regulating mutations in any gene set, it will also be increasing
this for all other genes—and so our method of choosing an equal number of B6-upregulated vs. CASTupregulated genes for use in our scan will not be affected by changes in mutation rate, even in the presence
of a bias in directionality of new mutations.
The difference in the number of cis-upregulated genes from B6 vs. CAST is an unbiased estimate of
the number of genes in a particular gene set that are under selection, under the assumption that selection was
not also acting to upregulate genes in the lineage with fewer cis-upregulated genes (in which case our
estimate would be conservative). The rationale for this is that under neutrality, given that we use an equal
number of B6-cis-upregulated and CAST-cis-upregulated genes as input to our scan, the expectation is for an
equal number of cis-upregulating alleles from each lineage. Since our scan searches for the largest
deviations from this, performing this on a single cohort would result in over-estimates of the true difference,
due to the Beavis effect (or “winner’s curse”). However when similar results are seen across all four cohorts
(or each pair of cohorts from each sex, in the case of sex-specific effects), this makes any such bias
extremely unlikely—for the same reason that replication studies of QTLs are not subject to the Beavis effect.
The Morris Water Maze
The Morris Water Maze is a widely used tool for testing learning and memory of mice. In this test,
mice are placed in a circular pool of water, with an invisible submerged platform being the only means that
the mice can stay above water without swimming. In one version of the test, the mice must find the platform
by chance during the initial training trials, after which the platform is removed from the pool. Those mice
that recall the location of the platform will tend to spend more time swimming in its expected location,
whereas those with poor memory will swim randomly. Two measures of memory accuracy are the fraction
of time the mice spend in the correct quadrant of the pool, and the number of times the mice swim over the
former location of the platform, during a one minute trial period.
CAST and SPRET each showed essentially no memory in this test, spending 28% and 27% of their
time in the correct quadrant (compared to 25% expected by chance) respectively6. B6 spent 42% of its time
in the correct quadrant, which is significantly greater than 25%. During the trial period B6 crossed the
platform's former location an average of 5.9 times, while CAST and SPRET were well under half of this, at
2.4 and 2.0 crossings. Therefore it is apparent that B6 outperforms CAST and SPRET in this memory test.
In fact, B6 outperformed all 12 other strains tested by over 50% in the number of platform location
crossings, a significant difference6.
Another version of the MWM does not remove the platform, but instead records the time required for
the mice to find the platform, after initial training. In this test, B6 mice took an average of 41 seconds to find
the platform on the first day, but this dropped to 28 seconds on the second day and 23 seconds by the third7.
Therefore B6 showed increasing memory of the platform location as the experiment progressed. In contrast,
CAST again showed no capacity for learning or memory, spending an average of 56 – 58 seconds swimming
on all three days of the trial (and this is actually misleadingly low, since the experiment was stopped and the
mice were placed on the platform by the experimenter after 60 seconds of swimming), with no improvement
and thus no apparent memory of the platform's location7.
Testing effects of SNPs on microarray and RNA-seq data
SNPs overlapping microarray probes could disrupt hybridization, leading to false cis-eQTL with a
directional bias towards higher B6 expression (since the arrays were designed to the B6 genome sequence).
If this occurred preferentially in some gene sets, it could lead to a false-positive result in our test. To test for
this possibility, we conducted two tests. First, we compiled a list of our array probes that overlap B6/CAST
SNPs, and asked whether this set of probes was enriched in any of our significant gene sets (Table 1). No
enrichment was seen (hypergeometric p > 0.1 for all gene sets). Second, we excluded these probes from our
analysis, and tested whether the same enrichments were observed. In all cases this exclusion had only a
negligible effect (the same gene sets were found at high- and medium-confidence). Together these analyses
indicate that SNPs disrupting probe hybridization are unlikely to explain our results.
For RNA-seq the situation is somewhat different. When measuring allele-specific expression in an
F1 hybrid, annotated SNPs are key to differentiating alleles, but unannotated SNPs will appear to be
sequencing errors when the non-reference (CAST) allele is observed. While a small number (1-2) of
errors/unannotated SNPs can be tolerated when aligning short reads, this effect may still cause CAST alleles
to be under-represented. To test whether this may cause the B6-upregulated gene sets we observed, we
reasoned that while a gene set may well be enriched for SNPs, it is very unlikely to be enriched only for
unannotated SNPs (considering that millions of SNPs are already known, and were discovered without
regard to the functions of genes they are in, so statistical power and SNP ascertainment bias should not pose
problems). Therefore we used annotated SNP enrichment as a proxy for unannotated SNP enrichment, and
tested whether either of the two significant gene sets from our RNA-seq analysis (memory and calmodulin
binding) was enriched for known SNPs. Neither was (hypergeometric p > 0.2 for both), indicating that their
B6-upregulation is unlikely to be an artifact due to unannotated SNPs.
Detecting subtle phenotypic effects
In Figure 4 we show several examples where eQTLs for growth regulators colocalize with QTLs for
naso-anal length. This approach can only detect major-effect QTL, given our sample size of 442 F2 mice.
One approach to detect more subtle effects is to sum the total number of each F2 individual's B6 alleles at all
growth regulator cis-eQTL, and compare this sum to the mass of each F2 individual; a relationship might be
found even when each locus is not predictive in isolation. Although this sum was a significant predictor of
mass, this was also true for randomly chosen genetic markers, suggesting that too many minor-effect loci
exist for this approach to be informative.
Note on trait ascertainment bias
For our previous test of lineage-specific selection on gene expression, choosing genes for the test
based on having large expression differences between parental lines would introduce another type of
ascertainment bias, since it would enrich for genes that are targeted by multiple “reinforcing” eQTLs (see
supplement of ref. 8). We note that this type of filtering is not as much of a problem for the current gene setbased test, since the genes used in this test are ranked based on the strength of their cis-eQTL only.
Nevertheless, we recommend not using this type of filtering for this test, as there are some scenarios where
requiring a strong parental difference could bias results.
It may appear that for any given gene set, the neutral expectation should be based upon the difference
in parental expression levels for each gene in that set, following Orr’s method for adjusting for ascertainment
bias1. However this is not the case, as can be demonstrated by the following example. Flips of unbiased and
biased coins can be used to represent neutral and selected gene sets (respectively). For a neutral gene set, the
cis-upregulating allele is equally likely to come from each parent, so is perfectly modeled by a fair coin.
Having sets of unbaised coin flips (with any number of flips per set), the distribution of number of “heads”
flips will follow the binomial distribution with 50% expected heads, and 50% tails. Having biased coin flips
will instead yield a biased result, with for example 80% heads expected if the coin is 80% biased towards
heads (e.g., ~80 heads and ~20 tails after 100 flips). The goal of our method is to distinguish between these
biased and unbiased sets. If we were to adjust each set for the total bias (analogous to the parental difference
in expression levels in the absence of trans-acting changes) before applying our method, we would be
controlling for the very signal we wish to detect—for example adjusting the 80 heads/20 tails result to say
that our neutral (or unbiased) expectation is 80/20 for this set would then guarantee that we would not detect
the bias in this set. This illustrates why it is not appropriate to adjust each gene set based on the parental
gene expression levels; as long as these are free of ascertainment bias as discussed above, the neutral
expectation is a binomial distribution with 50% expected cis-upregulation from each strain’s alleles.
For further discussion of ascertainment bias issues in sign tests, see the supplement of ref. 8.
Note on positive selection vs. relaxed negative selection
The following paragraph is reproduced (with some modifications) from the supplement of ref. 8.
It is important to note that this test of lineage-specific selection cannot distinguish between positive
selection for altered gene expression levels vs. a relaxation of negative selection, combined with a bias in the
directionality of mutational effects. The following scenario illustrates how relaxed negative selection can
lead to a pattern of cis-eQTL with biased directionality in a gene set. Imagine a gene set whose expression is
under strong negative selection in one lineage, so that no eQTL accumulate in this lineage, but (for whatever
reason) is under no selection in another lineage. In the unselected lineage, mutations causing cis eQTL will
accumulate. If the directions of these neutral mutations are equally likely (under the null) to be up- or downregulating, then the selection test will be a faithful indicator of positive selection. However, if they are
biased in one direction, then this will appear as an excess of cis eQTL acting in one direction. Such a bias is
likely to exist for new mutations (prior to selection) to down-regulate gene expression. This can be seen in
two ways. First, since the vast majority of random nucleotide sequences do not drive significant levels of
transcription, it stands to reason that mutations bringing a cis-regulatory region closer to a random sequence
will tend, on average, to down-regulate any transcribed gene. More direct evidence for this comes from
saturation mutagenesis studies where every possible base substitution or single-base deletion is engineered
into a promoter region, and the resulting gene expression is measured. For both mammalian and
bacteriophage promoters, the vast majority of mutations that affect expression result in down-regulation
(96.1% in three mammalian promoters and 99.8% in three bacteriophage promoters9, at a 2-fold change
cutoff). Although occasional mutations can result in up-regulation, the observation of consistent cis-acting
up-regulation of genes in a gene set along one lineage likely indicates positive selection. We note that this
relaxation of selection effect applies equally to Orr's test; thus Orr’s test may be more appropriately referred
to as a test of lineage-specific selection, rather than a test of positive selection.
We note that we do not expect that fixed mutations will necessarily be so overwhelmingly downregulating, since these are the very biased subset that has survived the gauntlet of selection and drift. For the
purposes of our arguments, the relevant quantity is the fraction of new mutations that are down-regulating,
and so our use of saturation mutagenesis data is appropriate.
The McDonald-Kreitman test10, a widely used test of selection, is quite similar to ours in that either
relaxed negative selection or positive selection can result in the same effects. For the MK test, this fact has
been noted [e.g. ref. 11 and references therein], but it almost universally ignored when applying the test; it is
nearly always assumed to reflect positive, and not relaxed negative, selection. In any case, it is fair to say
that our test reflects positive selection in much the same way as the MK test, in the sense that neither can
distinguish positive from relaxed negative selection.
Note on overlap between cis-eQTLs from separate cohorts
In describing Figure 2 we note that the specific genes implicated as cis-eQTLs within the gene sets
shown (mitochondria in Fig 2a and adult locomotory behavior in Fig 2b) show extensive overlap. In the
mitochondria gene set, there are 112-126 B6-upregulated genes in each cohort (some of these are within 2
mb of one another in the genome, so were excluded from analysis to ensure independence of cis-eQTLs; thus
Fig 2a shows numbers lower than this). 83 of these are found as targets of cis-eQTLs in all four cohorts, and
an additional 29 are sex-specific (11 to females and 18 to males), as they appear in both cohorts of one
gender, but neither of the other. In the adult locomotory behavior gene set, there are 12-14 CASTupregulated genes. Eight of these are found as targets of cis-eQTLs in all four cohorts, and another two are
sex-specific (one in males and one in females). For a complete list of genes from Fig 2, see Supplemental
Table 1.
Supplemental References
1. Orr H.A. Genetics 149, 2099-2104 (1998).
2. Voight BF et al. PLoS Biol 4, e72 (2006).
3. Eyre-Walker A. Genetics 162, 2017 (2002).
4. Macpherson JM, et al. Mol Biol Evol 25, 1025 (2008).
5. Sawyer SA, Hartl DL. Genetics 132, 1161 (1992).
6. Brown RE, Wong AA. Learn Mem. 14, 134-144 (2007).
7. Le Roy I, et al. Behav Brain Res. 95, 135-142 (1998).
8. Fraser HB, et al. PNAS 107, 2977 (2010).
9. Patwardhan RP, et al. Nat Biotechnol. 27, 1173 (2009).
10. McDonald JH, Kreitman M. Nature 351, 652 (1991).
11. Hughes AL. Heredity 99, 364 (2007).