Download Analytical approaches to RNA profiling data for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

Primary transcript wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression profiling wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA-Seq wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
1
Analytical approaches to RNA profiling data for the
identification of genes enriched in specific cells
Joseph D. Dougherty, Eric F. Schmidt, Miho Nakajima, Nathaniel Heintz
Laboratory of Molecular Biology,
Howard Hughes Medical Institute,
The Rockefeller University,
New York, NY. 10065
2
ABSTRACT
We have recently developed a novel method for the affinity purification of the complete suite
of translating mRNA from genetically labeled cell populations. This method permits
comprehensive quantitative comparisons of the genes employed by each specific cell type. We
provide a detailed description of tools for analysis of data generated with this and related
methodologies. An essential question that arises from these data is how to identify those genes
that are enriched in each cell type relative to all others. Genes relatively specifically employed
by a cell type may contribute to the unique functions of that cell, and thus may become useful
targets for development of pharmacological tools for cell specific manipulations. We describe
here a novel statistic, the Specificity Index, which can be used for comparative quantitative
analysis to identify genes enriched in specific cell populations across a large number of profiles.
This measure correctly predicts in situ hybridization patterns for many cell types. We apply this
measure to a large survey of CNS cell specific microarray data to identify those genes that are
significantly enriched in each population.
3
INTRODUCTION
The mammalian brain is the most complex organ of the body, containing hundreds of
intermingled cell populations. These cells can be classified into types according to their
morphology, projections, functions and gene expression profiles. Currently, in vivo analysis of
gene expression and translation in particular cell types is often performed with methodologies
that are non-parallel and difficult to quantify. Because of this, it remains a challenge to
determine the complete set of proteins employed by a given cell type, determine which genes are
expressed in or specific to a particular cell type relative to all others, or establish the degree to
which a given cell population is unique in the nervous system.
Previously, we have described a method, translating ribosome affinity purification (TRAP,
Supplemental Figure 1) for the isolation of translating mRNA from individual, genetically
defined, cell types (1,2). In this method, transgenic mice are generated which express a fusion
of eGFP and a ribosomal protein under the control of a Bacterial Artificial Chromosome
(BAC)(3) for a cell-specific 'driver' gene. A complete translational profile of all ribosome bound
mRNAs is then generated from these labeled cells via brain homogenization and affinity
purification with anti-eGFP antibodies. Relative quantities of the purified mRNAs are assessed
via microarray or related technologies. Thus, for any cell type for which a driver gene can be
identified, the methodology permits a comprehensive translational profile to be prepared for all
genes. The TRAP protocol is rapid, simple, and requires no specialized equipment. This method
permits the deconstruction of the complexity of the nervous system, allowing researchers access
to individual cell types within the context of the whole brain, with sensitivity sufficient to study
whole animal manipulations such as drug treatments, experimental injuries, or genetic
manipulations(2).
The fundamental impetus for the development of the TRAP methodology was to allow the
rapid and reproducible cell-specific assessment of RNA translation. Microarray analysis, as
traditionally applied to the nervous system, results in data representing the aggregate RNA from
all of the cell types present in the tissue(4), proportional to the percentage of those cells present
and the relative amount of RNA they produce. This has several implications regarding the
interpretation of these data(5). As the observed signal on the array represents an averaging of the
levels of the transcript in each of these cell types, RNAs present in all cell types, even at
moderate levels, will have fairly high observed values compared to RNAs present at high levels,
but in rare cell types. In fact, such mRNAs may even be undetectable because they represent a
small fraction of the total tissue RNA(1). Furthermore, as the RNAs from all the cell types are
measured in aggregate, any changes in RNA levels measured in the whole tissue are not easily
attributed to any particular cell type. Detected perturbations in RNA levels could be due to the
death of one cell type, the arrival of another, and/or changes within some or all of the cell types
present. Likewise, changes in one cell type could be masked by changes of opposite direction in
another cell type. All of these factors clearly complicate the application of microarrays to assess
changes in RNA due to experimental manipulations, especially those that may have their primary
influence on rare cells. TRAP provides not only the ability to detect changes in rare cell types,
but also enhanced ability to interpret the results, as it is known a priori which cells contain the
tagged ribosomes. In addition, TRAP has the advantage over other approaches to cell specific
RNA profiling as it assesses translation, rather than expression, providing a better correlate of
actual protein levels (6).
There are distinctions between microarray experiments from TRAP RNA compared to whole
tissue RNA, and these distinctions can have important impact on the assumptions regarding
experimental design, normalization, analysis and interpretation. To aid researchers implementing
cell-specific RNA-analysis technologies(1,7-10), we present here a preferred analytical method
for TRAP translational profiling data. Importantly, the TRAP methodology provides in vivo
4
quantitative comparative analysis of multiple cell types. Here, we have developed a robust
analytical method for identifying and quantifying cell-specific and enriched mRNA’s across
multiple cell populations, referred to as the Specificity Index. We apply this to a large survey of
CNS cell types and provide a simple perusable archive of plots of this measure across all cell
types, for each gene.
MATERIALS AND METHODS
Data
TRAP data were generated as described (1,2), and are available for download from GEO:
GSE13379. Etv1 data were not plotted because of known contamination with endothelial or
lymphoblast cells(1). Other cell types and drivers are listed in Table 1.
Translating Ribosome Affinity Purification
Additional TRAP experiments on wildtype mouse brains were conducted as described
(2). RNA was quantified using the Ribogreen assay, according to manufacturer’s instructions
(Invitrogen, Carlsbad, Ca.), and a Modulus single tube fluorometer from Turner Biosystems
(Sunnyvale, Ca) with the blue optical kit.
R Code
The scripts used for calculation of Specificity Index, are available from the Heintz Laboratory
Website (address:___).
SI for a given gene (n), in a given cell type (#1), compared to cell types, k = 2...m, is given by
the formula...
...where IP1,n is the expression value for gene n in cell type one, and rank(IP1,n / IP1,k) is the
position, of gene n, in a descending-ordered list of 'fold-change' (IP1 / IPk) values for all genes.
Note that SI is only calculated for those genes in cell type k with an absolute expression above
50 in IPk, and with Log2(IPk/Totalk) values above a threshold...
...where 1..j is a set of negative control genes known not to be expressed in this cell type, IPk,p is
the expression value for gene p in the immunoprecipitate from cell type k. Totalk,p is the
expression value for gene p in the total tissue RNA from the tissue cell type k was isolated from.
Any gene for which log2(IPk/Totalk) < Thresholdk is excluded, with the caveat that Thresholdk
was not allowed to exceed zero (Supplemental Materials and Supplemental Figure 5).
Scoring Allen Brain Atlas (ABA) in situ hybridizations
5
For comparative analysis of TRAP data, we developed a blinded, unbiased scoring method
(the SENU method). For the first application, for each of four cell types, fifty probesets were
selected at random from the top five hundred most enriched genes (IP/Total). For each cell-type,
an additional fifty probesets were selected from the array at random, irrespective of IP/Total
value. For each cell type, the fifty random and the fifty cell-enriched probesets were scrambled
together and presented to three blinded judges, previously trained in the heuristics below until
inter-rater reliability was above 60% on training sets.
Judges searched for each probeset in the ABA using the gene symbol and name. If no gene
symbol or synonym could be found, the probeset was scored as absent. For probesets present in
the ABA, judges first assessed overall quality of the in situ hybdrization (ISH). If the ISH had
no detectable signal or was of low quality for the given gene, the gene was scored as a “U.”
(Unscorable).
For probesets not scored U, judges evaluated potential expression in the four cell types. For
each cell type the judges could assign one of three scores, “S” (Specific for cell-type within
region), “E” (Expressed in cell type), and “N” (clearly Not expressed). Detailed heuristics for
each are:
S: In situ must be of very good quality and show clear signal in cell type of interest that is at least 3 color levels
with the Allen ‘expression viewer’ above any other cells in the same region.
E: In situ shows expression in cell type of interest but overall signal is weak or there is clear signal in
surrounding cells as well. In situ may be moderate or good quality.
N: In situ must be of very good quality and clearly have 1) no signal in cell-type of interest and 2) very good
signal somewhere else in tissue.
As cell type is difficult to assign from colometric ISH alone, for each cell type, the pattern assayed was:
Purkinje Cells: ISH pattern in cerebellum with evenly spaced large cells in the PCL.
Motor Neurons: ISH pattern in brain stem in large cells at the approximate locations of the 3rd, 5th and 7th motor
nuclei.
Layer V Cortical Neurons: A laminar ISH pattern in cortex at approximately the position of layer 5, with, at
most, labeling in one other layer.
Oligodendrocytes: Strong specific ISH pattern in the corpus callosum. Scattered labeling in cortex also
permitted. Note - color criteria for ‘S’ had to be relaxed as oligodendrocytes are often too small to be recognized as
cells by ABA expression viewer.
For the second round of SENU analysis, two additional cell types were added, Granule cells,
and Cortical Interneurons. For each of the six cell types, one hundred and fifty ISH were scored,
fifty each from the top two hundred and fifty of IP/Total, SI, and random lists. If multiple ISH
sets were available for the same gene, only the most recent sagital ISH set was used. Heuristics
for an ISH pattern consistent with expression in granule cells or interneurons are:
Granule cells: Clear expression exclusively in granule cell layer of cerebellum, in at least 50% of the cells.
Cortical interneurons: Scattered , non-laminar expression in the cortex, with a cell number in the range between
two reference ISH patterns, Cort, and the GABA transporter Slc32a1.
For the third round of SENU analysis, all remaining cell types were evaluated (glial cells
were only scored in cerebellum). For each cell types, all genes with SI p-values <10e-5 were
scrambled with an equal number of randomly selected genes and up to forty genes per cell line
were scored blindly as above, using the driver gene ISH as a reference pattern. It is worth
noting, however, that several cell types had difficult to interpret ISH patterns (Cck), lacked
appropriate signal in even for the driver (Grp), represented small and scattered cells (Olig2,
ALdh1L1), or were found in very cell dense regions (Neurod1). For many of these cell types,
inter rater reliability was correspondingly lower.
Immunofluorescence
Adult mice were perfused transcardially with PBS followed by 4% paraformaldehyde in
PBS, cryoprotected in 30% sucrose PBS, frozen, and sliced to 40 microns on a cryostat. Floating
sections were blocked with 5% normal donkey serum in 0.25% Triton X100 PBS and incubated
6
overnight with chicken anti-GFP antibody (Abcam, Cambridge, Ma), and/or Grm1 (AB1551,
Chemicon, Temecula, Ca) and Calb2 (6b,3 Swant, Bellinzona, Switzerland) incubated ninety
minutes with appropriately Alexa conjugated secondary antibodies (Invitrogen, Carlsbad, Ca),
and counterstained with DAPI. Images were acquired with a Zeiss LSM 510 inverted confocal
microscope.
RESULTS
Dataset
The microarray data employed for these studies is from a published survey of CNS cell types
generated with the TRAP methodology (1). This dataset contains samples representing a variety
of pure and mixed cell types from different structures of the mouse brain, as well as samples
from the corresponding whole tissue (Table 1). The purified samples are referred to as
immunoprecipitates (IP). In parallel, RNA which did not bind to the antibody was also harvested
to provide an assessment of the gene expression of the tissue as a whole. These samples are
referred to as unbound RNA. Microarray analysis, as traditionally applied to the nervous system,
results in samples that are most similar to unbound samples. As the immunoprecipitation does
not lead to significant depletion of cell-specific RNAs, here we use the unbound samples as a
measure for the total tissue homogenate RNA (referred to as Total).
IPvTotal plots
As an initial assessment of TRAP data, described in detail below, we generated scatterplots
with the log signal intensity for the IP on the X axis and the Total on the Y axis (IPvTotal plot)
for each cell population. Systematic examination of these plots revealed they could be used for
quick visual assessment of the quality of the TRAP experiment, particularly for the level of nonspecific background (Figure 1, Supplemental Figure 2). We first applied these plots for
assessment of different metrics of normalization (Supplemental Figure 3, and Materials). It also
became apparent these plots may also indicate the rarity and/or uniqueness of the cell type within
its tissue (Supplemental Figure 4 and Materials). Finally, we assessed IP/Total as a measure to
identify those RNAs that may be specific or enriched in a given cell type.
Figure 1a shows an example of this plot for Purkinje cells. A list of genes known from the
literature to be glial specific (and thus not in Purkinje neurons) has been marked in red, and a
variety of genes determined from in situ hybridization databases(11,12) to be highly expressed in
Purkinje cell layer, including the driver for this mouse line, Pcp2, have been marked in blue
(Supplemental Table 1). From this plot it is clear that RNAs known to be enriched in Purkinje
cells have high ratios of IP/Total. Genes that are known not to be expressed in Purkinje cells,
such as those that are specific to glia, are highly enriched in the Total RNA. They have low
IP/Total ratios. Based on the locations of these positive and negative controls, we have
developed a heuristic for the interpretation of IPvTotal plots, illustrated in Figure 1b. Essentially,
from the top left corner of the plot to the bottom right, one has increasing confidence, first that
the RNA derives from the targeted cell type, and then that it is highly enriched in that type. Note
that probesets with low signal (bottom left corner) should be considered with caution, as they
tend to have higher variability(13).
IP/Total for identification of enriched genes
As previously shown, if a RNA is specifically translated in the targeted cell type within a
tissue, it should have a very high IP/Total ratio (1). As an independent, qualitative measure of
the expression of specific mRNAs within a cell of interest, we compared our data to in situ
7
hybridization(ISH) data from the Allen Brain Atlas (ABA)(11). Since it is often difficult to
establish cell identity by ISH data alone, we chose for this first comparative study four cell types
that are relatively simple to identify by size and localization in colormetric ISH (brainstem motor
neurons, cerebellar Purkinje cells, layer 5 cortical pyramidal cells, oligodendrocytes). For each
cell type, a list of fifty ‘high IP/Total’ probesets was selected at random from the top five
hundred probesets, as ranked by IP/Total. Many of these mRNAs are only moderately enriched:
minimum IP/Total ratios range from around two (motor neurons, layer V cortical neurons) to
around four (Purkinje cells). For comparison, an additional 50 probesets were selected at random
from the array, and scrambled with the list above. These lists were then presented to three
blinded judges and the ISH for all genes were scored as specific (S), expressed (E), clearly not
expressed (N), or unscorable (U) in the cell type of interest. Figure 2a shows examples of S, E,
and N and U scores for brain stem motor neurons. After excluding the ISH scored U, probesets
for genes with high IP/Total were highly enriched by ISH in the cell type of interest (S), and less
likely to appear not expressed (N) than the random list of fifty genes (Chi square, p < .0005 for
each cell type). Typically, probesets with high IP/Total ratios were three to four times more
likely to be scored S than random genes (Figure 2b). Although this analysis demonstrated that
TRAP analysis results are concordant with the easily scored ISH data, the level of enrichment
varied substantially between the cell types assessed. Given this fact, and the many factors that
limit the utility of ISH data for detection of cell specific changes in gene expression in complex
tissues, we sought to develop an independent method for the quantitative measurement of the
specificity of expression of any gene in a given cell type or condition relative to a large number
of other cell types using comparative analysis of TRAP data from a variety of specific CNS cell
types.
The Specificity Index to identify cell-specific and enriched genes
As described above, the IP/Total metric can be used as a simple method to suggest cellspecific and enriched genes. However, there are three drawbacks to the method. First, there are
cell types where logically it would be ineffective, such as granule cells of cerebellum or medium
spiny neurons of striatum. Over 90% of the cells in the cerebellum are granule cells(14). As
such, a comparison of a granule cell IP to Total cerebellum will yield little enrichment of granule
cell genes, as shown in Supplemental Figure 4b. In contrast, comparison of the granule cell IP
data to the IP data obtained from Purkinje cells clearly reveals a high enrichment of the granule
cell driver gene, Neurod1 (Supplemental Figure 4c). This demonstrates that the granule cell IP
was robust, and illustrates the value of comparative analysis of TRAP derived from specific cell
types. Likewise, comparison of the Drd1a+ or Drd2+ medium spiny neurons to Total striatum,
which is made primarily of medium spiny neurons, will identify very few striatally enriched
genes(2). The second drawback is that a comparison of IP to Total will only yield information
about enrichment relative to one particular dissected structure, and not the rest of the brain. To
accurately determine the suite of cell-specific genes, one needs to make multiple comparisons
across all available cell types and structures. Finally, IP/Total alone does not give a sense of how
likely a particular ratio is to appear by chance, and at what threshold a gene should be considered
enriched. Indeed, from the four cell types scored above, there were clear differences in fraction
of specific genes found in the top five hundred IP/Total (Figure 2b).
To overcome these problems, we developed a generic algorithm, the Specificity Index (SI),
to assess the specificity of a given RNA in one sample relative to all other samples analyzed.
For each cell type, the SI is calculated in three steps, as illustrated in Figure 3a-e. First, following
GCRMA normalization within replicates and global normalization across samples, the IP was
compared to the Total to filter out the non-specific background by setting a simple threshold
based on negative controls (Supplemental Figure 5, and Materials). For those cell types known
8
to have significant background contamination, this threshold was left at one, so as to not filter
too many probesets and create false negatives. Probesets with low signal were also removed,
following standard practice with microarray data. Second, for the remaining probesets, this
filtered IP was iteratively compared to each other (unfiltered) sample in the dataset and a ratio
was calculated for each probeset. To prevent extreme outliers from skewing the subsequent
analysis, and to make the analysis more robust for difficult to normalize datasets, the probesets
were ranked from highest to lowest ratio within each comparison. Third, for each probeset, its
ranks across all comparisons are averaged to give the SI. Thus, the SI is a measure of the
specificity of expression for each probeset in a given cell type relative to all other cell types
included in the analysis: how highly ranked on a gene list is this probeset, on average, in this
cell type compared to all others.
Validation of Specificity Index and comparison to IP/Total
To determine if the SI succeeds in selecting cell specific genes in those cases where IP/Total
comparisons fail, we first examined the expression of genes predicted by each method to be
translated in granule cells. Figure 4a shows a comparison of eGFP immunohistochemistry for
GENSAT BAC transgenics (15) for two genes selected by IP/Total and two selected for a high
SI. The genes selected by SI clearly have an expression pattern that is more consistent with
highly enriched expression in cerebellar granule cells: labeling of many cell bodies in the
cerebellar granule cell layer, with fibers filing the molecular layer, where granule cell axons
project.
To determine how effectively the SI index performs in general compared to IP/Total in
selecting cell specific and enriched genes, we repeated our SENU analysis of ABA ISH patterns
for one hundred and fifty probesets for each of six cell types (Figure 4b). Fifty probesets were
chosen randomly each from the top two hundred and fifty probesets of SI and IP/Total, as well as
fifty random probesets from the array. These were scrambled and scored by three blinded
judges, as above. As before, Chi tests revealed TRAP data were performing significantly better
than chance at predicting specific gene expression (p<.01 to p<10-99, across either metric in
each cell type). As expected, SI outperforms IP/Total for those cases where the TRAPed cell
type makes up a significant fraction of the total, such as Neurod1 positive granule cells. Quite
surprisingly SI also out-performed IP/Total with Purkinje cells (Cb.Pcp2), cortical
oligodendrocytes (Ctx.Cmtm5) and cortical interneurons (Ctx.cort). In the worst case, that of
layer V cortical projection neurons (Ctx.Glt25d2), IP/Total or SI both yielded approximately
50% more than the amount of specific patterns expected by chance. There were no cell types
where IP/Total clearly performed significantly better than SI. Thus we determined that the SI is
a useful and robust metric for identifying cell specific and enriched genes.
The Specificity Index as a Statistical Measure
The SI is influenced by both the variations in the number of transcripts that are enriched in
each cell type being analyzed, and the purity and recovery of TRAP mRNA collected for each
cell type. The range of the rankings is dependent on the number of probesets in the comparison,
and that number depends on the number of genes expressed and the level of filtering in each
particular cell type. Consequently, raw SI values are not directly comparable across cell types.
In addition, the SI alone does not provide a sense of how likely a given rank is to occur by
chance. Therefore, for each SI we calculate a p-value via permutation testing as illustrated in
Supplemental Figure 6a: for each IP, the filtered expression values are randomly shuffled many
times and SIs are calculated for all probesets, to determine the frequency of a particular SI value
appearing. This creates a simulated probability distribution. The probability of any given SI from
9
the true distribution can be assigned from the simulated distribution. Thus one can derive a list of
genes that are significantly specific to, or enriched in, any particular cell type, with a known
probability (Supplemental Figure 6b). We note that for each cell type, the number of genes that
reach a given statistical threshold is different. However, since these probabilities are comparable
across cell types, they can be plotted to permit assessment of the specificity for a given probeset
across all cell types analyzed, as illustrated for the granule cell driver Neurod1 in Figure 4c.
To determine whether the SI is an accurate relative measure of the specificity of expression
of each gene relative to all others for the cell types analyzed, we next performed a post-hoc
analysis of our judges’ ratings in the SENU analysis pooled across all six cell types from Figure
4b. For p-values of < .00001, over 75% of scorable ISH were scored ‘Specific,’ compared to
approximately 15% of those p >.1 (Figure 4d). Even with extensive training in detailed
heuristics and blind scoring there is substantial subjectivity in the interpretation of ISH, and only
55% of the nine hundred ISH had identical scores from all three judges (for 95% however, at
least two judges agreed on the score). Of these ISH on which all three judges agreed, 100% of
the genes with p < .00001 were scored as specific (not shown). This analysis provides a
potential heuristic for the interpretation of various SI p-values for a gene across cells types: while
any p-value <.1 suggests some enrichment, as p-values continue to decrease, enrichment
increases until the majority, if not all genes at extremely low p-values are highly specific (Figure
4d).
Finally, to generalize this finding to all remaining cell types, we examined the ISH pattern
for all cell types for those genes with p<.00001. This represented a challenge as most of these
cell types cannot be unambiguously identified by position information alone. For each cell type,
the p<.00001 genes were scrambled with an equal number of randomly selected genes and up to
forty genes per cell type were scored blindly by three judges. For this analysis, genes were
scored as specific if their ISH pattern matched that of the driver for the TRAP line. Across
nearly all cell types most of these p<.00001 genes had patterns consistent with specific
expression in the correct cell types (Figure 5a), representing in all cases a highly significant
enrichment relative to randomly selected genes (Chi square p< .0005 to p<10-21).
However, for three cell types, SI did not perform well at predicting ISH patterns, and we will
discuss these briefly because they are each illustrative of an important point regarding this
analysis (Figure 5b). First, for the ISH patterns for genes from the line Etv1, which expresses the
eGFP-L10a transgene in layer 5b projection neurons, over 70% of the P<.00001 were specific to
blood vessels. This strongly suggests that the bacTRAP construct is also expressed in
endothelial cells or some component of the blood in this line. This illustrates the point that
careful anatomical characterization of TRAP lines is essential. Minor contamination by rare cell
types will be very apparent following SI analysis, and confirmation with ISH databases.
Second, for the Cck TRAP line, which is expressed broadly in multiple layers of cortex and
in both pyramidal cells and interneurons, we observed no significant enrichment for specific ISH
patterns (Chi square, p=.3). We believe this reflects the fact that this line includes so many
neuron types that nearly any gene expressed in neurons will be present in the IP. This illustrates
the difficulty in assessing ISH results for a TRAP driver that is broadly expressed.
Third, in the Cb.Grp data, representing a mix of unipolar brush cells (UBC) and Bergman
glia, SI identified genes did not show enrichment for specific ISH patterns (Chi Square p =.14).
As Bergman glia are represented in both the Cb.Aldh1L1 and Cb.Sept4 datasets, the most
specific genes for this data should come from the UBCs, a small excitatory interneuron found
primarily in the granule cell layer of the posterior lobules of the cerebellum(16). However, since
even the driver, Grp, did not have a specific ISH pattern, we were suspicious that ISH may have
reduced sensitivity for detecting messages in this scattered population of small neurons in the
cell dense cerebellar granule layer. Some SI identified genes, such as Nmb, did show a scattered
10
precipitate in lobule X of the cerebellum, but the particles were too small to be clearly
identifiable as cells by our judges. To provide an independent dataset, we examined the
GENSAT database(15) for the SI identified genes in the 5 lines for which adult data were
available online (Grp, Nmb, Ntf3, Otx2, and Eomes). Three of these five lines clearly expressed
GFP in cells with the distinct morphology and position of UBCs in the online database. We
further confirmed, in the NMB line, that these were indeed UBC's by confocal triple
immunofluorescence for GFP and the UBC markers Calb2 and Grm1 (Figure 5c)(17).
In general, we note that, despite the concordance between the TRAP data and ISH results for
genes whose expression is easily detected by ISH, a significant fraction of the genes determined
to be enriched in a specific cell type by TRAP analysis could not be scored from the ISH data
(Figure 2b). RT-PCR on genes without detectable signal (U) by ISH reveals that in these cases
the RNA is indeed present in the brain, and enriched in the TRAP samples from the cell types of
interest (1). This is not surprising, since successful ISH is dependant on many factors, including
expression level of the gene, hybridization kinetics of the probe, and availability of unique
sequence for probe design. We conclude that negative results on ISH should be interpreted with
caution. Of course, for any specific case, differences in ISH patterns and TRAP measurements
could also indicate there is a difference between transcription and translation of a given gene.
Finally, the mixed oligodendrocyte data (Olig2 line) only showed 43% specific ISH patterns.
While this is still significantly better (Chi Square, p <10-5) than the 12% identified by chance,
we were curious if this reflected the fact that of the twenty six cell types included in the SI
calculation, at least 4 contained information primarily from oligodendrocytes (ctx.olig2, cb.olig2,
ctx.cmtm5, cb.cmtm5). Thus, we repeated the SI analysis on our data set, but excluded three of
these four samples collected from oligodendrocytes so only one, unique oligodendrocyte sample
remained. As shown in Figure 6, this resulted in two major effects: first, as the oligodendrocyte
data became more unique in the analysis, there were now twice as many genes with p<.00001 by
SI; second, when the ISH for these genes were scored blindly as above, 70% of them showed
specific ISH patterns. This demonstrates the fact SI is a relative measure that is influenced by
the composition of the entire dataset, and that one should carefully consider which datasets to
include for the specific experimental question being addressed.
An archive of SI for all genes
To provide a resource to permit researchers to examine the specificity of the translation of
any gene across all cell types included in this analysis, we have created SI plots for all genes on
the array using updated chip definition files (18) that provide one measure per ENTREZ gene ID
(19) (Supplemental Figure 7). Figure 7 illustrates Specificity Index p-values, as well as IP/Total
values for 6 representative genes, across the 24 cell populations from the original studies. This
includes examples for genes known to be enriched in a few cell types (Slc18a3, the vesicular
acetylcholine transporter, in cholinergic cells, Dlx1 in interneurons); metabolic genes expressed
ubiquitously, though not equally, across the brain (Actb, Rpl8); and two genes implicated in
autism, Nrxn2 and Nrxn1, which show broad but variable expression, or enrichment in a limited
set of cortical neurons and granule cells, respectively. SI plots for all genes are available as a
downloadable archive from: (Heintz lab website). Simply browsing through images can
highlight remarkable biology. For example, the GalNAc transferase family is a group of golgi
apparatus enzymes that catalyze the addition of oligosaccharides to protein receptors destined for
the cell surface. Supplemental Figure 8 shows plots for six members of this family,
Galnt2,3,4,6,14,L2, five of which show remarkable cellular specificity to either oligodendrocyte
progenitors, astroglia, mature oligodendrocytes, layer 5 cortical projection neurons, or granule
cells. As many of these enzymes have affinities for distinct donors and acceptors (20), cellspecific expression of these proteins may result in distinct cell surface moieties.
11
DISCUSSION
We present here a set of analytical procedures that have been developed for analysis of
TRAP translational profiling data. These approaches are specifically designed to accommodate
features of TRAP translational profiling data that arise from the cell specific nature of the TRAP
data, and to provide a robust framework for comparative analysis of data obtained from large
numbers of cell types. In particular, we report the development of a Specificity Index to provide
a relative and quantitative measure for the specificity of expression of all genes across the cell
types being studied. In general, results of this analysis are concordant with easily evaluated ISH
data from the Allen Brain Atlas(11). However, our data also indicate that TRAP translational
profiling can reveal cell enriched expression for a large number of genes and cell types that are
not easily assessed by ISH.
The Specificity Index
The impetus for the development of the Specificity Index was to accommodate the facts that
there are dramatic differences in mRNA profiles between different cell types, and that it was
evident that a new method for comparative and quantitative analysis of TRAP data was needed.
Although analysis with standard tools for identifying statistically significant differences between
samples also can apply to TRAP data (21-25) comparisons of widely divergent cell types, such
as astrocytes and Purkinje neurons, result is greater than 60% of probesets reaching statistical
significance (p<.05) using the empirical Limma (24) module of Bioconductor (25) with FDR
multiple testing correction. This number of statistically significant changes demonstrates the
limited utility of such methods for selecting small numbers of targets for biological follow up
studies from such dramatically different cell types. The Specificity Index we’ve described here is
robust, and uses a permutation based statistical approach to compensate for any irregularities in
the distributions of the data, allowing direct comparison of p-values across samples with quite
varied distributions. As shown above, this measure provides results consistent with published
data, and independent assays of gene expression provided in the Allen Brain Atlas. However, the
Specificity Index is clearly dependent on the number and nature of the samples included in the
analysis (Figure 6). Consequently, the design of the Specificity Index analysis should be tailored
to the biological question at hand. However, as we anticipate that many researchers will be
interested primarily in the output of this analysis, rather than its implementation, we provide an
archive of Specificity Index histograms for all genes across all cell types included in this study to
permit in silico interrogation of cell-specific and enriched mRNA translation.
TRAP compared to In Situ Hybridization, Immunohistochemistry, and BAC Transgenesis
for assessment of gene expression
The TRAP methodology is complementary to other methods of examining gene expression
and protein translation in the CNS, though it has several distinct advantages. ISH and
immunohistochemistry both require the laborious development and optimization of gene specific
reagents, and depending on the size, location, expression level, and subcellular localization of the
target, may not provide sufficient information to unambiguously identify the cell type labeled.
BAC Transgenesis with an eGFP transgene provides comprehensive information about
morphology and projections of the labeled cells, as well as a living reagent for further study(3),
but requires a substantial time investment. Of the four methods, only TRAP can provide, in a
single experiment, interrogation of the entire translated genome. TRAP data is quantitative, with
the highest potential sensitivity and dynamic range. However, for the examination of a single
12
gene across the entire CNS, ISH, immunohistochemistry, or BAC transgenesis will remain the
method of choice.
Conclusions
A variety of different tools have been generated for the analysis of microarray data (13,2123). For those interested in developing or applying other array analysis methods to TRAP data,
it is advisable to first test those methods on our most robust datasets, such as the Purkinje cell
data, where a variety of positive and negative control genes can be used as standards
(Supplemental Table 1), and the cells can be more easily identified by ISH. Furthermore, for
most experiments, there are two important considerations for data analysis: first, quantile
normalization should only be applied to that which would be expected to have similar mRNA
distributions (from the same cell type or region, see Supplemental Figure 3, and Materials);
second, comparisons of IPvTotal data can be used to remove non-specific background prior to
IPvIP comparisons, regardless of the source of this background (Supplemental Figure 2,5,9 and
Materials). Ongoing improvements in the molecular methodology are likely to remove most of
the non-specific background deriving from interaction of purification reagents with untagged
ribosomes seen in this first survey, making in many cases the filtering steps for calculating SI
unnecessary in the future, though this filtering approach may still be applicable for dealing with
low level expression of the transgene in cell types of secondary interest to the study. To aide in
this normalization, lists of recommended negative control probesets are included in
Supplemental Table 2 (although any genes known not to be expressed in the cell type of interest
may be used). Standard statistical methods (21-25) remain essential for detecting more subtle
differences, such as the changes within a single cell type following exposure of the animal to a
drug (2).
Given the improved sensitivity and anatomic specificity obtained using TRAP and related
methodologies, we anticipate wide application of these methodologies for gene expression
studies in the mouse nervous system. The methods outlined here provide analytical tools for
those researchers employing these methodologies, as a well as those interested in mining
published TRAP datasets for cell specific and enriched mRNAs. Continued experimental and
analytical developments will enhance the value of the methods and data provided here.
Nonetheless this methodology provides a systematic approach to the expressed genes that
determine the unique properties of specific neural cell types, and to identify candidate genes to
serve as markers and pharmacological targets.
FUNDING
This work was funded by the Howard Hughes Medical Institute, the Adelson Program in
Neural Rehabilitation and Repair, the Simons Foundation, the Conte Center (NIH/NIMH
5P50MH074866 P2), and the Croll Charitable Trust.
ACKNOWLEDGEMENTS
We would like to thank P. Greengard, J.P. Doyle, J.C. Earnhart, S. Gayawali, M. Heiman, S.
Kriaucionis and members of the Geschwind and Heintz Laboratories for discussions and advice,
and R. Shah for blinded scoring of ISH data. We would also like to thank The Rockefeller
University Bioimaging facility, and Genomics Resource Center.
REFERENCES
1.
Doyle, J.P., Dougherty, J.D., Heiman, M., Schmidt, E.F., Stevens, T.R., Ma, G., Bupp, S., Shrestha, P., Shah, R.D., Doughty, M.L. et
al. (2008) Application of a translational profiling approach for the comparative analysis of CNS cell types. Cell, 135, 749-762.
13
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
Heiman, M., Schaefer, A., Gong, S., Peterson, J.D., Day, M., Ramsey, K.E., Suarez-Farinas, M., Schwarz, C., Stephan, D.A.,
Surmeier, D.J. et al. (2008) A translational profiling approach for the molecular characterization of CNS cell types. Cell, 135, 738748.
Yang, X.W., Model, P. and Heintz, N. (1997) Homologous recombination based modification in Escherichia coli and germline
transmission in transgenic mice of a bacterial artificial chromosome. Nat Biotechnol, 15, 859-865.
Sandberg, R., Yasuda, R., Pankratz, D.G., Carter, T.A., Del Rio, J.A., Wodicka, L., Mayford, M., Lockhart, D.J. and Barlow, C.
(2000) Regional and strain-specific gene expression mapping in the adult mouse brain. Proc Natl Acad Sci U S A, 97, 11038-11043.
Geschwind, D.H. (2000) Mice, microarrays, and the genetic diversity of the brain. Proc Natl Acad Sci U S A, 97, 10676-10678.
Ingolia, N.T., Ghaemmaghami, S., Newman, J.R. and Weissman, J.S. (2009) Genome-wide analysis in vivo of translation with
nucleotide resolution using ribosome profiling. Science, 324, 218-223.
Arlotta, P., Molyneaux, B.J., Chen, J., Inoue, J., Kominami, R. and Macklis, J.D. (2005) Neuronal subtype-specific genes that control
corticospinal motor neuron development in vivo. Neuron, 45, 207-221.
Cahoy, J.D., Emery, B., Kaushal, A., Foo, L.C., Zamanian, J.L., Christopherson, K.S., Xing, Y., Lubischer, J.L., Krieg, P.A.,
Krupenko, S.A. et al. (2008) A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for
understanding brain development and function. J Neurosci, 28, 264-278.
Sugino, K., Hempel, C.M., Miller, M.N., Hattox, A.M., Shapiro, P., Wu, C., Huang, Z.J. and Nelson, S.B. (2006) Molecular taxonomy
of major neuronal classes in the adult mouse forebrain. Nat Neurosci, 9, 99-107.
Emmert-Buck, M.R., Bonner, R.F., Smith, P.D., Chuaqui, R.F., Zhuang, Z., Goldstein, S.R., Weiss, R.A. and Liotta, L.A. (1996)
Laser capture microdissection. Science, 274, 998-1001.
Lein, E.S., Hawrylycz, M.J., Ao, N., Ayres, M., Bensinger, A., Bernard, A., Boe, A.F., Boguski, M.S., Brockway, K.S., Byrnes, E.J. et
al. (2007) Genome-wide atlas of gene expression in the adult mouse brain. Nature, 445, 168-176.
Magdaleno, S., Jensen, P., Brumwell, C.L., Seal, A., Lehman, K., Asbury, A., Cheung, T., Cornelius, T., Batten, D.M., Eden, C. et al.
(2006) BGEM: an in situ hybridization database of gene expression in the embryonic and adult mouse nervous system. PLoS Biol, 4,
e86.
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. and Speed, T.P. (2003) Exploration,
normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249-264.
Eccles, J.C., Itåo, M. and Szentâagothai, J. (1967) The cerebellum as a neuronal machine. Springer-Verlag, Berlin, New York [etc.].
Gong, S., Zheng, C., Doughty, M.L., Losos, K., Didkovsky, N., Schambra, U.B., Nowak, N.J., Joyner, A., Leblanc, G., Hatten, M.E.
et al. (2003) A gene expression atlas of the central nervous system based on bacterial artificial chromosomes. Nature, 425, 917-925.
Mugnaini, E. and Floris, A. (1994) The unipolar brush cell: a neglected neuron of the mammalian cerebellar cortex. J Comp Neurol,
339, 174-180.
Nunzi, M.G., Shigemoto, R. and Mugnaini, E. (2002) Differential expression of calretinin and metabotropic glutamate receptor
mGluR1alpha defines subsets of unipolar brush cells in mouse cerebellum. J Comp Neurol, 451, 189-199.
Dai, M., Wang, P., Boyd, A.D., Kostov, G., Athey, B., Jones, E.G., Bunney, W.E., Myers, R.M., Speed, T.P., Akil, H. et al. (2005)
Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res, 33, e175.
Maglott, D., Ostell, J., Pruitt, K.D. and Tatusova, T. (2007) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res, 35,
D26-31.
Wandall, H.H., Hassan, H., Mirgorodskaya, E., Kristensen, A.K., Roepstorff, P., Bennett, E.P., Nielsen, P.A., Hollingsworth, M.A.,
Burchell, J., Taylor-Papadimitriou, J. et al. (1997) Substrate specificities of three members of the human UDP-N-acetyl-alpha-Dgalactosamine:Polypeptide N-acetylgalactosaminyltransferase family, GalNAc-T1, -T2, and -T3. J Biol Chem, 272, 23503-23514.
Li, C. and Hung Wong, W. (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error
application. Genome Biol, 2, RESEARCH0032.
Tusher, V.G., Tibshirani, R. and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc
Natl Acad Sci U S A, 98, 5116-5121.
Sabatti, C., Karsten, S.L. and Geschwind, D.H. (2002) Thresholding rules for recovering a sparse signal from microarray experiments.
Math Biosci, 176, 17-34.
Smyth, G.K. (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat
Appl Genet Mol Biol, 3, Article3.
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J. et al. (2004)
Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, 5, R80.
Johnson, W.E., Li, C. and Rabinovic, A. (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods.
Biostatistics, 8, 118-127.
Cameron, R.S. and Rakic, P. (1991) Glial cell lineage in the cerebral cortex: a review and synthesis. Glia, 4, 124-137.
Perlson, E., Hanz, S., Ben-Yaakov, K., Segal-Ruder, Y., Seger, R. and Fainzilber, M. (2005) Vimentin-dependent spatial translocation
of an activated MAP kinase in injured nerve. Neuron, 45, 715-726.
Kriaucionis, S., Heintz, N. (2009) 5-hydroxymethylcytidine, a novel mammalian nuclear DNA base is present in brain and enriched in
Purkinje neurons. Science, 324, 929-930.
Geschwind, D.H. and Gregg, J.P. (2002) Microarrays for the neurosciences: an essential guide. MIT Press, Cambridge, Mass.
Figure 1. Assessment of IPvTotal plots. a) Scatterplot of immunoprecipitated (Purkinje cells, IP) vs. unbound RNA
(from whole cerebellum, Total) provides a basic measure of experiment quality. RNA for non-Purkinje cell genes
(glial genes, red) are highly enriched in Total RNA, while RNAs determined to be in Purkinje cells (blue,
Supplemental Table 1) are enriched in IP RNA. b) Illustration of the interpretation of IPvTotal plots based on the
locations of positive and negative control genes.
Figure 2. High IP/Total can identify cell-specific genes. a) Examples of in situ patterns from the Allen Brain Atlas
scored as Specific, Expressed, Not Expressed, or Unscorable for brainstem motor neurons (Allen Mouse Brain Atlas
[Internet]. Seattle (WA): Allen Institute for Brain Science. ©2008. Available from: http://mouse.brain-map.org). b)
14
For each of four cell types, fifty from the top five hundred highest IP/Total ratio genes, and fifty random genes, were
scrambled together and scored blindly by three judges trained in the rubric illustrated in a. Genes with high ratio for
each cell type (gray bars) were more likely to be categorized as specific (center panel) and less likely to be
categorized as Not Expressed (right panel) p<.0005, Chi-test, all cell types. Genes with absent or unscorable ISH
patterns (b, left panel) were not included in analysis.
Figure 3. Illustration of algorithm for calculation of specificity index (SI) to identify cell specific and enriched
genes for a single cell type (Purkinje Cells, pink). a) SI is a comparative analysis, thus multiple bacTRAP
experiments are conducted for several classical cell types shown in this illustration from Cajal. b) Data from each
cell type are normalized and filtered to remove background, as illustrated in Supplemental Figure 5, prior to IP/IP
calculation. c) Normalized and filtered Purkinje Cell data are compared to each other cell type (IP/IP). For each
comparison (2..M), probesets are ranked from highest to lowest 'fold change.' SI for each probeset is calculated as
the average rank across all comparisons. d) A p-value is assigned to a given SI value via a permutation testing, as
illustrated in Supplemental Figure 6a. e) A list of genes significantly enriched in Purkinje cells can be selected
based on p-value.
Figure 4. The Specificity Index provides a robust method for identifying cell specific and enriched mRNAs. a)
Specificity index performs better at selecting granule cell-specific mRNAs. Right panels: examples of GENSAT
eGFP expression patterns for two mRNAs with Specificity Index p<10-5, but low IP/Total (<2 fold) show robust
expression in cerebellar granule cells. Left panels: examples of two mRNAs high IP/Total (> 3 fold), but nonsignificant specificity indices show little expression in cerebellum. b) Blind scoring by three judges of Allen Brain
Atlas ISH for fifty random, fifty high IP/Total, and fifty high SI genes, across six cell types, reveals that SI generally
performs better than IP/Total in predicting specific ISH patterns, among those ISH that are scorable c) Plot for
combined Specificity Index p values (blue bars, -log 10 scale) and IP/Total values (red bars, log 2) across all cell
populations for a probeset of Neurod1. Specificity index clearly identifies Neurod1 as a marker for granule cells
(p<10-5). across all cell types. Axis on left shows log scale. Axes on right show corresponding p-values in blue, and
IP/Total ratio in red. Cell types are in same order, and with same abbreviations, as Table 1. d) Posthoc analysis
across all judges and cell types reveals that more than 75% of those genes with SI p values < 10e-5 are scored as
specific, compared to 15% of those with p > 0.1.
Figure 5. Specificity Index concurs well with ISH pattern for most cell types. a) A higher % of scorable ISH
patterns scored are Specific for those genes with SI p<10e-5 for these cell types, relative to randomly selected genes
(Chi Squares from p<0.0004 to p<10e-21). 'n' is the number of p<10e-5 genes for a given cell type. b) For three cell
types, ISH anlaysis did not show significant enrichment in specific genes by specificity index. Left panel: Etv1 data
are known to be contaminated with blood or endothelial cells, reflected here in the large fraction of Not Expressed
scores. Middle panel: Cck data are from a mix of many different cortical neuronal cell types, limiting
interpretability of both TRAP and ISH data. Right Panel: Unipolar Brush cells are difficult to identify by ISH. c)
Confocal immunofluoresence on GENSAT Nmb eGFP line reveals clear GFP expression in both Calb2+ and
Grm1+ unipolar brush cells(17). (Grm1 labels only the brush of these cells. Calb2 labels cytoplasm. Z stacks, not
shown, confirm all GFP+ cells are positive for either Grm1 or Calb2). Nmb was scored as "Not Expressed" by
two of three judges based on ISH alone, suggesting ISH may have limited sensitivity for some cell types.
Figure 6. Outcome of SI analysis depends on composition of dataset. When SI is calculated with only one
oligodendrocyte sample in the dataset (b), more genes arrive at p<10e-5 (n of 80 instead of 40), and a higher fraction
of those genes show a clear specific ISH pattern, compared to SI calculated with 4 oligodendrocytes samples in the
dataset (a)
Figure 7. Specificity index p-values and IP/Totals for a selection of representative mRNAs. a-f) Combined
Specificity Index p values (blue bars, -log 10 scale) and IP/Total values (red bars, log 2) across all cell populations.
a) The acetylcholine transporter, Slc18a3 is significantly specific to all four cholinergic cell populations assessed.
b) The interneuronal marker Dlx1 is translated specifically in the Cort and Pnoc bacTRAP lines. c,d) ubiquitously
expressed genes B-actin(Actb) and Ribosomal Protein L8(Rpl8) are not specific to any cell type, though translation
does vary across cell types. e,f) The Neurexin autism candidate genes Nrxn1, and Nrxn2, have differential patterns
of translation. Nrxn2 is more broadly translated, while Nrxn1 has low to moderate enrichment in cerebellar granule
cells and some cortical neuron types.
Table 1. List of the cell populations, relevant drivers, and abbreviations (used for Figures 4, 5, 7 and Supplemental
Figures 7,8).
15
SUPPLEMENTAL MATERIAL
Considerations for normalization and analysis across cell type specific data
There is a major distinction between microarray experiments of TRAP RNA compared to
whole tissue, unbound or Total RNA, and this distinction has an important impact on the
assumptions regarding normalization: any given cell only translates those mRNAs required for
its functions, and at the levels required by that particular cell type. Thus any given cell will have
a smaller number of detectable RNA species than a whole tissue sample, which consists of an
aggregate of RNAs from cells with a variety of roles. Therefore the distribution of measurable
RNAs between IP and Total samples should be different. This is shown in the histograms of
Supplemental Figure 3a. Total samples show more RNA's with detectable signal, consistent with
the measurement of a more complex population of mRNAs from a mixture of cells. This is an
important consideration because some normalization and analysis methods assume only minimal
differences in the distributions between samples, and may by default filter to remove those
probesets with signal in a small number of samples(26), or force all samples to have identical
distributions(13). This is clearly inappropriate when comparing widely divergent cell types in
which most genes are expected to vary in expression, with many genes being highly enriched in
a certain cell type.
The IPvTotal plot was used to examine the impact of different normalization methodologies.
Proper normalization should minimize IP/Total ratios for negative control genes, and maximize
it for positive controls. Among the most common methods for normalization of Affymetrix data
are the robust multi-array normalization (RMA) and GeneChip RMA (GCRMA), both of which
apply quantile normalization to all data sets using the assumption that all samples should have
the same RNA distribution (13). However, the assumption that any one cell type should express
the same number of mRNAs at similar proportions as any other cell type and/or that the
distribution of the aggregate of many cell types (Total) should be similar to the distribution of a
single cell type is not supported by our data (Supplemental Figure 3a and (1,2)). Consequently,
quantile normalization across IP and Total samples resulted in forcing both samples into an
artificial distribution that represents neither. Thus, RNAs that are present in the Total sample will
have their signals reduced and RNAs that are not present in the TRAP RNA will be artificially
inflated.
The impact of these considerations for specific genes is shown in Supplemental Figure 3b, a
scatterplot of Purkinje cell IP vs. cerebellar Total for all probesets on the array, with quantile
normalization performed either within groups (separately) or across groups (together). On
average, normalization across groups results in a decrease in the ability to detect enriched
messages (IP/Total for positive controls 8.17 separate vs. 6.92 together), and higher signal in
negative controls (0.23 separate vs. 0.37 together). In the case of specific probesets, particularly
those for negative controls with low signal, the difference can be quite dramatic. For example,
those from the Cnp1 gene change from a 0.1 IP/Total to a 0.5 IP/Total when the samples are
normalized together, and the glial genes Mog and Plp, which are not detected in the IP when
samples are normalized separately, appear as if they are present in the IP in Purkinje cells.
Quantile normalization still functions well in removing non-biological variability from
biological replicates (multiple independent TRAP samples from the same line and tissue). Thus
we first GCRMA normalized within replicate samples. Then, to correct for any global biases in
hybridization or scanning conditions, we performed global normalization to the biotinylated
spike in controls provided by Affymetrix across all cell populations.
16
Background: sources and removal
Following normalization, for each cell population we plotted IPvTotal and displayed positive
and negative controls to make initial judgments regarding the quality of a particular TRAP
dataset. In Figure 1a, it is apparent there are some glial RNAs with detectable signal in the IP,
though they are enriched eight to ten fold in the Total RNA (red genes, 0.12 average IP/Total).
There are three possible explanations, which are not mutually exclusive.
First, it is possible that neurons are translating a very low level of glial genes. For example,
it is known that Vimentin, expressed highly in glial progenitors(27), can be translated in adult
neurons following injury(28). Second, in some cases there may be low levels of eGFP-L10a
transgene expression in another cell type. For example, anatomical analysis (Supplemental
Figure 9c) demonstrates that low levels of transgene expression in ‘non-targeted’ cell types can
contribute signal to the TRAP microarray from the Lypd6 JP48 line. Though rare, careful driver
selection can avoid this complexity. It is also possible to exclude data from these contaminating
cell types in many cases posthoc by comparative analysis (1). Finally, as TRAP is an affinity
purification method, there may be a small amount of RNA binding to the affinity purification
reagents that is not derived from the labeled cells. To test this possibility, we performed TRAP
on a wildtype brain and determined that the affinity purification reagents can bind a very small
amount of RNA (Supplemental Figure 9a) in a manner proportional to the concentration of the
lysates (Supplemental 9b). For most cell populations, this background represents a small
fraction of the TRAP yield. However, in TRAP experiments with exceptionally low yield (<10
ngs), non-specific background can become problematic. Consistent with this, Supplemental
Figure 2 shows increasing relative levels of negative control probesets as yield decreases for
examples of experiments with good (Pcp2), low-moderate (Cmtm5), and very low yield (Cort).
Since the low yield IPs contain a larger proportion of non-specific background RNA that
comes from unlabeled cell types in the tissue, it is more difficult in these samples to make the
distinction between non-specific background and broadly translated messages. In spite of this
difficulty, even low yield samples (eg Cort), which have a substantial contribution from nonspecific background, still show remarkable enrichment of the positive control (Cort)
(Supplemental Figure 2b). Thus, these experiments also provide valid information (see also
Figure 4b), although not of the same quality as those with minimal background.
We quantified this level of non-specific background as the average IP/Total ratio of those
negative control genes that have measurable signal. Thus, from the examples in Supplemental
Figure 2, Cort has an average non-specific background of 1.1, while Cmtm5 has .48, and Pcp2
has .05. We then tested if the background could be removed with a relatively simple filter using
this measure. We excluded those probesets falling below this average non-specific background,
plus two standard deviations. Assuming a linear contribution of non specific background to
TRAP signal, and a normal distribution of background signal intensities, theoretically this should
remove the vast majority (96%) of those probesets that derive signal uniquely from background
RNA. This threshold is shown as the red lines on Supplemental Figure 2. Filtering to remove
these probesets prior to further analysis has the added advantage of reducing the number of
probesets tested, thus reducing the requisite number of multiple testing corrections for
downstream statistical analyses.
To determine if this filter is effective, we examined comparisons of two cell types from
different tissues (Supplemental Figure 5b), or with differential levels of non-specific background
contamination (Supplemental Figure 5a). Comparing cerebellar Purkinje cell IP data to Drd1+
medium spiny neuron IP data, without accounting for background, results in the apparent
expression of the cerebellar granule cell-specific gene Neurod1 in Purkinje cells (Supplemental
Figure 5b, left panel). Simple filtering prior to the IPvIP comparison successfully removed this
false positive result (Supplemental Figure 5b, right panel). Thus, regardless of the source of the
17
non-specific background, simple filters based on negative controls can be used as a generic
method to remove most probesets deriving from non-specific background.
As previously reported, there are also a group of mRNAs that apparently specifically bind the
affinity reagents even in the absence of eGFP-L10a protein(1). These probesets have extremely
high IP/Total ratios in every IP, including those from control, non-transgenic mouse brains.
These may represent specific interactions between anti-eGFP antibodies or protein G beads and
nascent peptides on the ribosomes. They were removed from subsequent analysis.
IPvTotal may indicate rarity of a cell type
The magnitude of IP/Total for positive controls can be used as a crude measure of the
contribution of the targeted cell type’s mRNAs to the total mRNA pool in the tissue of interest.
Supplemental Figure 4 shows IPvTotal plots for a less frequent (a), and extremely common cell
type (b), with similar levels of non-specific background (red line). One can see that the ratio of
the IP/Total for the driver gene (blue) increases with the rarity of the cell. Logically, if a cell
contributes 5% of the RNA in the total tissue, then the cell-specific genes should be 20 fold
enriched. From this, we can estimate that Purkinje cells, with an average enrichment of 8 fold
for their positive control genes (Figure 1a), contribute 12% of the RNA in the cerebellum. While
this is a disproportionately high amount relative to their numbers (0.3-0.4% of cerebellum, (29))
this number is not unreasonable given their relatively large cytoplasmic compartments (estimated
at 40x the volume of most common cerebellar cells). In addition to magnitude of ratio, there is a
broad difference in the number of RNAs with high IP/Total values across the samples in
Supplemental Figure 4a and 4b. In general, the number of RNAs that are differentially expressed
in this analysis correlates with the distinctive properties of that cell type relative to their tissue.
Of course, the exact magnitude of these fold changes can depend on the level of background
signal in the IP, thus currently these rules serve as useful heuristics rather than precise measures.
Anatomical Considerations
Careful characterization of transgene expression is essential to the interpretation of the
TRAP data. We typically characterize the eGFP levels both with and without antibody staining.
Those mouse lines with more robust expression have better yield and hence lower non-specific
background. If there is visible eGFP without antibody in mouse brain sections, yields will
generally be sufficient for microarray experiments. However, it is important to detect the
presence of trace labeling in additional populations using anti-eGFP antibodies, as some signal
from these populations would be detectable in the microarray data (Supplemental Figure 9c).
Most mouse lines will express in multiple cell populations. Normally, these populations are
present in distinct structures, and can thus be separated by careful dissection. Otherwise,
microarray data from mixed populations can also be approached post hoc: for example a
Bergman glial IP can be compared to a mixed Bergman glial/Unipolar Brush cell IP to identify
Unipolar Brush cell specific genes (1).
Finally, it is important to consider if the experimental manipulation will impact the
expression of the transgene itself. If so, this could have a dramatic impact on microarray results,
particularly if the manipulation induces a dramatic change in the populations expressing the
transgene, or the level of the transgene expression. This will need to be considered in the
interpretation of the data.
Recommendations for design of TRAP experiments
18
Supplemental Figure 10 provides an example for good TRAP study design. For TRAP,
standard good practices for microarray experimental design, execution, and analysis should be
followed (30). Among these, it is particularly important to include careful checks of RNA
quality and quantity before amplification. We recommend fluorometric measures for
quantification, such at the Ribogreen assay, when measuring RNA concentrations of less than
fifty nanogram per microliter, as well as Agilent Bioanalyzer assays to determine RNA integrity.
Also, when amplifying RNA, it is important to start with the same amount of RNA from each
sample, and use identical protocols. It is absolutely essential that experiment and control
samples should be collected and amplified simultaneously or in balanced pairs, to control for
non-specific amplification biases and batch effects, which afflict all microarray experiments
(30). This is especially important when investigating more subtle manipulations such as drug
treatments or the impact of knockouts on specific cell types. Finally, it is frequently advisable
to pool tissue from multiple animals for each condition to increase yield in the case of small
structures, as well as to help average out minor variations in dissection or treatment from animal
to animal. We conduct at least three replicate affinity purifications, per experimental condition,
and typically pool from three to six animals per replicate. However, it is clear the amount of
background is dependant of the concentration of tissue homogenized (Supplemental Figure 9b).
Maintaining an approximate 100mg/ml (or less) ratio of tissue to homogenization buffer, is
recommended when pooling tissue to reduce non-specific background.
Future improvements of the TRAP methodology may eliminate the need to collect a
Total measure and subtract non-specific background, and several strategies are actively being
pursued to allow this. Currently, low level transgene expression in alternate cell types can be
controlled by selecting more specific drivers. Weak drivers can be replaced with stronger ones.
Often TRAP data with high background is mined to select stronger drivers yielding lines
targeting the same cell type, but with better yield and lower background, such as replacing the
Cmtm5 line with the Cnp1 line(1) for mature oligodendrocytes.
Supplemental Figure 1. Illustration of the TRAP method. BAC transgenic mice are generated to target the
expression of a fusion of eGFP and a ribosomal protein (L10a) to a specific population of cells in the mouse brain
(shown here are motor neurons, targeted using a BAC containing the motor neuron specific Choline Acetyl
Transferase gene). To isolate cell specific translational profiles, the entire tissue is homogenized, treated with
detergents to solubulize the endoplasmic reticulum, and centrifuged to prepare a crude homogenate containing a mix
of eGFP tagged and untagged polysomes. Importantly the tagged polysomes come uniquely from the motor
neurons. Tagged polysomes, and associated mRNA, are then purified from homogenate using antibodies against
eGFP, which are bound to protein G coated magnetic beads. Both purified and unpurified RNA are them amplified
and hybridized to microarrays to profile the mRNA populations.
Supplemental Figure 2. Assessment of IPvTotal plots. a) A ratio threshold (red line) can be set between nonspecific background and broadly translated genes, based on the values of the negative control genes (red, glial genes
for neurons, and neuronal genes for glia). High yielding lines (Pcp2) generally have low background, while low
yielding lines, (Cmtm5), have a correspondingly higher background. b) Compared to higher yielding lings, with
very low yielding line (Cort), broadly translated RNA’s can not be easily distinguished from non-specific
background, though RNAs representing driver genes (blue) are consistently enriched. Black lines, all plots, 0.5, 1, 2
IP/Total ratio lines.
Supplemental Figure 3. Improper normalization can create false positive signals. a) Average histogram of probeset
signal intensities for all IP samples (dotted lines) compared to all Total samples (solid lines) reveals differences
between distributions. In particular, IP samples have more undetectable probesets (first bin, red arrow), consistent
with RNA purified from discrete cell populations compared to RNA from a mix of cell types. b) As illustrated with
IPvTotal plots for Purkinje cells, quantile normalization (forcing identical distributions) of IP and Total samples
together (right panel) produces artificial signal in negative controls (red genes, in yellow circle, shifted right). Black
line: 1 fold. Red line, line of best fit through negative controls. Blue line, line of best fit through positive controls.
19
Supplemental Figure 4. IPvTotal is dependant on the composition of the Total. a) IPvTotal for astroglial sample,
under control of the driver Aldh1L1 (blue), shows the enrichment of many genes, as illustrated by number of
probesets falling above the two fold line, suggesting glia contribute relatively a small fraction of the RNA pool of
the whole cerebellum. b) IPvTotal for an extremely common cell type, the granule cell of the cerebellum, fails to
show enrichment of granule cell-specific driver Neurod1, in spite of low background (red line), suggesting granule
cells contribute a significant fraction of whole cerebellar RNA. c) A scatterplot of an IPvIP comparison of the
Neurod1 IP to Pcp2 IP reveals clear enrichment of the Neurod1 probeset in the Neurod1 IP demonstrating the
Neuod1 IP is enriched in granule cell RNA. Black lines, 0.5, 1, 2 fold. Red genes, cell-specific negative controls, as
Figure 1
Supplemental Figure 5. Simple thresholds improve IPvIP comparisons. a) An illustration of how to filter data
using simple thresholds for background and low expressed genes. Background threshold was set at mean plus two
standard deviations of the detectable negative control genes. Probesets with expression below 50 were also
removed. b) IPvIP comparison of cerebellar Purkinje cells (y-axis) to cerebellar Granule cells (x-axis), which have
slightly different levels of background, have corresponding ‘differential’ expression of glial mRNAs (red genes, left
panel), which can be removed by applying simple thresholds (right panel). b) Likewise an IPvIP of cerebellar
Purkinje cells to Drd1a medium spiney neurons, which have background from different tissues, have a
corresponding ‘differential’ expression of background mRNAs from other tissue specific cell types (red, Drd2 from
the striatal Drd2+ medium spiny neuron, and Neurod1 from the cerebellar granule cell) which can be removed with
simple thresholds (right panel).
Supplemental Figure 6. a) Illustration of method for determining p-values for specificity index for one cell type.
b) Illustration of analytical flow to identify cell specific and enriched genes for all cell types.
Supplemental Figure 7. Updated chip definitions improve accuracy and interpretability of TRAP experiments. a)
IP/Total (log2, red) and specificity p-value (-log 10, blue), for all cell types (y-axis), for the probeset representing
the known oligodendrocyte gene MBP as measured using custom chip definition files (cdf) which remove
misaligned probes(18). b) Four examples of probesets for MBP using Affymetrix cdf files. Oligodendrocyte
populations marked with *. c) Probeset for Purkinje cell-specific gene, PCP2, using custom cdf. d) Four of the
probesets for PCP2 using Affymetrix cdf files.
Supplemental Figure 8. Dramatic differential translation of the GalNT gene family, suggest cellular specialization
of Golgi apparti. a-f) Combined Specificity Index p values (blue bars, -log 10 scale) and IP/Total values (red bars,
log 2) across all cell populations for a selection of the UDP-N-acetyl-alpha-D-galactosamine:polypeptide Nacetylgalactosaminyltransferase golgi protein family. a) Galnt3 shows specific translation in oligodendrocyte
progenitors. b) Galnt4 shows translation in astrocytic cell types. c) Galnt6 shows specific translation in mature
oligodendrocytes. d)Galnt14 shows specific translation in Corticospinal/Corticpontine neurons. e) GalntL2 shows
specific translation in granule cells of cerebellum.
Supplemental Figure 9. Two potential sources of non-specific background a) Representative Picochip capillary
electrophoretic traces from Agilent Bioanalyzer for RNA from a TRAP experiment on two cerebellums from
Bergman glial (Sept4) bacTRAP mice (left panel) or wild type mice (center panel), suggest a small amount of RNA
may derive from non-specific interactions of unlabeled RNA with affinity purification reagents Arrows: 18 and 28s
Ribosomal RNA peaks. IPvTotal plot (right panel) shows low level of signal in known negative control genes
(neuronal genes, red). Driver genes known to be highly expressed in Bergman glia (Sept4, Aldh1L1, blue) show
strong enrichment, while drivers for other cerebellar cell types (Neurod1, Pcp2, Lypd6, blue) show IP/Total ratios
similar to negative controls. Black Lines: 0.5, 1, 2 fold lines. Red Line: Average IP/Total ratio of negative controls.
Green Line: background IP/Total ratio level suggested by non specific yield (4.5ng) divided by TRAP yield. b)
Amount of non-specific RNA binding to affinity purification reagents depends on amount of tissue. Various
amounts of brain tissue from wild type mice were homogenized in a consistent volume of homogenization buffer,
and TRAP methodology was carried forward. Increasing amount of tissue increases non-specific background. A
1:10 w/v ratio, or less, is recommended to minimize this. c) Confocal immunofluorescence for eGFP in
Stellate/Basket neuronal (Lypd6) bacTRAP line shows low level transgene expression in additional cell types. Top
left, DAPI nuclear counterstain delineates layers of cerebellum (WM, white matter, GCL, granule cell layer, PCL,
Purkinje cell layer, ML, molecular layer). Stellate and Basket cells of molecular layer clearly contain eGFP-L10a
(top center, top right). The same eGFP image shown in range scale (blue pixels: no signal, red pixels: saturated)
with excessive gain (bottom center), or normal gain (bottom left) shows trace eGFP-L10a in white matter glia (red
arrow). IPvTotal (bottom right) shows clear enrichment of driver (Lypd6, blue), but moderate levels of signal from
negative control genes (glial genes, red) or drivers for other cell types showing trace expression (Sept4, blue). Note
20
that granule cell driver, neurod1 (green), has low signal, consistent with lack of expression in granule cells. Green
line: average IP/Total ratio of neurod1 probesets. Red line: average IP/Total ratio of glial genes.
Supplemental Figure 10. Recommendations and examples for TRAP experimental design.
Supplemental Table 1. Positive and negative Controls. a) List of genes scored as specific to the Purkinje cell layer
in the cerebellum by three independent reviewers, based on online ISH atlases(11,12). b) List of known markers
for glial cell types, which may be used as negative control genes for neuronal samples. c) List of markers for
neurons (neurofilaments and synaptic proteins), which may be used as negative controls for glial samples.
Table 1
Cell populations
Drd1+ medium spiney neurons of neostriatum
Drd2+ medium spiney neurons of neostriatum
Cholinergic Interneurons of corpus striatum
Motor neurons of brain stem
Cholinergic neurons of basal forebrain
Mature oligodendrocytes of cerebellum
Astroglia of cerebellum
Golgi neurons of cerebellum
Unipolar brush cells and Bergman glia of cerebellum
Stellate and basket cells of cerebellum
Granule cells of cerebellum
Oligodendroglia of cerebellum
Purkinje cells of cerebellum
Bergman glia and mature oligos. of cerebellum
Cck+ neurons of cortex
Mature oligodendrocytes of cortex
Cort+ interneurons of cortex
Astrocytes of cortex
Corticospinal, corticopontine neurons
Corticothalamic neurons
Oligodendroglia of cortex
Pnoc+ neurons of cortex
Motor neurons of the spinal cord
Driver
Drd1
Drd2
Chat
Chat
Chat
Cmtm5
Aldh1l1
Grm2
Grp
Lypd6
Neurod1
Olig2
Pcp2
Sept4
Cck
Cmtm5
Cort
Aldh1l1
Glt25d2
Ntsr1
Olig2
Pnoc
Chat
Abreviations used*
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
Figure 1
a
Cerebellum Total
Cerebellum Total
b
Purkinje Cell IP (Pcp2)
Purkinje Cell IP (Pcp2)
20%
0%
0%
Oligo
dendrocytes
40%
80%
Layer V
Cortical
Neurons
60%
% Not Expressed
20%
100%
Purkinje Cells
40%
High IP/Total
Random
Motor
Neurons
80%
Oligo
dendrocytes
80%
Layer V
Cortical
Neurons
100%
Purkinje Cells
60%
% Specific
100%
Motor
Neurons
Absent
Unscorable
Scorable
Figure 2
a
b
High IP/Total
Random
60%
40%
20%
0%
Figure 3
a
c
b
d
e
c
% of scored ISH
Crym
Neurod1
1426412_at
Neurod1
0%
< 10-5
.C
2
Random
Specific
10-3 to
10-5
ha
t
C
tx
.C
m
tm
5
C
tx
.C
or
t
C
tx
.G
lt2
5d
2
BS
b
cp
d1
High Specificity Index
C
b.
P
eu
ro
En2
b.
N
Gng4
% Specific ISH
High IPvTotal
C
a
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
Figure 4
100%
IPvTotal
Expressed
10-2 to
10-3
Specificity Index
80%
60%
40%
20%
0%
d
100%
Not Expressed
90%
80%
70%
60%
50%
40%
30%
20%
10%
10-2 to
10-1
Specificity Index p-value
>.1
Figure 5
a
b
c
Figure 6
a
b
c
Rpl8
Entrez ID: 26961
Ribosomal protein L8
e
Nrxn2
Entrez ID: 18190
Neurexin II
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
Slc18a3
Entrez ID: 20508
Solute carrier family 18,
member 3
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
a
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chatv
Figure 7
b
d
f
Dlx1
Entrez ID: 13390
Distal-less homeobox 1
Actb
Entrez ID: 11461
Actin, beta
Nrxn1
Entrez ID: 18189
Neurexin I
Supplemental Table 1
A
Postive controls
for Purkinje
cells
A930006D11
3110001A13Rik
4933428A15Rik
4933432P15Rik
A730030A06
Adprt1
Bcl11a
Capn10
Cck
Dgkz
Eprs
Grik1
Gtf2f2
Hsp105
Kcnab1
Letm1
Lhx5
Ndufs3
Nef3
Pcp2
Sec61a1
Zdhhc14
B
Negative
controls for
Neurons
Mbp
Aldh1l1
Cspg4
Galc
Glul
Mag
Mobp
Mog
Olig2
Plp1
C
Negative
controls for
Glia
Snap25
Cplx1
Nefh
Nefl
Nefm
Supplemental Figure 1
Generate and Characterize
BAC transgenic mice
Polysomes from
All other cells
CA
P
AAAAAAA
CA
P
Polysomes from
Motor Neurons
Dissect and Homogenize
Tissue of Interest
will stick to affinity matrix
will not stick to affinity matrix
Harvest enriched mRNA (IP)
Hybridize
Microarrays
Harvest whole tissue mRNA (Total)
Supplemental Figure 2
Cortex Total
Cerebellum Total
a
Purkinje Cell IP (Pcp2)
Oligodendrocyte IP (Cmtm5)
Cortex Total
Cortex Total
b
Cort Cell IP (Cort)
Cort Cell IP (Cort)
Supplemental Figure 3
a
Number of Probesets
All IPs
All Totals
Expression Level
Cerebellum Total
Cerebellum Total
b
Purkinje Cell IP
Purkinje Cell IP
Supplemental Figure 4
Cerebellum Total
a
Astrocytes IP
Cerebellum Total
b
Granule Cell IP
Purkinje Cell IP
c
Granule Cell IP
Supplemental Figure 5
a
Purkinje Cell IP (Pcp2)
Purkinje Cell IP (Pcp2)
Pre-Filter
Granule Cell IP
Post-Filter
Purkinje Cell IP
Purkinje Cell IP
b
Using filtered
list for any IP
versus IP
comparisons
removes
majority of
background
Cerebellum Total
Cerebellum Total
Establish
background
threshold
based on
IP/Total of
negative
control
probesets
Purkinje Cell IP
Remove probesets below
threshold, and low expressed
For each cell compare
IP to Total
Granule Cell IP
Pre-Filter
Post-Filter
Purkinje Cell IP
Purkinje Cell IP
c
Granule Cell IP
Drd1a MSN IP
Drd1a MSN IP
Supplemental Figure 6
a
b
c
*
*
*
Pcp2
Entrez ID: 18545
Purkinje cell protein 2(L7)
*
d
* *
*
1419084_a_at
Pcp2
1453207_at
Pcp2
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
**
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
*
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
** * ** **
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
a
Mbp
Entrez ID: 17196
Myelin basic protein
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
Supplemental Figure 7
1451961_a_at
Mbp
1425264_s_at
Mbp
b
*
1425264_s_at
Mbp
*
* *
* *
*
1425263_a_at
Mbp
*
1424944_at
Pcp2
1429583_at
Pcp2
c
e
Galnt6
Entrez ID: 207839
UDP-N-acetyl-alpha-D-galactosamine:
polypeptide N-acetylgalactosaminyltransferase 6
Galntl2
Entrez ID: 78754
UDP-N-acetyl-alpha-D-galactosamine:
polypeptide N-acetylgalactosaminyltransferase-like 2
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
UDP-N-acetyl-alpha-D-galactosamine:
polypeptide N-acetylgalactosaminyltransferase 3
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
a
Galnt3
Entrez ID: 14425
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
CS.Drd1
CS.Drd2
CS.Chat
BS.Chat
BF.Chat
Cb.Cmtm5
Cb.Aldh1L1
Cb.Grm2
Cb.Grp
Cb.Lypd6
Cb.Neurod1
Cb.Olig2
Cb.Pcp2
Cb.Sept4
Ctx.Cck
Ctx.Cmtm5
Ctx.Cort
Ctx.AldhL1
Ctx.Glt25d2
Ctx.Ntsr1
Ctx.Olig2
Ctx.Pnoc
SC.Chat
Supplemental Figure 8
b
d
f
Galnt4
Entrez ID: 14426
UDP-N-acetyl-alpha-D-galactosamine:
polypeptide N-acetylgalactosaminyltransferase 4
Galnt14
Entrez ID: 71685
UDP-N-acetyl-alpha-D-galactosamine:
polypeptide N-acetylgalactosaminyltransferase 14
Galnt2
Entrez ID: 108148
UDP-N-acetyl-alpha-D-galactosamine:
polypeptide N-acetylgalactosaminyltransferase 2
Supplemental Figure 9
a
4.4 ngs
Cerebellum Total
74.1 ngs
Bergman Glial IP
b
ngs RNA from
wildtype brain tissue
14
12
10
8
6
4
2
0
125 mg/ml
275 mg/ml
525 mg/ml
Tissue input per ml homogenization buffer
c
Dapi
Egfp
Overlay
ML
PCL
GCL
WM
Excess
Gain
Cerebellum Total
Normal
Gain
Lypd6 IP
Supplemental Figure 10
Principles
Design
1) Define question
2) Select line appropriate for cell
type
3) Plan to balance conditions
across batches
Anatomy
1) Confirm transgene expression
2) Does manipulation alter
expression?
Immunoprecipitation and
Microarrays
1) Harvest paired conditions
together in one batch and collect
total polysome sample
2) Quantify RNA carefully, and start
with identical amounts of all
samples for amplification
3) Amplify and hybridize all batches
together, if feasible
Analysis
1) GCRMA normalize together only
those samples that should have the
same distribution. Global normalize
subsequently to biotinylated spike
ins.
2) Compare IPs to Total. Calculate
a background threshold using the
IP/Total ratio for negative controls.
Remove from further analysis those
probesets with IP/Total below this
threshold.
3) Conduct statistical analysis on
remaining probesets.
Example
Design: MECP2 in glia
1) What is the impact on MECP knockout on cortical astrocytes in
vivo?
2) The previously generated Aldh1L1 JD130 line is expressed
exclusively in astrocytes [1]. Cross MECP2 KO with Aldh1L1 line
to generate breeders.
3) Plan for three batches. Each batch is three bacTRAP/MECP
null mice and three litermate bacTRAP only controls.
Anatomy: MECP2 in glia
1) Aldh1L1 bacTRAP line was previously and thoroughly
characterized [1]. Skip this step.
2) In first litter of bacTRAP/MECP null mice, confirm transgene is
still expressed uniquely in astrocytes, and at comparable levels to
littermate controls.
IP and Microarrays: MECP2 in glia
1) Day 1(3-4 hours): Harvest cortices from three MECP2
null/bacTRAP mice and 3 bacTRAP littermate controls. Pool
MECP2 null and control tissue separately. Homogenize and
prepare polysomes. Prior to immunoaffinity step, set aside 20 uls
for Total polysome sample.
Complete immunoaffinity purification, and RNA extraction of Total
and IP’d RNA until isopropanol precipitation. Store RNA at -80.
Day 2:Repeat day 1 with second batch of 3 MECP2 null and 3
controls.
Day 3:Repeat day 1 with third batch of 3 MECP2 null and 3
controls.
2) Day 4 (2 hours): Complete purification all three batches of
frozen RNA. Quantify with Ribogreen assay. Assure that RNA
integrity is above 8 for all samples with Bioanalyzer assay.
3) Day 4-6: From each sample, take 20 ngs of RNA, and begin
Affymetrix two cycle amplification for all twelve samples. Carry
through amplification and hybridization of all samples together.
Analysis: MECP2 in glia
1) GCRMA all Total samples together. GCRMA together all IP’d
samples (from MECP2 null and controls). Normalize all
samples(total and IP) together to spike in controls
2) Calculate IP/Total for a list (Supplemental Table 2) of nonastrocyte probesets (ie neuron specific genes). Remove all
probesets below the Mean + 2 S.D. of the ratios on this list from
further analysis for astrocytes.
3) Use the Limma module of Bioconductor to detect those genes
that change significantly between MECP2 null IP and control IP.
These represent the astrocyte’s response to the knock out.
Genes that change significantly between the MECP2 null Total
and control Total samples will represent the response of the other
cells in the tissue. These can be compared listwise or statistically
to determine the astrocyte specific response.