Download - Biological Psychiatry

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy wikipedia, lookup

Gene therapy of the human retina wikipedia, lookup

Genome editing wikipedia, lookup

Artificial gene synthesis wikipedia, lookup

The “Chip” as a Specific Genetic Tool
Stanley J. Watson, Fan Meng, Robert C. Thompson, and Huda Akil
DNA microarrays are powerful tools for the analysis of
the organization and regulation of the brain, in both
illness and health. Such messenger RNA expression methods are outgrowths of a marriage between the several
genome sequencing projects and a wide variety of physical, chemical, optical, and electronic systems. The advantages of microarray analyses include the ability to study
the regulation of several genes or even the entire genome
in a single experiment. However, there are substantive
issues associated with the use of these tools that need to be
considered before drawing conclusions about the genomic
regulation of the brain. These issues include the loss of
most anatomic (i.e., cellular and circuit) specificity, only
fair sensitivity, lack of absolute quantitative data, poor
comparability between studies, and high variability in
sample values, to mention the most obvious. In this review
we point to some of the solutions proposed for these
problems and novel techniques and approaches for newer
methods. Among these are methods for making arrays
more sensitive, including nonarray messenger RNA expression systems. The future of this field and its links to
deeper protein and cell biology are both emphasized.
Biol Psychiatry 2000;48:1147–1156 © 2000 Society of
Biological Psychiatry
Key Words: Microarray, gene expression, genomics,
equipment, data analysis, brain
Current Status of Microarray Systems
uch of the progress in treating mental illness over
the last half century has come from classic pharmacologic approaches. In the last quarter century the speed of
knowledge acquisition in the neurobiology of the brain has
radically increased as we have begun to see the nature of
those early treatments in the light of neural circuits,
cellular pathways, and gene regulation. The last decade
has placed us on the cusp of the first full version of the
human genome, mainly from the Human Genome Project
and most recently from several major industrial efforts.
Finally, it has become apparent that the major mental
illnesses are in fact complex genetic diseases involving
many genes, and most likely many circuits in the brain. It
From the Department of Psychiatry and The Mental Health Research Institute,
University of Michigan, Ann Arbor, Michigan.
Address reprint requests to Stanley J. Watson, M.D., Ph.D., The University of
Michigan, Mental Health Research Institute, 205 Zina Pitcher Place, Ann Arbor
MI 48109-0720.
Received March 10, 2000; revised September 28, 2000; accepted October 2, 2000.
© 2000 Society of Biological Psychiatry
is also quite likely that our classic treatments are not
targeted at most of the genes actually responsible for these
illnesses but rather are the results of fortuitous medications
that impact on symptoms and only indirectly modify the
disease processes themselves. Thus, as a field we are faced
with the need for tools to address this enormous complexity at the neural and genomic levels. In this review we
describe one such approach that holds promise for increasing our understanding of the biology of these disorders and
thus our likelihood of designing better treatments. This
approach involves the use of DNA microarrays for studying messenger RNA (mRNA) expression in the brain. It is
likely to be one of the best tools for the study of the human
genome in its full depth in mental illness. Once it is refined
it may well offer significant clues not only about the initial
gene defects that may lead to vulnerability to these
disorders, but also to the response of the brain to these
illnesses, which in turn could increase the likelihood of
future relapse. Optimally, these insights into the molecular
basis of the disease process will lead us to design therapies
that can treat the disorders and prevent the “scarring” that
results from the expression of the disease process. Here we
hope to accomplish the following three goals: 1) provide a
sense of the actual methods used to date and the types of
data they generate, 2) describe the problems currently
associated with microarrays, and 3) point to technical
advances and novel strategies that may be applicable in the
near future in this fast-paced academic and industrial
For years researchers have attempted to develop strategies that address true biological complexity. One such
strategy, microarrays, has emerged that permits the analysis of gene expression patterns in which relatively simple
questions are asked. For example, following a given
stimulus, what RNAs (used as an index of gene activation/
inactivation) are increased in one biological sample relative to another? A hypothesis behind many of these
approaches states that RNAs more abundant (or less
abundant) in one biological sample than another are
descriptive of the phenotypic differences between the two
biological samples. Such microarray strategies permit
studies that go beyond those where individual genes and
their mRNA expression are evaluated one at a time. They
enable the examination of thousands of genes simultaneously and the identification of patterns or profiles of
gene expression that may be the hallmark of the specific
phenotype under study.
PII S0006-3223(00)01080-5
Early efforts to evaluate complex gene expression
patterns were led by groups performing large scale DNA
sequencing/cloning, differential hybridization methodologies (e.g., subtractive hybridization [Ermolaeva and Sverdlov 1996]), polymerase chain reaction (PCR)– based
methods (e.g., differential display PCR [Liang and Pardee
1992]), or serial amplification of gene expression (SAGE;
Velculescu et al 1995). Refinements in these methods have
continued to develop and have proven to be very productive. However, these procedures are multistep processes,
which makes them challenging for most laboratories to
initiate. They are prone to false positives and can be
expensive and/or tedious and time consuming. In part
related to these challenges, alternative gene expression
analysis strategies have emerged. Some of these methods
are solution based and often require one to predefine
which genes are evaluated. One example of such a
solution-based method is the TaqMan assay (Livak et al
1995), in which many gene expression patterns can be
quantitatively evaluated in a high throughput fashion. To
execute this assay, an investigator must predefine which
genes are to be monitored and prepare materials specific to
these genes (e.g., specific oligonucleotides for each gene
to be assayed). Thus, though this assay is extraordinarily
sensitive and quantitative, it is quite expensive (due to the
required synthesis of gene-specific oligonucleotides) and
is limited by the investigator-defined list of genes evaluated in a given biological sample.
By comparison, arrays (micro- or macro-) rely upon a
simple strategy whereby the genes of interest are deposited
or synthesized upon a matrix (glass, silicon, or nylon
membrane). The genes applied to the matrix can be double
stranded or single stranded DNA that is either synthesized
or derived from tissue extracts (e.g., using PCR). The
experimenter knows exactly which specific gene (or fragment thereof) is present in each location on the matrix.
This matrix can then be used to compare or contrast the
relative quantity of mRNA found in two biological samples (Watson and Akil 1999). In this method, the DNA
first applied to the matrix represents the known probes.
The material derived from the tissue represents the “target” of the assay and contains an unknown mixture of
complementary DNAs (cDNAs) depending on the pattern
of gene expression in that tissue at the time of extraction.
The goal is to use the probes on the matrix to quantitate the
specific genes expressed in the target material. Some of
the array approaches (largely microarrays on glass or
silicon) utilize fluorescent-tagged first-strand cDNA (derived from tissue RNA) to determine the relative amounts
of gene expression in a biological sample. In these
situations a two-color scheme is utilized whereby the
control and experimental samples are tagged uniquely
with a specific fluor (e.g., control tagged with Cy3 [green]
S.J. Watson et al
and experimental tagged with Cy5 [red]). These tagged
cDNAs can then be mixed and allowed to hybridize
(interact) with a glass slide-based matrix containing the
genes (DNAs) of interest. The power of this approach
arises from the ability to purchase or produce arrays
with hundreds to thousands of genes (DNAs) on them.
With the development of array strategies, it is possible
to predefine what genes are evaluated in each experiment, as gene expression comparisons are based on
quantitation with DNA probes of known identity (although not necessarily known function). This approach
combines the advantages of broad-based strategies that
examine general patterns of change in gene expression
with the advantages of very targeted techniques that use
specific probes for single genes. Thus, array gene
expression analyses permit the broad evaluation of gene
expression patterns of thousands of defined genes in
single experiments.
Multiple examples in the past several years have highlighted the advantages of broadly evaluating gene expression patterns. A number of investigators have utilized
microarray approaches to investigate alterations in gene
expression in several models including yeast, T cells,
tumor cell lines, and fibroblasts (Amundson et al 1999;
DeRisi et al 1996; Heller et al 1997; Iyer et al 1999;
Jelinsky and Samson 1999; Khan et al 1998, 1999;
Lashkari et al 1997; Perou et al 1999; Schena et al 1995,
1996; Spellman et al 1998; Wang et al 1999a, 1999b;
Welford et al 1998; Whitney et al 1999; Wilson et al 1999;
Wolfsberg et al 1999). What these studies share is that
they reveal the dramatic orchestration of cellular responses
to given stimuli. A large number of genes appear to
respond in a coordinated manner to particular signals,
revealing previously unsuspected connections between
cellular pathways and implicating novel genes in certain
functions. In this way, investigators can gain insight into
those gene activation or inactivation processes that occur
concurrently. The mechanistic relationship between the
activation or inactivation of a given set of genes may
suggest a coordinated response at the cellular or system
level following specific stimuli. Using temporal relationships between activation/inactivation gene expression patterns, investigators can attempt to define primary response
genes from secondary response genes. Although this still
requires rigorous testing, functions and interdependency of
gene expression patterns and/or alterations can be determined. The use of microarray analysis of gene expression
is now applied to many additional models and systems.
These additional models/systems include Drosophila (development [White et al 1999]), skeletal muscle (aging/
caloric restriction [Lee et al 1999]), isolated neurons
(distinctive phenotypes/functions [Luo et al 1999]), and
many others. Through this type of work, few fields of
The “Chip” as a Specific Genetic Tool
Table 1. Microarray Equipment and Complementary DNA (cDNA) Clone Production Costs, and
Laboratory Tasks
Clone production
Liquid handling (96-well pipettes, cherry picking)
PCR cycler (96 or 384 well)
Microtiter dish reader (Escherichia coli growth,
DNA quantitation)
Gel documentation (PCR reaction QC/DNA quantitation)
Microtiter dish centrifuges and evaporators
Bar code printers/scanners
Freezers/refrigerators/sample racks
Microarray data analysis software
Computers and computer network/data base
Laboratory tasks
Clone processing (Escherichia coli growth, PCR, PCR
clean-up, gel characterization, DNA concentration)
Array fabrication (arraying)
Tissue processing (RNA extractions, RNA
amplifications [if necessary])
Generation of fluorescent-tagged cDNA and
hybridization of chips
Data collection and analysis
Data base management
Staff skills
$ 45–125,000
$ 40 –225,000
$6 per clone
$ 50 –150,000
$5– 8,000 each
$ 20 –50,000
10 –30,000
10 –15,000
10 –15,000
10 –20,000
0 –50,000
10 –50,000
General laboratory
General laboratory
General laboratory
General laboratory
General laboratory/
Computer specialist
Includes approximate cost figures (U.S. dollars) for major equipment necessary for the production and utilization of
microarrays. In addition, specific types of personnel required to perform various microarray-related laboratory tasks and related
skills to accomplish these tasks are listed. PCR, polymerase chain reaction; QC, quality control.
biology will remain unaffected by the emerging microarray-generated gene expression data.
Currently, investigators have two major choices in the
application of such microarray studies— commercially
available microarray materials/services versus homegrown
microarrays. Challenges related to the broad implementation of commercial microarrays relate largely to the costs
associated with these prepared matrices (e.g., Affymetrix
or Incyte). Microarrays or DNA chips can cost thousands
of dollars each and much more than that for an entire
experiment with many subjects and conditions. Although
these commercial products also offer access to the substantial experience base of the vendors, the costs remain
largely beyond the reach of most academic research
laboratories. However, the costs for developing the homegrown variety can in some cases be equally daunting. In
addition, the homegrown arrays require sufficient expertise to implement the technology once equipment and
DNA sources can be obtained. The costs of the equipment
to produce and analyze homegrown arrays can vary
tremendously. Table 1 describes some of the equipment
necessary for these purposes, as well as approximate costs.
Beyond the equipment, one needs to obtain the DNA to
place onto these chips and/or membranes. Once again,
options abound, from cDNA clones (commercial as well
as private sources), PCR-amplified materials from clones,
mRNA, and/or genomic DNA to synthetic oligonucleotides. Depending upon the number of genes evaluated in a
given research study, the clones, PCR fragments, or
oligonucleotides can also represent significant costs. Approximate costs for some of these materials and related
processes are shown in Table 1. In addition to the cost of
the materials themselves is the ability to maintain, process,
and reliably track clones, DNAs, experimental samples,
and array data. Substantial efforts are required to develop
computer-based tracking capability as well as maintain a
data base structure sufficiently flexible to keep pace with
emerging strategies and new developments. As it stands,
many academic institutions around the country have developed or are developing microarray core facilities,
thereby providing this powerful technology to many
Beyond purchased or homegrown glass arrays, nylon
membrane arrays or macroarrays can be easily produced in
most laboratories. Several advantages to such macroarrays
exist, including improved DNA binding properties of this
matrix, the ease and cost of macroarray production, and
improved cDNA labeling procedures used to produce
radioactive cDNA tools. An additional advantage reported
by several commercial sources of such macroarrays suggests that these products can be reused for several gene
expression comparisons. Further evidence is emerging
suggesting that the use of nylon arrays and radioactive
cDNA tools provides improved sensitivity relative to glass
slide arrays when analyzed with fluorescent cDNA tools
(Bertucci et al 1999). Disadvantages of nylon-based macroarrays include the inability to compare gene expression
values within a single membrane, as comparisons require
a minimum of two theoretically identical membranes (e.g.,
control vs. experimental), and the difficulties associated
with generating high-density arrays containing tens of
thousands of genes per unit area.
Table 1 also generically describes some of the laboratory tasks necessary to consider when developing a complete microarray laboratory. Beyond the general tasks
listed, it is very important to recognize the vast quantities
of data (generated during the array fabrication process as
it relates to quality control). These data can describe
Escherichia coli growth, PCR reactions, PCR cleanup,
DNA concentration, and chip printing. This information
may be critical when one is analyzing array results. Lastly,
even though they are only dealt with in one line in this
table, the value of skilled computer support personnel and
networks in this microarray process should not be
Current Problems in Microarray-Based
Expression Analysis
One of the major challenges specific to the application of
microarrays to the study of gene expression patterns in the
central nervous system (CNS) will be to determine the
consequences of evaluating gene expression patterns in
such a complex tissue. Many previous microarray studies
have utilized comparatively less complex biological samples (e.g., tissue culture cells, yeast cells, tumor cells). As
the CNS is made of very complex cellular phenotypes, it
remains to be seen what procedures can be generically
applied to regional brain differences, disease-specific gene
expression changes within defined brain regions, and/or
the necessity of single-cell gene expression profiling (e.g.,
single-cell PCR or laser dissection– based strategies). A
microarray component yet to be fully studied is the
consequence of diluting RNAs of interest with RNA from
surrounding neurons and glia. For example, particular
neurons may be found in the brain in very limited
numbers. Further, they may be found diffusely throughout
S.J. Watson et al
a given brain region. It is likely that general brain region
dissection approaches would dilute the concentration of
RNAs from such neurons with RNAs from millions of
potentially unrelated neurons and glia. Thus, cell or tissue
isolation strategies as a rule affect one’s ability to detect
specific RNAs and thus overall sensitivity. If one wishes
to study particular neurons in a single brain region, how
will one isolate these cells of interest and expect to detect
the relevant mRNAs of interest? Will general brain region
dissection strategies (e.g., hypothalamus) yield sufficient
expression information for comparisons or will more
neuroanatomically refined methods like single-cell PCR or
laser dissection methods be required?
Beyond the number of complex phenotypes in the CNS
is the related issue of sensitivity of the microarray application of fluorescent-tagged nucleic acid tools. One of the
many possible explanations for the improved sensitivity of
nylon arrays is the improved efficiency of radioactive
labeling procedures relative to fluorescent strategies. For
example, it remains to be seen what level of gene expression in a given brain region can be detected using these
fluorescent detection approaches. Further, what will be the
consequences of brain region analysis versus single cell
analysis? Which of these two approaches would be more
advantageous for the specific detection of weakly expressed genes in many cells and/or weakly expressed
genes in a limited number of cells within a given brain
region? These represent significant issues actively pursued
in several research laboratories. Several alternatives or
refinements are being developed. These improvements
include refinements in the synthesis of fluorescent-tagged
nucleic acids (e.g., fluorescent nucleotide analogs with
improved enzymatic properties or post-cDNA synthesis
coupling of fluorescent tags), development of fluors with
improved fluorescent properties (e.g., stronger signals),
and novel hybridization instrumentation that can discriminate radioactivity tagged cDNA probes (35S vs. 32P, rather
than Cy3 vs. Cy5). Beyond these improvements, novel
developments in the design of high-throughput gene arrays
are described below.
In addition to the tissue complexity and the sensitivity
issues associated with brain samples, obtaining reliable
data is also a major challenge for the microarray-based
expression analysis. There are probably two main sources
for the observed variability of the microarray data: the
normal gene expression variations in different samples and
the noises introduced in the microarray assay process.
There are few systematic studies about the normal gene
expression variations; although data from in situ hybridizations seem to suggest that normal variance for many
tightly regulated tissue-specific genes can be within 20%
to 30%. However, there are two- to fourfold random
fluctuations for many genes in yeast (Cho et al 1998;
The “Chip” as a Specific Genetic Tool
Klevecz et al 1984). A recent article from Affymetrix
(Santa Clara, CA) suggested that for most of the “housekeeping” genes in human tissues, differences of less than
fourfold are probably not biologically significant (Warrington et al 2000), as those relatively abundant housekeeping genes are probably less tightly regulated. As a
result, a significant portion of microarray data variability
for high- or medium-abundance mRNAs may be due to
their normal expression variations.
For the tightly regulated (mostly low abundance)
mRNA species, noise introduced in various stages of the
microarray-based assay process may be the predominant
factor. Due to the miniaturization and the number of genes
involved in the assay, it is very difficult to maintain
consistent processing conditions across multiple assays for
each gene; thus, obtaining accurate absolute signals is
unlikely. For radioactively labeled probe-based microarray
assays, noise from slide heterogeneities, pin-to-pin variation, spotting volume fluctuation, and so on are described
in great detail by Schuchhardt et al (2000). Although some
of the systematic variations may be reduced by including
various controls (Schuchhardt et al 2000), random fluctuations in various stages can not be controlled and can
accumulate quickly in a complicated assay. Furthermore,
the complexity of brain tissue will also lead to significant
variations in brain tissue dissection, further reducing the
accuracy of microarray experiments.
The two-color assays should produce more accurate
results because variations in spot size and cDNA probe
amount on the chip should not change the signal ratio, as
signals are derived from the same spot. However, this is
only true if signals are well above the background in both
signal detection channels. The signal level for most of the
tightly regulated genes will be close to the background
level, and the fluctuation in spot size and probe amount in
a spot will still significantly change the signal ratio from
two samples. In addition, background level in a slide can
also vary significantly from spot to spot due to factors
such as unevenness in slide surface property, dust contamination, and incomplete washing, leading to high signal
variability for low-abundance mRNA species even in the
two-color assay system.
Unfortunately, despite the high variability of the microarray data, most of the published studies using microarray-based expression analysis only included very limited
number of repeats, and many studies conducted the assay
only once. Furthermore, many people use the arbitrary
“twofold change” criteria to judge if the observed gene
expression change is significant. However, the twofold
threshold is not statistically valid even for duplicate
experiments (Claverie 1999), and it is critical to have
enough replicate microarray assays to reach reliable conclusions (Lee et al 2000). In essence, the microarray
experiment should be held to the same statistical standards
that apply to other biological experiments. It is also critical
that the results from the microarray experiment should be
verified independently by other mRNA quantitation methods such as in situ, Northern, RNA protection, TaqMan, or
Invader assays.
The high variability of the microarray data also means
subtle changes in experimental condition may significantly alter the results, and thus it is very difficult for
different labs to compare experimental data. The lack of
standard controls, the predominant use of relative signals
(ratios), and the adoption of incompatible data formats
also contribute to poor comparability between studies.
Some of these issues, such as the use of standard controls
and the design of compatible data formats, have been
discussed in several recent meetings to improve data
New Directions in Microarray Technology
Microarray technology has undergone extensive development in array format, detection, and printing methods in
recent years. In addition to the flat-surface glass or silicon
chips, supporting materials such as microscopic beads,
nanochannel glass, 96-well microtiter plates, microelectrode array, and phototransistor arrays are also used for
depositing nucleic acid material. One of the most promising approaches is the microscopic bead– based array, as it
offers high sensitivity, flexibility, and many replicates in
one assay (Walt 2000). Bead-based approaches do not use
spatial location as the key for oligonucleotide probe
identity, in contrast to flat-surface chips. Different oligonucleotides are covalently attached to beads coded by
unique fluorescent dye combinations. Fluorescent-labeled
nucleic acid samples are then used to interrogate a mixture
of dye-coded beads. After that, there are two approaches to
obtain the experimental results. BeadArray (Illumina, San
Diego) assembles bead arrays by sedimentation on an
optical fiber substrate containing 5000 –50,000 individual
fibers. Each fiber contains a well that will accommodate
one bead. The identity of the beads and the hybridization
results can be easily decoded by analyzing images recorded from several excitation-emission wavelength combinations. In contrast, the Suspension Arrays developed by
Luminex (Austin, TX) use a microsphere-based flow
cytometric assay and beads are read one by one by laser
beams. Up to 20,000 beads can be read in 1 sec. Another
interesting development is the use of nanochannel glass
slides for array printing. Nanochannel glass materials are
unique glass structures containing a regular geometric
array of parallel holes or channels as small as 33 nm in
diameter or as large as several micrometers in diameter
(Beattie 1998). As a result, the surface area of nanochan-
S.J. Watson et al
nel glass is much greater than that of regular glass,
enabling larger amounts of DNA material to be deposited
in each spot. The hybridization kinetics are also greatly
improved due to the “flow through” property of the chip.
Furthermore, the wave-guide effect of the nanochannels
enables fluorescent signals inside the nanochannels to be
detected, particularly if the scanner has good depth of
field. Gene Logic (Gaithersburg, MD) reported that on a
CCD camera-based scanner their nanochannel glass-based
“Flow-Thru” chip increases hybridization signals by up to
Although the predominant method for microarray signal
detection is still based on fluorescence, many other new
methods also show promise. Radioactive probes have the
advantage of high incorporation efficiency, high sensitivity, and low cost. They were not used in high-density
microarrays due to the lack of high-resolution imaging
methods. However, the new Micro Imagers manufactured
by Biospace (Paris) have spatial resolutions from 15 to 20
␮m for radioactive probes, suitable for detecting signals
from microarrays with dot sizes around 150 ␮m (BIOSPACE 1999). Ratio analysis between two samples on the
same chip is possible because those Micro Imagers also
have the ability to separate the 3H signal from those
produced by 14C, 35S, or 33P. Detection methods based on
oxidation-reduction reaction (CMS 1999), resonance light
scattering (Yguerabide and Yguerabide 1998), capacitance
change after hybridization, resonance ionization mass
spectrum methods (Whitaker 1999), and the nanoparticlepromoted silver staining detection (Taton et al 2000) have
also been reported, and some of them have already found
niches in low-density clinical genotyping chips.
Probably the most significant advance in the area of
array making is the use of programmable digital light
processors for directing in situ oligonucleotide synthesis.
Similar to the Affymetrix approach (cf. Watson and Akil
1999), light is used to direct the addition of specific
nucleotides at defined locations on a chip. However, the
costly masks used in the Affymetrix method are replaced
by a light projector based on the digital light processor,
which will project mask images created by computer
programs onto the chip surface. These methods have the
potential of greatly reducing the cost and increasing the
flexibility of high-density oligonucleotide chips (Garner
2000; Singh-Gasson et al 1999; Zhou et al 1999). Another
increasingly important chip-making method involves the
adoption of inkjet printing technology (Blanchard et al
1996). When used for oligonucleotide deposition, it can
work with many different chip surfaces due to its
noncontact nature. It can also be used for in situ
oligonucleotide synthesis, thus greatly increasing the
flexibility while reducing the cost of oligonucleotidebased microarrays.
Data Analysis Issues
Experiments using microarray technology generate vast
amounts of data, and methods for the management and
analysis of these data are under intensive investigation.
The data management and analysis problems for microarrays can be divided roughly into four stages: data collection, data storage, image analysis, and knowledge discovery. The data collection stage will record the biological
properties of the samples, the sample tracking file created
during the chip-making process, and the image file generated by the scanner, along with data on other experimental conditions and procedures collected by the laboratory
information management software systems. Ideally, various controls and standards included in the experimental
process should allow downstream data analysis programs
to compare experimental results across different sources.
During the data storage stage, the above data, as well as
data from downstream analysis, should be stored in a
format and location that will allow easy comparison and
analysis among different groups. The image analysis stage
will convert the image file produced by microarray scanners or phosphoimagers to numerical signal intensity
values that can be used for knowledge discovery. Usually,
the sample-tracking file is also merged with the processed
image data to assign gene identity to each data point. In the
knowledge discovery stage, the goal is to extract from the
massive microarray data the significant information on the
changes in individual genes and alteration in patterns and
relationships among the various genes. In the case of
expression arrays, the most frequently asked questions
concern the identification of up- and downregulated genes,
patterns of gene expression, suggested functional role of
unknown genes, and correlation among different genes
and experimental conditions. There are already tools at
different development stages for these problems. Ideally,
one should also be able to compare experimental results
from different platforms and different systems, to integrate
knowledge from literature or clinical data bases, and to
automatically update the data mining results with new
information from various data bases from time to time.
Generally speaking, data collection and image analysis
are the relatively mature part of the microarray data
analysis problem. There are already many commercially
available software packages or free software packages that
can handle these issues reasonably well. Our discussion
below will largely focus on data storage and the knowledge discovery issues.
Currently, there are dozens of proprietary microarray
data base schemes in different laboratories, in addition to
three major public microarray data repositories: the Gene
Expression Omnibus at the National Center for Biotechnology Information (NCBI), the GeneX data base at the
The “Chip” as a Specific Genetic Tool
National Center for Genome Resources (NCGR), and the
ArrayExpress data base at the European Bioinformatics
Institute (EBI). Few of them can “talk to” each other, and
it is impossible to conduct data analysis across different
data bases. The adoption of identical data structure and
sample description language for microarray experiments is
critical for the comparability of data from different
sources. Equally important is the use of common standards
for data normalization, quality control, and cross-platform
comparison. An international conference at EBI has already been devoted to these topics. A series of recommendations has been published, and five working groups were
set up to develop 1) standards in experiment description
and data representation; 2) microarray data extensible
markup language (XML) exchange format; 3) ontologies
for sample description; 4) normalization, quality control,
and cross-platform comparison; and 5) data-query language and data-mining approaches (MGED 2000). The
use of XML for the formal description and annotation of
microarray data and experiments will greatly facilitate
data exchange in the future. Currently there are at least two
sets of XMLs developed by NCBI (NCBI 1999) and
NCGR-EBI (NCGR 2000). Hopefully we will see a
unified standard for data representation, description, and
exchange in the near future after the differences among
these major public data bases are settled.
Discovering new knowledge from the microarray data
can be pursued on several different levels. At the simplest
level, the up- and downregulated genes can be easily
identified from microarray experiments. Indeed, this may
be the question that is most frequently asked when
biologists conduct experiments using microarray technology. Almost all the microarray data analysis software
packages have this capability, and the major difference
among various packages is in the way they visualize the
analysis results. In fact, one can also use desktop spreadsheet or data base programs to implement such analysis
and generate graphic reports easily.
However, although the simple fold-of-change analysis
is useful in many situations, it hardly touches the rich
information embedded in the microarray data. Since microarray experiments determine expression levels of thousands of genes, it would be useful to find higher order
relationships or hidden patterns among these genes. Clustering together genes that exhibit similar expression patterns across multiple experiments is one way of revealing
such a relationship. Such an analysis would help understand the regulatory mechanisms underlying the change in
expression levels. One may also get some idea about the
functional role of unknown genes by the known genes in
the same cluster. Since the first cluster analysis of microarray data by Eisen et al (1998), it has become a routine
method for grouping coregulated and/or functionally sim-
ilar genes based on microarray data. Currently, many
variations of clustering algorithms are being used in
microarray data analysis. The most popular hierarchic
clustering makes no assumption about the biological
properties of the genes involved, and it has been shown to
be a good tool for discovering genes with similar functions
and inferring functional roles of unknown genes both in
yeast and in mammalian cells. However, this method is not
statistically very robust, and its results can be strongly
influenced by outliers. In addition, gene expression patterns are not inherently hierarchic. Another clustering
method that has been successfully used in the analysis of
hematopoietic differentiation is the self-organizing map
method, which allows the experimenter to impose partial
structure on the data and test different hypotheses. The
self-organizing map method is also reported to be more
robust and accurate (Toronen et al 1999). Yet another way
to cluster gene expression is the support vector machine
method (Brown et al 2000), which is a supervised learning
technique that uses training gene data sets assembled
based on current knowledge to specify in advance which
genes should cluster together and which genes should not
be assigned to a given functional class. This method is
reported to be more accurate in identifying genes with
common functions based on microarray expression data. It
can also identify outliers and is more robust for large data
sets. A systematic comparison of various clustering algorithms, including the K mean (a type of partitioned
clustering in which the number of clusters [K] is defined
before the calculation) and the Bayesian clustering methods (which allow one to factor expertise and prior knowledge into computation), and different ways of calculating
the similarity matrix between genes for expression data
analysis are urgently needed.
Principal component analysis is a way to reduce a large
data set to a more meaningful, smaller set of variables.
Genes that are coexpressed with one another and are also
largely independent of other subsets of expression patterns
are combined into factors. They are thought to be representative of the underlying biological process that has
created the correlation among different genes. For example, genes that share common promoter elements may
largely be coregulated in some situations. The identified
principal components can also be used for other analysis,
such as clustering. Classification is another way of
analyzing microarray data. It uses microarray analysis
data sets from multiple samples in known categories,
such as cancer and normal tissues, to extract unique and
reliable expression patterns or “predictors” for samples
in a particular category. The unique patterns discovered
in such analyses can then be used to predict the
properties of unknown samples. These patterns will also
be helpful for understanding the molecular mechanisms
underlying the differences among tested samples
(Golub et al 1999).
Uncovering the possible connections between gene
expression level and genomic structure will be the next
level of challenge. Finding the relationship between promoter structure and the expression pattern of the corresponding gene will be among the first steps to understanding the gene regulation process. A computational method
for finding the consensus promoter elements in the promoter regions of coregulated genes has already been
reported (van Helden et al 1998; Zhang 1999). It was
tested in yeast, but human and mouse genomes may
quickly be amenable to such analyses, given the rapid
progress of the genome projects. There are also efforts
devoted to deducing the intracellular regulatory circuits in
gene transcription based on microarray data (Kyoto University, Institute for Chemical Research 2000; McAdams
and Arkin 1998; Yuh et al 1998). Although it will be
extremely difficult to conduct such analyses in multicellular organisms, knowledge gained from unicellular organisms will certainly help the understanding of basic transcription programming in higher organisms. Analyzing the
relationships between gene expression data and protein
expression data and/or single nucleotide polymorphism
analysis results may also help to elucidate various regulatory mechanisms and interactions in biological processes.
A bigger challenge will be the discovery of hidden
patterns in or associations between gene expression data
and data in existing knowledge data bases such as the
current scientific literature and the medical data bases.
Although there are extensive efforts to mine literature data
bases, it will be a daunting task to implement such
analyses for new microarray data on a regular basis.
Enormous computing power, extremely fast Internet connections, and data bases that can communicate with each
other will certainly be required. When that goal is
achieved, it is conceivable that computers, rather than
scientists, will be the main contributors of new ideas.
Distant Future Perspectives
The current status of the “genomics revolution” might be
compared to the first day that Columbus landed in the New
World. He could see that there was indeed land there, but
he had no idea of the size of the Americas or of the impact
these continents would eventually have on the old world
he knew. The seismic change in the world of biology is
also at its earliest stage, with an enormous future and real
complexities yet to come. We are just now beginning to
see the first elements of the monumental biological puzzle—the actual sequences of the entire genomes of several
species, including man. Within a very few years, not only
will all of these genomes be fully sequenced, but the
S.J. Watson et al
motifs and patterns within them will also lay out the
enormous similarities and differences of life. On the way
to the goal of such deep understanding we will have to
pass through several stages. These include the first stage of
actual sequencing and now the second stage, in which we
attempt to use the full-scale sequence data to begin to see
how these thousands of genes actually respond to the
entire range of life events, and how they in turn control
adaptation to internal and external demands throughout the
life cycle. We will also begin to appreciate the subtle
choices made by nature with respect to which genes and
which forms of those genes are normally found in specific
tissues and under what conditions they are regulated. It
then becomes possible to define the actual functions of the
unique gene products, to elucidate how they interact with
each other, and to describe the ways in which they are
affected by the environment. These last steps are, in a
sense, the real goal of biology and the key to uncovering
the causes of all human illnesses, including mental ones.
It is clear that the brain is the ultimate genetic system,
as it is the organ that expresses the highest proportion of
the genome. But it has used its genetic complexity to
elaborate a level of higher order organization involving
neuronal circuits that mediate much more complex functions that can vary from moment to moment and that
encompass the broadest range of biological controls, from
breathing to cognition. In addition, these circuits are
highly dynamic, and the brain is, in essence, a learning
machine. It alters its circuits and their functioning throughout the course of life, and in response to changing
demands. Thus, its ultimate physical structure, its pattern
of gene expression, and the parameters of its function are
all vastly different from the parameters set early in life.
There is much to be learned from studying the individual
genes, the specific circuits, and the unique behavioral
responses triggered by specific conditions. But there is
little question that much of the information processing in
the brain resides in patterns of activity (be it electrophysiologic, transcriptional/translational, or secretory activity)
and that we have not yet begun to fathom these patterns
and their coordination. The use of genomics to study one
level of organization and patterning is likely to prove
extremely useful, though it needs to be incorporated with
other approaches (Akil and Watson 2000). Our task as
molecular neuroscientists and biological psychiatrists is to
take on this enormously daunting problem of understanding brain function and dysfunction, while bearing in mind
that the brain, because of its circuitry and its plasticity, is
a great deal more than patterns of gene expression. As
genetic and cellular studies mature in specific technical
ways, we should see much more precise and sensitive
studies being carried out on the brain in normal and
abnormal states. However, much effort will be needed to
The “Chip” as a Specific Genetic Tool
not only uncover static patterns of gene expression that
appear associated with a disorder, but also discover the
dynamics of these patterns, and how they contribute to the
appearance of an illness and its course. Just now the
outlines of this enormous undertaking are beginning to
take shape. The diagnostic and therapeutic options we will
begin to see in the next few years will likely grow in large
part from these efforts and, for the first time, offer both
real insights into the human brain and better tools for
helping those suffering from brain-related illnesses.
This study was supported in part by National Institute of Mental Health
(NIMH) Grant No. 5 PO1 MH422521 (SJW, HA), NIMH Conte Grant
No. L99-MH60398-2 (SJW, HA), National Institute on Drug Abuse
Grants Nos. 5 RO1 DA8920 and 5 RO1 DA02265 (SJW, FM, HA),
grants from the Nancy Pritzker Depression Research Network (SJW,
RCT, HA), and National Institute of Diabetes and Digestive and Kidney
Diseases Grant No. RO1-DK54232 (RCT).
Aspects of this work were presented at the conference “Genetics and
Brain Function: Implications for the Treatment of Anxiety,” March
22–23, 2000, Washington, DC. The conference was jointly sponsored by
the Anxiety Disorders Association of America (ADAA), the ADAA
Scientific Advisory Board, and the National Institute of Mental Health.
Akil H, Watson SJ (2000): Science and the future of psychiatry.
Arch Gen Psychiatry 57:86 – 87.
Amundson SA, Bittner M, Chen Y, Trent J, Meltzer P, Fornace
AJJ (1999): Fluorescent cDNA microarray hybridization
reveals complexity and heterogeneity of cellular genotoxic
stress responses. Oncogene 18:3666 –3672.
Beattie KL, inventor; Houston Advanced Research Center, assignee (1998, December 1): Microfabricated, flowthrough
porous apparatus for discrete detection of binding reactions.
U.S. Patent 5,843,767.
Bertucci F, Bernard K, Loriod B, Chang YC, Granjeaud S,
Birnbaum D, et al (1999): Sensitivity issues in DNA arraybased expression measurements and performance of nylon
microarrays for small samples. Hum Mol Genet 8:1715–1722.
BIOSPACE. Micro Imager. Available at: http://www.biospace.
fr/Versionfr/microimager/tech_spec.html. Accessed November 28, 2000.
Blanchard AP, Kaiser RJ, Hood LE (1996): High-density oligonucleotide arrays. Biosens Bioelectron 11:687– 690.
Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW,
Furey TS, et al (2000): Knowledge-based analysis of microarray gene expression data by using support vector machines.
Proc Natl Acad Sci U S A 97:262–267.
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A,
Wodicka L, et al (1998): A genome-wide transcriptional
analysis of the mitotic cell cycle. Mol Cell 2:65–73.
Claverie JM (1999): Computational methods for the identification of differential and coordinated gene expression. Hum
Mol Genet 8:1821–1832.
CMS. Technology: How the system works. Available at: http://
1155 Accessed
November 28, 2000.
DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M,
et al (1996): Use of a cDNA microarray to analyse gene
expression patterns in human cancer. Nat Genet 14:457– 460.
Eisen MB, Spellman PT, Brown PO, Botstein D (1998): Cluster
analysis and display of genome-wide expression patterns.
Proc Natl Acad Sci U S A 95:14863–14868.
Ermolaeva OD, Sverdlov ED (1996): Subtractive hybridization,
a technique for extraction of DNA sequences distinguishing
two closely related genomes: Critical analysis. Genet Anal
13:49 –58.
Garner HR. Digital Optical Chemistry. Available at: http:// Accessed November 1, 2000.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M,
Mesirov JP, et al (1999): Molecular classification of cancer:
Class discovery and class prediction by gene expression
monitoring. Science 286:531–537.
Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J,
et al (1997): Discovery and analysis of inflammatory diseaserelated genes using cDNA microarrays. Proc Natl Acad Sci U
S A 94:2150 –2155.
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, et
al (1999): The transcriptional program in the response of
human fibroblasts to serum. Science 283:83– 87.
Jelinsky SA, Samson LD (1999): Global response to Saccharomyces cerevisiae to an alkylating agent. Proc Natl Acad Sci U
S A 96:1486 –1491.
Khan J, Bittner ML, Saal LH, Teichmann U, Azorsa DO, Gooden
GC, et al (1999): CDNA microarrays detect activation of a
myogenic transcription program by the PAX3-FKHR fusion
oncogene. Proc Natl Acad Sci U S A 96:13264 –13269.
Khan J, Simon R, Bittner M, Chen Y, Leighton SB, Pohida T, et
al (1998): Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res 58:5009 –5013.
Klevecz RR, Kauffman SA, Shymko RM (1984): Cellular clocks
and oscillators. Int Rev Cytol 86:97–128.
Kyoto University, Institute for Chemical Research. KEGG expression map. Available at:
kegg2.html. Accessed November 1, 2000.
Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C,
Hwang SY, et al (1997): Yeast microarrays for genome wide
parallel genetic and gene expression analysis. Proc Natl Acad
Sci U S A 94:13057–13062.
Lee C-K, Klopp RG, Weindruch R, Prolla TA (1999): Gene
expression profile of aging and its retardation by caloric
restriction. Science 285:1390 –1393.
Lee ML, Kuo FC, Whitmore GA, Sklar J (2000): Importance of
replication in microarray gene expression studies: Statistical
methods and evidence from repetitive cDNA hybridizations.
Proc Natl Acad Sci U S A 97:9834 –9839.
Liang P, Pardee AB (1992): Differential display of eukaryotic
messenger RNA by means of the polymerase chain reaction.
Science 257:967–971.
Livak KJ, Marmaro J, Todd JA (1995): Towards fully automated
genome-wide polymorphism screening. Nat Genet 9:341–
Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, et al
(1999): Gene expression profiles of laser-captured adjacent
neuronal subtypes. Nat Med 5:117–122.
McAdams HH, Arkin A (1998): Simulation of prokaryotic
genetic circuits. Annu Rev Biophys Biomol Struct 27:199 –
MGED. Microarray Gene Expression Database group. Available
at: Accessed November 1, 2000.
NCBI. Gene expression omnibus. Available at: http://www. Accessed November 28, 2000.
NCGR. The Gene Expression Markup Language (GEML).
Available at:
Accessed November 1, 2000.
Perou CM, Jeffrery SS, van de Rijn M, Rees CA, Eisen MB,
Ross DT, et al (1999): Distinctive gene expression patterns in
human mammary epithelial cells and breast cancers. Proc
Natl Acad Sci U S A 96:9212–9217.
Schena M, Shalon D, Davis RW, Brown PO (1995): Quantitative
monitoring of gene expression patterns with a complementary
DNA microarray. Science 270:467– 470.
Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW
(1996): Parallel human genome analysis: Microarray-based
expression monitoring of 1,000 genes. Proc Natl Acad Sci U
S A 93:10614 –10619.
Schuchhardt J, Beule D, Malik A, Wolski E, Eickhoff H, Lehrach
H, et al (2000): Normalization strategies for cDNA microarrays. Nucleic Acids Res 28:E47.
Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F,
Sussman MR, et al (1999): Maskless fabrication of lightdirected oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol 17:974 –978.
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen
MB, et al (1998): Comprehensive identification of cell
cycle-regulated genes of the yeast Saccharomyces cerevisiae
by microarray hybridization. Mol Biol Cell 9:3273–3297.
Taton TA, Mirkin CA, Letsinger RL (2000): Scanometric DNA
array detection with nanoparticle probes. Science 289:1757–
Toronen P, Kolehmainen M, Wong G, Castren E (1999):
Analysis of gene expression data using self-organizing maps.
FEBS Lett 451:142–146.
van Helden J, Andre B, Collado-Vides J (1998): Extracting
regulatory sites from the upstream region of yeast genes by
computational analysis of oligonucleotide frequencies. J Mol
Biol 281:827– 842.
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995):
Serial analysis of gene expression. Science 270:484 – 487.
Walt DR (2000): Bead-based fiber-optic arrays. Science 287:
451– 452.
S.J. Watson et al
Wang K, Gan L, Jeffery E, Gayle M, Gown AM, Skelly M, et al
(1999): Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray. Gene 229:101–108.
Wang Y, Rea T, Bian J, Gray S, Sun Y (1999): Identification of
the genes responsive to etoposide-induced apoptosis: Application of DNA chip technology. FEBS Lett 445:269 –273.
Warrington JA, Nair A, Mahadevappa M, Tsyganskaya M
(2000): Comparison of human adult and fetal expression and
identification of 535 housekeeping/maintenance genes.
Physiol Genomics 2:143–147.
Watson SJ, Akil H (1999): Gene chips and arrays revealed: A
primer on their power and their uses. Biol Psychiatry 45:533–
Welford SM, Gregg J, Chen E, Garrison D, Sorensen PH, Denny
CT, et al (1998): Detection of differentially expressed genes
in primary tumor tissues using representational differences
analysis coupled to microarray hybridization. Nucleic Acids
Res 26:3059 –3065.
Whitaker TJ (1999, November): Novel methods for detection of
hybridization on DNA chips. Paper presented at 6th Annual
Chips to Hits, Berkeley, California.
White KP, Rifkin SA, Hurban P, Hogness DS (1999): Microarray
analysis of drosophila development during metamorphosis.
Science 286:2179 –2184.
Whitney LW, Becker KG, Tresser NJ, Caballero-Ramos CI,
Munson PJ, Prabhu VV, et al (1999): Analysis of gene
expression in multiple sclerosis lesions using cDNA microarrays. Ann Neurol 46:425– 428.
Wilson M, DeRisi J, Kristensen HH, Imboden P, Rane S, Brown
PO, et al (1999): Exploring drug-induced alterations in gene
expression in Mycobacterium tuberculosis by microarray
hybridization. Proc Natl Acad Sci U S A 96:12833–12838.
Wolfsberg TG, Gabrielian AE, Campbell MJ, Cho RJ, Spouge
JL, Landsman D (1999): Candidate regulatory sequence
elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Res 9:775–792.
Yguerabide J, Yguerabide EE (1998): Light-scattering submicroscopic particles as highly fluorescent analogs and their use as
tracer labels in clinical and biological applications. Anal
Biochem 262:137–156.
Yuh CH, Bolouri H, Davidson EH (1998): Genomic cis-regulatory logic: Experimental and computational analysis of a sea
urchin gene. Science 279:1896 –1902.
Zhang MQ (1999): Promoter analysis of co-regulated genes in
the yeast genome. Comput Chem 23:233–250.
Zhou X, Gao X, LeProust E, Peppllois JP, Yu P, Zhang H, et al
(1999): Light-directed, programmable microarray synthesis.
Nat Genet 23:84.