REVIEWS The “Chip” as a Specific Genetic Tool Stanley J. Watson, Fan Meng, Robert C. Thompson, and Huda Akil DNA microarrays are powerful tools for the analysis of the organization and regulation of the brain, in both illness and health. Such messenger RNA expression methods are outgrowths of a marriage between the several genome sequencing projects and a wide variety of physical, chemical, optical, and electronic systems. The advantages of microarray analyses include the ability to study the regulation of several genes or even the entire genome in a single experiment. However, there are substantive issues associated with the use of these tools that need to be considered before drawing conclusions about the genomic regulation of the brain. These issues include the loss of most anatomic (i.e., cellular and circuit) specificity, only fair sensitivity, lack of absolute quantitative data, poor comparability between studies, and high variability in sample values, to mention the most obvious. In this review we point to some of the solutions proposed for these problems and novel techniques and approaches for newer methods. Among these are methods for making arrays more sensitive, including nonarray messenger RNA expression systems. The future of this field and its links to deeper protein and cell biology are both emphasized. Biol Psychiatry 2000;48:1147–1156 © 2000 Society of Biological Psychiatry Key Words: Microarray, gene expression, genomics, equipment, data analysis, brain Current Status of Microarray Systems M uch of the progress in treating mental illness over the last half century has come from classic pharmacologic approaches. In the last quarter century the speed of knowledge acquisition in the neurobiology of the brain has radically increased as we have begun to see the nature of those early treatments in the light of neural circuits, cellular pathways, and gene regulation. The last decade has placed us on the cusp of the first full version of the human genome, mainly from the Human Genome Project and most recently from several major industrial efforts. Finally, it has become apparent that the major mental illnesses are in fact complex genetic diseases involving many genes, and most likely many circuits in the brain. It From the Department of Psychiatry and The Mental Health Research Institute, University of Michigan, Ann Arbor, Michigan. Address reprint requests to Stanley J. Watson, M.D., Ph.D., The University of Michigan, Mental Health Research Institute, 205 Zina Pitcher Place, Ann Arbor MI 48109-0720. Received March 10, 2000; revised September 28, 2000; accepted October 2, 2000. © 2000 Society of Biological Psychiatry is also quite likely that our classic treatments are not targeted at most of the genes actually responsible for these illnesses but rather are the results of fortuitous medications that impact on symptoms and only indirectly modify the disease processes themselves. Thus, as a field we are faced with the need for tools to address this enormous complexity at the neural and genomic levels. In this review we describe one such approach that holds promise for increasing our understanding of the biology of these disorders and thus our likelihood of designing better treatments. This approach involves the use of DNA microarrays for studying messenger RNA (mRNA) expression in the brain. It is likely to be one of the best tools for the study of the human genome in its full depth in mental illness. Once it is refined it may well offer significant clues not only about the initial gene defects that may lead to vulnerability to these disorders, but also to the response of the brain to these illnesses, which in turn could increase the likelihood of future relapse. Optimally, these insights into the molecular basis of the disease process will lead us to design therapies that can treat the disorders and prevent the “scarring” that results from the expression of the disease process. Here we hope to accomplish the following three goals: 1) provide a sense of the actual methods used to date and the types of data they generate, 2) describe the problems currently associated with microarrays, and 3) point to technical advances and novel strategies that may be applicable in the near future in this fast-paced academic and industrial arena. For years researchers have attempted to develop strategies that address true biological complexity. One such strategy, microarrays, has emerged that permits the analysis of gene expression patterns in which relatively simple questions are asked. For example, following a given stimulus, what RNAs (used as an index of gene activation/ inactivation) are increased in one biological sample relative to another? A hypothesis behind many of these approaches states that RNAs more abundant (or less abundant) in one biological sample than another are descriptive of the phenotypic differences between the two biological samples. Such microarray strategies permit studies that go beyond those where individual genes and their mRNA expression are evaluated one at a time. They enable the examination of thousands of genes simultaneously and the identification of patterns or profiles of gene expression that may be the hallmark of the specific phenotype under study. 0006-3223/00/$20.00 PII S0006-3223(00)01080-5 1148 BIOL PSYCHIATRY 2000;48:1147–1156 Early efforts to evaluate complex gene expression patterns were led by groups performing large scale DNA sequencing/cloning, differential hybridization methodologies (e.g., subtractive hybridization [Ermolaeva and Sverdlov 1996]), polymerase chain reaction (PCR)– based methods (e.g., differential display PCR [Liang and Pardee 1992]), or serial amplification of gene expression (SAGE; Velculescu et al 1995). Refinements in these methods have continued to develop and have proven to be very productive. However, these procedures are multistep processes, which makes them challenging for most laboratories to initiate. They are prone to false positives and can be expensive and/or tedious and time consuming. In part related to these challenges, alternative gene expression analysis strategies have emerged. Some of these methods are solution based and often require one to predefine which genes are evaluated. One example of such a solution-based method is the TaqMan assay (Livak et al 1995), in which many gene expression patterns can be quantitatively evaluated in a high throughput fashion. To execute this assay, an investigator must predefine which genes are to be monitored and prepare materials specific to these genes (e.g., specific oligonucleotides for each gene to be assayed). Thus, though this assay is extraordinarily sensitive and quantitative, it is quite expensive (due to the required synthesis of gene-specific oligonucleotides) and is limited by the investigator-defined list of genes evaluated in a given biological sample. By comparison, arrays (micro- or macro-) rely upon a simple strategy whereby the genes of interest are deposited or synthesized upon a matrix (glass, silicon, or nylon membrane). The genes applied to the matrix can be double stranded or single stranded DNA that is either synthesized or derived from tissue extracts (e.g., using PCR). The experimenter knows exactly which specific gene (or fragment thereof) is present in each location on the matrix. This matrix can then be used to compare or contrast the relative quantity of mRNA found in two biological samples (Watson and Akil 1999). In this method, the DNA first applied to the matrix represents the known probes. The material derived from the tissue represents the “target” of the assay and contains an unknown mixture of complementary DNAs (cDNAs) depending on the pattern of gene expression in that tissue at the time of extraction. The goal is to use the probes on the matrix to quantitate the specific genes expressed in the target material. Some of the array approaches (largely microarrays on glass or silicon) utilize fluorescent-tagged first-strand cDNA (derived from tissue RNA) to determine the relative amounts of gene expression in a biological sample. In these situations a two-color scheme is utilized whereby the control and experimental samples are tagged uniquely with a specific fluor (e.g., control tagged with Cy3 [green] S.J. Watson et al and experimental tagged with Cy5 [red]). These tagged cDNAs can then be mixed and allowed to hybridize (interact) with a glass slide-based matrix containing the genes (DNAs) of interest. The power of this approach arises from the ability to purchase or produce arrays with hundreds to thousands of genes (DNAs) on them. With the development of array strategies, it is possible to predefine what genes are evaluated in each experiment, as gene expression comparisons are based on quantitation with DNA probes of known identity (although not necessarily known function). This approach combines the advantages of broad-based strategies that examine general patterns of change in gene expression with the advantages of very targeted techniques that use specific probes for single genes. Thus, array gene expression analyses permit the broad evaluation of gene expression patterns of thousands of defined genes in single experiments. Multiple examples in the past several years have highlighted the advantages of broadly evaluating gene expression patterns. A number of investigators have utilized microarray approaches to investigate alterations in gene expression in several models including yeast, T cells, tumor cell lines, and fibroblasts (Amundson et al 1999; DeRisi et al 1996; Heller et al 1997; Iyer et al 1999; Jelinsky and Samson 1999; Khan et al 1998, 1999; Lashkari et al 1997; Perou et al 1999; Schena et al 1995, 1996; Spellman et al 1998; Wang et al 1999a, 1999b; Welford et al 1998; Whitney et al 1999; Wilson et al 1999; Wolfsberg et al 1999). What these studies share is that they reveal the dramatic orchestration of cellular responses to given stimuli. A large number of genes appear to respond in a coordinated manner to particular signals, revealing previously unsuspected connections between cellular pathways and implicating novel genes in certain functions. In this way, investigators can gain insight into those gene activation or inactivation processes that occur concurrently. The mechanistic relationship between the activation or inactivation of a given set of genes may suggest a coordinated response at the cellular or system level following specific stimuli. Using temporal relationships between activation/inactivation gene expression patterns, investigators can attempt to define primary response genes from secondary response genes. Although this still requires rigorous testing, functions and interdependency of gene expression patterns and/or alterations can be determined. The use of microarray analysis of gene expression is now applied to many additional models and systems. These additional models/systems include Drosophila (development [White et al 1999]), skeletal muscle (aging/ caloric restriction [Lee et al 1999]), isolated neurons (distinctive phenotypes/functions [Luo et al 1999]), and many others. Through this type of work, few fields of The “Chip” as a Specific Genetic Tool BIOL PSYCHIATRY 2000;48:1147–1156 1149 Table 1. Microarray Equipment and Complementary DNA (cDNA) Clone Production Costs, and Laboratory Tasks Approximate costs Equipment Arrayer Scanner Clone production Clones Liquid handling (96-well pipettes, cherry picking) PCR cycler (96 or 384 well) Microtiter dish reader (Escherichia coli growth, DNA quantitation) Gel documentation (PCR reaction QC/DNA quantitation) Microtiter dish centrifuges and evaporators Bar code printers/scanners Freezers/refrigerators/sample racks Microarray data analysis software Computers and computer network/data base Laboratory tasks Clone processing (Escherichia coli growth, PCR, PCR clean-up, gel characterization, DNA concentration) Array fabrication (arraying) Tissue processing (RNA extractions, RNA amplifications [if necessary]) Generation of fluorescent-tagged cDNA and hybridization of chips Data collection and analysis Data base management Staff skills $ 45–125,000 $ 40 –225,000 $6 per clone $ 50 –150,000 $5– 8,000 each $ 20 –50,000 $ $ $ $ $ $ 10 –30,000 10 –15,000 10 –15,000 10 –20,000 0 –50,000 10 –50,000 General laboratory General laboratory General laboratory General laboratory General laboratory/ bioinformatics Computer specialist Includes approximate cost figures (U.S. dollars) for major equipment necessary for the production and utilization of microarrays. In addition, specific types of personnel required to perform various microarray-related laboratory tasks and related skills to accomplish these tasks are listed. PCR, polymerase chain reaction; QC, quality control. biology will remain unaffected by the emerging microarray-generated gene expression data. Currently, investigators have two major choices in the application of such microarray studies— commercially available microarray materials/services versus homegrown microarrays. Challenges related to the broad implementation of commercial microarrays relate largely to the costs associated with these prepared matrices (e.g., Affymetrix or Incyte). Microarrays or DNA chips can cost thousands of dollars each and much more than that for an entire experiment with many subjects and conditions. Although these commercial products also offer access to the substantial experience base of the vendors, the costs remain largely beyond the reach of most academic research laboratories. However, the costs for developing the homegrown variety can in some cases be equally daunting. In addition, the homegrown arrays require sufficient expertise to implement the technology once equipment and DNA sources can be obtained. The costs of the equipment to produce and analyze homegrown arrays can vary tremendously. Table 1 describes some of the equipment necessary for these purposes, as well as approximate costs. Beyond the equipment, one needs to obtain the DNA to place onto these chips and/or membranes. Once again, options abound, from cDNA clones (commercial as well as private sources), PCR-amplified materials from clones, mRNA, and/or genomic DNA to synthetic oligonucleotides. Depending upon the number of genes evaluated in a given research study, the clones, PCR fragments, or oligonucleotides can also represent significant costs. Approximate costs for some of these materials and related processes are shown in Table 1. In addition to the cost of the materials themselves is the ability to maintain, process, and reliably track clones, DNAs, experimental samples, and array data. Substantial efforts are required to develop computer-based tracking capability as well as maintain a data base structure sufficiently flexible to keep pace with emerging strategies and new developments. As it stands, many academic institutions around the country have developed or are developing microarray core facilities, thereby providing this powerful technology to many investigators. Beyond purchased or homegrown glass arrays, nylon membrane arrays or macroarrays can be easily produced in 1150 BIOL PSYCHIATRY 2000;48:1147–1156 most laboratories. Several advantages to such macroarrays exist, including improved DNA binding properties of this matrix, the ease and cost of macroarray production, and improved cDNA labeling procedures used to produce radioactive cDNA tools. An additional advantage reported by several commercial sources of such macroarrays suggests that these products can be reused for several gene expression comparisons. Further evidence is emerging suggesting that the use of nylon arrays and radioactive cDNA tools provides improved sensitivity relative to glass slide arrays when analyzed with fluorescent cDNA tools (Bertucci et al 1999). Disadvantages of nylon-based macroarrays include the inability to compare gene expression values within a single membrane, as comparisons require a minimum of two theoretically identical membranes (e.g., control vs. experimental), and the difficulties associated with generating high-density arrays containing tens of thousands of genes per unit area. Table 1 also generically describes some of the laboratory tasks necessary to consider when developing a complete microarray laboratory. Beyond the general tasks listed, it is very important to recognize the vast quantities of data (generated during the array fabrication process as it relates to quality control). These data can describe Escherichia coli growth, PCR reactions, PCR cleanup, DNA concentration, and chip printing. This information may be critical when one is analyzing array results. Lastly, even though they are only dealt with in one line in this table, the value of skilled computer support personnel and networks in this microarray process should not be underappreciated. Current Problems in Microarray-Based Expression Analysis One of the major challenges specific to the application of microarrays to the study of gene expression patterns in the central nervous system (CNS) will be to determine the consequences of evaluating gene expression patterns in such a complex tissue. Many previous microarray studies have utilized comparatively less complex biological samples (e.g., tissue culture cells, yeast cells, tumor cells). As the CNS is made of very complex cellular phenotypes, it remains to be seen what procedures can be generically applied to regional brain differences, disease-specific gene expression changes within defined brain regions, and/or the necessity of single-cell gene expression profiling (e.g., single-cell PCR or laser dissection– based strategies). A microarray component yet to be fully studied is the consequence of diluting RNAs of interest with RNA from surrounding neurons and glia. For example, particular neurons may be found in the brain in very limited numbers. Further, they may be found diffusely throughout S.J. Watson et al a given brain region. It is likely that general brain region dissection approaches would dilute the concentration of RNAs from such neurons with RNAs from millions of potentially unrelated neurons and glia. Thus, cell or tissue isolation strategies as a rule affect one’s ability to detect specific RNAs and thus overall sensitivity. If one wishes to study particular neurons in a single brain region, how will one isolate these cells of interest and expect to detect the relevant mRNAs of interest? Will general brain region dissection strategies (e.g., hypothalamus) yield sufficient expression information for comparisons or will more neuroanatomically refined methods like single-cell PCR or laser dissection methods be required? Beyond the number of complex phenotypes in the CNS is the related issue of sensitivity of the microarray application of fluorescent-tagged nucleic acid tools. One of the many possible explanations for the improved sensitivity of nylon arrays is the improved efficiency of radioactive labeling procedures relative to fluorescent strategies. For example, it remains to be seen what level of gene expression in a given brain region can be detected using these fluorescent detection approaches. Further, what will be the consequences of brain region analysis versus single cell analysis? Which of these two approaches would be more advantageous for the specific detection of weakly expressed genes in many cells and/or weakly expressed genes in a limited number of cells within a given brain region? These represent significant issues actively pursued in several research laboratories. Several alternatives or refinements are being developed. These improvements include refinements in the synthesis of fluorescent-tagged nucleic acids (e.g., fluorescent nucleotide analogs with improved enzymatic properties or post-cDNA synthesis coupling of fluorescent tags), development of fluors with improved fluorescent properties (e.g., stronger signals), and novel hybridization instrumentation that can discriminate radioactivity tagged cDNA probes (35S vs. 32P, rather than Cy3 vs. Cy5). Beyond these improvements, novel developments in the design of high-throughput gene arrays are described below. In addition to the tissue complexity and the sensitivity issues associated with brain samples, obtaining reliable data is also a major challenge for the microarray-based expression analysis. There are probably two main sources for the observed variability of the microarray data: the normal gene expression variations in different samples and the noises introduced in the microarray assay process. There are few systematic studies about the normal gene expression variations; although data from in situ hybridizations seem to suggest that normal variance for many tightly regulated tissue-specific genes can be within 20% to 30%. However, there are two- to fourfold random fluctuations for many genes in yeast (Cho et al 1998; The “Chip” as a Specific Genetic Tool Klevecz et al 1984). A recent article from Affymetrix (Santa Clara, CA) suggested that for most of the “housekeeping” genes in human tissues, differences of less than fourfold are probably not biologically significant (Warrington et al 2000), as those relatively abundant housekeeping genes are probably less tightly regulated. As a result, a significant portion of microarray data variability for high- or medium-abundance mRNAs may be due to their normal expression variations. For the tightly regulated (mostly low abundance) mRNA species, noise introduced in various stages of the microarray-based assay process may be the predominant factor. Due to the miniaturization and the number of genes involved in the assay, it is very difficult to maintain consistent processing conditions across multiple assays for each gene; thus, obtaining accurate absolute signals is unlikely. For radioactively labeled probe-based microarray assays, noise from slide heterogeneities, pin-to-pin variation, spotting volume fluctuation, and so on are described in great detail by Schuchhardt et al (2000). Although some of the systematic variations may be reduced by including various controls (Schuchhardt et al 2000), random fluctuations in various stages can not be controlled and can accumulate quickly in a complicated assay. Furthermore, the complexity of brain tissue will also lead to significant variations in brain tissue dissection, further reducing the accuracy of microarray experiments. The two-color assays should produce more accurate results because variations in spot size and cDNA probe amount on the chip should not change the signal ratio, as signals are derived from the same spot. However, this is only true if signals are well above the background in both signal detection channels. The signal level for most of the tightly regulated genes will be close to the background level, and the fluctuation in spot size and probe amount in a spot will still significantly change the signal ratio from two samples. In addition, background level in a slide can also vary significantly from spot to spot due to factors such as unevenness in slide surface property, dust contamination, and incomplete washing, leading to high signal variability for low-abundance mRNA species even in the two-color assay system. Unfortunately, despite the high variability of the microarray data, most of the published studies using microarray-based expression analysis only included very limited number of repeats, and many studies conducted the assay only once. Furthermore, many people use the arbitrary “twofold change” criteria to judge if the observed gene expression change is significant. However, the twofold threshold is not statistically valid even for duplicate experiments (Claverie 1999), and it is critical to have enough replicate microarray assays to reach reliable conclusions (Lee et al 2000). In essence, the microarray BIOL PSYCHIATRY 2000;48:1147–1156 1151 experiment should be held to the same statistical standards that apply to other biological experiments. It is also critical that the results from the microarray experiment should be verified independently by other mRNA quantitation methods such as in situ, Northern, RNA protection, TaqMan, or Invader assays. The high variability of the microarray data also means subtle changes in experimental condition may significantly alter the results, and thus it is very difficult for different labs to compare experimental data. The lack of standard controls, the predominant use of relative signals (ratios), and the adoption of incompatible data formats also contribute to poor comparability between studies. Some of these issues, such as the use of standard controls and the design of compatible data formats, have been discussed in several recent meetings to improve data compatibility. New Directions in Microarray Technology Microarray technology has undergone extensive development in array format, detection, and printing methods in recent years. In addition to the flat-surface glass or silicon chips, supporting materials such as microscopic beads, nanochannel glass, 96-well microtiter plates, microelectrode array, and phototransistor arrays are also used for depositing nucleic acid material. One of the most promising approaches is the microscopic bead– based array, as it offers high sensitivity, flexibility, and many replicates in one assay (Walt 2000). Bead-based approaches do not use spatial location as the key for oligonucleotide probe identity, in contrast to flat-surface chips. Different oligonucleotides are covalently attached to beads coded by unique fluorescent dye combinations. Fluorescent-labeled nucleic acid samples are then used to interrogate a mixture of dye-coded beads. After that, there are two approaches to obtain the experimental results. BeadArray (Illumina, San Diego) assembles bead arrays by sedimentation on an optical fiber substrate containing 5000 –50,000 individual fibers. Each fiber contains a well that will accommodate one bead. The identity of the beads and the hybridization results can be easily decoded by analyzing images recorded from several excitation-emission wavelength combinations. In contrast, the Suspension Arrays developed by Luminex (Austin, TX) use a microsphere-based flow cytometric assay and beads are read one by one by laser beams. Up to 20,000 beads can be read in 1 sec. Another interesting development is the use of nanochannel glass slides for array printing. Nanochannel glass materials are unique glass structures containing a regular geometric array of parallel holes or channels as small as 33 nm in diameter or as large as several micrometers in diameter (Beattie 1998). As a result, the surface area of nanochan- 1152 S.J. Watson et al BIOL PSYCHIATRY 2000;48:1147–1156 nel glass is much greater than that of regular glass, enabling larger amounts of DNA material to be deposited in each spot. The hybridization kinetics are also greatly improved due to the “flow through” property of the chip. Furthermore, the wave-guide effect of the nanochannels enables fluorescent signals inside the nanochannels to be detected, particularly if the scanner has good depth of field. Gene Logic (Gaithersburg, MD) reported that on a CCD camera-based scanner their nanochannel glass-based “Flow-Thru” chip increases hybridization signals by up to 44-fold. Although the predominant method for microarray signal detection is still based on fluorescence, many other new methods also show promise. Radioactive probes have the advantage of high incorporation efficiency, high sensitivity, and low cost. They were not used in high-density microarrays due to the lack of high-resolution imaging methods. However, the new Micro Imagers manufactured by Biospace (Paris) have spatial resolutions from 15 to 20 m for radioactive probes, suitable for detecting signals from microarrays with dot sizes around 150 m (BIOSPACE 1999). Ratio analysis between two samples on the same chip is possible because those Micro Imagers also have the ability to separate the 3H signal from those produced by 14C, 35S, or 33P. Detection methods based on oxidation-reduction reaction (CMS 1999), resonance light scattering (Yguerabide and Yguerabide 1998), capacitance change after hybridization, resonance ionization mass spectrum methods (Whitaker 1999), and the nanoparticlepromoted silver staining detection (Taton et al 2000) have also been reported, and some of them have already found niches in low-density clinical genotyping chips. Probably the most significant advance in the area of array making is the use of programmable digital light processors for directing in situ oligonucleotide synthesis. Similar to the Affymetrix approach (cf. Watson and Akil 1999), light is used to direct the addition of specific nucleotides at defined locations on a chip. However, the costly masks used in the Affymetrix method are replaced by a light projector based on the digital light processor, which will project mask images created by computer programs onto the chip surface. These methods have the potential of greatly reducing the cost and increasing the flexibility of high-density oligonucleotide chips (Garner 2000; Singh-Gasson et al 1999; Zhou et al 1999). Another increasingly important chip-making method involves the adoption of inkjet printing technology (Blanchard et al 1996). When used for oligonucleotide deposition, it can work with many different chip surfaces due to its noncontact nature. It can also be used for in situ oligonucleotide synthesis, thus greatly increasing the flexibility while reducing the cost of oligonucleotidebased microarrays. Data Analysis Issues Experiments using microarray technology generate vast amounts of data, and methods for the management and analysis of these data are under intensive investigation. The data management and analysis problems for microarrays can be divided roughly into four stages: data collection, data storage, image analysis, and knowledge discovery. The data collection stage will record the biological properties of the samples, the sample tracking file created during the chip-making process, and the image file generated by the scanner, along with data on other experimental conditions and procedures collected by the laboratory information management software systems. Ideally, various controls and standards included in the experimental process should allow downstream data analysis programs to compare experimental results across different sources. During the data storage stage, the above data, as well as data from downstream analysis, should be stored in a format and location that will allow easy comparison and analysis among different groups. The image analysis stage will convert the image file produced by microarray scanners or phosphoimagers to numerical signal intensity values that can be used for knowledge discovery. Usually, the sample-tracking file is also merged with the processed image data to assign gene identity to each data point. In the knowledge discovery stage, the goal is to extract from the massive microarray data the significant information on the changes in individual genes and alteration in patterns and relationships among the various genes. In the case of expression arrays, the most frequently asked questions concern the identification of up- and downregulated genes, patterns of gene expression, suggested functional role of unknown genes, and correlation among different genes and experimental conditions. There are already tools at different development stages for these problems. Ideally, one should also be able to compare experimental results from different platforms and different systems, to integrate knowledge from literature or clinical data bases, and to automatically update the data mining results with new information from various data bases from time to time. Generally speaking, data collection and image analysis are the relatively mature part of the microarray data analysis problem. There are already many commercially available software packages or free software packages that can handle these issues reasonably well. Our discussion below will largely focus on data storage and the knowledge discovery issues. Currently, there are dozens of proprietary microarray data base schemes in different laboratories, in addition to three major public microarray data repositories: the Gene Expression Omnibus at the National Center for Biotechnology Information (NCBI), the GeneX data base at the The “Chip” as a Specific Genetic Tool National Center for Genome Resources (NCGR), and the ArrayExpress data base at the European Bioinformatics Institute (EBI). Few of them can “talk to” each other, and it is impossible to conduct data analysis across different data bases. The adoption of identical data structure and sample description language for microarray experiments is critical for the comparability of data from different sources. Equally important is the use of common standards for data normalization, quality control, and cross-platform comparison. An international conference at EBI has already been devoted to these topics. A series of recommendations has been published, and five working groups were set up to develop 1) standards in experiment description and data representation; 2) microarray data extensible markup language (XML) exchange format; 3) ontologies for sample description; 4) normalization, quality control, and cross-platform comparison; and 5) data-query language and data-mining approaches (MGED 2000). The use of XML for the formal description and annotation of microarray data and experiments will greatly facilitate data exchange in the future. Currently there are at least two sets of XMLs developed by NCBI (NCBI 1999) and NCGR-EBI (NCGR 2000). Hopefully we will see a unified standard for data representation, description, and exchange in the near future after the differences among these major public data bases are settled. Discovering new knowledge from the microarray data can be pursued on several different levels. At the simplest level, the up- and downregulated genes can be easily identified from microarray experiments. Indeed, this may be the question that is most frequently asked when biologists conduct experiments using microarray technology. Almost all the microarray data analysis software packages have this capability, and the major difference among various packages is in the way they visualize the analysis results. In fact, one can also use desktop spreadsheet or data base programs to implement such analysis and generate graphic reports easily. However, although the simple fold-of-change analysis is useful in many situations, it hardly touches the rich information embedded in the microarray data. Since microarray experiments determine expression levels of thousands of genes, it would be useful to find higher order relationships or hidden patterns among these genes. Clustering together genes that exhibit similar expression patterns across multiple experiments is one way of revealing such a relationship. Such an analysis would help understand the regulatory mechanisms underlying the change in expression levels. One may also get some idea about the functional role of unknown genes by the known genes in the same cluster. Since the first cluster analysis of microarray data by Eisen et al (1998), it has become a routine method for grouping coregulated and/or functionally sim- BIOL PSYCHIATRY 2000;48:1147–1156 1153 ilar genes based on microarray data. Currently, many variations of clustering algorithms are being used in microarray data analysis. The most popular hierarchic clustering makes no assumption about the biological properties of the genes involved, and it has been shown to be a good tool for discovering genes with similar functions and inferring functional roles of unknown genes both in yeast and in mammalian cells. However, this method is not statistically very robust, and its results can be strongly influenced by outliers. In addition, gene expression patterns are not inherently hierarchic. Another clustering method that has been successfully used in the analysis of hematopoietic differentiation is the self-organizing map method, which allows the experimenter to impose partial structure on the data and test different hypotheses. The self-organizing map method is also reported to be more robust and accurate (Toronen et al 1999). Yet another way to cluster gene expression is the support vector machine method (Brown et al 2000), which is a supervised learning technique that uses training gene data sets assembled based on current knowledge to specify in advance which genes should cluster together and which genes should not be assigned to a given functional class. This method is reported to be more accurate in identifying genes with common functions based on microarray expression data. It can also identify outliers and is more robust for large data sets. A systematic comparison of various clustering algorithms, including the K mean (a type of partitioned clustering in which the number of clusters [K] is defined before the calculation) and the Bayesian clustering methods (which allow one to factor expertise and prior knowledge into computation), and different ways of calculating the similarity matrix between genes for expression data analysis are urgently needed. Principal component analysis is a way to reduce a large data set to a more meaningful, smaller set of variables. Genes that are coexpressed with one another and are also largely independent of other subsets of expression patterns are combined into factors. They are thought to be representative of the underlying biological process that has created the correlation among different genes. For example, genes that share common promoter elements may largely be coregulated in some situations. The identified principal components can also be used for other analysis, such as clustering. Classification is another way of analyzing microarray data. It uses microarray analysis data sets from multiple samples in known categories, such as cancer and normal tissues, to extract unique and reliable expression patterns or “predictors” for samples in a particular category. The unique patterns discovered in such analyses can then be used to predict the properties of unknown samples. These patterns will also be helpful for understanding the molecular mechanisms 1154 BIOL PSYCHIATRY 2000;48:1147–1156 underlying the differences among tested samples (Golub et al 1999). Uncovering the possible connections between gene expression level and genomic structure will be the next level of challenge. Finding the relationship between promoter structure and the expression pattern of the corresponding gene will be among the first steps to understanding the gene regulation process. A computational method for finding the consensus promoter elements in the promoter regions of coregulated genes has already been reported (van Helden et al 1998; Zhang 1999). It was tested in yeast, but human and mouse genomes may quickly be amenable to such analyses, given the rapid progress of the genome projects. There are also efforts devoted to deducing the intracellular regulatory circuits in gene transcription based on microarray data (Kyoto University, Institute for Chemical Research 2000; McAdams and Arkin 1998; Yuh et al 1998). Although it will be extremely difficult to conduct such analyses in multicellular organisms, knowledge gained from unicellular organisms will certainly help the understanding of basic transcription programming in higher organisms. Analyzing the relationships between gene expression data and protein expression data and/or single nucleotide polymorphism analysis results may also help to elucidate various regulatory mechanisms and interactions in biological processes. A bigger challenge will be the discovery of hidden patterns in or associations between gene expression data and data in existing knowledge data bases such as the current scientific literature and the medical data bases. Although there are extensive efforts to mine literature data bases, it will be a daunting task to implement such analyses for new microarray data on a regular basis. Enormous computing power, extremely fast Internet connections, and data bases that can communicate with each other will certainly be required. When that goal is achieved, it is conceivable that computers, rather than scientists, will be the main contributors of new ideas. Distant Future Perspectives The current status of the “genomics revolution” might be compared to the first day that Columbus landed in the New World. He could see that there was indeed land there, but he had no idea of the size of the Americas or of the impact these continents would eventually have on the old world he knew. The seismic change in the world of biology is also at its earliest stage, with an enormous future and real complexities yet to come. We are just now beginning to see the first elements of the monumental biological puzzle—the actual sequences of the entire genomes of several species, including man. Within a very few years, not only will all of these genomes be fully sequenced, but the S.J. Watson et al motifs and patterns within them will also lay out the enormous similarities and differences of life. On the way to the goal of such deep understanding we will have to pass through several stages. These include the first stage of actual sequencing and now the second stage, in which we attempt to use the full-scale sequence data to begin to see how these thousands of genes actually respond to the entire range of life events, and how they in turn control adaptation to internal and external demands throughout the life cycle. We will also begin to appreciate the subtle choices made by nature with respect to which genes and which forms of those genes are normally found in specific tissues and under what conditions they are regulated. It then becomes possible to define the actual functions of the unique gene products, to elucidate how they interact with each other, and to describe the ways in which they are affected by the environment. These last steps are, in a sense, the real goal of biology and the key to uncovering the causes of all human illnesses, including mental ones. It is clear that the brain is the ultimate genetic system, as it is the organ that expresses the highest proportion of the genome. But it has used its genetic complexity to elaborate a level of higher order organization involving neuronal circuits that mediate much more complex functions that can vary from moment to moment and that encompass the broadest range of biological controls, from breathing to cognition. In addition, these circuits are highly dynamic, and the brain is, in essence, a learning machine. It alters its circuits and their functioning throughout the course of life, and in response to changing demands. Thus, its ultimate physical structure, its pattern of gene expression, and the parameters of its function are all vastly different from the parameters set early in life. There is much to be learned from studying the individual genes, the specific circuits, and the unique behavioral responses triggered by specific conditions. But there is little question that much of the information processing in the brain resides in patterns of activity (be it electrophysiologic, transcriptional/translational, or secretory activity) and that we have not yet begun to fathom these patterns and their coordination. The use of genomics to study one level of organization and patterning is likely to prove extremely useful, though it needs to be incorporated with other approaches (Akil and Watson 2000). Our task as molecular neuroscientists and biological psychiatrists is to take on this enormously daunting problem of understanding brain function and dysfunction, while bearing in mind that the brain, because of its circuitry and its plasticity, is a great deal more than patterns of gene expression. As genetic and cellular studies mature in specific technical ways, we should see much more precise and sensitive studies being carried out on the brain in normal and abnormal states. However, much effort will be needed to The “Chip” as a Specific Genetic Tool not only uncover static patterns of gene expression that appear associated with a disorder, but also discover the dynamics of these patterns, and how they contribute to the appearance of an illness and its course. Just now the outlines of this enormous undertaking are beginning to take shape. The diagnostic and therapeutic options we will begin to see in the next few years will likely grow in large part from these efforts and, for the first time, offer both real insights into the human brain and better tools for helping those suffering from brain-related illnesses. This study was supported in part by National Institute of Mental Health (NIMH) Grant No. 5 PO1 MH422521 (SJW, HA), NIMH Conte Grant No. L99-MH60398-2 (SJW, HA), National Institute on Drug Abuse Grants Nos. 5 RO1 DA8920 and 5 RO1 DA02265 (SJW, FM, HA), grants from the Nancy Pritzker Depression Research Network (SJW, RCT, HA), and National Institute of Diabetes and Digestive and Kidney Diseases Grant No. RO1-DK54232 (RCT). Aspects of this work were presented at the conference “Genetics and Brain Function: Implications for the Treatment of Anxiety,” March 22–23, 2000, Washington, DC. The conference was jointly sponsored by the Anxiety Disorders Association of America (ADAA), the ADAA Scientific Advisory Board, and the National Institute of Mental Health. References Akil H, Watson SJ (2000): Science and the future of psychiatry. Arch Gen Psychiatry 57:86 – 87. Amundson SA, Bittner M, Chen Y, Trent J, Meltzer P, Fornace AJJ (1999): Fluorescent cDNA microarray hybridization reveals complexity and heterogeneity of cellular genotoxic stress responses. Oncogene 18:3666 –3672. Beattie KL, inventor; Houston Advanced Research Center, assignee (1998, December 1): Microfabricated, flowthrough porous apparatus for discrete detection of binding reactions. U.S. Patent 5,843,767. Bertucci F, Bernard K, Loriod B, Chang YC, Granjeaud S, Birnbaum D, et al (1999): Sensitivity issues in DNA arraybased expression measurements and performance of nylon microarrays for small samples. Hum Mol Genet 8:1715–1722. BIOSPACE. Micro Imager. Available at: http://www.biospace. fr/Versionfr/microimager/tech_spec.html. Accessed November 28, 2000. Blanchard AP, Kaiser RJ, Hood LE (1996): High-density oligonucleotide arrays. Biosens Bioelectron 11:687– 690. Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, et al (2000): Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 97:262–267. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, et al (1998): A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73. Claverie JM (1999): Computational methods for the identification of differential and coordinated gene expression. Hum Mol Genet 8:1821–1832. CMS. Technology: How the system works. Available at: http:// BIOL PSYCHIATRY 2000;48:1147–1156 1155 www.microsensor.com/TechnologySystem.html. Accessed November 28, 2000. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, et al (1996): Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14:457– 460. Eisen MB, Spellman PT, Brown PO, Botstein D (1998): Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868. Ermolaeva OD, Sverdlov ED (1996): Subtractive hybridization, a technique for extraction of DNA sequences distinguishing two closely related genomes: Critical analysis. Genet Anal 13:49 –58. Garner HR. Digital Optical Chemistry. Available at: http:// pompous.swmed.edu/. Accessed November 1, 2000. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al (1999): Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537. Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J, et al (1997): Discovery and analysis of inflammatory diseaserelated genes using cDNA microarrays. Proc Natl Acad Sci U S A 94:2150 –2155. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, et al (1999): The transcriptional program in the response of human fibroblasts to serum. Science 283:83– 87. Jelinsky SA, Samson LD (1999): Global response to Saccharomyces cerevisiae to an alkylating agent. Proc Natl Acad Sci U S A 96:1486 –1491. Khan J, Bittner ML, Saal LH, Teichmann U, Azorsa DO, Gooden GC, et al (1999): CDNA microarrays detect activation of a myogenic transcription program by the PAX3-FKHR fusion oncogene. Proc Natl Acad Sci U S A 96:13264 –13269. Khan J, Simon R, Bittner M, Chen Y, Leighton SB, Pohida T, et al (1998): Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res 58:5009 –5013. Klevecz RR, Kauffman SA, Shymko RM (1984): Cellular clocks and oscillators. Int Rev Cytol 86:97–128. Kyoto University, Institute for Chemical Research. KEGG expression map. Available at: http://www.genome.ad.jp/kegg/ kegg2.html. Accessed November 1, 2000. Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, Hwang SY, et al (1997): Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci U S A 94:13057–13062. Lee C-K, Klopp RG, Weindruch R, Prolla TA (1999): Gene expression profile of aging and its retardation by caloric restriction. Science 285:1390 –1393. Lee ML, Kuo FC, Whitmore GA, Sklar J (2000): Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A 97:9834 –9839. Liang P, Pardee AB (1992): Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257:967–971. Livak KJ, Marmaro J, Todd JA (1995): Towards fully automated genome-wide polymorphism screening. Nat Genet 9:341– 342. Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, et al 1156 BIOL PSYCHIATRY 2000;48:1147–1156 (1999): Gene expression profiles of laser-captured adjacent neuronal subtypes. Nat Med 5:117–122. McAdams HH, Arkin A (1998): Simulation of prokaryotic genetic circuits. Annu Rev Biophys Biomol Struct 27:199 – 224. MGED. Microarray Gene Expression Database group. Available at: http://www.ebi.ac.uk/microarray/MGED/index.html. Accessed November 1, 2000. NCBI. Gene expression omnibus. Available at: http://www. ncbi.nlm.nih.gov/geo. Accessed November 28, 2000. NCGR. The Gene Expression Markup Language (GEML). Available at: http://www.ncgr.org/research/genex/geml.html. Accessed November 1, 2000. Perou CM, Jeffrery SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, et al (1999): Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A 96:9212–9217. Schena M, Shalon D, Davis RW, Brown PO (1995): Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467– 470. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW (1996): Parallel human genome analysis: Microarray-based expression monitoring of 1,000 genes. Proc Natl Acad Sci U S A 93:10614 –10619. Schuchhardt J, Beule D, Malik A, Wolski E, Eickhoff H, Lehrach H, et al (2000): Normalization strategies for cDNA microarrays. Nucleic Acids Res 28:E47. Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman MR, et al (1999): Maskless fabrication of lightdirected oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol 17:974 –978. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, et al (1998): Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297. Taton TA, Mirkin CA, Letsinger RL (2000): Scanometric DNA array detection with nanoparticle probes. Science 289:1757– 1760. Toronen P, Kolehmainen M, Wong G, Castren E (1999): Analysis of gene expression data using self-organizing maps. FEBS Lett 451:142–146. van Helden J, Andre B, Collado-Vides J (1998): Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281:827– 842. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995): Serial analysis of gene expression. Science 270:484 – 487. Walt DR (2000): Bead-based fiber-optic arrays. Science 287: 451– 452. S.J. Watson et al Wang K, Gan L, Jeffery E, Gayle M, Gown AM, Skelly M, et al (1999): Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray. Gene 229:101–108. Wang Y, Rea T, Bian J, Gray S, Sun Y (1999): Identification of the genes responsive to etoposide-induced apoptosis: Application of DNA chip technology. FEBS Lett 445:269 –273. Warrington JA, Nair A, Mahadevappa M, Tsyganskaya M (2000): Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol Genomics 2:143–147. Watson SJ, Akil H (1999): Gene chips and arrays revealed: A primer on their power and their uses. Biol Psychiatry 45:533– 543. Welford SM, Gregg J, Chen E, Garrison D, Sorensen PH, Denny CT, et al (1998): Detection of differentially expressed genes in primary tumor tissues using representational differences analysis coupled to microarray hybridization. Nucleic Acids Res 26:3059 –3065. Whitaker TJ (1999, November): Novel methods for detection of hybridization on DNA chips. Paper presented at 6th Annual Chips to Hits, Berkeley, California. White KP, Rifkin SA, Hurban P, Hogness DS (1999): Microarray analysis of drosophila development during metamorphosis. Science 286:2179 –2184. Whitney LW, Becker KG, Tresser NJ, Caballero-Ramos CI, Munson PJ, Prabhu VV, et al (1999): Analysis of gene expression in multiple sclerosis lesions using cDNA microarrays. Ann Neurol 46:425– 428. Wilson M, DeRisi J, Kristensen HH, Imboden P, Rane S, Brown PO, et al (1999): Exploring drug-induced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. Proc Natl Acad Sci U S A 96:12833–12838. Wolfsberg TG, Gabrielian AE, Campbell MJ, Cho RJ, Spouge JL, Landsman D (1999): Candidate regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Res 9:775–792. Yguerabide J, Yguerabide EE (1998): Light-scattering submicroscopic particles as highly fluorescent analogs and their use as tracer labels in clinical and biological applications. Anal Biochem 262:137–156. Yuh CH, Bolouri H, Davidson EH (1998): Genomic cis-regulatory logic: Experimental and computational analysis of a sea urchin gene. Science 279:1896 –1902. Zhang MQ (1999): Promoter analysis of co-regulated genes in the yeast genome. Comput Chem 23:233–250. Zhou X, Gao X, LeProust E, Peppllois JP, Yu P, Zhang H, et al (1999): Light-directed, programmable microarray synthesis. Nat Genet 23:84.