Download Visualizing Chromosomes as Transcriptome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Research Article
Visualizing Chromosomes as Transcriptome Correlation Maps:
Evidence of Chromosomal Domains Containing Co-expressed
Genes—A Study of 130 Invasive Ductal Breast Carcinomas
1,2
1
1
3
Fabien Reyal, Nicolas Stransky, Isabelle Bernard-Pierrot, Anne Vincent-Salomon,
4
9
9
9
9
Yann de Rycke, Paul Elvin, Andrew Cassidy, Alexander Graham, Carolyn Spraggon,
1
5
2
6
7
Yoann Désille, Alain Fourquet, Claude Nos, Pierre Pouillart, Henri Magdelénat,
3
3
3
Dominique Stoppa-Lyonnet, Jérôme Couturier, Brigitte Sigal-Zafrani,
4
3
8
Bernard Asselain, Xavier Sastre-Garau, Olivier Delattre,
1,7
1
Jean Paul Thiery, and Franc¸ois Radvanyi
1
Unité Mixte de Recherche 144, Centre National de la Recherche Scientifique; Departments of 2Surgery, 3Tumor Biology, 4Biostatistics,
Radiotherapy, 6Medical Oncology, and 7Translational Research; and 8U509, Institut National de la Santé et de la Recherche Médicale,
Institut Curie, Paris, France and 9Astra Zeneca, Alderley Park, United Kingdom
5
Abstract
Introduction
Completion of the working draft of the human genome has
made it possible to analyze the expression of genes
according to their position on the chromosomes. Here, we
used a transcriptome data analysis approach involving for
each gene the calculation of the correlation between its
expression profile and those of its neighbors. We used the
U133 Affymetrix transcriptome data set for a series of 130
invasive ductal breast carcinomas to construct chromosomal
maps of gene expression correlation (transcriptome correlation map). This highlighted nonrandom clusters of genes
along the genome with correlated expression in tumors.
Some of the gene clusters identified by this method
probably arose because of genetic alterations, as most of
the chromosomes with the highest percentage of correlated
genes (1q, 8p, 8q, 16p, 16q, 17q, and 20q) were also the
most frequent sites of genomic alterations in breast cancer.
Our analysis showed that several known breast tumor
amplicons (at 8p11-p12, 11q13, and 17q12) are located
within clusters of genes with correlated expression. Using
hierarchical clustering on samples and a Treeview representation of whole chromosome arms, we observed a
higher-order organization of correlated genes, sometimes
involving very large chromosomal domains that could
extend to a whole chromosome arm. Transcription correlation maps are a new way of visualizing transcriptome data.
They will help to identify new genes involved in tumor
progression and new mechanisms of gene regulation in
tumors. (Cancer Res 2005; 65(4): 1376-83)
Large-scale transcriptome analyses based on DNA microarrays
have facilitated the classification of cancers into biologically
distinct categories, some of which may explain the clinical
behaviors of tumors (1–3). Such analyses may also help to find
new prognostic and predictive markers (4, 5). Completion of the
initial working draft of the human genome has made it possible
to interpret transcriptome data in a new way, by directly
assigning the genome-wide, high-throughput gene expression
profiles to the human genome sequence (6). A few recent studies
have explored the relationship between the transcriptome and
the positions of genes on chromosomes. Using data from serial
analysis of gene expression (SAGE) in a range of human normal
and tumor tissues, Caron et al. (7) showed that highly expressed
genes are often found in clusters in specific chromosomal
regions, called regions of increased gene expression. By
expressed sequence tag collection data analysis, Zhou et al. (8)
have compared normal and tumor tissues in 10 different tissue
types and showed clusters of genes exhibiting increased
expression along chromosomes in tumors. Many of these
genomic regions corresponded to known amplicons. By comparing serial analysis of gene expression data for normal bronchial
epithelium, adenocarcinomas, and squamous cell carcinomas of
the lung, Fujii et al. (9) identified clusters of overexpressed and
underexpressed genes. They also showed that in squamous cell
carcinomas of the lung, about half of these clusters were located
in imbalanced chromosomal regions previously identified by
cytogenetic, comparative genomic hybridization, or loss of
heterozygosity studies. Two studies have directly explored the
relationship between DNA copy number alterations and gene
expression in breast tumors using cDNA microarrays and found
that 40% to 60% of the highly amplified genes were overexpressed (10, 11). Therefore, clustering of overexpressed genes
could be attributable, at least in part, to underlying gene copy
number alterations. The different approaches mentioned above
established a relationship between the transcriptome and the
chromosomal location of the genes by analyzing the transcriptome considering the expression of each gene individually.
The genes are subsequently grouped into chromosomal domains
according to the expression data. A second approach to explore
the relationship between the transcriptome and the organization
Note: F. Reyal, N. Stransky, and I. Bernard-Pierrot contributed equally to this work.
B. Sigal-Zafrani contributed on behalf of the Institut Curie Breast Cancer Group (see
Acknowledgments).
Supplementary data for this article are available at Cancer Research Online
(http://cancerres.aacrjournals.org/). The original figures of the article as well as the
supplementary data can be found at http://microarrays.curie.fr/publications/
oncologie_moleculaire/breast_TCM/.
Requests for reprints: François Radvanyi, Unité Mixte de Recherche 144, Institut
Curie-CNRS, 26 rue dVUlm, 75248 Paris Cedex 05, France. Phone: 33-1-42-34-63-39; Fax:
33-1-42-34-63-49; E-mail: [email protected].
I2005 American Association for Cancer Research.
Cancer Res 2005; 65: (4). February 15, 2005
1376
www.aacrjournals.org
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Transcriptome Correlation Map of Breast Carcinomas
of the genome was developed first on budding yeast transcriptome data (12) and subsequently on Drosophila transcriptome data (13). This transcriptome data analysis approach
searches for groups of neighboring genes that show correlated
expression profiles. Both groups identified chromosomal
domains of similarly expressed neighboring genes. In normal
cells, these chromosomal domains of co-regulated genes may
represent chromatin-regulated regions or groups of genes
regulated by the same transcription factor(s). We developed a
similar computational method for the analysis of transcriptome
data to identify chromosomal domains containing co-expressed
genes in cancer and applied it to a series of 130 invasive ductal
breast carcinomas.
Materials and Methods
Patients and Breast Tumor Samples. We analyzed the gene
expression profiles of 130 infiltrating ductal primary breast carcinomas.
These carcinomas were obtained from 130 patients who were included
in the prospective database initiated in 1981 by the Institut Curie Breast
Cancer Group between 1989 and 1999. The flash-frozen tumor samples
were stored at 80jC immediately after lumpectomy or mastectomy. All
tumor samples contained >50% cancer cells, as assessed by H&E
staining of histologic sections adjacent to the samples used for the
transcriptome analysis. The clinical data for the patients and the
histologic characteristics of the tumors are summarized in Supplementary Table S1. This study was approved by the institutional review
boards of Institut Curie.
RNA Extraction and Microarray Data Collection. RNA was extracted
from all tumor samples by the cesium chloride protocol (14, 15). The
concentration and the integrity/purity of each RNA sample were measured
using RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, CA) and the
Agilent 2100 bioanalyzer. The DNA microarrays used in this study were the
Human Genome U133 set (HG-U133, Affymetrix, Santa Clara, CA),
consisting of two GeneChip arrays (A and B), and containing almost
45,000 probe sets. Each probe set consisted of 22 different oligonucleotides
(11 of which are a perfect match with the target transcript and 11 of which
harbor a one-nucleotide mismatch in the middle). These 22 oligonucleotides were used to measure the level of a given transcript. Details of the RNA
amplification, labeling, and hybridization steps are available from http://
www.affymetrix.com. Chips were scanned and the intensities for each probe
set were calculated using Affymetrix MAS5.0 default settings. The mean
intensity of the probe sets for each array was set to a constant target value
(500) by linearly scaling the array signal intensities.
Selection of Probe Sets: Attribution of a Unique Probe Set to Each
Gene. The fact that some genes are represented by several probe sets could
introduce artifacts when looking for neighboring genes with similar
expression patterns because the intensities of these probe sets are highly
correlated. To avoid this artifact, probe sets corresponding to uncharacterized expressed sequence tags were removed from the analysis, and when
several probe sets corresponded to the same gene (i.e., if different probe sets
had the same title, the same GenBank ID or belonged to the same UniGene
Cluster), a single probe set was kept for the analysis. Probe sets with an
‘‘_at’’ extension were preferentially kept because they tend to be more
specific according to Affymetrix probe set design algorithms. Probe sets
with an ‘‘_s_at’’ extension were the second best choice followed by all other
extensions. When several probe sets with the same extension were available
for one gene, the one with the highest median value was kept. From an
initial list of f45,000 probe sets, we kept 16,215 ‘‘unique’’ probe sets
corresponding to a unique gene. Due to the univocal correspondence
between these 16,215 probe sets and genes, we will use the terms ‘‘probe
set’’ and ‘‘gene’’ indifferently.
Chromosomal Location of the Probe Sets. Each of the 16,215 probe
sets represents a unique gene. Their genomic locations were obtained
from the U133 annotation files from Affymetrix. When the position of the
www.aacrjournals.org
probe set was not available, it was obtained using the basic local
alignment search tool–like alignment tool program (16) for sequencematching searches of probe set–specific target sequences (an f600
nucleotide sequence from which probe set oligonucleotides are derived)
against the University of California Santa Cruz Human Genome Working
Draft sequence, or by using the position of their corresponding UniGene
Cluster (Homo sapiens UniGene Build 164). All positions refer to the July
2003 Human Genome Working Draft.
Calculation of Transcription Correlation Scores and Identification
of Neighboring Genes with Correlated Expression. A similar method to
those described in yeast and Drosophila (12, 13), based on transcriptome
array data, has been developed in our laboratory (by N. Stransky) to evaluate
the correlation between the expression profile of each gene and those of its
neighbors. For each probe set (gene), we calculated a score, which we called
the transcriptome correlation score. This score is the sum of the Spearman
rank order correlation values in the tumor samples between the RNA levels
of this gene and the RNA levels of each of the physically nearest 2n genes (n
centromeric genes and n telomeric genes).
To determine a significance threshold (i.e., a score above which a gene is
considered to have a similar expression pattern to its neighbors), we created
100 random data sets of the same size by randomly ordering the 16,215
probe sets on the genome. For each random set, transcriptome correlation
scores were calculated for each probe set. The significance threshold was
the 500th quantile of the distribution (i.e., the value for which 1 of 500 probe
sets in the random data sets were above this value). Probe sets with a score
exceeding the threshold are consequently significantly correlated with their
neighbors within a number of 2n probe sets at P < 0.002 and are called
‘‘correlated probe sets.’’
To determine the appropriate number (2n) of neighboring genes needed
to calculate the transcriptome correlation score for each gene of our data
set, the total number of genes with a score above the threshold was
calculated as a function of 2n (13) for several values ranging from 2 to 34
(Supplementary Fig. S1). Above 2n = 20, this number reaches a plateau.
Therefore, 20 neighboring genes were used to calculate the transcriptome
correlation score.
For each of the 16,215 probe sets used, we calculated the transcriptome
correlation score. For each chromosome, we obtained a diagram (the
transcriptome correlation map) representing this score for each probe set,
organized according to its chromosomal position.
Correlation between Adjacent Groups of Correlated Genes. To
determine if there was a correlation between adjacent groups of correlated
genes, we analyzed the correlated probe sets by one-way unsupervised
hierarchical clustering on samples (17) and by leaving the probe sets
organized according to their chromosomal position.
Software Used for Data Analyses. Statistical analysis and all
calculations were done using R 1.9.0 (http://www.r-project.org). We used
the HKIS software (http://isoft.free.fr/hkis/) to look up gene positions in
public databases and for data formatting. We used Java TreeView 1.0.5
(http://jtreeview.sourceforge.net/) to make representations of Eisen clusters
(17) obtained with Cluster 3.0 (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/
software/cluster/).
Results
Generation of a Transcriptome Map Based on the
Correlated Expression of Neighboring Genes (Transcriptome
Correlation Map) in Infiltrating Ductal Breast Carcinoma.
The RNA expression data of 130 ductal invasive breast
carcinomas were obtained using the Affymetrix U133 set. Of the
45,000 probes in the initial set, 16,215 probe sets, each
corresponding to a unique gene, were kept to establish
chromosomal transcriptome maps (see Materials and Methods).
These maps highlight genes that show a correlated expression
with their neighbors. The transcriptome correlation maps for
genes located on chromosomes 1 and 2 are shown in Fig. 1. The
significance threshold was defined as the 500th quantile of the
1377
Cancer Res 2005; 65: (4). February 15, 2005
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Cancer Research
Figure 1. Transcriptome correlation map of chromosomes 1 and 2 in infiltrating ductal breast carcinoma. The transcriptome correlation score (TC score) for a
given gene indicates the strength of the correlation between the expression of this gene and the expression of the 20 neighboring genes (see Materials and Methods).
The transcriptome correlation map shows the scores for the different genes as a function of their position along the chromosome. Left, transcriptome correlation
maps of chromosome 1 and chromosome 2. Dashed line, significance threshold: Genes with scores above this threshold are considered to have a significantly
correlated expression with that of their neighbors. This significant threshold was calculated using random data sets (see Materials and Methods).
Right, examples of random data set for chromosome 1 and chromosome 2.
distribution of the resulting 1.6 million transcriptome correlation
scores in the random data sets and was equal to 3.38. Examples
of a random data set for chromosomes 1 and 2 are given in Fig. 1
(right panels). The transcriptome correlation maps of the different
chromosomes, including chromosomes 1 and 2, showed that
genes with a transcriptome correlation score above the threshold
were not distributed uniformly along the chromosomes: groups of
adjacent genes with correlated expression could be seen (Fig. 1
and Supplementary Fig. S2 and Table S2). Interestingly, the genes
corresponding to the three well-known amplification regions in
breast cancer [8p11-p12 (FGFR1 locus), 11q13 (CCDN1 locus), and
17p12 (ERBB2 locus)] were present in chromosomal regions
containing genes with transcriptome correlation scores higher
than the threshold (Fig. 2).
Large Chromosomal Domains of Co-expressed Genes.
Overall, 20% of the genes had a significant transcriptome correlation
score (i.e., their expression was correlated with that of their
neighbors). The percentage of genes with a significant transcriptome
Cancer Res 2005; 65: (4). February 15, 2005
correlation score differed significantly between the different
chromosome arms (Fig. 3). The chromosome arms with the highest
percentages (higher than 30%) were 1q (243 of 750 genes), 8p (82 of
207 genes), 8q (185 of 364 genes), 14q (174 of 527 genes), 16p (179 of
379 genes), 16q (110 of 318 genes), 17q (303 of 704 genes), and 20q
(101 of 304 genes). Beside chromosome arm 14q, all these
chromosome arms are also the most frequent locations of genomic
alterations in breast cancer. We analyzed chromosome arms 1q, 8p,
8q, 16p, 16q, and 17q in more detail (Figs 4–6 and Supplementary
Fig. S4). For this analysis, we considered only genes with a
transcriptome correlation score above the threshold (Figs. 4A-6A
and Supplementary Fig. S4A). The expression patterns of these
genes, ordered according to their chromosomal location, were
examined in the 130 breast cancer samples by using an unsupervised
hierarchical clustering representation (one-way clustering on
samples). Very similar expression patterns were obtained for the
243 genes on chromosome 1q with a significant transcriptome
correlation score (Fig. 4B). These genes extended from PEX11B
1378
www.aacrjournals.org
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Transcriptome Correlation Map of Breast Carcinomas
(1q21.1) to ELYS (1q44), spanning a 100,677-kb region. A group of
genes with a similar expression pattern was also apparent on the
long arm of chromosome 8 (Fig. 5B) and extended to the
centromeric region of chromosome 8p [213 genes, spanning
108,140 kb from FLJ14299 (8p12) to KIAA0014 (8q24.3)]. The 8p11p12 amplified region containing the FGFR1 locus was located at the
edge of this domain. A very different expression pattern was
observed for the genes telomeric to the FGFR1 locus [54 genes,
spanning 25,203 kb from LOC84549 (8p12) to DKFZp761P0423
(8p23.1)]. The 289 genes on chromosome 16 (Supplementary Fig.
S4B) with a transcriptome correlation score higher than the
threshold corresponded to two different sets of genes. One was a
group of 110 genes all localized on 16q and spanning 43,127 kb
from VPS35 (16q11.2) to FANCA (16q24.3) and the second was a
group of 179 genes all localized on 16p, spanning 30,901 kb from
BCKDK (16p11.2) to RGS11 (16p13.3). In contrast, numerous
different expression patterns were observed on chromosome 17q
(Fig. 6B). At least six different sets of genes were found: A group of
36 genes (a) spanning 4,034 kb from TNFAIP1 (17q11.2) to ZNF207
(17q11.2), a group of 37 genes (b) spanning 4,110 kb from FLJ22865
(17q12) to SMARCE1 (17q21.2), a group of 64 genes (c) spanning
7,045 kb from KRTAP4-15 (17q21.2) to SCAP1 (17q21.32), a group of
four genes (d) spanning 78 kb from HOXB2 (17q21.32) to HOXB7
(17q21.32), a group of 74 genes (e) spanning 19,006 kb from
KIAA0924 (17q21.32) to LOC51321 (17q24.2), and a group of 88
genes ( f ) spanning 14,219 kb from SLC16A6 (17q24.2) to MGC4368
(17q25.3 (17925.3) (Fig. 6).
Discussion
Microarray technology makes it possible to monitor the
expression of thousand of genes simultaneously. Completion of
the initial working draft of the human genome has made it
possible to analyze the expression of the genes according to
their position on the genome. Comparison of the expression
patterns of adjacent genes in different expression array data sets
in yeast (12, 18), Drosophila (13), and worms (19) has led to the
identification of groups of physically adjacent genes that share
similar expression profiles. In this work, we used this new
approach to analyze large-scale transcriptome data concerning
cancer.
To identify systematically co-expressed adjacent genes along the
chromosomes, we calculated an expression correlation score for
each given gene. This score was the sum of the correlation values
between the RNA expression levels of this gene and the RNA
expression levels of each of the 20 neighboring genes (10
centromeric genes and 10 telomeric genes). Using the U133
Affymetrix transcriptome data for a series of 130 invasive ductal
Figure 2. Transcriptome correlation maps of regions localized on chromosomes 8, 11, and 17 containing known amplicons in infiltrating ductal breast carcinoma.
The transcriptome correlation maps of chromosome 8 (region p21.2 to q11.21; left), chromosome 11 (region q11 to q14.1; middle ), and chromosome 17 (region q11.2
to q21.31; right ) are shown. Vertical bars, position of the centromeres. Bold squares, transcriptome correlation scores (TC score) for FGFR1, CCND1,
and ERBB2.
www.aacrjournals.org
1379
Cancer Res 2005; 65: (4). February 15, 2005
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Cancer Research
Figure 3. Percentage of genes with a
transcriptome correlation score (TC score)
higher than the threshold for each
chromosomal arm in infiltrating ductal
breast carcinoma. For each chromosomal
arm, the number of genes with a
transcriptome correlation score higher than
the significant correlation score divided
by the total number of considered genes
on the chromosome arm was calculated
and expressed as a percentage.
breast carcinomas, we constructed a genome-wide map that we
called the transcriptome correlation map. This map highlights the
‘‘correlated genes’’ (i.e., genes that share a similar expression profile
with their neighbors).
We have found in this series of breast tumors that f20% of
the genes showed a significant correlation with their neighbors.
The transcription correlation maps of the different chromosomes revealed regions with a high percentage of correlated
genes separated by regions containing genes not significantly
correlated with their neighbors. The precise physical definition
of the correlated regions was not always straightforward as a
continuum between the correlated regions was observed in
some cases.
We addressed the effect of the gene densities on the percentage
of correlated genes in the transcription correlation map. Unlike
for the regions of increased gene expression described by Caron
et al. (7), we observed no systematic correlation between gene
density and the percentage of genes that are over the threshold
on the transcription correlation map (Supplementary Fig. S3).
Genetic or nongenetic mechanisms could account for the
correlation in expression between neighboring genes. Aneusomy
will affect in the same way the expression of a series of adjacent
genes not subjected to gene dosage compensation. This has been
described both for DNA losses and gains in yeast (20) and in
humans (10, 11, 21, 22). Nongenetic mechanisms could also affect
the expression of neighboring genes. Several models or combinations of models have been proposed: long range effect of
transcription factors, chromatin structure modification, and
increased concentration of components of the transcriptional
machinery due to a particular subnuclear location of chromosomal segments (23). Genomic alterations most likely explain part
of the correlation between neighboring genes in breast tumors
because, except for chromosome arm 14q, the chromosome arms
presenting the highest percentage of correlated genes (8q, 51%;
16p, 47%; 17q, 43%; 8p, 40%; 16q, 35%; 20q, 33%; 1q, 32%) were
also known to harbor frequent chromosome imbalance, as shown
by karyotypic, comparative genomic hybridization, or loss of
heterozygosity studies (24–30). Two well-characterized amplicons,
the FGFR1 amplicon at 8p11-p12 and the ERBB2 amplicon at
17q12, corresponded to regions presenting a high percentage of
correlated genes. Very recent data on breast cancer cell lines
Cancer Res 2005; 65: (4). February 15, 2005
suggest that FGFR1 is not the driving oncogene in the 8p11-p12
amplicon. Several new candidate oncogenes located in this region
have been suggested to play a causal role in breast cancer
progression in tumors with an amplification in the 8p11-p12
Figure 4. Chromosomal domains of co-expressed genes on chromosome 1q. A ,
transcriptome correlation map of chromosome 1. B, unsupervised hierarchical
cluster analysis of 130 infiltrating ductal breast carcinomas using the genes on
the long arm of chromosome 1 with a significant transcriptome correlation
score (TC score ). Each row corresponds to one tumor sample and each column
corresponds to one gene. The genes are arranged in cytogenetic order from
the centromere to the q telomere: red, high level of expression relative to the mean
expression level in the 130 tumor samples; green, low level of gene expression.
The genes under line a correspond to the genes contained in rectangle a of A.
1380
www.aacrjournals.org
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Transcriptome Correlation Map of Breast Carcinomas
correlated in these regions. This percentage is in good
agreement with the results of Hyman et al. (10) and Pollack
et al. (11), who showed that f50% of the genes within an
amplified region are overexpressed.
We found chromosomal regions harboring genes with
correlated expression in regions known to contain amplicons
like 8p11-p12, 11q13, and 17q12 in regions exhibiting frequent
single chromosome arm gain like 1q, 8q, and 16p, and in
regions exhibiting chromosomal losses like 16q. To determine
the exact genetic mechanisms responsible for the correlation in
expression between adjacent genes, it will be necessary to
obtain, in parallel with transcriptome data, genome data like
those obtained by comparative genomic hybridization arrays
(35). The combination of comparative genomic hybridization
array data and transcription correlation map analysis should
help us to identify genes involved in tumor progression. Groups
of genes with correlated expression were rarer but also present
in regions not affected by genetic alterations in breast cancer
like chromosome arms 2p or 9q. Our new approach for
analyzing the transcriptome in tumors may pinpoint genes
involved in tumor progression, the expression of which is
altered by nongenetic mechanisms.
Using hierarchical clustering and a Treeview representation,
we were able to show that regions with genes with similar
expression patterns can extend over very large chromosomal
Figure 5. Chromosomal domains of co-expressed genes on chromosome 8.
A, transcriptome correlation map of chromosome 8. B, unsupervised
hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using
the genes of chromosome 8 with a significant transcriptome correlation
score (TC score). Each row, one tumor sample; each column, one gene.
The genes are arranged in cytogenetic order from the p telomere to the
q telomere: red, high level of expression relative to the mean expression level in
the 130 tumor samples; green, low level of gene expression. The genes
under line a (or b) correspond to the genes contained in
rectangle a (or b) of A.
region. Of the 15 genes with a significant transcriptome
correlation score from this region, nine (HTPAP, KIAA0725,
FGFR1, BAG4, LSM1, RCP, BRF2, PROSC, and FLJ14299) have
already been proposed to be potential oncogenes (31). The ERBB2
region at 17q12 is of particular interest in breast cancer.
Amplification and overexpression of ERBB2 is associated with
poor clinical outcome and is found in 15% of infiltrating breast
carcinomas. Moreover, ERBB2 gene amplification determines the
response to specific antibody-based therapy (trastuzumab; ref. 32).
All seven of the genes located in the minimal ERBB2
amplification region (280 kb) that presented an overexpression
associated with amplification (STARD3 , PNMT, TCAP,
CAB2 , ERBB2 , MGC14832, and GRB7; ref. 33) corresponded to
correlated genes in the 17q12 region. The 11q13 region is often
amplified in breast cancer, and CCND1 was initially thought to be
the main oncogene in this region. Although CCND1 was one of
the correlated genes, many other genes in this region had similar
or higher transcriptome correlation scores, possibly because
multiple mechanisms for increasing the expression of CCND1
occur frequently in addition to amplification (33) and/or due
to the complexity of the 11p13 amplification region, which
probably contains several different amplicons (34). It should be
noted that within the different regions rich in correlated genes,
in particular 11q13 and 17q12, f50% of the genes were found
to be correlated, meaning that the other 50% of genes are not
www.aacrjournals.org
Figure 6. Chromosomal domains of co-expressed genes on chromosome
17q. A, transcriptome correlation map of chromosome 17. B, unsupervised
hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using
the genes on the long arm of chromosome 17 with a significant transcriptome
correlation score (TC score ). Each row corresponds to one tumor sample
and each column corresponds to one gene. The genes are arranged
in cytogenetic order from the centromere to the q telomere: red, high level
of expression relative to the mean expression level in the 130 tumor samples;
green, low level of gene expression. The genes under line a (or b , c , d, e, f)
correspond to the genes under line a (or b, c , d, e, f ) of A .
1381
Cancer Res 2005; 65: (4). February 15, 2005
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Cancer Research
domains, sometimes involving a whole chromosome arm. This
phenomenon could be due to genetic or nongenetic mechanisms. Changes in whole chromosome gene expression due to
aneuploidy have already been described in yeast and in human
(20–22). Modification of gene expression involving large chromosomal domains or even entire chromosomes could also be
due to epigenetic mechanisms (36).
The ‘‘transcriptome correlation map’’ approach has two strong
points: (a) although potentially very useful, knowledge about gene
expression in normal tissue is not compulsory. This is particularly
useful for tumors for which the normal counterparts are difficult
to obtain, such as breast or ovarian carcinomas, or when the
cellular origin of the tumor is unknown, like Ewing tumors,
synovialosarcoma, or medulloblastomas; (b) it will be possible to
compare the transcriptome correlation maps between different
groups of samples even if the data for the different groups are
obtained on different platforms because transcriptome correlation
map does not compare the gene expression in different groups
but the correlation of expression between neighboring genes
within a group of samples. Additionally, subsets of genes with a
significant correlation score could be used to classify tumors into
meaningful anatomoclinical groups. It should be noted that the
relationship between regions of correlated genes can occur
not only between adjacent regions but also between regions
on different chromosome arms or even different chromosomes (e.g., higher levels of expression of the genes included
in the ERBB2 cluster are associated with lower expression
levels of the genes included in the CCND1 cluster). The
systematic investigation of such relationship could help to
identify combinations of events that occur during tumor
progression.
Transcriptome correlation maps are a new way of interpreting
transcriptome data. Combined with other molecular data (i.e.,
chromosomal alteration data obtained by comparative genomic
hybridization arrays or large-scale methylation analysis; refs.
35, 37, 38), transcriptome correlation map can be applied to any
tumor type and will help to identify new genes involved in
tumor progression and new mechanisms of gene regulation in
tumors.
Acknowledgments
Received 7/29/2004; revised 10/13/2004; accepted 12/9/2004.
Grant support: Centre National de la Recherche Scientifique, Institut Curie Breast
Cancer program, European FP5 IST HKIS project, and Comité de Paris Ligue Nationale
Contre le Cancer (Laboratoire associé); Ligue Nationale Contre le Cancer fellowship
(F. Reyal and I. Bernard-Pierrot) and French Ministry of Education and Research
fellowship (N. Stransky).
The Institut Curie Breast Cancer Group: Bernard Asselain, Alain Aurias, Emmanuel
Barillot, François Campana, Patricia De Cremoux, Olivier Delattre, Veronique Diéras,
Jean-Marc Extra, Alain Fourquet, Henri Magdelénat, Martine Meunier, Claude Nos,
Thao Palangié, Pierre Pouillart, Marie-France Poupon, François Radvanyi, Xavier
Sastre-Garau, Brigitte Sigal-Zafrani, Dominique Stoppa-Lyonnet, Anne Tardivon,
Fabienne Thibault, Jean Paul Thiery, and Anne Vincent-Salomon.
The costs of publication of this article were defrayed in part by the payment of page
charges. This article must therefore be hereby marked advertisement in accordance
with 18 U.S.C. Section 1734 solely to indicate this fact.
1. Perou CM, Sorlie T, Eisen MB, et al. Molecular
portraits of human breast tumours. Nature 2000;
406:747–52.
2. Sorlie T, Perou CM, Tibshirani R, et al. Gene
expression patterns of breast carcinomas distinguish
tumor subclasses with clinical implications. Proc Natl
Acad Sci U S A 2001;98:10869–74.
3. Sorlie T, Tibshirani R, Parker J, et al. Repeated
observation of breast tumor subtypes in independent
gene expression data sets. Proc Natl Acad Sci U S A
2003;100:8418–23.
4. van’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene
expression profiling predicts clinical outcome of breast
cancer. Nature 2002;415:530–6.
5. van de Vijver MJ, He YD, van’t Veer LJ, et al. A geneexpression signature as a predictor of survival in breast
cancer. N Engl J Med 2002;347:1999–2009.
6. Collins FS, Morgan M, Patrinos A. The Human
Genome Project: lessons from large-scale biology.
Science 2003;300:286–90.
7. Caron H, van Schaik B, van der Mee M, et al. The
human transcriptome map: clustering of highly
expressed genes in chromosomal domains. Science
2001;291:1289–92.
8. Zhou Y, Luoh SM, Zhang Y, et al. Genome-wide
identification of chromosomal regions of increased
tumor expression by transcriptome analysis. Cancer Res
2003;63:5781–4.
9. Fujii T, Dracheva T, Player A, et al. A preliminary
transcriptome map of non-small cell lung cancer.
Cancer Res 2002;62:3340–6.
10. Hyman E, Kauraniemi P, Hautaniemi S, et al. Impact
of DNA amplification on gene expression patterns in
breast cancer. Cancer Res 2002;62:6240–5.
11. Pollack JR, Sorlie T, Perou CM, et al. Microarray
analysis reveals a major direct role of DNA copy
number alteration in the transcriptional program of
human breast tumors. Proc Natl Acad Sci U S A 2002;99:
12963–8.
12. Cohen BA, Mitra RD, Hughes JD, Church GM. A
computational analysis of whole-genome expression
data reveals chromosomal domains of gene expression.
Nat Genet 2000;26:183–6.
13. Spellman PT, Rubin GM. Evidence for large domains
of similarly expressed genes in the Drosophila genome.
J Biol 2002;1:5.
14. Coombs LM, Pigott D, Proctor A, Eydmann M,
Denner J, Knowles MA. Simultaneous isolation of
DNA, RNA, and antigenic protein exhibiting kinase
activity from small tumor samples using guanidine
isothiocyanate. Anal Biochem 1990;188:338–43.
15. Chirgwin JM, Przybyla AE, MacDonald RJ, Rutter WJ.
Isolation of biologically active ribonucleic acid from
sources enriched in ribonuclease. Biochemistry
1979;18:5294–9.
16. Kent WJ. BLAT—the BLAST-like alignment tool.
Genome Res 2002;12:656–64.
17. Eisen MB, Spellman PT, Brown PO, Botstein D.
Cluster analysis and display of genome-wide expression
patterns. Proc Natl Acad Sci U S A 1998;95:14863–8.
18. Kruglyak S, Tang H. Regulation of adjacent yeast
genes. Trends Genet 2000;16:109–11.
19. Lercher MJ, Blumenthal T, Hurst LD. Coexpression of
neighboring genes in Caenorhabditis elegans is mostly
due to operons and duplicate genes. Genome Res
2003;13:238–43.
20. Hughes TR, Roberts CJ, Dai H, et al. Widespread
aneuploidy revealed by DNA microarray expression
profiling. Nat Genet 2000;25:333–7.
21. Virtaneva K, Wright FA, Tanner SM, et al. Expression
profiling reveals fundamental biological differences in
acute myeloid leukemia with isolated trisomy 8 and
normal cytogenetics. Proc Natl Acad Sci U S A 2001;98:
1124–9.
22. Phillips JL, Hayward SW, Wang Y, et al. The
consequences of chromosomal aneuploidy on gene
Cancer Res 2005; 65: (4). February 15, 2005
1382
References
expression profiles in a cell line model for prostate
carcinogenesis. Cancer Res 2001;61:8143–9.
23. Oliver B, Parisi M, Clark D. Gene expression
neighborhoods. J Biol 2002;1:4.
24. Tirkkonen M, Tanner M, Karhu R, Kallioniemi A,
Isola J, Kallioniemi OP. Molecular cytogenetics of
primary breast cancer by CGH. Genes Chromosomes
Cancer 1998;21:177–84.
25. Forozan F, Mahlamaki EH, Monni O, et al.
Comparative genomic hybridization analysis of 38
breast cancer cell lines: a basis for interpreting
complementary DNA microarray data. Cancer Res
2000;60:4519–25.
26. Cingoz S, Altungoz O, Canda T, Saydam S,
Aksakoglu G, Sakizli M. DNA copy number changes
detected by comparative genomic hybridization and
their association with clinicopathologic parameters in
breast tumors. Cancer Genet Cytogenet 2003;145:
108–14.
27. Janssen EA, Baak JP, Guervos MA, van Diest PJ, Jiwa
M, Hermsen MA. In lymph node-negative invasive
breast carcinomas, specific chromosomal aberrations
are strongly associated with high mitotic activity and
predict outcome more accurately than grade, tumour
diameter, and oestrogen receptor. J Pathol 2003;201:
555–61.
28. Jong YJ, Li LH, Tsou MH, et al. Chromosomal
comparative genomic hybridization abnormalities in
early- and late-onset human breast cancers: correlation
with disease progression and TP53 mutations. Cancer
Genet Cytogenet 2004;148:55–65.
29. Kirchweger R, Zeillinger R, Schneeberger C, Speiser P,
Louason G, Theillet C. Patterns of allele losses suggest
the existence of five distinct regions of LOH on
chromosome 17 in breast cancer. Int J Cancer
1994;56:193–9.
30. Dutrillaux B, Gerbault-Seureau M, Zafrani B. Characterization of chromosomal anomalies in human
breast cancer. A comparison of 30 paradiploid cases
www.aacrjournals.org
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Transcriptome Correlation Map of Breast Carcinomas
with few chromosome changes. Cancer Genet Cytogenet 1990;49:203–17.
31. Ray ME, Yang ZQ, Albertson D, et al. Genomic and
expression analysis of the 8p11-12 amplicon in
human breast cancer cell lines. Cancer Res 2004;
64:40–7.
32. Nahta R, Hung MC, Esteva FJ. The HER-2-targeting
antibodies trastuzumab and pertuzumab synergistically
inhibit the survival of breast cancer cells. Cancer Res
2004;64:2343–6.
www.aacrjournals.org
33. Kauraniemi P, Kuukasjarvi T, Sauter G, Kallioniemi A.
Amplification of a 280-kilobase core region at the
ERBB2 locus leads to activation of two hypothetical
proteins in breast cancer. Am J Pathol 2003;163:
1979–84.
34. Ormandy CJ, Musgrove EA, Hui R, Daly RJ, Sutherland RL. Cyclin D1, EMS1 and 11q13 amplification in
breast cancer. Breast Cancer Res Treat 2003;78:323–35.
35. Pinkel D, Segraves R, Sudar D, et al. High resolution
analysis of DNA copy number variation using compar-
ative genomic hybridization to microarrays. Nat Genet
1998;20:207–11.
36. Grewal SI, Moazed D. Heterochromatin and epigenetic control of gene expression. Science 2003;301:798–802.
37. Zardo G, Tiirikainen MI, Hong C, et al. Integrated
genomic and epigenomic analyses pinpoint biallelic
gene inactivation in tumors. Nat Genet 2002; 32:453–8.
38. Huang TH, Perry MR, Laux DE. Methylation profiling
of CpG islands in human breast cancer cells. Hum Mol
Genet 1999;8:459–70.
1383
Cancer Res 2005; 65: (4). February 15, 2005
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.
Visualizing Chromosomes as Transcriptome Correlation
Maps: Evidence of Chromosomal Domains Containing
Co-expressed Genes−−A Study of 130 Invasive Ductal Breast
Carcinomas
Fabien Reyal, Nicolas Stransky, Isabelle Bernard-Pierrot, et al.
Cancer Res 2005;65:1376-1383.
Updated version
Supplementary
Material
Cited articles
Citing articles
E-mail alerts
Reprints and
Subscriptions
Permissions
Access the most recent version of this article at:
http://cancerres.aacrjournals.org/content/65/4/1376
Access the most recent supplemental material at:
http://cancerres.aacrjournals.org/content/suppl/2005/10/12/65.4.1376.DC1
This article cites 38 articles, 18 of which you can access for free at:
http://cancerres.aacrjournals.org/content/65/4/1376.full#ref-list-1
This article has been cited by 17 HighWire-hosted articles. Access the articles at:
http://cancerres.aacrjournals.org/content/65/4/1376.full#related-urls
Sign up to receive free email-alerts related to this article or journal.
To order reprints of this article or to subscribe to the journal, contact the AACR Publications
Department at [email protected].
To request permission to re-use all or part of this article, contact the AACR Publications
Department at [email protected].
Downloaded from cancerres.aacrjournals.org on June 15, 2017. © 2005 American Association for Cancer
Research.