Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ARTICLES © 2008 Nature Publishing Group http://www.nature.com/naturemedicine Stromal gene expression predicts clinical outcome in breast cancer Greg Finak1–3, Nicholas Bertos2,7, Francois Pepin1–3,7, Svetlana Sadekova2, Margarita Souleimanova2, Hong Zhao2, Haiying Chen2, Gulbeyaz Omeroglu2, Sarkis Meterissian4, Atilla Omeroglu6, Michael Hallett1,3 & Morag Park2,3,5 Although it is increasingly evident that cancer is influenced by signals emanating from tumor stroma, little is known regarding how changes in stromal gene expression affect epithelial tumor progression. We used laser capture microdissection to compare gene expression profiles of tumor stroma from 53 primary breast tumors and derived signatures strongly associated with clinical outcome. We present a new stroma-derived prognostic predictor (SDPP) that stratifies disease outcome independently of standard clinical prognostic factors and published expression-based predictors. The SDPP predicts outcome in several published whole tumor–derived expression data sets, identifies poor-outcome individuals from multiple clinical subtypes, including lymph node–negative tumors, and shows increased accuracy with respect to previously published predictors, especially for HER2-positive tumors. Prognostic power increases substantially when the predictor is combined with existing outcome predictors. Genes represented in the SDPP reveal the strong prognostic capacity of differential immune responses as well as angiogenic and hypoxic responses, highlighting the importance of stromal biology in tumor progression. Breast cancer is a major cause of morbidity and mortality in Western countries1. Disease-related mortality has declined as a result of earlier diagnoses and the availability of adjuvant therapies. The identification of individuals at increased risk of recurrence, who may then be selected for aggressive systemic therapy, remains a considerable challenge. Recent technological advances permit the genomic characterization of tumors, enhancing our understanding of cancer initiation and progression2–6. Gene expression signatures have been identified that classify breast tumors into subtypes showing distinct expression profiles associated with specific clinical characteristics2–8. Transcriptional signatures have been identified for estrogen receptor (ER)–positive (luminal), HER2-positive (ERBB2-amplified), and ER-, progesterone receptor (PR)- and HER2–negative (basal) breast cancer4,6, indicating that there exist molecularly distinct subtypes of breast cancer with distinct clinical outcomes. Genomics-based predictors of metastasis in breast cancer have been developed for clinical use9. Although some investigators have focused their analyses on specific cell types (primarily fibroblasts) or derived gene expression signatures from defined cell types studied in vitro10–14, most prognostic gene expression signatures and predictors have been derived from whole tissue consisting of tumor epithelial cells and the surrounding stroma. Samples judged to possess insufficient tumor epithelial cell content are generally excluded2,3,5,6. Under normal physiological conditions, stroma serves as an important barrier to epithelial cell transformation; the interplay between epithelial cells and the microenvironment maintains epithelial polarity and modulates growth inhibition15. However, the stromal compartment undergoes changes in response to emerging epithelial lesions and can have a key role in cancer initiation and progression15,16. These changes may include the recruitment of immune and endothelial cells providing growth and matrix remodeling factors, as well as a new blood supply promoting tumor growth and metastasis15–17. To date, the mechanisms involved are poorly understood and their relationship to disease outcome has not been addressed. Gene expression in isolated tumor stroma from clinical breast cancer samples has not been rigorously examined. Given the importance of stromal elements in modulating and driving cancer progression in animal models, it is crucial to elucidate the contribution of stroma and understand the consequences of changes in this tissue compartment as they relate to breast cancer outcome. In this study, we have used laser capture microdissection (LCM) to isolate tumor stroma and matched normal stroma from human breast cancers and have used microarrays to identify stromal gene expression signatures. To our surprise, whereas the gene expression signatures derived from whole tumors generate clusters associated with ER and HER2 status2,4,6, differential gene expression from the tumor stroma generates clusters linked to clinical outcome. 1McGill Centre for Bioinformatics, 3775 University Street, McGill University, Québec H3A 2B4, Canada. 2Molecular Oncology Group, 687 Pine Avenue West, McGill University Health Centre, Québec H3A 1A1, Canada. 3Department of Biochemistry, 3655 Promenade Sir William Osler, McGill University, Québec H3G 1Y6, Canada. 4Department of Surgery and 5Department of Oncology, 687 Pine Avenue West, McGill University, Québec H3A 1A1, Canada. 6Department of Pathology, 3775 University Street, McGill University, Québec, H3A 2B4, Canada. 7These authors contributed equally to this work. Correspondence should be addressed to M.P. ([email protected]). Received 3 October 2007; accepted 2 April 2008; published online 27 April 2008; doi:10.1038/nm1764 NATURE MEDICINE ADVANCE ONLINE PUBLICATION 1 ARTICLES Class distinction Identify differentially expressed genes between clusters, n = 53 samples IDC-stroma Logistic regression Order genes by predictive power for outcome in multivariate model with LN, ER, PR, HER2 status, n = 53 samples IDC-stroma Test classifier on independent data NKI3, n = 295, Rotterdam5, n = 286, JRH32, n = 251 and Uppsala33, n = 189 Choose minimal size classifier with maximal AUC Retrain on full data set, n = 53 samples IDC-stroma Class prediction Train predictors with increasing number of genes, n = 27 random samples IDC-stroma 0 0 90% 88% 92% 0 0 88% 0 c 85% 73% 0 73% 1.0 0 88% Test, n = 26 remaining samples Cross-validate 50 iterations Cluster 1 ER (positive) PR (positive) HER2 (positive) LN (positive) Grade (III) Recurrence (positive) Poor outcome Post-op radiation therapy (yes) Post-op hormonal therapy (yes) Post-op chemotherapy (yes) ER negative (P = 5.9 × 10–3) Recurrence positive (P = 5.98 × 10–4) Poor outcome (P = 2.04 × 10–5) Fractoin good outcome Cluster 2 2 Cluster 3 1 b Class discovery Identify genes, n = 31 matched samples Identify clusters, n = 53 samples IDC-stroma E1923 E1715 E1900 E1056 E1417 E1696 E1199 E1217 E1227 E1317 E1531 E1197 E1239 E1672 E1745 E1551 E1542 E1792 E1383 E1125 E1808 E1957 E1469 E1659 E1188 E1084 E1998 E1668 E1751 E1277 E1527 E1228 E1529 E1116 E1752 E1525 E1647 E1859 E1897 E1902 E1467 E1901 E1292 E1223 E1973 E1843 E1423 E1830 E1314 E1174 E1257 E1677 E1634 © 2008 Nature Publishing Group http://www.nature.com/naturemedicine a 0.8 P = 3.64 × 10–5 0.6 Cluster 1 Cluster 2 Cluster 3 0.4 0.2 0.0 0 40 60 20 Recurrence-free good-outcome survival time (months) Recurrence negative (P = 7.26 × 10–3) Good outcome (P = 2.30 × 10–3) Figure 1 Class discovery in tumor stroma. (a) Flow chart outlining principal steps in the construction of the SDPP. LN, lymph node. (b) Class discovery in tumor stroma samples over a basis set of the 200 most variable genes from matched normal versus tumor stroma gene expression data. Clusters in the tree are labeled with the percentage of times they were observed in 1,000 bootstrap iterations. Clinical characteristics of each tumor sample are presented in the boxes below each sample with a shaded box representing a positive status, and a strikethrough indicating no information available. Poor outcome is defined as dead of disease or alive with disease as of last follow-up. Significant associations of each cluster with clinical characteristics are presented below the relevant cluster. (c) Kaplan-Meier survival curves for samples belonging to clusters 1, 2 and 3 in b. RESULTS Tumor stroma gene expression identifies outcome-linked clusters To investigate changes in breast tumor stroma, we have combined LCM-based tissue isolation and RNA amplification with gene expression profiling using DNA microarrays as previously described18. We isolated cells from the stromal compartment within the tumor bed, and from morphologically normal tissue at least 2 mm from the tumor, from individuals with invasive breast carcinoma (Supplementary Fig. 1a and Supplementary Table 1 online). We have previously shown that such morphologically normal tissue has not undergone significant gene expression changes compared to breast reduction tissue18. We determined whether the gene expression profiles derived from tumor stroma can be used to identify cancer subtypes via a classdiscovery approach. We identified genes whose expression varied most between tumor tissue and normal stroma for the 31 tissue-matched pairs. The 200 most variable genes were used to cluster our complete data set of 53 tumor stroma samples (Fig. 1a), generating three clusters (Fig. 1b). Cluster 1 (Fig. 1b,c) shows a significantly reduced rate of recurrence (P ¼ 7.26 103, w2 test for association) and longer relapse-free survival (P ¼ 4.17 103) compared to the remaining samples, whereas cluster 2 (Fig. 1b,c) has a significantly increased rate of recurrence (P ¼ 2.04 105) and shorter relapse-free survival (P ¼ 2.87 104). Cluster 3 (Fig. 1b,c) contains a mixture of outcomes. Multivariate Cox regression indicates that the poor outcome– associated cluster 2 is independent of ER, HER2 and lymph node status, as well as age, grade and tumor size, whereas the good 2 outcome–associated cluster 1 is independent of these variables, as well as of radiotherapy, chemotherapy and hormonal therapy (Supplementary Tables 1 and 2 online). Hence, tumor-associated changes in stromal gene expression can be used to identify sample clusters distinct from previously identified breast tumor subtypes2,4,6. Outcome-linked stroma shows distinct biological responses The tripartition of samples by stromal expression profiles may represent three subtypes of breast tumor stroma (Fig. 1b). To investigate whether differences between these groups reflect distinct biological responses, we performed pair-wise comparisons of gene expression between the three clusters (Fig. 1a). From this class distinction, we identified 163 unique genes that show the greatest differential expression pattern between clusters (Fig. 2 and Supplementary Data online) and can be used to group tumors by outcome in a manner similar to that previously generated by class discovery (Fig. 2a,b). The 163-gene set was used as a starting point to characterize the differences between the good outcome– and poor outcome–associated stroma subtypes. Each subtype can be distinguished by its differential expression of genes involved in distinct biological responses, providing evidence that the subtypes reflect different combinations of underlying biologies (Fig. 2a). Gene ontology analysis of the genes with elevated expression in the poor-outcome sample cluster (Fig. 2c) identifies an enrichment for genes involved in functions and processes associated with poor outcome19,20 (Supplementary Fig. 1b and Supplementary Data). These include factors ADVANCE ONLINE PUBLICATION NATURE MEDICINE ARTICLES a b Good outcome Poor outcome 89% –6–4–2 0 2 4 6 Mixed outcome c E1056 E1647 E1174 E1897 E1634 E1417 E1317 E1696 E1199 E1257 E1239 E1197 E1659 E1227 E1217 E1900 E1715 E1423 E1223 E1745 E1672 E1901 E1531 E1314 E1551 E1542 E1677 E1859 E1469 E1292 E1188 E1529 E1830 E1973 E1668 E1843 E1525 E1923 E1751 E1277 E1527 E1467 E1116 E1752 E1228 E1902 E1998 E1084 E1808 E1125 E1957 E1792 E1383 100% 93% Fraction good outcome 1.0 0.8 0.6 0.4 © 2008 Nature Publishing Group http://www.nature.com/naturemedicine Mixed outcome Good outcome Poor outcome e Mixed outcome Good outcome Poor outcome 0.0 ER (positive) PR (positive) HER2 (positive) LN (positive) Grade (III) Recurrence (positive) Poor outcome d –3 P = 8.34 × 10 0.2 Mixed outcome 0 20 40 60 Recurrence-free good-outcome survival time (months) Good outcome ITGBL1* OGN* C21orf34* SORCS2 ADRA2A* CXCL14* FRZB* RAI2 * HOXA10* PRND F2RL2* SLC40A1* FGF18 ESR1 AR BCAN TLN2 PSCD3 SUSD3 GREB1 AREG WISP2 PIP TFF1 TFF3 PSMD1 RPL10 SCGB2A2 ACAA2 PDCD7 ZHX2 A_24_P936252 TCEA3 Poor outcome CD48 * PLEK * SOAT1 LAP3 PLA2G7 MS4A4A GIMAP5 * RUNX3 * HLA-A HLA-F IL10RA C1orf38 NCF2 COTL1 COTL1 GZMA* CD8A* CD52 * TRBV5-4 * CD3D CD247 * TRA@ CD2 XCL1 GZMB CYBB CCL13 MEI1 HCST Figure 2 Class distinction of tumor stroma. (a) Hierarchical clustering of tumor stroma samples with the 163 genes differentially expressed between clusters 1, 2 and 3 of Figure 1b. Gene clusters are labeled with significance from bootstrap analysis, and color bars represent the three gene clusters described in the text. Heat map colors represent meancentered fold change expression in log-space. (b) Kaplan-Meier curves for each of the three clusters. (c) Expanded view of the genes expressed predominantly in samples of the pooroutcome cluster. (d) Genes expressed predominantly in samples of the mixed-outcome cluster. (e) Genes expressed predominantly in samples of the good-outcome cluster. *, member of the SDPP gene set. linked to angiogenic, hypoxic and tumor-associated macrophage (TAM) responses, such as adrenomedullin (ADM), interleukin-8 (IL8), endothelin-1 (EDN1), osteopontin (SPP1) and the chemokine CXCL1 (CXCL1)7,21–25. Further supporting a link with angiogenesis, samples within the poor-outcome cluster show the highest endothelial content, as established by immunohistochemical staining for CD31 (Supplementary Fig. 2a,b online). Genes whose expression is decreased in subjects in the poor– outcome group (Fig. 2a,d) include elements involved in negative regulation of WNT receptor signaling (FRZB) and in development (HOXA10) (Supplementary Data); the loss of either of these genes has been linked to epithelial-to-mesenchymal-transition and poor outcome in some cancers26,27. The gene set expressed predominantly in the good-outcome cluster (Fig. 2e) is functionally enriched for elements of the T helper type 1 (TH1) immune response, including T cell receptor complex (CD8A, CD247, CD3D), MHC class I protein binding (CD8A) and granzyme A and granzyme B activity (GZMA) (Supplementary Fig. 1b)28–30. We observed elevated levels of CD8Aand CD247-positive cells in tumor stroma from individuals in the good outcome–linked cluster compared to those in the poor outcome–linked cluster (Supplementary Fig. 2c,d), suggesting that there is an increased recruitment of activated T cells and natural killer cells to tumors in individuals from the good-outcome cluster (Fig. 2a,e). Immunohistochemistry directed against osteopontin (SPP1) (elevated in the poor-outcome cluster) confirmed microarray results (Supplementary Fig. 2e). Expression of selected genes was successfully validated by quantitative real-time PCR (Supplementary Fig. 2f). NATURE MEDICINE ADVANCE ONLINE PUBLICATION Mixed outcome Good outcome Poor outcome IL4I1 AQP9 S100A8 S100P CLEC4E CAPS MMP12 MMP12 A_24_P6918261 (unknown) LCN2 SYTL1 CALB2 MMP7 GRB14 HRASLS * SCEL A_32_P18475 (unknown) GPR110 SPP1* IQGAP3 (AI345640) A_32_P13533 (unknown) S100A7 S100A9 A_23_P366468 (unknown) HIST1H1C SPNS2 CXCL1 MMP1 STK38L KRT23 UGCGL1 ACTG2 ACTG2 KCNK5 SCRG1 (scrapie-responsive gene 1) C1orf31 VGLL1* ROPN1 SHC4 UBE2C KIF18B FAM83D NCAPG ASPM CENPF OIP5 C6orf173 ECT2 GPR56 RDH10 MYBL1 CLIP4 FAM54A SGOL1 E2F8 CHEK1 HSPC159 ZNF165 GBP5 LCP1* CDCA7 KYNU NDC80 RIOK3 CTSL2 SLAIN1 SQLE GJD4 BXDC1 AZIN1 ATG5 GTF3C6 SRPK1 AMD1 GK CRY1 C1orf31 TACSTD1 TFEC LACTB2 ITGB8 SLC30A5 LRRCC1 ORMDL1 RP11–11C5.2 CHML KLF8 IL8 ADM * STK24 C6orf168 * SNTG2 * HTATIP2 C6orf203 C6orf117 B3GNT5 RCAN1 OXR1 EDN1 RIPK4 PERP GALNT3 Thus, we identify distinct sets of genes reflecting hypoxia and angiogenesis (linked to poor outcome) and a TH1-like immune response (linked to good outcome). Construction of a stroma-derived prognostic predictor No prognostic predictors have been derived from clinical samples of breast tumor stroma. We generated a predictor of minimal size and maximum accuracy from the 163 genes identified as discussed above. Many factors that are prognostic for breast cancer outcome, such as ER or HER2 status, can substantially affect tumor gene expression profiles2,4,6. To limit the influence of these variables, we identified genes predictive of outcome independently of these factors. We used multivariate logistic regression with ER, PR, HER2 and lymph node status as covariates to rank genes by their independent prognostic ability (Fig. 1a and Supplementary Data). To construct a predictor, we trained a multivariate naive Bayes classifier31 by using incrementally larger gene sets from this ordered list (Fig. 1a). After generating ROC curves, we assessed the performance of classifiers by comparing the areas under the curve (AUC), where a larger AUC indicates better performance. Although there were a number of predictors with similar performance and composition (Supplementary Fig. 3a online), the predictor that maximized the AUC contained 26 genes (SDPP; Fig. 1a) and performed well in data derived from tumor stroma (Fig. 3a–c), but not in data derived from tumor epithelium (Supplementary Fig. 3a). This result, as well as the inability of the SDPP to cluster tumor epithelium by outcome (Supplementary Fig. 3b–e 3 ARTICLES SDPP performance in whole-tumor data sets Previous analyses have identified predictors of outcome with gene expression data derived from whole breast tumor tissue comprising both tumor epithelium and stroma2,3,5,6. To establish whether the SDPP could predict outcome from such data, we analyzed three publicly available breast cancer data sets, the Netherlands Cancer Institute (NKI)3, John Radcliffe Hospital (JRH)32 and Uppsala33 data sets (Fig. 1a). These comprise 295, 251 and 189 invasive ductal carcinoma (IDC) samples, respectively, with mixed ER, PR, grade and lymph node status. We also analyzed a fourth data set, the Rotterdam set5, which contains 286 lymph node–negative IDC samples with mixed clinical characteristics. Only a subset of the 26 SDPP genes was available in each data set (NKI, 15; JRH, 16; Uppsala, 11; Rotterdam, 19). However, in each case, genes from each of the three biological classes were present (Fig. 4a). Despite the lack of expression data for some of the 26 SDPP genes in these data sets, the SDPP accurately predicted outcome in each tumor data set, performing significantly better than random (accuracy: NKI, 75.6%, P ¼ 3.5 1011; JRH, 73.3%, P ¼ 1.2 104; Uppsala, 59.8%, P ¼ 4.8 102; Rotterdam, 60%, P ¼ 8.0 103). This indicates that a stromal signal can be detected in expression data generated from whole tumor. In all data sets, individuals assigned to the poor-outcome group had a substantially increased risk of recurrence or death from disease when compared to individuals in other groups (Table 1, Fig. 4b and Supplementary Fig. 4a,b online), demonstrating the utility and robustness of the SDPP in whole tumor–derived data. Because all samples in the Rotterdam set were node negative5, our analysis shows that gene expression in tumor stroma is predictive of outcome before detectable lymph node involvement. E1527 E1751 E1843 E1228 E1668 E1188 E1529 E1973 E1752 E1277 E1084 E1467 E1998 E1859 E1830 E1672 E1223 E1634 E1292 E1525 E1469 E1116 E1902 E1659 E1677 E1542 E1923 E1647 E1174 E1901 E1199 E1314 E1257 E1423 E1239 E1197 E1551 E1696 E1417 E1317 E1745 E1531 E1900 E1715 E1957 E1217 E1897 E1808 E1227 E1056 E1383 E1125 E1792 Fraction event free The SDPP is an independent prognostic factor To test whether the SDPP is an independent prognostic factor in whole tumor–derived data sets, we performed multivariate Cox regression of available prognostic factors in all four data sets. The SDPP was independent of ER and HER2 status, lymph node involvement, grade, age, chemotherapy and hormonal therapy (Table 1 and Supplementary Fig. 4a,c). Of note, the hazard ratio (HR) for the poor-outcome group identified by the SDPP in the HER2-positive cohorts was, on average, 2.6 times greater than for the complete popua c 1.0 1.0 lations, indicating increased utility of the P = 1.11 × 10–16 0.8 0.8 predictor in this cohort (Fig. 4b; HRHER2/ 0.6 0.6 HRtotal ¼ 3.33, HRHER2/HRtotal ¼ 2.53 and 0.4 0.4 HRHER2/HRtotal ¼ 1.35 for overall survival AUC = 0.93 0.2 0.2 and time to metastasis in the NKI data set and relapse-free survival in the Rotterdam 0.0 0.0 0 20 40 60 0.0 0.2 0.4 0.6 0.8 1.0 data set, respectively). Recurrence-free good-outcome False-positive rate Other expression-based prognostic signasurvival time (months) tures and predictors for breast cancer outGZMA b CD8A TRBV5-4 come have been identified. The US Food –4 –2 0 2 4 CD52 CD247 and Drug Administration (FDA)–approved CD48 PLEK 70-gene predictor, developed from a subset RUNX3 GIMAP5 of the NKI cohort9, predicts metastatic proLCP1 F2RL2 SLC40A1 gression and contains genes involved in proFRZB RAI2 liferation, angiogenesis and invasion2,34. The HOXA10 ITGBL1 SDPP predicted outcome with greater accuOGN C21orf34 ADRA2A racy (75.6%, P ¼ 3.5 1011) than the CXCL14 SPP1 70-gene predictor (61.0%, P ¼ 0.457) in the HRASLS VGLL1 NKI data set and was 5.96 times more likely ADM C6orf168 to identify a true poor-outcome individual SNTG2 than the 70-gene predictor in the HER2positive cohort (positive diagnostic likelihood Probability ratio of 6.86 for SDPP versus 1.15 for 70-gene ER (positive) PR (positive) HER2 (positive) predictor; Supplementary Fig. 4d). LN (positive) Grade (III) Signatures have also been developed that Recurrence (positive) Poor outcome reflect in vitro biological responses, including Post-op radiation therapy (yes) Post-op hormonal therapy (yes) the response of cultured stromal fibroblasts Post-op chemotherapy (yes) to serum stimulation8 (wound response signature, 380 genes), a hypoxia-associated Figure 3 Construction and performance of the SDPP. (a) ROC curve for the SDPP applied to tumor stroma samples, showing the true-positive and false-positive rate, as well as the AUC. The AUC transcriptional response7 (123 genes), a corresponds to the probability of the SDPP ranking a randomly selected positive example higher than signature derived from good- versus poora randomly selected negative example. Dotted diagonal line shows the performance of a random outcome fibroblastic tumors (solitary fibrous predictor. (b) Predictions made by the SDPP in the stroma data set. Samples are ordered by the tumor versus desmoid-type fibromatosis; probability of membership in each of the three classes, whereas genes are arranged by hierarchical SFT/DTF)10 (656 genes) and an interferon clustering. Each gene cluster is color-coded as in Figure 2a. Heat map colors represent mean-centered (IFN) response signature from cocultured fold change expression in log-space. (c) Kaplan-Meier curves for the three tumor groups identified by the SDPP. fibroblasts14 (49 genes). Given that these Average true-positive rate © 2008 Nature Publishing Group http://www.nature.com/naturemedicine and Supplementary Results online) shows that the prognostic pattern of SDPP genes is specific to tumor stroma. Notably, the genes in the SDPP contain representatives from each of the three gene clusters identified in the 163-gene set, highlighting the importance of each of these biological functions in tumor stroma (Figs. 2a and 3b). 4 ADVANCE ONLINE PUBLICATION NATURE MEDICINE ARTICLES 0.6 0.4 Hazard ratio = 3.67 0.2 P = 9 × 10–5 0.0 Median = 7.67 years 0.6 –5 P = 1.36 × 10 0.8 0.6 0.4 Hazard ratio = 3.21 0.2 P = 1.1 × 10–4 Median = 4.63 years 0.0 0 5 10 15 Metastasis-free survival time (years) 1.0 –3 P = 2.11 × 10 0.8 0.6 0.4 Hazard ratio = 1.932 0.2 –4 P = 8.0 × 10 0.0 0.2 0 2 4 6 8 10 12 14 Relapse-free survival time (years) 0.5 SDPP poor outcome SDPP good outcome SDPP not used predictors and signatures arise from different sources and probably reflect different biologies, we compared their performances with that of the SDPP in the NKI data set. Notably, all of the large gene signatures showed good accuracy in predicting outcome: hypoxia (68%, P ¼ 2.1 103), wound response (68.5%, P ¼ 9.0 104), IFN (63.4%, P ¼ 0.16) and SFT/DTF (73.6%, P ¼ 1.48 108). Although the SFT/DTF signature shows similar accuracy to the SDPP, the SDPP is independent of all other predictors and signatures, including the SFT/DTF (Table 1) and the IFN signature (data not shown). Consistent with these findings, only three genes of the 656gene SFT/DTF signature overlap with the 26-gene SDPP (SLC40A1, OGN and CXCL14). Moreover, in accordance with the fibroblastderived nature of the SFT/DTF, gene ontology analysis shows that the SFT/DTF encompasses neither an immune response linked to the good-outcome SFT component nor enhanced angiogenesis or hypoxia linked to the poor-outcome DTF component (data not shown). Thus, the SDPP reflects biological processes beyond those indicated by the fibroblast-derived SFT/DTF signature, the IFN response signature and other signatures. Integration of multiple molecular predictors Published prognostic predictors for breast cancer are considered to be of limited value when used independently34,35. To test whether the SDPP adds value, we applied a graphical modeling approach (Supplementary Results and Supplementary Fig. 4e). Using the NKI data set and predictions from the 70-gene predictor, the wound response, hypoxia and SFT/DTF signatures, and the SDPP, we constructed a Bayes’ classifier of metastasis31. We found that the first four signatures each have a posterior probability of metastasis (posterior probability) below 50%, whereas the SDPP has a posterior probability of 55.8% (Fig. 4c). ADVANCE ONLINE PUBLICATION Median = 3.25 years 0 5 10 15 Overall survival time (years) 1.0 –5 P = 2.07 ×10 0.8 0.6 Hazard ratio = 8.12 –4 P = 1.3 × 10 0.4 0.2 Median = 1.73 years 0.0 0 1.0 5 10 15 Metastasis-free survival time (years) –2 P = 3.87 × 10 0.8 0.6 0.4 Hazard ratio = 2.60 0.2 –2 P = 4.6 × 10 0.0 0.7 Poor-outcome prediction Predictor not used Good-outcome prediction Hazard ratio = 12.22 P = 1.4 × 10–4 0.4 0.0 Fraction metastasis free 1.0 –6 P = 2.96 × 10 0.8 0 5 10 15 Overall survival time (years) 0.3 SFT/DTF Hypoxia 70-gene Wound SDPP NATURE MEDICINE Fraction surviving Fraction surviving 0.8 Prediction Relapse c 1.0 –7 P = 3.02 × 10 Fraction relapse free RUNX3 RUNX3 PLEK PLEK GZMA CD48 CD247 GIMAP5 GIMAP5 CD8A LCP1 CD52 CD52 CXCL14 RAI2 ADRA2A OGN FRZB FRZB HOXA10 HOXA10 ADM SPP1 VGLL1 VGLL1 HRASLS Posterior probability of metastasis © 2008 Nature Publishing Group http://www.nature.com/naturemedicine Prediction Metastasis b 1.0 Fraction metastasis free CD8A GZMA RUNX3 LCP1 CD48 PLEK CD247 HOXA10 ADRA2A F2RL2 OGN RAI2 FRZB ADM SPP1 Fraction relapse free a 0 2 4 6 8 10 12 14 Relapse-free survival time (years) Figure 4 Performance of the SDPP in publicly available breast cancer gene expression data sets. (a) Predictions of good, poor and mixed outcome for individuals in the NKI (top) and Rotterdam (bottom) data sets using the SDPP. Samples are ordered by their score from the SDPP, and genes are ordered by hierarchical clustering. Tick marks below the heat map represent metastasis or relapse events. (b) Kaplan-Meier survival analysis of the SDPP classes for overall survival (top left) and time to metastasis (middle left) of the complete NKI data set. Overall survival (top right) and time to metastasis (middle right) are also shown for the HER2-positive cohort of the NKI data set. Relapse-free survival is shown for the complete Rotterdam data set (bottom left) and the HER2-positive cohort thereof (bottom right). Relevant hazard ratios, median survival, and P values are indicated. (c) Posterior probability of metastasis calculated from the Bayes’ classifier trained on good- and poor-outcome predictions from the SDPP, 70-gene predictor, and wound, hypoxia and SFT/DTF signatures in the NKI data set. A black box indicates a poor-outcome prediction from a signature, a gray box indicates a good-outcome prediction and an empty box indicates that information from that predictor was not used. Red box, dots, cases where the SDPP predicted good outcome; green box, dots, cases where the SDPP predicted poor outcome SDPP; black box, dots, SDPP predictions were not used. Grey dotted line, the prior probability of metastasis for the case where predictor information was not used. Notably, combining the SDPP with any of these predictors improves the posterior probability in the NKI data set beyond any combination of poor-outcome predictors that does not include the SDPP (maximum 86.6% posterior probability with SDPP versus maximum 59.1% without SDPP; Fig. 4c). Additionally, we observed comparable improvements in the posterior probability when we combined the other predictors with the SDPP to predict good outcome in the NKI data set (28.2% with SDPP versus 42.7% without SDPP; Fig. 4c). These results demonstrate an interaction between the biological processes underlying the predictors and highlight the increased prognostic power that can be gained by characterization of the tumor microenvironment. DISCUSSION Here we identify subtypes of tumor stroma corresponding to goodand poor-outcome breast cancers that have not been previously recognized. Although there is an increasing awareness that stromal interactions contribute to tumor progression, previous studies have not addressed how changes occurring in tumor stroma affect disease outcome. Such insight is essential for the development of new therapeutic strategies. On the basis of differential gene expression patterns in breast tumor stroma, we have developed a 26-gene predictor (Table 2) that forecasts disease outcome with greater accuracy than predictors or signatures derived from whole tissue. This result strongly suggests that changes in breast tumor stroma have a crucial role in disease progression and outcome. A key finding of this study is that multiple, distinct biological responses are differentially present within the stroma of individuals in outcome-linked categories (Fig. 2). For example, tumor stroma samples from the good-outcome cluster overexpress a distinct set of 5 ARTICLES © 2008 Nature Publishing Group http://www.nature.com/naturemedicine Table 1 Multivariate Cox regression for overall survival in the complete NKI data set3 changes in tumor epithelium as well as stroma, the combined outputs of which are Variable P -value Hazard ratio Lower 95% CI Upper 95% CI predictive35. The gene expression profile of tumor stroma, distilled as a 26-gene SDPP, Stroma predictor (poor outcome) 0.0043** 3.055 1.419 6.58 functions as a robust predictor whose output Stroma predictor (mixed outcome) 0.014* 2.343 1.185 4.63 is correlated with clinical outcome in our ER (positive) 0.83 0.93 0.487 1.78 bank of breast cancer samples. The perforHER2 (positive) 0.12 1.536 0.888 2.66 mance of the predictor in independent, whole Grade (poorly differentiated) 0.82 1.065 0.615 1.84 tumor–derived data sets indicates that Grade (well differentiated) 0.12 0.414 0.138 1.24 although the prognostic ability of the SDPP Lymph nodes positive (44) 0.021* 2.343 1.139 4.82 is specific to tumor stroma, the signal can be Lymph nodes positive (0) 0.55 0.801 0.389 1.65 detected in and the SDPP can accurately Age (o40 years) 0.0042** 2.122 1.268 3.55 predict outcome in data sets from whole tissue Hormonal therapy (yes) 0.83 0.905 0.353 2.32 Chemotherapy (yes) 0.23 0.65 0.321 1.32 (Fig. 4). Notably, the identification of good or 70 genes (poor outcome) 0.0029** 3.847 1.587 9.32 poor outcome by the SDPP in individuals in Hypoxia signature 0.033* 1.922 1.055 3.5 the NKI, Rotterdam and other public data sets Wound signature (intermediate) 0.94 1.033 0.411 2.6 is independent of ER and HER2 status, indiWound signature (quiescent) 0.5 0.848 0.523 1.37 cating that the SDPP identifies distinct bioloSFT/DTF (DTF-like) 0.18 1.679 0.791 3.56 gical processes, rather than those associated STF/DTF (unclassified) 0.15 1.562 0.848 2.88 with clinical breast cancer subtypes. The inde*P o 0.05; **P o 0.01. CI, confidence interval. pendence of good- and poor-outcome categories with respect to treatment suggests that immune-related genes, including T cell and NK cell markers these classes reflect host responses that are not targeted by convenindicative of a TH1–type immune response (GZMA, CD52, CD247, tional adjuvant therapies (Table 1). The hazard ratio for the SDPP poor-outcome group is greater than, CD8A). This is consistent with studies correlating increased memory and activated TH1 cell content in the tumor with good outcome in and independent of, the hazard ratio for lymph node involvement, the lung and colon cancers36,37. Therefore, individuals with this gene single most clinically accepted factor currently used to predict expression pattern may benefit from treatments targeting tumor breast cancer outcome47. Moreover, the SDPP has a higher accuracy cells via the immune response, such as vaccine therapies in the than all other signatures and predictors in the NKI data set, and it adjuvant setting. shows no gene overlap and adds prognostic information when In contrast, the immune cell presence within the tumor stroma is combined with the FDA-approved 70-gene predictor. Of note, in diminished in individuals in the poor-outcome cluster. Instead, the HER2-positive cohort of the NKI data set, the SDPP is more stroma from individuals in the poor-outcome cluster shows markers likely to identify a true poor-outcome sample than the 70-gene of an increased hypoxic and angiogenic response, as well as a decrease predictor, providing increased specificity at the cost of sensitivity in chemokines that stimulate natural killer cell migration (CXCL14) (Supplementary Fig. 4d). In light of the SDPP’s improved perforand mediate prosurvival signals in T lymphocytes (GIMAP5)38,39. mance in HER2-positive tumors over the total population, and in Tumor hypoxia elicits changes in gene expression that act to modify consideration of the immune-related functions of genes associated the tumor microenvironment25,40. Individuals in the poor-outcome with good outcome, we speculate that the SDPP may provide cluster have elevated stromal expression of hypoxia-associated genes. information necessary to predict response to trastuzumab or other IL-8 enhances endothelial cell proliferation41, which is consistent with anti-HER2 therapies. Of note, the markedly improved prognostic power obtained from the increased density of CD31-positive endothelial cells in the tumor stromal bed of these individuals. Monocytes and TAMs accumulate in combining the SDPP with other signatures and predictors demonhypoxic areas of tumors, in part recruited by the action of chemoat- strates that the SDPP captures biological processes in a prognostic tractants such as CXCL1 and ET-1 induced in the hypoxic environ- space orthogonal to existing predictors (Fig. 4c). The SDPP is ment24,42. Consistent with this mechanism, we observe elevated complementary to the SFT/DTF signature. Furthermore, the small expression of genes found in the hypoxic transcriptome of monocytes size of the SDPP (26 genes) makes it more amenable to clinical use and macrophages, including ADM, MMP1 and SPP125. Several matrix than the 656-gene SFT/DTF signature. Distinct gene expression signatures in breast tumor stroma reflect metallopeptidases are also highly expressed in poor-outcome as compared to good-outcome sample clusters, including MMP12 and different clinical outcomes that are not restricted to a specific clinical MMP1, which are known to be involved in tissue remodeling by subtype. The stroma signature presented here, alone or in combinamacrophages; MMP1 is also involved in angiogenesis, invasion and tion with other molecular prognostic predictors, promises to improve metastasis43,44. Although these biological responses (hypoxia, angio- molecular classification and outcome prediction in breast cancer, genesis and differential immune response) have been associated with specifically by aiding identification of patients who may benefit poor outcome, and although TAM recruitment correlates with poor from aggressive therapies, or stratifying breast cancer subjects for prognosis in multiple cancers, including breast cancer45, their utility as clinical trials. The SDPP provides prognostic information beyond independent prognostic factors is poor19,20 or restricted to specific classical prognostic factors and published molecular signatures. This finding, in combination with the improved accuracy that we obtained clinical subtypes46. Our study reveals that integrating the output of these multiple with a combinatorial approach, highlights the need to integrate all biological responses generates an effective and independent predictor aspects of the tumor microenvironment into prognostic prediction of outcome (Fig. 3). These observations reflect the heterogeneity of and suggests future directions for the development of novel diagnostic breast cancer, wherein outcome is affected by genetic and epigenetic or therapeutic modalities. 6 ADVANCE ONLINE PUBLICATION NATURE MEDICINE ARTICLES Table 2 Description of genes in the 26-gene SDPP P value (logistic Gene name Chromosome regression) 5q11-q12 0.023 GZMA © 2008 Nature Publishing Group http://www.nature.com/naturemedicine CD8A 2p12 0.036 Transcript description KEGG pathways Gene ontology processes Granzyme A (granzyme 1; Neuroactive ligand- Apoptosis, cleavage of lamin, cytolysis, cytotoxic T lymphocyte– associated serine receptor interaction immune response, proteolysis esterase-3) CD8a (CD8; MAL; p32; Leu2) Antigen processing and T cell activation, antigen processing presentation, cell adhesion molecules (CAMs), and presentation, immune response, transmembrane receptor protein hematopoietic cell lineage, T cell receptor tyrosine kinase signaling pathway signaling pathway TRBV5-4 7q34 0.048 T cell receptor b variable 5-4 CD52 1p36 0.017 (TRBV5-4) CD52 (CDW52; CAMPATH-1 Elevation of cytosolic calcium ion antigen) CD247 1q22-q23 0.028 CD247 (CD3H; CD3Q; CD3Z; TCRZ; CD3-ZETA) concentration, respiratory burst Natural killer cell– mediated cytotoxicity; Cell surface receptor–linked signal transduction T cell receptor signaling pathway CD48 1q21.3-q22 0.032 PLEK 2p14 0.055 RUNX3 1p36 0.058 CD48 (BCM1; BLAST; hCD48; mCD48; BLAST1; SLAMF2; Natural killer cell– mediated cytotoxicity Defense response MEM-102) Pleckstrin (P47; FLJ27168) Intracellular signaling cascade Runt-related transcription factor-3 (AML2; CBFA3; Cell proliferation, induction of apoptosis, negative regulation of epithelial cell PEBP2aC; FLJ34510; MGC16070) proliferation, negative regulation of progression through cell cycle, regulation of transcription (DNA-dependent), transcription, transcription from RNA polymerase II promoter GIMAP5 7q36.1 0.050 GTPase, IMAP family member 5 (IAN4; IAN5; IMAP3; hIAN5; HIMAP3; IAN4L1; FLJ11296) LCP1 13q14.3 0.075 Actin filament bundle formation Lymphocyte cytosolic protein-1 (L-plastin) (CP64; PLS2; LC64P; FLJ25423; FLJ26114; FLJ39956; L-PLASTIN; DKFZp781A23186) F2RL2 5q13 0.028 Coagulation factor II (thrombin) receptor-like 2 (PAR3) SLC40A1 2q32 0.020 Solute carrier family 40 wounding, signal transduction Anatomical structure morphogenesis, (iron-regulated transporter) member-1 (FPN1; HFE4; MTP1; IREG1; MST079; cellular iron ion homeostasis, ion transport, iron ion transport MSTP079; SLC11A3) Frizzled-related protein Wnt receptor signaling pathway, cell (FRE; FZRB; hFIZ; FRITZ; FRP-3; FRZB1; SFRP3; SRFP3; FRZB-1; FRZB-PEN) differentiation, multicellular organismal development, negative regulation of Wnt FRZB 2qter 0.059 Neuroactive ligand– receptor interaction G protein–coupled receptor protein signaling pathway, blood coagulation, response to receptor signaling pathway, skeletal Xp22 0.027 Retinoic acid–induced-2 development Embryonic development HOXA10 7p15-p14 0.045 Homeobox A10 (PL; HOX1; HOX1H; HOX1.8; MGC12859) Multicellular organismal development, regulation of transcription (DNA- ITGBL1 13q33 0.066 Homo sapiens mRNA full-length insert dependent), spermatogenesis, transcription Cell adhesion, cell-matrix adhesion, RAI2 (AL359052) OGN cDNA clone EUROIMAGE 1968422 OR integrin, beta-like 1 (with 9q22 0.038 integrin-mediated signaling pathway EGF-like repeat domains) (OSCP; TIED) Osteoglycin (mimecan; OIF; SLRR3A; DKFZP586P2421) NATURE MEDICINE ADVANCE ONLINE PUBLICATION 7 ARTICLES Table 2 Continued P value (logistic Gene name C21orf34 Chromosome regression) 21q21.1 0.047 Transcript description KEGG pathways Gene ontology processes Neuroactive ligand- G protein–coupled receptor protein signaling receptor interaction pathway, Rho protein signal transduction, actin cytoskeleton organization and Chromosome 21 open reading frame 34 (C21orf35; FLJ38295; hypothetical protein ADRA2A 10q24-q26 0.044 LOC388815) Adrenergic aa-2A-, receptor (ADRA2; ADRAR; ZNF32; ADRA2R; ALPHA2AAR) © 2008 Nature Publishing Group http://www.nature.com/naturemedicine biogenesis, activation of MAPK activity, cell motility, fear response, negative regulation of adenylate cyclase activity, positive regulation of cell proliferation, signal transduction CXCL14 SPP1 HRASLS VGLL1 ADM 5q31 4q21-q25 0.051 0.037 Chemokine (C-X-C motif) ligand-14 (KS1; Kec; BMAC; BRAK; NJAC; MIP- Cytokine-cytokine receptor interaction, Cell-cell signaling, chemotaxis, immune response, inflammatory response, signal 2g; SCYB14; MGC10687; bolekine) leukocyte transendothelial migration transduction Secreted phosphoprotein-1 (osteopontin, bone sialoprotein I, early Cell communication, ECM-receptor Cell adhesion, cell-matrix adhesion, negative regulation of bone mineralization, T lymphocyte activation 1) (OPN; BNSP; BSPI; ETA-1; MGC110940) interaction, focal adhesion ossification 3q29 0.028 HRAS-like suppressor (A-C1; HSD28; HRASLS1; H-REV107) Xq26.3 0.038 Vestigial-like-1 (Drosophila) (TDU; VGL1) Regulation of transcription (DNAdependent) 11p15.4 0.022 Adrenomedullin (AM) cAMP biosynthetic process, cell-cell signaling, circulation, excretion, female pregnancy, heart development, positive regulation of cell proliferation, progesterone biosynthetic process, response to wounding, signal transduction C6orf168 (A_32_P74964) 6q16.2 0.021 Chromosome 6 open reading frame 168 SNTG2 (A_32_P49764) 2p25.3 0.017 Syntrophin, g2 (SYN5; G2SYN; MGC133174) Central nervous system development KEGG, Kyoto encyclopedia of genes and genomes. METHODS Description of samples. We collected tissue samples from 73 women presenting with invasive breast carcinoma. We froze the samples in liquid nitrogen within 30 min of surgical removal and subsequently subjected them to LCM. From this cohort, we obtained 53 samples of tumor stroma (50 IDC and three invasive lobular carcinoma cases); in 31 cases, we were able to obtain individual-matched normal adjacent stroma (Supplementary Methods online). The median follow-up time was 3.58 years. We determined recurrence (local or distant) by examination of medical records after diagnosis. We defined poor outcome as alive with disease or dead of disease as of the time of the latest follow-up. No subject in the study received neoadjuvant therapy. This study was approved by the McGill University Health Centre (MUHC) Research Ethics Board (Protocols SUR-99-780 and SUR-00-966). All human subjects provided written, informed consent. 99.2 percentile of the variance distribution across all matched samples, which were also significantly differentially expressed in at least three samples (P o 1 105). We clustered the tumor stroma data using these genes and evaluated the resulting clusters as described (Supplementary Methods). Identification of differentially expressed genes. We used pair-wise class distinction to identify genes differentially expressed between the stroma clusters as described (Supplementary Methods). The resulting gene list contained 163 members. Laser capture microdissection, RNA isolation and microarray hybridization. We carried out LCM, sample isolation and preparation and microarray hybridization as previously described18 (Supplementary Methods). After we performed normalization and model fitting as previously described18, our microarray data set contained 111 distinct expression experiments. Predictor construction and evaluation. We used logistic regression to score and rank each gene in the expression profile on the basis of its statistical significance (P-value) in predicting recurrence in a model that included the gene expression level, lymph node status, and ER, PR and HER2 status. This model ensured that the predictive strength of a gene was not confounded with those variables. We trained naive Bayes’ classifiers to predict outcome using the ranked gene expression profile of the recurrence-positive stroma cluster and compared them using ROC curves as described (Supplementary Methods). The AUC was used to evaluate the performance of the SDPP in tumor stroma, tumor epithelium, normal stroma and normal epithelium. Stroma subtype identification. We fit a linear model48 to the individualmatched tumor versus normal stroma data and identified the genes in the SDPP performance in tumor epithelium. We evaluated the SDPP in gene expression data from LCM-derived tumor epithelium (Supplementary Methods). 8 ADVANCE ONLINE PUBLICATION NATURE MEDICINE ARTICLES Gene ontology analysis. We cross-referenced genes differentially expressed in each stroma outcome subtype against gene onotology annotations49 to identify overrepresented gene ontology categories using a test against the hypergeometric distribution, at a significance threshold of P r 0.05. © 2008 Nature Publishing Group http://www.nature.com/naturemedicine Comparison with publicly available breast cancer data sets. We downloaded four publicly available breast cancer data sets and used the SDPP to predict the outcome for each sample. In the NKI3, JRH32, Uppsala33 and Rotterdam5 data sets, the poor-, mixed- and good-outcome categories of samples identified by the SDPP were treated as categorical variables in Cox proportional hazards regression and included other prognostic factors available for each data set. The accuracy of a predictor was measured as the total true positives over the total number of samples in a cohort. Details are provided in Supplementary Methods. Predictor integration. We combined the independent predictions of the SDPP, 70-gene predictor, wound response, SFT/DTF and hypoxia signatures in the NKI data set to construct a Bayes’ classifier of metastasis as described (Supplementary Methods). Annotation of unknown genes. We obtained probe annotations from the BioConductor hgug4112a Agilent annotation package. Unknown probes were annotated by using BLAST to align probe sequences against the ENSEMBL human genome assembly (release 49) or by searching across other databases. For details, see Supplementary Methods. Assignments for unknown members of the set of 163 genes are found in Supplementary Data. Immunohistochemistry. We validated protein expression of selected members of the SDPP gene set (CD8A, CD247 and SPP1) by immunohistochemistry with sections from formalin-fixed, paraffin-embedded blocks obtained from the MUHC Pathology archive. CD31 expression was evaluated on frozen tissue sections. Procedures were carried out as per the manufacturer’s instructions (Supplementary Methods). Quantitative real-time–PCR. We used amplified RNAs prepared from microdissected material as templates for quantitative RT-PCR validation using a LightCycler system (Roche Applied Science) as per the manufacturer’s instructions (Supplementary Methods). Accession codes. Microarray results have been submitted to the Gene Expression Omnibus with accession code GSE9014. Note: Supplementary information is available on the Nature Medicine website. ACKNOWLEDGMENTS We thank: D. Fleiszer, A. Loutfi, C. Milne, D. Owen, G. Pearl, R. Salasidis, F. Tremblay, M. Wexler (surgeons); F. Halwani, K. Khetani (pathologists); H. Barwick, A. Cuellar, D. Hori, S. Eng, L. Pasyuk, T. Vilhena, C. Palko-Condron (Pathology staff); C. Loiselle (Nursing); the MUHC Anaesthesia Department; A. Dedhar and A. Viquez (tissue and data collectors) for their assistance. We also thank C. Mihalcioiu, P. Siegel and members of the Park lab for their critical review of this manuscript. This work was supported by grants to M.P. from the Québec Breast Cancer Foundation, Genome Canada–Génome Québec, Valorisation-Recherche Québec and Fonds de la Récherche en Santé du Québec and a Canadian Institutes of Health Research (CIHR) Team Grant; a National Science and Engineering Research Council of Canada Discovery Grants Program grant to M.H.; a CIHR McGill University Cancer Consortium Training Award to G.F.; a US Department of Defense Breast Cancer Predoctoral Traineeship Award to F.P.; MUHC Research Institute and MUHC Department of Medicine Fellowships to N.B.; and Cedars Cancer Institute Fellowships to S.S. and N.B. M.P. holds the Diane and Sal Guerrera Chair in Cancer Genetics at McGill University. AUTHOR CONTRIBUTIONS G.F. designed and implemented the data analysis pipeline for the data generated for this study, developed methods and software for data analysis, analyzed and interpreted the data, and contributed to manuscript preparation. N.B. coordinated experiments, supervised the quantitative RT-PCR and immunohistochemical validation aspects of this study, participated in discussions of data analysis and interpretation, and contributed to manuscript preparation. F.P. contributed to methods and software development and participated in discussions of data analysis and interpretation. S.S. developed protocols for tissue storage, LCM, linear amplification and labeling, and supervised these applications. NATURE MEDICINE ADVANCE ONLINE PUBLICATION M.S. performed LCM and immunohistochemistry. H.Z. performed quantitative RT-PCR and isolated RNA after LCM. H.C. prepared samples and conducted gene expression profiling. G.O. performed pathological and histological analysis of samples and gave advice regarding immunohistochemistry. S.M. contributed to clinical analyses and tissue procurement. A.O. performed pathological and histological analyses on tissue samples before LCM. M.H. supervised the bioinformatics and biostatistics aspects of the project, designed and coordinated analyses, and contributed to manuscript preparation. M.P. initiated and supervised the tissue collection and microarray preparation, supervised the expression profiling aspect of this project, designed and coordinated experiments and contributed to manuscript preparation. Published online at http://www.nature.com/naturemedicine Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions 1. Kamangar, F., Dores, G.M. & Anderson, W.F. Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world. J. Clin. Oncol. 24, 2137–2150 (2006). 2. van ’t Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002). 3. van de Vijver, M.J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002). 4. Perou, C.M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000). 5. Wang, Y. et al. Gene-expression profiles to predict distant metastasis of lymphnode–negative primary breast cancer. Lancet 365, 671–679 (2005). 6. Sorlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 10869–10874 (2001). 7. Chi, J.T. et al. Gene expression programs in response to hypoxia: cell type specificity and prognostic significance in human cancers. PLoS Med. 3, e47 (2006). 8. Chang, H.Y. et al. Robustness, scalability and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl. Acad. Sci. USA 102, 3738–3743 (2005). 9. Glas, A.M. et al. Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 7, 278 (2006). 10. West, R.B. et al. Determination of stromal signatures in breast carcinoma. PLoS Biol. 3, e187 (2005). 11. Allinen, M. et al. Molecular characterization of the tumor microenvironment in breast cancer. Cancer Cell 6, 17–32 (2004). 12. Iyer, V.R. et al. The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999). 13. Singer, C.F. et al. Differential gene expression profile in breast cancer–derived stromal fibroblasts. Breast Cancer Res. Treat. published online, doi:10.1007/s10549-0079725-2 (27 September 2007). 14. Buess, M. et al. Characterization of heterotypic interaction effects in vitro to deconvolute global gene expression profiles in cancer. Genome Biol. 8, R191 (2007). 15. Bhowmick, N.A. & Moses, H.L. Tumor-stroma interactions. Curr. Opin. Genet. Dev. 15, 97–101 (2005). 16. Kim, J.B., Stein, R. & O’Hare, M.J. Tumour-stromal interactions in breast cancer: the role of stroma in tumourigenesis. Tumour Biol. 26, 173–185 (2005). 17. Tlsty, T.D. & Coussens, L.M. Tumor stroma and regulation of cancer development. Annu. Rev. Pathol. 1, 119–150 (2006). 18. Finak, G. et al. Gene expression signatures of morphologically normal breast tissue identify basal-like tumors. Breast Cancer Res. 8, R58 (2006). 19. Uzzan, B., Nicolas, P., Cucherat, M. & Perret, G.Y. Microvessel density as a prognostic factor in women with breast cancer: a systematic review of the literature and metaanalysis. Cancer Res. 64, 2941–2955 (2004). 20. Gruber, G. et al. Hypoxia-inducible factor 1 a in high-risk breast cancer: an independent prognostic parameter? Breast Cancer Res. 6, R191–R198 (2004). 21. Nikitenko, L.L., Fox, S.B., Kehoe, S., Rees, M.C. & Bicknell, R. Adrenomedullin and tumour angiogenesis. Br. J. Cancer 94, 1–7 (2006). 22. Bobrovnikova-Marjon, E.V., Marjon, P.L., Barbash, O., Vander Jagt, D.L. & Abcouwer, S.F. Expression of angiogenic factors vascular endothelial growth factor and interleukin-8/CXCL8 is highly responsive to ambient glutamine availability: role of nuclear factor-kB and activating protein-1. Cancer Res. 64, 4858–4869 (2004). 23. Wang, D. et al. CXCL1 induced by prostaglandin E2 promotes angiogenesis in colorectal cancer. J. Exp. Med. 203, 941–951 (2006). 24. Murdoch, C., Giannoudis, A. & Lewis, C.E. Mechanisms regulating the recruitment of macrophages into hypoxic areas of tumors and other ischemic tissues. Blood 104, 2224–2234 (2004). 25. Bosco, M.C. et al. Hypoxia modifies the transcriptome of primary human monocytes: modulation of novel immune-related genes and identification of CC-chemokine ligand 20 as a new hypoxia-inducible gene. J. Immunol. 177, 1941–1955 (2006). 26. Yoshida, H., Broaddus, R., Cheng, W., Xie, S. & Naora, H. Deregulation of the HOXA10 homeobox gene in endometrial carcinoma: role in epithelial-mesenchymal transition. Cancer Res. 66, 889–897 (2006). 9 © 2008 Nature Publishing Group http://www.nature.com/naturemedicine ARTICLES 27. Lee, A.Y. et al. Expression of the secreted frizzled-related protein gene family is downregulated in human mesothelioma. Oncogene 23, 6672–6676 (2004). 28. Dunn, G.P., Koebel, C.M. & Schreiber, R.D. Interferons, immunity and cancer immunoediting. Nat. Rev. Immunol. 6, 836–848 (2006). 29. Ellyard, J.I., Simson, L. & Parish, C.R. TH2-mediated anti-tumour immunity: friend or foe? Tissue Antigens 70, 1–11 (2007). 30. Mills, C.D., Kincaid, K., Alt, J.M., Heilman, M.J. & Hill, A.M. M-1/M-2 macrophages and the TH1/TH2 paradigm. J. Immunol. 164, 6166–6173 (2000). 31. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. 116–226 (Morgan Kaufman Publishers, San Mateo, California, 1988). 32. Miller, L.D. et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects and patient survival. Proc. Natl. Acad. Sci. USA 102, 13550–13555 (2005). 33. Sotiriou, C. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer Inst. 98, 262–272 (2006). 34. Nuyten, D.S. & van de Vijver, M.J. Gene expression signatures to predict the development of metastasis in breast cancer. Breast Dis. 26, 149–156 (2006). 35. Massague, J. Sorting out breast-cancer gene signatures. N. Engl. J. Med. 356, 294–297 (2007). 36. Pages, F. et al. Effector memory T cells, early metastasis and survival in colorectal cancer. N. Engl. J. Med. 353, 2654–2666 (2005). 37. Hiraoka, K. et al. Concurrent infiltration by CD8+ T cells and CD4+ T cells is a favourable prognostic factor in non–small-cell lung carcinoma. Br. J. Cancer 94, 275–280 (2006). 38. Dalberg, U., Markholst, H. & Hornum, L. Both Gimap5 and the diabetogenic BBDP allele of Gimap5 induce apoptosis in T cells. Int. Immunol. 19, 447–453 (2007). 10 39. Starnes, T. et al. The chemokine CXCL14 (BRAK) stimulates activated NK cell migration: implications for the downregulation of CXCL14 in malignancy. Exp. Hematol. 34, 1101–1105 (2006). 40. Boudreau, N. & Myers, C. Breast cancer–induced angiogenesis: multiple mechanisms and the role of the microenvironment. Breast Cancer Res. 5, 140–146 (2003). 41. Li, A., Dubey, S., Varney, M.L., Dave, B.J. & Singh, R.K. IL-8 directly enhanced endothelial cell survival, proliferation and matrix metalloproteinase production and regulated angiogenesis. J. Immunol. 170, 3369–3376 (2003). 42. Sica, A., Schioppa, T., Mantovani, A. & Allavena, P. Tumour-associated macrophages are a distinct M2 polarised population promoting tumour progression: potential targets of anti-cancer therapy. Eur. J. Cancer 42, 717–727 (2006). 43. Gupta, G.P. et al. Mediators of vascular remodelling co-opted for sequential steps in lung metastasis. Nature 446, 765–770 (2007). 44. Hofmann, H.S. et al. Matrix metalloproteinase-12 expression correlates with local recurrence and metastatic disease in non–small cell lung cancer patients. Clin. Cancer Res. 11, 1086–1092 (2005). 45. Lewis, C.E. & Pollard, J.W. Distinct role of macrophages in different tumor microenvironments. Cancer Res. 66, 605–612 (2006). 46. Teschendorff, A.E., Miremadi, A., Pinder, S.E., Ellis, I.O. & Caldas, C. An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer. Genome Biol. 8, R157 (2007). 47. Fitzgibbons, P.L. et al. Prognostic factors in breast cancer. College of American Pathologists Consensus Statement 1999. Arch. Pathol. Lab. Med. 124, 966–978 (2000). 48. Smyth, G.K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 Article 3 (2004). 49. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000). ADVANCE ONLINE PUBLICATION NATURE MEDICINE