Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
COMBINING BIOMARKERS AND CLINICOPATHOLOGIC FACTORS FOR PREDICTION OF RESPONSE TO ADJUVANT CHEMOTHERAPY FOR BREAST CANCER: COX MODEL AND SUPPORT VECTOR MACHINE (SVM) METHODS by Xudong Liu A thesis submitted to the Department of Community Health and Epidemiology In conformity with the requirements for the degree of Master of Science Queen’s University Kingston, Ontario, Canada April, 2010 Copyright ©Xudong Liu, 2010 Abstract Background: Breast cancer is a complex disease, both phenotypically and etiologically. Accordingly, the responses to various treatments in the adjuvant setting among individuals vary considerably. There is a demand for tools that can distinguish patients who may benefit or may suffer from particular systemic treatments. We hypothesized that combination of data on genetic biomarkers with data from traditional clinical and pathophysiological (clinicopathologic) factors using traditional Cox model or Support Vector Machine (SVM) method, a new machine learning method, may provide a better tool for prediction of benefits to chemotherapy for the treatment of early breast cancer than using either biomarker or clinicopathologic data alone. Methods: This project included 531 patients from NCIC-CTG MA.5 trial who had data on both clinicopathologic factors, such as age, tumor size, ER status, type of surgery, tumor grade and lymph node involvement, and biomarkers assayed on tissue microarrays (TMAs), including HER2, p53, CA9, MEP21, clusterin, pAKT, COX2 and TOP2A. The Cox model and SVM methods were used to develop prognostic indices for relapse-free or overall survival with either data from TMAs and clinicopathologic assessments alone or their combination. The prognostic indices developed were then examined for their value as predictive classifiers for benefits from CEF treatment. The power of the predictive classifiers derived was evaluated and compared using the bootstrap approach. Results: None of the prognostic indices developed were found to have significant predictive value, although the prognostic index developed using SVM method based on i only biomarkers yielded a marginal significant p-value (p=0.0527) for the interaction between classifier and treatment. In accordance with results published previously, the interaction between the classifier developed based on HER2 or TOP2A and treatment was significant (p=0.02 and 0.04 respectively). Comparisons based on the bootstrap approach indicate classifiers developed based on SVM performed better than those based on the Cox model method. Conclusions: Combination of data using biomarkers and clinical-pathological factors, and using either the traditional COX model method or the new machine learning method was not shown to perform better than two single previously known biomarkers in prediction of response to CEF treatment for early breast cancer. ii Acknowledgements First, I would like to express my sincerest gratitude to my primary supervisor, Dr. Dongsheng Tu, for his encouragement, guidance, assistance, motivation, enthusiasm, and most of all, patience throughout the development and execution of this research. Your faith in me and your expertise and guidance through the challenges of this project have been invaluable. Secondly, I would also like to sincerely thank my secondary supervisor Dr. Lois Shepherd for her valuable advice from medical and oncologic prospective during the development of my thesis project and for her important feedback in preparation of my thesis proposal and final thesis. My sincere thanks extend to the NCIC Clinical Trials Group for allowing me to use their data for this project. I would like to acknowledge the faculty and staff in the Department of Community Health and Epidemiology for providing such a friendly and supportive learning environment. Special thanks to Lee Watkins and Katherine Cook for their patient assistance. I also greatly appreciate the generous support from my mentor Dr. Jeanette Holden, whose encouragement and consideration is important for me to start and fulfill this program. Last but not least, I would like to thank my dearest son Kevin whose love and amusement constantly keeps my heart warm and spirit high; to my wife Jianhua who has been a cheerful and loving supporter and encourager in my life; to my parents whose love and encouragement keeps supporting me spiritually throughout my life. iii Table of Contents Abstract ................................................................................................................................ i Acknowledgements............................................................................................................ iii Table of Contents............................................................................................................... iv List of Figures .................................................................................................................... vi List of Tables .................................................................................................................... vii Abbreviations................................................................................................................... viii Chapter 1 Introduction ........................................................................................................ 1 1.1 Overview ................................................................................................................... 1 1.2 Breast Cancer and Its Treatment ............................................................................... 3 1.3 The Need to Identify Optimal Treatment for Women with Breast Cancer............... 4 1.4 Prognostic and Predictive Factors of Breast Cancer ................................................. 6 1.4.1 Traditional clinical and pathologic (clinicopathologic) factors.......................... 8 1.4.2 Biomarkers from Tissue Microarrays (TMAs) Assays ...................................... 9 1.5 Need for Combining Clinicopathologic Characteristics and Molecular Biomarkers ....................................................................................................................................... 14 1.6 Objectives of the Project ......................................................................................... 15 Chapter 2 A Review of Statistical Techniques ................................................................. 17 2.1 Traditional Cox model for censored survival data (COX model method).............. 17 2.2 Support Vector Machine (SVM) Method................................................................ 18 2.3 Verification, validation and comparison of predictive classifiers........................... 23 Chapter 3 Patients and Methods ....................................................................................... 25 3.1 NCIC CTG MA.5 Clinical Trial ............................................................................. 25 3.2 Data ......................................................................................................................... 26 3.3 Analysis strategies................................................................................................... 27 3.3.1 Analysis of Prognostic and Predictive Values of Each Individual Variable .... 27 3.3.2 Development of Prognostic Indices.................................................................. 28 3.3.2.1 Cox Model Method .................................................................................... 28 3.3.2.2 Support Vector Machine (SVM) method................................................... 29 iv 3.3.3 Verification for the Prognostic Indices as Predictive Classifier for the Treatment................................................................................................................... 30 3.3.4 Validation of Predictive Classifier using Bootstrap Methods .......................... 31 Chapter 4 Results .............................................................................................................. 32 4.1 Data Sets.................................................................................................................. 32 4.2 Summary of Baseline Characteristics ..................................................................... 32 4.3 Prognostic and Predictive Values of Each Individual Variable .............................. 35 4.4 Prognostic Indices ................................................................................................... 38 4.4.1 Classification Rules based on COX Model Method......................................... 38 4.4.2 Classification Rules based on SVM Method.................................................... 42 4.5 Prognostic Indices as Predictive Classifiers............................................................ 45 4.5.1 Treatment Effects in High and Low Risk Groups Defined by Prognostic Indices ................................................................................................................................... 45 4.5.2 Hazard Ratio and Confidence Interval for Interaction between Treatment and Prognostic Index ........................................................................................................ 71 4.6 Validation of Predictive Classifier using Bootstrap Methods................................. 74 Chapter 5 Conclusions and Discussion............................................................................. 79 References......................................................................................................................... 84 v List of Figures Figure 1: Recurrence-free Survival Curves by COX Model and SVM Approach – Index 1…………………………………………………………….……….….51 Figure 2: Recurrence-free Survival Curves by COX Model and SVM Approach – Index 2…………………………..………………………………………….…53 Figure 3: Recurrence-free Survival Curves by COX Model and SVM Approach – Index 3………………………………………………………….………….….55 Figure 4: Overall Survival Curves by COX Model and SVM Approach – Index 1….…57 Figure 5: Overall Survival Curves by COX Model and SVM Approach – Index 2….…59 Figure 6: Overall Survival Curves by COX Model and SVM Approach – Index 3….…61 Figure 7: Recurrence free survival curves by COX model and SVM approach – HER2/TOP2A…………………………………………………………………63 Figure 8: Overall survival curves by COX model and SVM approach – HER2/TOP2A…………………………………………………………………65 Figure 9: Recurrence free and overall survival curves by COX model – HER2………..67 Figure 10: Recurrence free and overall survival curves by COX model – TOP2A……..69 vi List of Tables Table 1: Baseline Characteristics of Patients with both Clinical and Biomarker Available………………………………………………………………......…..34 Table 2: Prognostic and Predictive Factors (P-values from COX model)……….…….37 Table 3.1: Summary of Three Indices in Recurrence-free Survival by COX model….....40 Table 3.2: Summary of Three Indices in Recurrence-free Survival by SVM…………....41 Table 4.1: Summary of Classification Results in Recurrence-free Survival by SVM...…43 Table 4.2: Summary of Classification Results in Overall Survival by SVM…………….44 Table 5.1: Summary of Treatment Effect for Poor Prognostic Groups (Recurrence-free Survival)………………………………………….….….….47 Table 5.2: Summary of Treatment Effect for Good Prognostic Groups (Recurrence-free Survival)………………………………………….….….….48 Table 5.3: Summary of Treatment Effect for Poor Prognostic Groups (Overall Survival)……………………………..…………………….…….…..49 Table 5.4: Summary of Treatment Effect for Good Prognostic Groups (Overall Survival)……………….……………………………………..….…..50 Table 5.5: Summary of Prognostic and Predictive Indices (Recurrence-free Survival)…72 Table 5.6: Summary of Prognostic and Predictive Indices (Overall Survival)…………..73 Table 6.1: Summary of Proportion Interaction Term with Significant p-value by Bootstrap Method (Recurrence-free Survival)……………………………….75 Table 6.2: Summary of Proportion Interaction Term with Significant p-value by Bootstrap Method (Overall Survival)………………………………………..76 Table 7.1: Summary of Hazard Ratio and CI by Bootstrap Method (Recurrence-free Survival)……………………………………………………………….….…..77 Table 7.2: Summary of Hazard Ratio and CI by Bootstrap Method (Overall Survival)...78 vii Abbreviations AMP CA9 cDNA CEF CLU CMF COX-2 COX-PH DNA ER HER2 MEP21 (PODO) mRNA NCIC-CTG p53 pAKT SD SVM TMA TOP2A Amplification Carbonic anhydrase IX (CAIX or CA9) Complementary Deoxyribonucleic acid Cyclophosphamide, Epirubicin, and S-Fluorouracil Clusterin Cyclophosphamide, Methotrexate, and S-Fluorouracil Cyclooxygenases -2 COX proportional hazards model Deoxyribonucleic acid Estrogen receptor The human epidermal growth factor receptor type 2 Podocalyxin Messenger RNA Clinical Trials Group Tumor protein 53 phosphorylated Akt Standard Deviation Support Vector Machine Tissue Microarrays Assays Topoisomerase II alpha gene viii Chapter 1 Introduction 1.1 Overview One of the biggest challenges for basic life science and clinical research, clinical practice, and the pharmaceutical industry in the 21st century is to develop and deliver treatments for disease that are tailored to an individual patient's biology and pathophysiology. This is especially important for complex diseases, such as breast cancer. Breast cancer is a complex disease, both phenotypically and etiologically1. Accordingly, the response to various treatments among individuals varies considerably. Adjuvant therapies which include chemo and hormonal therapies have been proven to substantially improve disease-free and overall survival in women with breast cancer, are effective forms of treatment after surgery and widely used for the treatment of breast cancer following the primary surgery2, 3. Yet, all who are involved in the medical management of breast cancer patients know that the same treatment does not work in the same way in each patient. Often, only a proportion of patients will have delayed disease recurrence after treatment with a particular drug, and among these patients, only a proportion of patients will continue to remain free of disease. Also different patients may suffer from different side effect of the same drug2, 4, 5, 6 . Thus there is a clinical demand for tools that could distinguish patients who may benefit or suffer from particular systemic treatments. The ultimate goal is to administer the adjuvant systemic therapies in a ‘tailored” or “personalized” fashion and cost-effective way that offers the most benefit and fewest side effects to the breast cancer patients. 1 Patients who remain disease free for a long period of time after treatment may be considered as having responded well to the treatment. Response prediction and efficient administration of various types of treatment is a huge challenge for both clinicians and basic researchers. The selection of adjuvant therapies for breast cancer patients has been primarily directed by the long-term results of clinical outcome studies from previous clinical trials. The treatment guidelines and evidence-based decision-making tools have long been based on clinical and pathological features of the patients and are far from being satisfactory and accurate7. Recently, classical pharmacogenetic studies identified a limited number of genetic candidate predictive markers which would be used to predict response to certain treatment. These limited markers often failed to reflect the complexity of the treatment response. The development of genomic technology, such as gene expression arrays, for the analysis of human tumor samples has provided a comprehensive approach and enhanced our understanding of the complexity of the biology of breast cancer. This technology has resulted in the development of genomic profiles that more accurately assess the risk of recurrence and predict clinical outcomes from particular treatments in this disease. Incorporating genetic information, such as gene expression signatures, with clinical, pathophysiological, and other risk factors for breast cancer may be better than any of them alone for evaluating the prognosis and predicting response to chemotherapy, yet few studies have examined the value in integrating new genomic information with the traditional factors to provide a more detailed assessment of clinical risk and an improved prediction of response to therapy. The work described in this thesis explores this 2 integration approach using the NCIC CTG MA.5 study, a randomized trial comparing the efficacy of Cyclophosphamide, Methotrexate, and S-Fluorouracil (CMF) and the more dose intensive Cyclophosphamide, Epirubicin, and S-Fluorouracil (CEF) for the treatment of early breast cancer in node positive pre and perimenopausal women. 1.2 Breast Cancer and Its Treatment The incidence of breast cancer and mortality internationally has been stable or has declined over the past decades8, 9. More women with breast cancer are surviving longer than in the past. In Canada, Canadian Cancer Statistics 2007 shows the breast cancer death rate among Canadian women has dropped by 25 per cent in the last two decades and that 86 per cent of women diagnosed with the disease are surviving at least five years10. Yet, breast cancer remains the most common cancer occurring in women worldwide and it is the leading cancer-related cause of death for women in Canada10 and in other industrialized countries9, 11-13. In Canada, more than 22,000 women will be newly diagnosed with breast cancer every year; breast cancer kills more than 5,000 Canadian women every year, more than any other type of cancer except lung cancer; one in nine Canadian women will be diagnosed with and 1 in 27 will die of breast cancer in their lifetime; it accounts for an estimated 95,300 potential years of life lost10. Women diagnosed with early-stage breast cancer often undergo surgery for removal of the tumor and then, typically, are treated with adjuvant systemic therapy. Since all detectable cancer in these women with ‘early’ breast cancer is restricted to the breast and, in the case of node-positive patients, the local axilliary lymph nodes, it can be removed surgically. However, undetected micro-metastatic deposits of the disease may 3 remain and, perhaps after a delay of several years, develop into a clinically detectable recurrence that eventually causes death. The aim of adjuvant systemic therapy is to eradicate these micro-metastases. Adjuvant systemic therapies, including chemo and hormonal therapies, have been the most effective approach to treatment after surgery and radiotherapy if appropriate. Therefore they are widely used for the treatment of breast cancer following the primary surgery and have been proven to substantially improve disease-free and overall survival in both pre-menopausal and postmenopausal women with breast cancer2, 3. It is largely due to early detection and advances in adjuvant therapy, such as the improved management exemplified by the introduction of screening programs and new treatment regimes, that breast cancer mortality rates are now declining. 1.3 The Need to Identify Optimal Treatment for Women with Breast Cancer Although adjuvant therapy is useful in the treatment of women with early breast cancer, the absolute survival gain with the postoperative adjuvant therapy has been modest (5~10%); a substantial proportion of patients continue to relapse with their disease; and all of the therapies have mild to severe toxicity4. Many women with early breast cancer may be cured with surgery alone or may not benefit from adjuvant systemic therapy. Breast cancer is a complex and heterogeneous disease, phenotypically as well as genotypically1. There exist wide-ranging subsets of breast cancer patients who have different prognoses and who respond differently to treatments. Therefore the identification of patients who need treatment and the definition of the best therapy for an individual have become one of the priorities in breast cancer care and research. 4 Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome after the treatment. Systemic adjuvant therapy, chemotherapy or hormonal therapy, reduces the risk of distant metastasis by approximately one-third; however, many patients receiving adjuvant systemic therapy would have survived without it2, 3, while many women receiving adjuvant systemic therapy will still develop widespread or metastatic disease. Even poor prognosis patients receiving no adjuvant therapy have only a 50%–60% risk for developing distant metastases, indicating that 40%–50% of these patients will remain free of disease without any adjuvant systemic therapy2, 5, 6 . Studies have shown a large proportion of women do not benefit from adjuvant systemic therapy as some relapse despite adjuvant therapy and others are never destined to relapse4. All these reflect the fact that breast cancer is a heterogeneous disease encompassing different biological mechanisms with distinct prognoses, unavoidably raising the question about which patients will benefit from treatment, and what treatment should be given. There is currently a great variety in the adjuvant therapy prescribed for women with breast cancer, with the consequence that some women are likely being under-treated whereas others are being over-treated. Existing chemotherapies have in general severe side effects for patients, and they may also be very expensive. The severity of possible side effects of aggressive chemotherapies cannot be overemphasized. Some patients may die as a result of the treatment, not of the disease. Therefore, breast cancer patients and their clinicians need better information to determine which patients are most likely to benefit from adjuvant therapy, which are very 5 unlikely to benefit, and which particular type of adjuvant therapies are most appropriate for individual women. With this information, optimal and personalized adjuvant systemic therapy could be administered in a cost-effective way offering the most benefit and fewest side effects to the breast cancer patients. This would be beneficial to both individual patients and society in general. 1.4 Prognostic and Predictive Factors of Breast Cancer In early breast cancer, as with other tumors, the first step towards individualized therapy is to distinguish those patients who need treatment from those who do not. The second step is to tailor treatment to the characteristics of the specific tumor, given that certain cancers respond better to certain regimens or individual drugs than do others. In general, breast cancer patients with a poor prognosis for survival benefit the most from adjuvant chemotherapy14, 15. In fact, the decision to administer adjuvant therapy for breast cancer has been mainly based on prognostic features. It is critical to identify reliable factors that are associated with clinical outcome regardless of treatment. These are called prognostic factors. The factors that can identify patients more or less likely to benefit from the therapy are called predictive factors. The terms prognostic and predictive factors are often imprecisely defined. A prognostic factor is usually defined as a patient characteristic that identifies subgroups of untreated patients having different outcomes, and a factor predictive of treatment effect as a patient characteristic that identifies subgroups of treated patients having different (as a consequence of treatment) outcomes16-18. The prognostic factors are measurements at diagnosis associated with outcome (i.e., recurrence rate, death rate, and other clinical 6 outcomes), while predictive factors are associated with the degree of response to specific therapy. A prognostic factor is most often, but not necessarily, also predictive of the effect of specific treatments. Whether a prognostic factor is also predictive of treatment effect or not can only be assessed in a valid comparative setting such as in a randomized trial. A factor that is predictive for one treatment effect may not be predictive for another treatment. Also, if a factor predicts (for a specific treatment) the effect on one outcome, it may or may not predict the effect for another outcome16. Good prognostic and predictive factors need to be accurate and reproducible, and can help distinguish between patients at high risk and those at low risk for developing recurrent disease. Sometimes, a prognostic or predictive factor may be constructed from several different types of measurements at the diagnosis of breast cancer. In this case, a prognostic factor consisting of multiple variables is often called a prognostic index (or signature) and a predictive factor as a predictive classifier. The primary objective of this thesis project is to identify predictive classifiers from data from NCIC CTG MA.5 trial. These may include demographic characteristics, clinical characteristics, and pathological features at diagnosis and some molecular biomarkers of the tumor. Statistical and machine learning techniques will be used to assess and compare the ability of these classifiers in the prediction of patients’ response to anthracycline-containing chemotherapy for early breast cancer. In the following section, I will first provide a description and review of some traditional clinical factors and pathologic characteristics 7 of breast cancer tumors and specific molecular biomarkers which will be included in this thesis project. 1.4.1 Traditional clinical and pathologic (clinicopathologic) factors There is a vast literature which has investigated prognostic effects of many demographic and tumor characteristics of breast cancer patients. Age, race, tumor size, histological tumor type, auxiliary nodal status, standardized pathological grade, lymphatic and vascular invasion, and hormone-receptor (eg. estrogen and/or progesterone) status are some of the clinical and pathologic factors which have been considered as having prognostic value in connection with disease recurrence and/or survival. Among them, only hormone-receptor status is accepted as an established predictive factor for selection of systemic adjuvant treatment of breast cancer19. The presence of estrogen and/or progesterone receptors in the primary tumor is the strongest predictor for response and benefit to tamoxifen therapy for the breast cancer. Although the above clinicopathologic factors are used currently by clinicians to inform decision making regarding the need for, and type of, adjuvant chemotherapies, these conventional “macro-scale” clinical, environmental and behavioral parameters have only limited predictive power7. The information arising from these conventional “macroscale” clinicopathologic factors is nowhere near as precise and accurate as is needed to provide robust predictions for the treatment responses. Aside from perhaps the hormone receptor status, these factors do not appear to reflect or take into account the underlying biology of breast cancer. 8 1.4.2 Biomarkers from Tissue Microarrays (TMAs) Assays Recent advances in genome research and technologies have changed the landscape of biomedical research. New technologies are not only important for fundamental research, but will also be useful for diagnostic, prognostic or therapeutic purposes in individual patients. In cancer research, the use of Tissue Microarray (TMA) and the emerging advanced gene expression profiling technology, such as DNA Microarray, makes it possible to construct more accurate classifiers for predicting drug responses in personalized medicine settings using factors more closely related to the biology of the breast cancer. In a landmark study, Perou et al.20 identified four subtypes (luminal ER+, basal, HER-2 and normal breast) of breast cancer based on complementary DNA (cDNA) microarray expression patterns of 8,102 genes and later demonstrated prognostic values of a revised subtype classification which divided ER+ tumors further into two subtypes (luminal A and luminal B/C). While DNA microarrays, which include oligonucleotide (eg. Affymetrix platform) or complementary DNA (cDNA), allow for high-throughput analyses of the mRNA expression of thousands of genes simultaneously, and have been leading to the identification of new genes and pathways with importance in cancer development and progression, or as targets for new therapies, the validation and prioritization of genes emerging from genome screening analyses in large series of clinical tumors has become a new bottleneck in research. Tissue microarray (TMA)21 technology is a histology based approach that permits the study of the expression of a single gene in hundreds of tissue samples, and has been developed to further understand the importance of the genes 9 identified on the profiling platforms. TMAs are tissue sections mounted in a slide containing samples from hundreds of individual tumor specimens. They can be used for large-scale analysis of gene expression (protein) level using immunohistochemical staining on hundreds of tumor specimens at a time. TMAs, coupled with DNA microarrays, provide a powerful approach to identify large numbers of new candidate genes, and rapidly validate their clinical impact in large series of human tumors. They have been frequently and successfully used in cancer diagnosis and detection, and more recently have been applied to cancer prognosis and prediction22-26. With TMAs and immunohistochemical analysis, several molecular biomarkers showed prognostic value in breast cancer regarding outcome irrespective of therapy, and, more recently, have been shown to be predictive factors for outcome after conventional or high-dose chemotherapy27-31. In the following, I am going to review some of these molecular markers which have been studied and analyzed by the Genetic Pathology Examine Centre, Vancouver General Hospital and are included in my thesis project. P53 p53 is a tumor suppressor gene and an important factor for cell cycle regulation, p53 also induces apoptosis after toxic damage32, 33 to the cell. Abrogation of p53 function results in a more aggressive tumor biology and a worse outcome for patient with breast cancer34. Considering that many anticancer agents kill cancer cells primarily by inducing apoptosis and p53 is a key determinant to induce either growth arrest or apoptosis in response to cytotoxic stress, p53 has been considered as one of the most important 10 candidate prognostic and predictive factor for breast cancer. The prognostic and predictive values of p53 has been extensively studied in breast cancer (reviewed by Elledge35). p53 has consistently been shown to have predictive value for breast cancer patients treated with standard chemotherapy or high-dose chemotherapy regimens for recurrent dosease36-38. HER2 The human epidermal growth factor receptor (HER) family plays a key role in regulation of mammalian cell survival, proliferation, migration, adhesion, and differentiation. The HER family of receptor tyrosine kinases comprises four structurally related transmembrane receptors: HER1 (EGFR EGFR (HER-1), HER-2, HER-3, and HER-4. All members have an extracellular ligandbinding domain, a transmembrane domain, and a cytoplasmic tyrosine kinase domain. When responding to ligand specific binding, these receptors act by forming hetero- or homodimers to initiate tyrosine kinase activity in the intracellular domain. A number of ligands were identified for HER1 and HER-3 and -4, yet no direct ligand for HER-2 has yet been discovered. HER-2 was found to be the preferred heterodimerisation partner for all other HER family members, suggesting a more central role of HER2 in the HER family due to its the primary function as a co-receptor. HER2, also known as c-erbB-2, encodes the protein HER2/neu antigen. Although the signaling pathways activated by HER2 are not completely characterized, HER2 is clearly established as oncogenic factor associated with a more malignant phenotype,39 cell proliferation,40 invasion,41 and metastasis.42 Numerous studies 36, 37, 43-50 have established HER2/neu not only as an important independent prognostic factor but 11 also as a reliable predictive marker for projecting response to chemotherapeutics, antiestrogens, and therapeutic anti-HER2/neu antibodies in breast cancer. It has been shown that amplification of HER2/neu, overexpression of its product, or both in breast tumors not only predicts responsiveness to trastuzumab but also identifies patients who may benefit from high-dose chemotherapy or from anthracycline-containing regimens (summarized by Pritchard et al, 2006)45. CA9 Carbonic anhydrase IX (CAIX or CA9) protein is a member of the carbonic anhydrase family, and is thought to play a role in the regulation of cell proliferation in response to hypoxic conditions and may be involved in oncogenesis and tumor progression51-53. CA9 expression has been show to be linked to poor prognosis in a number of human tumours54. It has been identified as an independent predictor of survival in advanced renal clear cell carcinoma55, 56, and more recently was shown to have promising predictive value for the efficacy and outcome of primary epirubicin/tamoxifen therapy for breast cancer57. COX-2 COX-2 is a member of the family of Cyclooxygenases (COXs) enzymes that catalyze the rate-limiting step in the conversion of arachidonic acid to prostaglandins58-60, which have been implicated in breast carcinogenesis61-63. While multiple studies have shown prognostic value of COX2 in breast cancer64-66, the evidence for it as predictive marker has been limited67. 12 Clusterin (CLU) Clusterin (CLU) is a secretory heterodimeric disulphide-linked glycoprotein expressed in virtually all tissues and found in all human fluids68, 69. It is involved in numerous physiological processes important for carcinogenesis and tumor growth, including apoptotic cell death, cell cycle regulation, DNA repair, cell adhesion, tissue remodeling and so on68-70. Several studies have shown clusterin was associated with prognosis of breast cancer71-74, yet no study has shown its predictive value in adjuvant therapy. pAKT pAKT, also known as PKB, is a serine/threonine protein kinase and a crucial regulator of widely divergent cellular processes, including apoptosis, proliferation and differentiation. Disruption of normal Akt/PKB signaling occurs frequently in several human cancers, and the enzyme appears to play an important role in cancer progression and cell survival75. Besides prognostic value in breast cancer76, pAKT have also been shown to predict therapeutic response in the breast cancer adjuvant setting. The pAKT signaling pathway was shown to decrease the efficacy of hormone therapy and was associated with a poorer outcome in patients who receive adjuvant hormone therapy77, 78. In addition, constitutive and inducible pAKT activity promotes resistance to chemotherapy, trastuzumab or tamoxifen in breast cancer cells79, suggesting that pAKT activation could be used as a predictive marker for sensitivity to various therapies, which was confirmed by several studies80. 13 MEP21(Podo) MEP21, or Podocalyxin, is a member of the CD34 family of integral membrane proteins with anti-adhesive qualities81-83. Its overexpression is an independent predictor of breast cancer progression84. There is no study yet on it role as a prognostic or predictive factor in breast cancer. TOP2A Topoisomerase II alpha (TOP2A) is an enzyme that is integrally associated with the cytotoxic action of anthracyclines and whose gene is close to HER2/neu on chromosome 17q. Several recent articles have suggested that TOP2A gene amplification and deletion are predictive markers for the benefit of adjuvant anthracycline containing therapy in primary breast cancer85-88. It has been suggested in fact that the predictive value of HER2 is most likely related to the concomitant amplification of the topo II alpha gene88. 1.5 Need for Combining Clinicopathologic Characteristics and Molecular Biomarkers Since tissue microarray analysis is done individually for a biomarker of interest, research had been focusing on the study of prognostic and predictive effects of single biological markers. Like their clinicopathologic counterparts, the predictive power for most of these molecular biomarkers alone is very limited. Recently, there have been several studies which showed that combinations of multiple molecular biomarkers are more predictive than single biomarkers89-91. Moreover, since the microarray technology suffers from a low signal-to-noise ratio, integration of other sources of information, such as macro-scale clinicopathologic data, could be more important to counter randomly 14 generated differences in expression levels. Nevertheless, the focus in most current studies is on gene expression while the traditional clinicopathologic data is often underused when DNA microarray data are available. Recent studies have shown ‘old’ clinicopathologic markers are by no means outperformed by gene expression profiles in breast cancer prognosis and prediction92-94. Combining gene expression data using IHC analysis with clinicopathologic data might lead to a more robust and accurate assessment of cancer prognosis and predictions. Shedden et al (2003) used a pathological framework and showed that this information significantly lowered the number of genes required in their model95. Recent successful cases also showed superiority of clinical and genetic data integration rather than the single type data classifier94, 96, 97. 1.6 Objectives of the Project In this project, multiple molecular markers (reviewed above) will be combined with multiple clinicopathologic characteristics of patients to develop predictive classifiers which would be more robust and predictive than that developed from single biomarkers or clinicopathologic factors to provide tailored prescription of cyclophosphamide, epirubicin, and fluorouracil (CEF) or cyclophosphamide, methotrexate, and fluorouracil (CMF) in the adjuvant therapeutic treatment of premenopausal women with node-positive breast cancer. In the next chapter, I will provide a review of major statistical techniques which will be used in this project. Specifically the followings are the objectives of this project: • The primary objective of this thesis project is to use traditional Cox proportional 15 hazards model and new machine learning methods to develop predictive classifiers which would be used to identify patients who are responsive to the anthracycline-containing chemotherapy for early breast cancer based on the tissue microarray and clinicopathologic data from NCIC CTG MA.5 study. • In addition, this project will also evaluate and compare the predictive powers of the classifiers developed. 16 Chapter 2 A Review of Statistical Techniques Currently, no statistical procedure has been developed specifically for the construction of predictive classifiers. The common approach in the literature is to first construct a prognostic index using statistical modeling or machine learning techniques and then consider this prognostic index as also a predictive classifier after its ability in prediction of response to treatment is demonstrated in a formal test of interaction between this classifier and treatment. The same approach will be used in this thesis project. In the following, I will provide an overview of two techniques, COX model and SVM methods, which can be used in the first step of this approach, i.e., construction of a prognostic index, and are explored in this thesis. 2.1 Traditional Cox model for censored survival data (COX model method) In studies with survival data, traditional statistical models, either directly or through its hazard function, on a set of covariates are main techniques for the identification of independent prognostic factors. These models can also be used to construct a prognostic index. Cox proportional hazards model, the most widely used regression model for censored survival data98, 99, is usually used as the first choice of the models. Suppose that there are a total of N patients under study and, for i-th patient (i=1, 2, …, N), let hi(t) be the hazard function of disease recurrence or death for an individual with a set of covariates x_i = (xi1, . . . , xip), which, in the context of this thesis, are the 17 values of clincopathologic and biomarkers for this patient. Cox proportional hazard model assumes the following relationship between the hazard function and covariates: hi ( t ) = h0 ( t ) exp ( b1 xi1 + b2 xi 2 + ... + bd xid ) where h0(t) is an unspecified baseline hazard function. Based on data, we can estimate the regression coefficients β = (β1, . . . , βp) in the model. Denote these estimates as β̂ = ( β̂1 , . . . , βˆ p ). With these estimates, we can define a linear risk score function f (X) = βˆ ' X for a patient with covariates X=(x1, …, xp). Based on this function, we can divide patients into various risk groups once a cut-off point c is identified. For example, a patient with covariates X may be classified into the high risk group (and, therefore, may benefit from treatment) if f(X)>c. A well known prognostic index used in early breast cancer, the Nottingham prognostic index, was constructed based on this method100 The main advantage of the Cox model is that it is a well-understood and welltested framework in survival analysis. But a crucial assumption underlying the Cox model, i.e., the proportional hazards assumption, may not be satisfied by the data in practice. 2.2 Support Vector Machine (SVM) Method Statistical models have been the main statistical techniques used to construct prognostic indicies. They focus on explicitly modeling the underlying probabilistic mechanism of the phenomenon under study. Recently, new machine learning methods have been applied to develop prognostic indicies, especially when the number of the covariates is large. Machine learning techniques take a rather different perspective from 18 the traditional statistical modeling methods. Their main focus is merely to learn a predictive rule from the current data, which are called training samples, using a variety of statistical, probabilistic and optimization techniques without explicitly assuming a statistical model for these data. These rules are then applied to classify new data, identify new patterns or predict novel trends101. Machine learning methods are frequently used for cancer diagnosis and detection and more recently, they have been applied to study cancer prognosis and prediction. Relative to traditional statistical methods, machine learning methods were shown to substantially (15-25%) improve the accuracy of outcome prediction102. Machine learning techniques, especially the supervised machine learning methods are a promising approach for cancer prognosis and prediction based on integrated data sources. In the following, I will provide a detailed description and review of one specific machine learning method, called support vector machine method, which will be explored in my thesis project. The support vector machine or SVM103, 104 technique is a relatively new but popular method in machine learning. The basic principle of SVM is to find an equation for a line (or hyperplane when there are more than one covariate) which can separate two classes (such as patients with good or bad prognosis) maximally. Figure 1 illustrates a simple SVM that discriminates two separable classes. The hyperplane is determined by a subset of the points of the two classes, called support vectors. The SVM algorithm creates a hyperplane that separates the data into two classes with the maximum margin – meaning that the distance between the hyperplane and the closest examples, defined as the margin, is maximized. If the two classes are overlapping and not linearly separable, the SVM can 19 projects the data into a higher-dimensional space using kernel function, and solves the classification problem in the higher dimension. Commonly used kernel functions are the linear kernel, the radial basic kernel, and the polynomial kernel. Class Y Class X Figure 1: Separation by Support Vectors (source: The Elements of Statistical Learning (2nd edition) by Hastie, Tibshirani and Friedman (2008). Springer-Verlag.) SVM possesses a flexible structure and thus are especially useful when the data structure is complex and from different sources. SVM was successfully applied in tumor classification and prognosis prediction102 and in the study of cancer treatment prediction105. To deal with censored survival time data, we need to divide the survival times into a number of predefined intervals or classes or to treat it as a rank regression problem although new algorithm was recently proposed to handle the censored survival data directly using SVM106, 107 but software for these algorithm are not readily available. Since our goal is to classify the patients into “bad’ and “good’ prognostic groups, we can 20 divide patients into two groups based on their survival times using the approach described below and develop a two-class SVM with a linear kernel function, which was suggested to be superior to other more complex forms of SVM for class prediction when dealing with complex data source with larger number of covariates 108. To classify patients into two classes during the training process of SVM, we can adopt the strategy proposed by Liu et al 109, using the only patients with extreme survival times. The training dataset will only consist of two types of extreme patients: short-term survivors and long-term survivors. The reason for not considering patients whose survival times were not extreme during the training steps is that the sharp contrast between the short-term and long-term survivors may be more informative and reliable for building and understanding the relation between the covariates and outcomes, while considering other patients sample would bring in more noise and confusion signals. The following are steps during the construction of a SVM classifier based on censored survival times: 1). Selecting extreme samples for SVMs training: For a patient T, if his/her follow-up time is F(T) and status at the end of follow-up time is E(T), we will define where E(T)=1 stands for “dead” and E(T)=0 stands for “alive” or “censored”; c1 and c2 are two thresholds of survival time for defining short-term and long-term survivors, respectively. The thresholds may vary and the guideline is that the selected training 21 dataset should contain enough patients in original training sample. Patients in the selected training dataset will be assigned with class label “1” for “short-term survivors” and “-1” for “long-term survivors”. 2). Supervised training of SVM. Using all data on covariates as input and the assigned classes as output, a twoclass SVM with linear kernel function can be trained on the selected training dataset. The SVM regression function G(T) can be expressed as a linear combination of the values of covariates: G (T ) = ∑ α i yi K (T , x(i )) + b , i where x(i ) are support vectors samples, yi are the class labels of x(i ) , vector T represents a test sample. The parameter α i and b are determined from the training data during the supervised training. 3). Calculation of risk score index based on the trained SVM For a given new test sample, it is more likely a shorter survivor if G(T) >0; and is more likely to be longer survivor if G(T) <0. The predicted output of G(T) can be transformed into a probability that can be expressed as so called risk score index as following: S (T ) = 1 1 + e − G (T ) 4). Classification of samples into high and low risk groups based on the risk score index 22 The S(T) is in the range of (0,1), with the smaller value indicating for the better prognosis and the larger value for poorer prognosis. Since we only want to classify patients into two groups: high risk and low risk, the cut-off value for S(T) would be selected as 0.5. That is, if S(T) >0.5 then the patient will be assigned to high risk group, and the patient will be assigned to low risk group if S(T) <0.5. R e1071 package can be used in the development a SVM index. 2.3 Verification, validation and comparison of predictive classifiers In order to determine whether the prognostic indices developed from the methods reviewed in last section are also predictive classifiers, we need first to verify it has truly the ability to predict the response to treatment. This is usually done by testing the significance of the interaction in a Cox model with the subgroup (high risk or low risk) defined by a given index, treatment and the subgroup by treatment interaction terms. The indices with significant interactions will be considered as (significant) predictive classifiers. For a predictive classifier developed from above, its usefulness in predicting treatment responses for future patients is also important to assess. This step is usually called validation. Validation of a prognostic index has been discussed extensively in the literature 110 . Validation of a predictive classifier is, however, much more complicated. The ideal method to validate a predictive classifier is carrying out a new prospective randomized clinical trial. Several designs for these prospective clinical trials have recently been suggested for the validation of predictive classifiers111, 112 but they are not feasible for this thesis project. Instead, we will use an approach based on bootstrap 23 method which generates B bootstrap samples from the original sample and then calculates the proportion of the bootstrap samples with significant interaction terms. This proportion will be used as a measure for the generalizibility of a predictive classifier to new population, which can also be used as an informal measure to compare two predictive classifiers. Another measure based on hazard ratio of interaction, which is defined as the ratio of treatment hazard ratios between the two classes defined by the predictive classifiers, and associated 95% confidence interval will also be computed using the bootstrap method. These two measures will be used to compare and rank the predictive classifiers developed. We did not reconstruct the classification indices based on the bootstrap samples because our objective is to assess the value of the indices as the predictive classifiers and not to compare the approaches which generated the indices. 24 Chapter 3 Patients and Methods 3.1 NCIC CTG MA.5 Clinical Trial The NCIC Clinical Trials Group (NCIC CTG) MA.5 trial was a randomized multi-institutional trial comparing cyclophosphamide, epirubicin, and fluorouracil (CEF) to cyclophosphamide, methotrexate, and fluorouracil (CMF) in premenopausal women with node-positive breast cancer. A total of 716 women with early stage breast cancer were randomized to receive chemotherapy with CEF (epirubicin 60 mg/m2 and 5-FU 500 mg/m2 both IV days 1 & 8 and cyclophosphamide 75 mg/m2 p.o. days 1-14) or CMF (methotrexate 40 mg/m2 and 5-FU 600 mg/m2 both IV days 1 & 8 and cyclophosphamide 100 mg/m2 p.o. days 1-14) all given in a 28 day cycle. Main eligibility criteria for the MA.5 trial included: 1) pre- or perimenopausal women with node positive breast cancer, 2) histologically proven breast cancer treated with total or partial mastectomy, 3) no distant metastases, 4) no prior hormonal or cytotoxic therapy, 5) LVEF ≥45% 6) Adequate hematological, renal and liver function. The primary endpoint for the MA.5 study was relapse-free survival (RFS) with overall survival (OS) as the secondary endpoint. RFS was defined as time from random assignment to recurrence including local breast chest wall and regional and/or distant recurrence. OS was based on death from any cause (refer TOP II and MA5). During and after the MA.5 trial, pathologists were asked to submit a representative formalin-fixed, paraffin-embedded block of tumor tissue from each woman, 25 or if tumor blocks were unavailable, 20 4-μm unstained sections, to the central office of the NCIC CTG. Paraffin blocks were stored at room temperature, and unstained sections were kept at 4°C. Samples were identified only by an identification number assigned to each patient at randomization. CEF was shown to be superior to CMF and remained superior in terms of relapsefree survival and overall survival after a median follow-up of 10 years113, 114 . As compared with CMF, however, CEF was associated with increased rates of alopecia, nausea, vomiting, stomatitis, and febrile neutropenia; a temporary reduction in the quality of life; and an increase in the risk of congestive heart failure (1.1 percent) and acute leukemia(1.4 percent) 45, 113, 114. 3.2 Data Information on the clinicopathologic factors including age, tumor size, ER status, type of surgery, tumor grade and status of lymph nodes was collected from all patients as part of data collection for the original clinical trial. Tumor tissues were available from 639 of 710 eligible women (90 percent)45, 113, 114 and tissue microarrays (TMAs) were constructed from 531 patients with paraffin embedded specimens. TMA assays were done on the following biomarkers by Dr. Blake Gilks and Dr. David Huntsman at the Genetic Pathology Evaluation Centre (GPEC) of Vancouver General Hospital: HER2, p53, CA9, MEP21 (podo,), clusterin (CLU), pAKT, COX2 and TOP2A. The algorithms to determine these biomarkers as positive or negative were also pre-specified by them. Assay results were reported to the NCIC CTG central office. All the data required for this thesis are maintained by NCIC CTG. 26 3.3 Analysis strategies We first analyzed the prognostic and predictive value for each clinicopathologic factor and biomarker to understand the contribution from each factor and biomarker. Two methods described above, Cox model and SVM methods, were then used to develop prognostic indices based directly on the censored relapse-free or overall survival in MA.5 trial. The prognostic indices developed were verified to see whether they are also significant predictive classifiers. Finally, the predictive powers of the predictive classifiers derived from these two different approaches were evaluated and compared. In the following, we will describe each of these steps in detail. 3.3.1 Analysis of Prognostic and Predictive Values of Each Individual Variable For each individual clinicopathologic factor and biomarker, its prognostic value was assessed using a Cox model with each factor or biomarker as a single covariate and recurrence-free survival and overall survival times as outcome variables. Factors with a p-value <0.05 from the Cox model were considered as having significant prognostic value. Predictive value of each clinicopathologic and biomarker was assessed by evaluating the treatment by factor interaction term in the Cox model by fitting a Cox model with treatment, factor and treatment*factor interaction term as covariates and recurrence-free survival and overall survival times as outcome variables. A factor with a significant p-value <0.05 for the interaction term was considered as having significant predictive value. 27 3.3.2 Development of Prognostic Indices Three different combinations of clinicopathologic factors and biomarkers were explored to develop prognostic indices with the COX model and SVM methods: 1) clinicopathologic factors only, 2) molecular biomarkers assayed with TMA only, and 3) clinicopathologic factors and all molecular biomarkers assayed with TMA combined. This generated three indices for each of the methods: • Index1 (all clinicopathologic factors): age, surgery type, node, grade, tumor size and EST (clinical assessment of ER protein level) • Index2 (all biomarkers): CA9, p53, Podo, CLU, COX2, pAKT, TOP2A, HER2 and ER • Index3 (all clinicopathologic factors and biomarkers combined): age, surgery type, node, grade, size, EST, CA9, p53, Podo, CLU, COX2, pAKT, TOP2A, HER2 and ER Both the recurrence-free survival time and overall survival time were used separately as outcome variables for constructing these indices with COX model and SVM methods. In the following, I give some details for each of these two methods. 3.3.2.1 Cox Model Method With Cox model method, we first fit a Cox proportional hazards model with a set of covariates (as defined above, all clinicopathologic factors only, all biomarkers from TMA assay only or the combination of the two types) with either the recurrence-free survival time or the overall survival time as outcome variable. A series of estimates of regression coefficients β̂ = ( β̂1 , . . . , βˆ p ) for the input covariates were then generated 28 from the fitted Cox model. With these estimated coefficients, a linear risk score was then formed by the index function f (X) = βˆ ' X for each given patient based on their values on all the covariates used to fit the COX model. The mean value of the index scores from all patients is taken as the cut-off C-value. If for a new patient, her risk score calculated based on the index function is greater than or equal to the mean of the index (cut-off Cvalue), she will be classified into the high risk or poor prognostic group and into the low risk or good prognostic group otherwise. 3.3.2.2 Support Vector Machine (SVM) method Supervised learning approach was used for support vector machine method. The training dataset were first defined. In contrast to fitting the COX model, in which all the available samples were used, the training data set used for supervised training of the SVMs included only the extreme samples, i.e., the samples with longer or shorter survival times. The samples with mid-range survival times were excluded from the training process. Descriptive analysis showed that the mean for recurrent free survival time and overall survival time in this study were 6.2 and 7.4 years respectively. Based on these two numbers, the cut-off criteria for the longer or shorter term survivors were defined as follows: 1) If a patient had a recurrence-free survival time or overall survival time less than 5 years and died, then we treated this patient as shorter survivor. 2) On the other hand, if a patient was still alive and had recurrence-free survival time or overall survival time greater than 6 years, then we treated this patient as longer survivor. 29 A two-class SVM with linear kernel function was then trained with supervised learning approach on the training dataset, where class consisted of longer or shorter survivors identified using above criteria. Because missing data was not allowed for support vector machine method, patients with missing data on any input variable were excluded in SVM model. The supervised training of SVM was accomplished using the following code within the e1071 package in R: svm.model <- svm(CLASS ~. , data=trainset, cost=100, gamma=1, kernel="linear") With the trained SVM, a classification rule was then formed and applied to classify any patient into two predicted groups, shorter survivor or longer survivor, based on her values on the input variables (covariates). We assumed these two groups can be regarded as respectively high risk group and low risk group. This prediction classifying process was realized with the following code within the e1071 package in R with the two-class SVM model: svm.pred <- predict(svm.model, testset[, -X]) , where X denotes the position of class variable. X depended on the number of clinical and/or biomarkers in each index. 3.3.3 Verification for the Prognostic Indices as Predictive Classifier for the Treatment To asses whether a prognostic index constructed based on respectively Cox model and SVM methods can also be used as a predictive classifier for prediction of responses to CEF and CMF treatment, we first assess the difference of two treatments in each of the 30 risk groups defined by the index based on log-rank test and Kaplan-Meier plots. Cox proportional hazards models were then fitted using recurrence-free survival time or overall survival time as outcome variables with treatment, risk group (high risk or low risk) defined either by the index, and treatment by risk group interaction terms in the model. An index was considered to have predictive value if the treatment by risk group term was significant at 0.05 level in the fitted COX model. 3.3.4 Validation of Predictive Classifier using Bootstrap Methods In order to assess the usefulness of a predictive classifier in the prediction of treatment responses for future patients, validation using bootstrap approach was adopted for this project. With this approach, 1,000 bootstrap samples from the original sample were first generated. Cox proportional hazard models were then fitted with treatment, risk group and treatment by risk group interaction terms in the model using either recurrencefree survival or overall survival time as out come variable. The proportion of the bootstrap samples with significant interaction terms were calculated for each index. In addition, with the bootstrap samples, the mean and the confidence interval for the difference in the logarithms of interaction hazard ratios (Ratio of Hazard Ratio, RHR) between the two risk groups classified by the index were also computed. 31 Chapter 4 Results 4.1 Data Sets Data on clinical and pathologic factors in the MA.5 clinical trial were available for a total of 716 participating patients, including age, surgical type (Total vs Partial Mastectomy), tumor size (T1, T2, T3), tumor grade (1, 2, 3), number of positive lymph nodes and EST (clinical assessment of ER protein levels, i.e., local ER measurements from each center [ ≥ 10, <10 fmol protein/mg tissue]). Data on biomarkers from Tissue Microarrays (TMAs) profiling were available for 531 patients. Biomarkers that were assessed in this trial include CA9, p53, Podo, CLU, COX2, pAKT, TOP2A, HER2 and ER (estrogen receptor status). 4.2 Summary of Baseline Characteristics The distributions of baseline characteristic on the 531 patients included in the analysis were summarized in Table 1. Among the 531 patients included, 255 patients were treated with CEF and 276 with CMF. 270 (50.8%) of these patients had a recurrence of breast cancer and 213 died during the trial period. The average age of all included was 41.9 years old with no difference between patients in the two treatment groups. The distributions of surgery type, tumor grade, tumor size and number of positive nodes in the two treatment groups were similar. The percentages of subjects with data available on each biomarker were also similar between the two treatment groups. Analyses for the prognostic and predictive effects of HER2 and TOP2A were reported 32 previously but the published results on HER2 were based on the data from RT-PCR quantification. HER2 level was assayed using TMA in this analysis and data were available on a total of 434 patients, among which 57 were HER2 amplified, of whom 23 were in CEF group and 34 in CMF arm. Among the 423 subjects with TOP2A tested, 24 patients had TOP2A deleted and 53 were TOP2A amplified with no significant differences among the CEF and CMF treatment groups. 33 Table 1: Baseline Characteristics of Patients Age (years) Surgery type, No. (%) Grade, No. (%) Tumor Size, No. (%) Node positive, No. (%) EST, No. (%) HER2 status, No. (%) CA9, No. (%) COX2, No. (%) Mean ± SD Partial mastectomy Mastectomy 1 2 CEF (N=255) 41.6 ± 11.0 129 (50.6%) CMF (N=276) 42.3 ±9 .9 140 (50.7%) Total (N=531) 41.9±10.5 269(50.7%) 126 (49.4%) 27 (10.6%) 80 (31.4%) 136 (49.3%) 28 (10.1% ) 86 (31.2% ) 3 143 (56.1%) 152 (55.1% ) missing T1 T2 T3 missing 0 5 (1.9%) 92 (36.1%) 134 (52.6%) 9 (3.5%) 20 (7.8%) 0 (0.0%) 10 (3.6) 110 (39.9%) 135 (48.9%) 16 (5.8%) 15 (5.4%) 0 (0.0%) 262 (49.3%) 55 (10.4% ) 166 (31.3% ) 295 (55.6% ) 15 (2.8%) 202(38.0%) 269 50.7%) 125 (4.7%) 35 (6.6%) 0 (0.0%) 1<=Node<=3 >3 <=10 >10 missing Amplified 154 (60.4%) 101 (39.6%) 67 (26.3%) 150 (58.8%) 38 (14.9%) 23 (9.0%) 166 (60.1%) 110 (39.9%) 71 (25.7%) 160 (58.0%) 45 (16.3%) 34 (12.3%) 320 (60.3%) 211 (39.7%) 138(26.0%) 310 (58.4%) 83 (15.6%) 57 (10.7%) Non-amplified Missing 0 1 3 Missing 0 1 missing 184 (72.1%) 48 (18.8%) 203 (79.6%) 10 (3.9%) 28 (11.0%) 14 (5.5%) 144 (56.5%) 93 (36.5%) 18 (7.0%) 193 (69.9%) 49 (17.7%) 213 (77.2%) 21 (7.6%) 27 (9.8%) 15 (5.4%) 146 (52.9%) 106 (38.4%) 26 (8.7%) 377 (71.0%) 97 (18.3%) 416 (78.3%) 31 (5.8%) 55 (10.4%) 29 (5.5%) 290 (54.6%) 199(37.5%) 44 (8.3%) 34 Table 1 (continued): Baseline Characteristics of Patients CEF CMF (N=255) (N=276) P53, No. (%) 0 191(74.9%) 192 (69.6%) 1 26(10.2%) 28 (10.1%) 3 17 (6.7%) 33 (12.0%) missing 21 (8.2%) 23 (8.3%) PODO, No. (%) 0 23 (9.0%) 16 (5.8%) 1 137 (53.7%) 138 (50.0%) 2 60 (23.5%) 77 (27.9%) 3 10 (3.9%) 16 (5.8%) missing 25 (9.8%) 29 (10.5%) CLU, No. (%) 0 21 (8.2%) 22 (8.0%) 1 161 (63.1%) 153 (55.4%) 3 51 (20.0%) 71 (25.7%) Missing 22 (8.6%) 30 (10.8%) PAKT, No. (%) 0 4 (1.6%) 4 (1.5%) 1 86 (33.7%) 101 (36.6%) 2 118 (46.3%) 115 (41.7%) 3 11 (4.3%) 16 (5.8%) missing 36 (14.1%) 40 (14.5%) TOP2A, No. (%) 0 161 (63.1%) 185 (67.0%) 1 11 (4.3%) 13 (4.7%) 2 32 (12.5%) 21 (7.6%) missing 51 (20%) 57 (20.6%) ER LEVEL, No. (%) 0 119 (46.6%) 132 (47.8%) 1 71 (27.8%) 63 (22.8%) 2 37 (14.5%) 43 (15.6%) 3 16 (6.3%) 18 (6.5%) missing 12 (4.7%) 20 (7.2%) Total (N=531) 383 (72.1%) 54 (10.2%) 50(9.4%) 44 (8.3%) 39(7.3%) 275 (51.8%) 137(25.8%) 26 (4.9%) 54 (10.1%) 43(8.1%) 314 (59.1%) 122(23.0%) 52 (9.8%) 8(1.5%) 187 (35.2%) 233(43.9%) 27 (5.1%) 76 (14.3%) 346 (65.2%) 24 (4.5%) 53 (10.0%) 108 (20.3%) 251 (47.3%) 134 (25.2%) 80 (15.1%) 34 (6.4%) 32 (6.0%) 4.3 Prognostic and Predictive Values of Each Individual Variable Table 2 lists all the p-values for assessment of both prognostic and predictive values for all the factors and biomarkers included in this study. Five clinicopathologic factors including age, surgery type, number of positive lymph node, grade and tumor size and three biomarkers including HER2, P53 and ER were found to be of significant prognostic factors when recurrence-free survival was used 35 as outcome variable. When assessing prognostic value using overall survival, in addition to the above clinical and biomarkers, clinicopathologic factor EST and biomarker PODO were also found to have significant prognostic values. No clinicopathologic factor was found to have predictive value when either recurrence-free survival time or overall survival time was used as a response variable. HER2 was found to have a statistically significant interaction with treatment with both recurrence-free survival time (p=0.0154) and overall survival time (p=0.0409) being as the outcome variable, indicating that HER2 has strong predictive value in predicting treatment response differences between the CEF and CMF treatment groups. The interaction between TOP2A and treatment was also found to be statistically significant (p=0.0277) when the overall survival was the response variable and only a trend toward significant (p=0.0904) was found with recurrence-free survival time as outcome variable. 36 Table 2: Analysis of Single Factor and Biomarker as Prognostic and Predictive Factors (p-value of treatment or interaction effects from Cox model) Recurrence-free Survival _____________________ Prognostic Predictive Overall Survival _______________________ Prognostic Predictive Clinical factors Age 0.0178 0.7060 0.0915 0.1359 Surgery type 0.0008 0.8166 <.0001 0.9889 EST 0.5642 0.7907 0.0088 0.3598 Node <.0001 0.5734 <.0001 0.7727 Grade <.0001 0.5704 <.0001 0.3757 Size <.0001 0.3702 <.0001 0.8922 0.0119 0.0154 0.0004 0.0409 CA9 0.8765 0.5368 0.1575 0.2136 P53 0.0297 0.8231 <.0001 0.3005 PODO 0.8421 0.3801 0.0424 0.4718 CLU 0.2079 0.5298 0.1074 0.1306 ER 0.0016 0.1712 <.0001 0.8130 COX2 0.7117 0.4685 1.0000 0.5409 PAKT 0.4427 0.5444 0.5895 0.9181 TOP2A 0.4126 0.0904 0.0672 0.0277 Biomarkers HER2 37 4.4 Prognostic Indices 4.4.1 Classification Rules based on COX Model Method The following are the three index functions constructed based on Cox model using recurrence-free survival time as outcome variable: 1) Index1 = (-0.01654)*age + 0.33840*surgtype + 0.43112*grade + 0.08767*node + 0.50894*size + 0.0002281*EST 2) Index2 = ( -0.12721)*CA9 + 0.00759*P53 + ( -0.08475)*PODO + 0.09723*CLU + ( -0.09908)*COX2 + ( 0.01352)*PAKT+ ( -0.07040)*TOP2A + 0.34702*HER2 + ( -0.23602)*ER 3) Index3 = (-0.01396)*age + 0.54277*surgtype + 0.22260*grade + 0.09461*node + 0.34215*size + (0.14113)*ER + 0.0009667*EST + ( -0.04800)*CA9 + 0.06618*P53 + ( -0.19417)*PODO + ( -0.01023)*CLU + ( -0.12836)*COX2 + ( 0.05839)*PAKT + ( -0.13051)*TOP2A + 0.28897*HER2 The three index functions constructed using overall survival time as outcome variable are as following: 4) Index1 = (-0.01345)*age + 0.46177*surgtype + 0.75219*grade + 0.07922*node + 0.55648*size + (0.00215)*EST 5) Index2 = ( -0.08191)*CA9 + 0.08725*P53 + ( 0.07409)*PODO + 0.16671*CLU + ( -0.08720)*COX2 + ( -0.08785)*PAKT+( 0.01080)*TOP2A + 0.50923*HER2 + (-0.32309)*ER 6) Index3 = ( -0.01242)*age + 0.63033*surgtype + 0.40067*grade + 0.08967*node + 0.43195*size + 0.0005004*EST + ( 0.03415)*CA9 + 0.16667*P53 + ( -0.06177)*PODO + ( 0.05415)*CLU + ( -0.00332)*COX2 + ( 0.00217)*PAKT + ( -0.08976)*TOP2A + 0.49786*HER2 + ( -0.15314)*ER Table 3.1 summarized the cut-off C-value for each index with the recurrence-free survival time as outcome variable and the number and percentage of patients classified into high and low risk groups. For Index 1, which was constructed with only the clinicopathologic factors being used, 40.1% of patients were classified into high risk group and 59.9% in good (low risk) prognostic group. With Index 2 developed based on only biomarkers, 174 (32.8%) and 357 (67.2%) patients were classified into poor and good prognostic group respectively. When using all clinicopathologic factors and 38 biomarkers in Index 3, there were about 137 (25.8%) patients assigned to the poor prognostic group and 394 (74.2%) to the good prognostic group. Results are similar when overall survival was used as outcome variable (Table 3.2). For comparison purposes, we also presented results for indices developed based on HER2 and TOP2A only and their combination. It seems fewer patients were classified into the high risk group if these two biomarkers were used alone or in combination. 39 Table 3.1: Summary of Three Indices for Recurrence-free Survival based on COX Model method ____Low Risk Group____ CEF CMF Total 151 167 318 (28.4) (31.6%) (59.9%) Index 1 1.74 ____High Risk Group___ CEF CMF Total 104 109 213 (19.5%) (20.5%) (40.1%) Index2 -0.20 171 185 356 84 91 175 (15.8%) (17.1%) (32.9%) (32.2%) (34.8%) (67.0%) Index3 0.92 59 76 135 196 200 396 (11.1%) (14.3%) (25.4%) (36.9%) (37.6%) (74.6%) HER2 0.06 23 (4.33%) 34 (6.4%) 57 232 242 474 (10.7%) (43.7%) (45.6%) (89.3%) TOP2A 0.02 43 (8.1%) 34 (6.4%) 77 212 242 454 (14.5%) (39.9%) (45.6%) (85.5%) HER2+TOP2A 0.06 30 21 (3.95%) (5.65%) Index Cut-off C-value 40 51 (9.6%) 234 246 480 (44.1%) (46.3%) (90.4%) Table 3.2: Summary of Three Indices for Overall Survival based COX model Method Index Index 1 Index2 Index3 HER2 TOP2A HER2+TOP2A ___High Risk Group____ CEF CMF Total ___Low Risk Group___ CEF CMF Total 2.62 112 (21.1%) 115 (21.6%) 227 (42.7%) 143 (26.9%) 161 (30.4%) 304 (57.3%) -0.015 84 (15.8%) 100 (18.8%) 184 (34.7%) 171 (32.2%) 176 (33.2%) 347 (65.3%) 64 84 (15.8%) 148 (27.9%) 191 (36.0%) 192 (36.1%) 383 (72.1%) 0.09 (12.1%) 23 (4.3%) 34 (6.4%) 57 (10.7%) 232 (43.7%) 242 (45.6%) 474 (89.3%) 0.06 43 (8.1%) 34 (6.4%) 77 (14.5%) 212 (39.9%) 242 (45.6%) 454 (85.5%) 0.08 21 (3.9%) 30 (5.6%) 51 (9.6%) 234 (44.1%) 246 (46.3%) 480 (90.4%) Cutoff Cvalue 1.93 41 4.4.2 Classification Rules based on SVM Method The numbers and percentages of patients who were classified into lower/higher risk groups based on prognostic indices developed with SVM method are presented in Table 4.1 for recurrence-free survival and in Table 4.2 for overall survival. Numbers and percentages are also displayed for each treatment group. In these two tables, SVM1, SVM2 and SVM3 represent indices developed by SVM method with clinicopathologic factors, only biomarkers, and all clinicopathologic factors and biomarkers respectively. Compared to the classification results based on COX model method, smaller numbers of patients were classified into poor (low risk) group under SVM approach. 42 Table 4.1: Summary of Classification Results for Recurrence-free Survival based on SVM method ______Low Risk Group______ CEF CMF Total SVM1 ______High Risk Group_____ CEF CMF Total 67 69 136 (15.6%) (16.0%) (31.6%) 144 (33.4%) 151 (35.0%) 295 (68.4%) SVM2 20 (5.7%) 28 (7.9%) 48 (13.6%) 143 (40.6%) 161 (45.8%) 304 (86.4%) SVM3 43 (15.0%) 60 (20.9%) 103 (36.0%) 87 (30.4%) 96 (33.6%) 183 (64.0%) HER2+TOP2A 21 (5.6%) 30 (8.0%) 51 (13.6%) 156 (41.6%) 168 (44.8%) 324 (86.4%) 43 Table 4.2: Summary of Classification Results for Overall Survival based on SVM Method ______Low Risk Group______ CEF CMF Total SVM1 ______High Risk Group_____ CEF CMF Total 50 55 105 (11.6%) (12.8%) (24.4%) 161 (37.3%) 165 (38.3%) 326 (75.6%) SVM2 20 (5.7%) 28 (7.9%) 48 (13.6%) 143 (40.6%) 161 (45.8%) 304 (86.4%) SVM3 28 (9.8%) 37 (12.9%) 65 (22.7%) 102 (35.7%) 109 (41.6%) 221 (77.3%) HER2+TOP2A 21 (5.6%) 30 (8.0%) 51 (13.6%) 156 (41.6%) 168 (44.8%) 324 (86.4%) 44 4.5 Prognostic Indices as Predictive Classifiers 4.5.1 Treatment Effects in High and Low Risk Groups Defined by Prognostic Indices To check whether the prognostic indices developed above can be considered also as predictive classifiers, we first studied the difference in treatment with CEF and CMF for each of the two risk groups defined by a given prognostic index to see whether treatment effects are different for patients in these two groups. Table 5.1~5.4 and KaplanMeier survival curves in Figure 1~6 summarize the results of these investigation results for all prognostic indices developed based on Cox model and SVM methods. When only clinicopathologic factors were included, no significant treatment difference between CEF and CMF was found in low risk groups defined by the prognostic indices developed based on either Cox model or by SVM method. This can be seen also from Kaplan-Meier survival curves (graphs b and d of Figure 1 and 4), which showed that the CEF and CMF curves were very close although the CEF curve was always above the CMF curve. For the high risk group defined by these indices, although the difference between two treatments did not reach significance, patients treated by CEF had apparently better response to those treated by CMF in terms of either recurrence-free survival or overall survival. It was worth noticing that index developed based on SVM performed better than that based on Cox model method since p-value for the treatment difference almost reached significance (p=0.0592). 45 When all biomarkers were used, the results for low risk groups defined by prognostic indices developed by Cox model and SVM methods were very similar to that with only clinicopathologic factors. For the high risk group defined by the SVM prognostic index, patients treated by CEF had a significant better relapse-free survival (p=0.0396) and marginally better overall survival (p=0.0742) than those treated by CMF. In contrast, no significant treatment difference was found for patients in the high risk group defined by the prognostic index developed based on Cox model method. This shows an apparent better performance of SVM method to the COX model method. We have hypothesized that combination of clinicopathologic factors and biomarkers may improve performance of the predictive classifiers based on only clinicopathologic factors or biomarkers. But, our hypothesis was not confirmed in this study. Actually, unexpected results were observed, which showed that patients in the low risk group, instead of high risk group, defined by the SVM prognostic index had a significantly different treatment effect in relapse free survival (p=0.0482). For comparison purposes, results for indices developed based on HER2 and TOP2A alone and their combination are also presented in tables and figures. Consistent with results published, no significant difference between treatments was found for patients in the low risk group defined by these indices, where significant difference in both relapse free and overall survivals was found for patients in the high risk group defined by either HER2 or TOP2A alone. For the index defined by combination of HER2 and TOP2A, significant difference was found on relapse free survival (p=0.0386) and only marginally significant difference on overall survival. 46 Table 5.1 Summary of Treatment Effect (CEF/CMF) in Recurrence-Free Survival for Good Prognostic (Low Risk) Group Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2 = CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 Cox Model Hazard ratios P-value (95%CI) for treatment 0.2775 0.83 (0.59, 1.17) SVM Hazard ratios P-value (95%CI) for treatment 0.1817 0.78 (0.54, 1.12) 0.2257 0.83 (0.61, 1.12) 0.8114 0.96 (0.70, 1.33) 0.2174 0.83 (0.62, 1.12) 0.0482 0.62 (0.38, 0.99) 0.5607 0.93 (0.72, 1.20) 0.91 (0.70, 1.18) 0.90 (0.70, 1.16) 0.9013 0.98 (0.72, 1.34) TOP2A 0.4728 HER2+TOP2A 0.4132 47 Table 5.2 Summary of Treatment Effect (CEF/CMF) in Recurrence-Free Survival for Poor Prognostic (Good Risk) Groups Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2 = CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 Cox Model Hazard ratios P-value (95%CI) for treatment 0.2145 0.81 (0.58, 1.13) SVM Hazard ratios P-value (95%CI) for treatment 0.0592 0.68 (0.46, 1.02) 0.3592 0.83 (0.56, 1.23) 0.0396 0.44 (0.20, 0.96) 0.6711 0.92 (0.61, 1.38) 0.7617 0.93 (0.59, 1.45) 0.0111 0.37 (0.18, 0.81) 0.48 (0.26, 0.88) 0.45 (0.21, 0.96) 0.0386 0.45 (0.21, 0.96) TOP2A 0.0173 HER2+TOP2A 0.0386 48 Table 5.3 Summary of Treatment Effect (CEF/CMF) in Overall Survival for Good Prognostic (Low Risk) Group Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2 = CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 Cox Model Hazard ratios P-value (95%CI) for treatment 0.726 1.08 (0.70, 1.66) SVM Hazard ratios P-value (95%CI) for treatment 0.1800 0.76 (0.50, 1.14) 0.6017 0.91 (0.64, 1.30) 0.5710 1.11 (0.77, 1.60) 0.6415 0.92 (0.65, 1.30) 0.0906 0.65 (0.39, 1.07) 0.9277 0.98 (0.73, 1.32) 0.99 ( 0.74, 1.34) 0.96 (0.72, 1.28) 0.5157 1.12 (0.79, 1.60) TOP2A 0.9716 HER2+TOP2A 0.7706 49 Table 5.4 Summary of Treatment Effect (CEF/CMF) in Overall Survival for Poor Prognostic (High Risk) Group Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2 = CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 Cox Model Hazard ratios P-value (95%CI) for treatment 0.048 0.70 (0.50, 0.99) SVM Hazard ratios P-value (95%CI) for treatment 0.0615 0.65 (0.41, 1.02) 0.4779 0.86 (0.57, 1.30) 0.0742 0.48 (0.22, 1.07) 0.5837 0.89 (0.58, 1.37) 0.7206 0.90 (0.51, 1.59) 0.0349 0.45 (0.21, 0.95) 0.41 (0.22, 0.79) 0.51 (0.24, 1.09) 0.0847 0.51 (0.24, 1.10) TOP2A 0.0074 HER2+TOP2A 0.0847 50 Figure 1: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic factors [a] [b] 51 Figure 1: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic factors (continued) [c] [d] 52 Figure 2: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Biomarkers [ a] [b] 53 Figure 2: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Biomarkers (continued) [c] [d] 54 Figure 3: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic Factors and Biomarkers [a] [b] 55 Figure 3: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic Factors and Biomarkers (continued) [c] [d] 56 Figure 4: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic factors [a] [b] 57 Figure 4: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic factors (continued) [c] [d] 58 Figure 5: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Biomarkers [a] [b] 59 Figure 5: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Biomarkers (continued) [c] [d] 60 Figure 6: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic factors and Biomarker [a] [b] 61 Figure 6: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic Indices Developed based on COX model and SVM methods and All Clininopathologic factors and Biomarkers (continued) [c] [d] 62 Figure 7: Recurrence free survival curves by COX model and SVM approach – HER2/TOP2A [a] [b] 63 Figure 7: Recurrence free survival curves by COX model and SVM approach – HER2/TOP2A (continued) [c] [d] 64 Figure 8: Overall survival curves by COX model and SVM approach – HER2/TOP2A [a] [b] 65 Figure 8: Overall survival curves by COX model and SVM approach – HER2/TOP2A (continued) [c] [d] 66 Figure 9: Recurrence free and overall survival curves by COX model – HER2 [a] [b] 67 Figure 9: Recurrence free and overall survival curves by COX model – HER2 (continued) [c] [d] 68 Figure 10: Recurrence free and overall survival curves by COX model – TOP2A [a] [b] 69 Figure 10: Recurrence free and overall survival curves by COX model – TOP2A (continued) [c] [d] 70 4.5.2 Hazard Ratio and Confidence Interval for Interaction between Treatment and Prognostic Index To formally confirm whether a prognostic index is also a predictive classifier, the hazard ratios and associated 95% confidence intervals for the interaction between treatment and risk group defined by prognostic index along with the p-values are presented respectively in Table 5.5 with recurrence-free survival time as outcome and Table 5.6 with overall survival time as outcome. Consistent with observations made above based on treatment effects in respect to risk groups and Kaplan-Meier curves, only prognostic index developed based on SVM method and all biomarkers had marginally significant p-values for interaction between treatment and risk groups in terms of either relapse-free survival (p-value=0.0527, hazard ratio=2.32) or overall survival (pvalue=0.0573, hazard ratio=2.34). In contrast, the p-value of interaction was significant for both recurrence-free and overall survivals when either HER2 or TOP2A was used alone and marginally significant when they were combined. 71 Table 5.5: Summary of Predictive Results in Recurrence-free Survival Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2 = CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 Cox Model SVM Hazard P-value of Hazard P-value of Ratio for treatment Ratio for treatment by index Interaction* by index Interaction* Term interaction Term interaction (95% CI) (95% CI) 0.9480 1.02 0.5939 1.16 (0.63, 1.64) (0.68, 1.98) 0.9955 1.01 (0.61, 1.64) 0.0527 2.32 (0.99, 5.43) 0.6691 0.90 (0.54,1.49) 0.3448 0.73 (0.38,1.41) 0.0196 2.51 (1.16, 5.45) TOP2A 0.0386 2.0 (1.04, 3.86) HER2+TOP2A 0.0623 2.12 0.0438 (0.96, 4.68) *Ratio of the treatment hazard ratios between the two classes defined. 72 2.30 (1.02, 5.17) Table 5.6: Summary of prognostic and predictive indices (overall survival) Cox Model Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2 = CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 SVM P-value of treatment by index interaction 0.1262 Hazard Ratio for Interaction Term* (95% CI) 1.54 (0.89, 2.68) P-value of treatment by index interaction 0.5745 Hazard Ratio for Interaction Term* (95% CI) 1.19 (0.65, 2.19) 0.8357 1.06 (0.61, 1.83) 0.0573 2.34 (0.97, 5.61) 0.9261 1.03 (0.59, 1.79) 0.3646 0.71 (0.33, 1.50) 0.0459 2.25 (1.01, 4.98) 2.51 (1.24, 5.07) 1.95 (0.87, 4.38) 0.0540 2.27 (0.99, 5.22) TOP2A 0.0108 HER2+TOP2A 0.1055 *Ratio of the treatment hazard ratios between the two classes defined. 73 4.6 Validation of Predictive Classifier using Bootstrap Methods Although no significant predictive classifier was found by either Cox model or SVM method for all clinicopathologic factors, biomarkers or their combination, we still evaluated the performance of each classifier based on proportion of significant interaction p-values and interaction hazard ratio and associated confidence intervals estimated from bootstrap samples. Summary of the proportion of significant interaction p-values were presented in Table 6.1 for recurrence-free survival and Table 6.2 for overall survival. We can see from these two tables that the proportions were quite low for prognostic index developed by Cox model method: all were lower than 10% except for index based on all clinicopathologic factors with overall survival as outcome variable. Classifiers based on SVM method performed better than these based on COX model method in all cases except in the same case where Cox model had higher than 10% proportion as mentioned before. SVM method performed best when either all biomarkers or only combination of HER2 and TOP2A were used but it was not superior to the classifiers based on HER2 and TOP2A alone. Tables 7.1 and 7.2 present the mean and the confidence interval for the hazard ratio of interaction between treatment and risk groups defined by the prognostic indices developed based on Cox model and SVM method respectively for recurrence-free survival and overall survival. The conclusions from these results were the same with what were observed above based on proportion of significant p-values. 74 Table 6.1: Summary for Proportion of Significant Interaction between Treatment and Risk Group Estimated by Bootstrap Method for Recurrence-Free Survival Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2= CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Cox Model SVM 6.3% 9.5% 5.2% 50.3% 6.9% 25.2% HER2 65.7% TOP2A 50.9% HER2+TOP2A 49.5% 75 54.5% Table 6.2: Summary for Proportion of Significant Interaction between Treatment and Risk Group Estimated by Bootstrap Method for Overall Survival Classifiers Index1/SVM1 = age surgtype grade node size EST Index2 /SVM2= CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Index3 /SVM3= age surgtype grade node size EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER Cox Model SVM 35.3% 12.6% 6.2% 46.2% 5.7% 13.5% HER2 54.5% TOP2A 75.3% HER2+TOP2A 36.7% 76 46.2% Table 7.1: Summary of Hazard Ratio of Interaction in Recurrence-free survival) and 95% CI Estimated by Bootstrap Method Cox Model Classifiers SVM Index1/SVM1 = age surgtype grade node size 1.01 (0.99-1.03) 1.17 (1.15-1.20) EST Index2 /SVM2= CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER 1.01(0.99-1.02) 2.34 (2.25-2.42) Index3 /SVM3= age surgtype grade node size 0.90 (0.88-0.91) 0.66 (0.64-0.68) EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 2.51 (2.44, 2.56) TOP2A 1.95 (1.91, 1.99) HER2+TOP2A 2.17 (2.12, 2.23) 77 2.31 (2.23, 2.39) Table 7.2: Summary of Hazard Ratio of Interaction in Recurrence-free survival) and 95% CI Estimated by Bootstrap Method Cox Model Classifiers Index1/SVM1 = age surgtype grade node size 1.57 (1.54-1.60) EST Index2 /SVM2= CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER 1.04 (1.02-1.06) Index3 /SVM3= age surgtype grade node size 1.02 (1.00-1.04) EST CA9 P53 PODO CLU COX2 PAKT TOP2A HER2 ER HER2 2.27 (2.22, 2.33) 2.53 (2.48, 2.59) 1.97 (1.92, 2.02) TOP2A HER2+TOP2A 78 SVM 1.21 ( 1.181.24) 2.33 (2.24, 2.42) 0.71 (0.69, 0.75) 2.25 (2.17, 2.33) Chapter 5 Conclusions and Discussion As suggested in other studies96, 97 on breast cancer and other types of cancer, we hypothesized that the combination of immunohistochemical information from biomarkers assessed by Tissue Microarrays (TMAs) with data on clinical and pathophysiological factors may provide a better classifier in predicting response to chemotherapy than using data on biomarkers or clinicopathologic factors alone. We have explored two different methods, a traditional COX model method and a new machine learning method, SVM, for development of classifiers based on combination of the data. Our hypothesis was, however, not confirmed by the analysis of data from NCIC CTG MA.5, a clinical trial comparing CEF vs. CMF in the treatment of early breast cancer. The only significant predictive classifier we identified was based on two single biomarkers alone. Even classifiers developed based on combination of these two biomarkers were not better than that based on these two biomarkers alone. There could be several explanations for these findings. It could be that two biomarkers, HER2 and TOP2A, are just too strong for prediction of response to these particular treatments. Therefore, adding information on other biomarkers and clinicopathologic factors, none of them proven as being predictive alone, would only dilute their effects. Another reason may be the statistical approach we used. As mentioned before, we chose to find a prognostic index first and then to verify whether it is a significant predictive classifier. A direct approach to identify predictive classifiers 79 which take into account the biomarkers and clinicopathologic factors may yield different results but this kind of approach has not been yet available in statistical literature. The biological and clinical significance of HER2 and TOP2A, especially HER2, for treatment decision making in the therapy for early breast cancer is well established both in work done by others and described in previous publications and by us in this study. HER2 amplification is however quite rare (around 10% in this study and variably described between 10-25% in different series). Although HER2 has been shown to be the most reliable predictor for response to CEF treatment, this biomarker is limited to a small group of breast cancer patients. The importance of the identification of more sensitive predictive classifiers becomes apparent. The results from two statistical methods used to develop predictive classifiers demonstrated that the SVM method outperformed the COX model method. When using the bootstrap method to evaluate the performance of each classifier based on proportion of significant interaction p-values, the estimation of the interaction hazard ratio and associated confidence intervals, classifiers developed with SVM method were demonstrated to be more stable. The proportion of significant p-values was higher for the classifiers developed based on SVM method except in one case when overall survival was the outcome variable and only clinicopathologic data were used. One of the strengths of this project is to use clinical trial data to test our hypothesis. Data from randomized controlled clinical trials, such as NCI-CTG MA.5, provide highest level of evidence in evaluating treatments115. The prospective nature of clinical trials gives the researchers control over the data collection, which is superior to 80 retrospective studies in terms of completeness, integrity and quality of the data. In addition, the randomization in the clinical trial protect against bias in the allocation of subjects to the different study groups, thus eliminating systematic differences between study groups. The other strength of this thesis is that we used two different types of methods to identify predictive classifiers. Therefore, we are assured that negative conclusion we had was not due only to the statistical method we used. Using different methods also offered an opportunity to compare and contrast these two methods. Our study has several potential limitations as listed below: (1). Potential measurement errors of the data: TMA data may be subject to errors in the measurement of true expression level of a gene in a tissue. Construction of TMAs is complex and relies on accurate identification of appropriate tumor tissue. In addition the distribution of individual molecular markers within the malignant tissue is often quite heterogeneous and, thus, part of tissue used for construction of TMAs may not properly represent the biological features of the entire tumor cell population; (2). Missing tumor tissue and assessments: Among the 710 patients who participated in the trial, tumor tissue were only available for 639 individuals and among them, only 531 had results available from tissue microarrays (TMAs). This may limit the generalization of our results to patients who participated in MA.5. Further to this, not all markers worked on all patient samples. The smaller sample size may limit the power of our study to detect significant interaction, which in general requires a large sample size; 81 (3). Generalizability of results: This study used data only from one clinical trial, NCIC-CTG MA.5, which tested CEF in pre-menopausal women with early breast cancer. It is difficult to say whether the results from this study can be generalized to other treatments or other cancer types. In fact, many studies in other areas have shown that combination of traditional clinical data with gene expression data provided better prediction in treatments95-97; (4). Specific cut-off value in Cox model: When classifying patient into different risk group using COX model, different cut-off values used may yield different results. In this study, the mean value of the index scores from all patients is taken as the cut-off Cvalue. Other cut-off values, such as weighted mean and median of the index scores from all patients can also be used, which may result different conclusions; (5). Use of only extreme survival times in SVM method: For SVM approach we adopt the strategy proposed by Liu et al 109, using the only patients with extreme survival times. The training dataset consisted of only two types of extreme patients: short-term survivors and long-term survivors. The reason for not considering patients whose survival times were not extreme during the training steps is that the sharp contrast between the short-term and long-term survivors may be more informative and reliable for building and understanding the relation between the covariates and outcomes, while considering other patients sample would bring in more noise and confusion signals. In this study, if a patient had a recurrence-free survival time or overall survival time less than 5 years and died, then classify this patient as shorter survivor; if a patient was still alive and had recurrence-free survival time or overall survival time greater than 6 years, then we treated 82 this patient as longer survivor. This is somewhat arbitrary and some other cut-off values may lead different results. In summary, the results in this project confirmed the strong predictive effects of the previously identified biomarkers HER2 and TOP2A. When using either of them alone for the prediction of response to CEF treatment in early breast cancer, there remained approximately 30% of cases where inappropriate decisions might be made. Identification of more reliable predictive classifiers, by new genetic investigations or through new statistical techniques, is an interesting and open area for further research. 83 References 1. Aubele,M. & Werner,M. Heterogeneity in breast cancer and the problem of relevance of findings. Anal. Cell Pathol. 19, 53-58 (1999). 2. Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group. Lancet 352, 930942 (1998). 3. Tamoxifen for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group. Lancet 351, 1451-1467 (1998). 4. Williams,C. et al. Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy. Health Technol. Assess. 10, iii-xi, 1 (2006). 5. Robain,M. et al. Predictive factors of response to first-line chemotherapy in 1426 women with metastatic breast cancer. Eur. J. Cancer 36, 2301-2312 (2000). 6. Chang,J. et al. Survival of patients with metastatic breast carcinoma: importance of prognostic markers of the primary tumor. Cancer 97, 545-553 (2003). 84 7. Isaacs,C., Stearns,V., & Hayes,D.F. New prognostic factors for breast cancer recurrence. Semin. Oncol. 28, 53-67 (2001). 8. Jemal,A., Thomas,A., Murray,T., & Thun,M. Cancer statistics, 2002. CA Cancer J. Clin. 52, 23-47 (2002). 9. 10. Jemal,A. et al. Cancer statistics, 2007. CA Cancer J. Clin. 57, 43-66 (2007). Canadian Cancer Statistics 2007. (http://www.cancer.ca/vgn/images/portal/ cit_86751114/36/15/1816216925cw_2007stats_en.pdf ). 2007. 11. DOH 2002 NHS Cancer Screening Programmes.[Cancer Screening Programmes]. URL:www.cancerscreening.nhs.uk. Accessed 6 September, 2002. 2002. 12. Bray,F., Sankila,R., Ferlay,J., & Parkin,D.M. Estimates of cancer incidence and mortality in Europe in 1995. Eur. J. Cancer 38, 99-166 (2002). 13. American Cancer Society. [Breast cancer facts and figures]. URL: www.cancer.org. Accessed 6 September 2002. 2002. 14. Goldhirsch,A., Glick,J.H., Gelber,R.D., Coates,A.S., & Senn,H.J. Meeting highlights: International Consensus Panel on the Treatment of Primary Breast Cancer. Seventh International Conference on Adjuvant Therapy of Primary Breast Cancer. J. Clin. Oncol. 19, 3817-3827 (2001). 85 15. Eifel,P. et al. National Institutes of Health Consensus Development Conference Statement: adjuvant therapy for breast cancer, November 1-3, 2000. J. Natl. Cancer Inst. 93, 979-989 (2001). 16. Adolfsson,J. & Steineck,G. Prognostic and treatment-predictive factors-is there a difference? Prostate Cancer Prostatic. Dis. 3, 265-268 (2000). 17. Pusztai,L. & Hess,K.R. Clinical trial design for microarray predictive marker discovery and assessment. Ann Oncol. 15, 1731-1737 (2004). 18. Steineck,G., Adolfsson,J., Scher,H.I., & Whitmore,W.F., Jr. Distinguishing prognostic and treatment-predictive information for localized prostate cancer. Urology 45, 610-615 (1995). 19. Aapro,M.S. Adjuvant therapy of primary breast cancer: a review of key findings from the 7th international conference, St. Gallen, February 2001. Oncologist. 6, 376-385 (2001). 20. Perou,C.M. et al. Molecular portraits of human breast tumours. Nature 406, 747-752 (2000). 21. Kononen,J. et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat. Med. 4, 844-847 (1998). 86 22. Aaltomaa,S. et al. Expression of Ki-67, cyclin D1 and apoptosis markers correlated with survival in prostate cancer patients treated by radical prostatectomy. Anticancer Res. 26, 4873-4878 (2006). 23. Sebastiani,V. et al. Tissue microarray analysis of FAS, Bcl-2, Bcl-x, ER, PgR, Hsp60, p53 and Her2-neu in breast carcinoma. Anticancer Res. 26, 2983-2987 (2006). 24. Park,K., Han,S., Shin,E., Kim,H.J., & Kim,J.Y. Cox-2 expression on tissue microarray of breast cancer. Eur. J. Surg Oncol. 32, 1093-1096 (2006). 25. Bueso-Ramos,C. et al. Protein expression of a triad of frequently methylated genes, p73, p57Kip2, and p15, has prognostic value in adult acute lymphocytic leukemia independently of its methylation status. J. Clin. Oncol. 23, 3932-3939 (2005). 26. Zellweger,T. et al. Tissue microarray analysis reveals prognostic significance of syndecan-1 expression in prostate cancer. Prostate 55, 20-29 (2003). 27. Kroger,N. et al. Prognostic and predictive effects of immunohistochemical factors in high-risk primary breast cancer patients. Clin. Cancer Res. 12, 159168 (2006). 87 28. Allred,D.C., Harvey,J.M., Berardo,M., & Clark,G.M. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod. Pathol. 11, 155-168 (1998). 29. Chang,J. et al. Biologic markers as predictors of clinical outcome from systemic therapy for primary operable breast cancer. J. Clin. Oncol. 17, 30583063 (1999). 30. Fitzgibbons,P.L. et al. Prognostic factors in breast cancer. College of American Pathologists Consensus Statement 1999. Arch. Pathol. Lab Med. 124, 966-978 (2000). 31. Hamilton,A. & Piccart,M. The contribution of molecular markers to the prediction of response in the treatment of breast cancer: a review of the literature on HER-2, p53 and BCL-2. Ann Oncol. 11, 647-663 (2000). 32. Bergh,J., Norberg,T., Sjogren,S., Lindgren,A., & Holmberg,L. Complete sequencing of the p53 gene provides prognostic information in breast cancer patients, particularly in relation to adjuvant systemic therapy and radiotherapy. Nat. Med. 1, 1029-1034 (1995). 33. Kirsch,D.G. & Kastan,M.B. Tumor-suppressor p53: implications for tumor development and prognosis. J. Clin. Oncol. 16, 3158-3168 (1998). 88 34. Rudin,C.M. & Thompson,C.B. Apoptosis and cancer in The Genetic Basis of Human Cancer (eds. Vogelstein,B. & Kinzler,K.W.) 193-204 (The McGrawHill companies, Inc, New York, 1998). 35. Elledge,R.M. & Allred,D.C. Prognostic and predictive value of p53 and p21 in breast cancer. Breast Cancer Res. Treat. 52, 79-98 (1998). 36. Nieto,Y. et al. Evaluation of the predictive value of Her-2/neu overexpression and p53 mutations in high-risk primary breast cancer patients treated with high-dose chemotherapy and autologous stem-cell transplantation. J. Clin. Oncol. 18, 2070-2080 (2000). 37. Thor,A.D. et al. erbB-2, p53, and efficacy of adjuvant therapy in lymph nodepositive breast cancer. J. Natl. Cancer Inst. 90, 1346-1360 (1998). 38. Somlo,G. et al. Predictors of long-term outcome following high-dose chemotherapy in high-risk primary breast cancer. Br. J. Cancer 87, 281-288 (2002). 39. Hudziak,R.M., Schlessinger,J., & Ullrich,A. Increased expression of the putative growth factor receptor p185HER2 causes transformation and tumorigenesis of NIH 3T3 cells. Proc. Natl. Acad. Sci. U. S. A 84, 7159-7163 (1987). 89 40. Aguilar,Z. et al. Biologic effects of heregulin/neu differentiation factor on normal and malignant human breast and ovarian epithelial cells. Oncogene 18, 6050-6062 (1999). 41. Ignatoski,K.M. et al. ERBB-2 overexpression confers PI 3' kinase-dependent invasion capacity on human mammary epithelial cells. Br. J. Cancer 82, 666674 (2000). 42. Spencer,K.S., Graus-Porta,D., Leng,J., Hynes,N.E., & Klemke,R.L. ErbB2 is necessary for induction of carcinoma cell invasion by ErbB family receptor tyrosine kinases. J. Cell Biol. 148, 385-397 (2000). 43. Muss,H.B. et al. c-erbB-2 expression and response to adjuvant therapy in women with node-positive early breast cancer. N. Engl. J. Med. 330, 12601266 (1994). 44. Paik,S. et al. HER2 and choice of adjuvant chemotherapy for invasive breast cancer: National Surgical Adjuvant Breast and Bowel Project Protocol B-15. J. Natl. Cancer Inst. 92, 1991-1998 (2000). 45. Pritchard,K.I. et al. HER2 and responsiveness of breast cancer to adjuvant chemotherapy. N. Engl. J. Med. 354, 2103-2111 (2006). 90 46. Arpino,G. et al. Predictive value of apoptosis, proliferation, HER-2, and topoisomerase IIalpha for anthracycline chemotherapy in locally advanced breast cancer. Breast Cancer Res. Treat. 92, 69-75 (2005). 47. Willsher,P.C. et al. C-erbB2 expression predicts response to preoperative chemotherapy for locally advanced breast cancer. Anticancer Res. 18, 36953698 (1998). 48. Fritz,P. et al. c-erbB2 and topoisomerase IIalpha protein expression independently predict poor survival in primary human breast cancer: a retrospective study. Breast Cancer Res. 7, R374-R384 (2005). 49. Houston,S.J. et al. Overexpression of c-erbB2 is an independent marker of resistance to endocrine therapy in advanced breast cancer. Br. J. Cancer 79, 1220-1226 (1999). 50. Jarvinen,T.A., Holli,K., Kuukasjarvi,T., & Isola,J.J. Predictive value of topoisomerase IIalpha and other prognostic factors for epirubicin chemotherapy in advanced breast cancer. Br. J. Cancer 77, 2267-2273 (1998). 51. Ivanov,S. et al. Expression of hypoxia-inducible cell-surface transmembrane carbonic anhydrases in human cancer. Am. J. Pathol. 158, 905-919 (2001). 52. Wykoff,C.C., Pugh,C.W., Maxwell,P.H., Harris,A.L., & Ratcliffe,P.J. Identification of novel hypoxia dependent and independent target genes of the 91 von Hippel-Lindau (VHL) tumour suppressor by mRNA differential expression profiling. Oncogene 19, 6297-6305 (2000). 53. Pastorek,J. et al. Cloning and characterization of MN, a human tumorassociated protein with a domain homologous to carbonic anhydrase and a putative helix-loop-helix DNA binding segment. Oncogene 9, 2877-2888 (1994). 54. Potter,C.P. & Harris,A.L. Diagnostic, prognostic and therapeutic implications of carbonic anhydrases in cancer. Br. J. Cancer 89, 2-7 (2003). 55. Bui,M.H. et al. Carbonic anhydrase IX is an independent predictor of survival in advanced renal clear cell carcinoma: implications for prognosis and therapy. Clin. Cancer Res. 9, 802-811 (2003). 56. Atkins,M. et al. Carbonic anhydrase IX expression predicts outcome of interleukin 2 therapy for renal cancer. Clin. Cancer Res. 11, 3714-3721 (2005). 57. Generali,D. et al. Role of carbonic anhydrase IX expression in prediction of the efficacy and outcome of primary epirubicin/tamoxifen therapy for breast cancer. Endocr. Relat Cancer 13, 921-930 (2006). 92 58. Holmes,D.R., Wester,W., Thompson,R.W., & Reilly,J.M. Prostaglandin E2 synthesis and cyclooxygenase expression in abdominal aortic aneurysms. J. Vasc. Surg 25, 810-815 (1997). 59. Topley,N. et al. Human peritoneal mesothelial cell prostaglandin synthesis: induction of cyclooxygenase mRNA by peritoneal macrophage-derived cytokines. Kidney Int. 46, 900-909 (1994). 60. Kulkarni,P.S. Synthesis of cyclooxygenase products by human anterior uvea from cyclic prostaglandin endoperoxide (PGH2). Exp. Eye Res. 32, 197-204 (1981). 61. Kundu,N., Yang,Q., Dorsey,R., & Fulton,A.M. Increased cyclooxygenase-2 (cox-2) expression and activity in a murine model of metastatic breast cancer. Int. J. Cancer 93, 681-686 (2001). 62. Gilhooly,E.M. & Rose,D.P. The association between a mutated ras gene and cyclooxygenase-2 expression in human breast cancer cell lines. Int. J. Oncol. 15, 267-270 (1999). 63. Liu,X.H. & Rose,D.P. Differential expression and regulation of cyclooxygenase-1 and -2 in two human breast cancer cell lines. Cancer Res. 56, 5125-5127 (1996). 93 64. Zerkowski,M.P., Camp,R.L., Burtness,B.A., Rimm,D.L., & Chung,G.G. Quantitative analysis of breast cancer tissue microarrays shows high cox-2 expression is associated with poor outcome. Cancer Invest 25, 19-26 (2007). 65. Mohammad,A.M. et al. Expression of cyclooxygenase-2 and 12-lipoxygenase in human breast cancer and their relationship with HER-2/neu and hormonal receptors: impact on prognosis and therapy. Indian J. Cancer 43, 163-168 (2006). 66. Grudzinski,M. et al. [Cox-2 and CD105 expression in breast cancer and disease-free survival]. Rev. Assoc. Med. Bras. 52, 275-280 (2006). 67. Park,K., Han,S., Shin,E., Kim,H.J., & Kim,J.Y. Cox-2 expression on tissue microarray of breast cancer. Eur. J. Surg Oncol. 32, 1093-1096 (2006). 68. Trougakos,I.P., So,A., Jansen,B., Gleave,M.E., & Gonos,E.S. Silencing expression of the clusterin/apolipoprotein j gene in human cancer cells using small interfering RNA induces spontaneous apoptosis, reduced growth ability, and cell sensitization to genotoxic and oxidative stress. Cancer Res. 64, 18341842 (2004). 69. Trougakos,I.P. & Gonos,E.S. Clusterin/apolipoprotein J in human aging and cancer. Int. J. Biochem. Cell Biol. 34, 1430-1448 (2002). 94 70. Pucci,S., Bonanno,E., Pichiorri,F., Angeloni,C., & Spagnoli,L.G. Modulation of different clusterin isoforms in human colon tumorigenesis. Oncogene 23, 2298-2304 (2004). 71. Kruger,S., Ola,V., Fischer,D., Feller,A.C., & Friedrich,M. Prognostic significance of clusterin immunoreactivity in breast cancer. Neoplasma 54, 46-50 (2007). 72. Zhang,S. et al. Clusterin expression and univariate analysis of overall survival in human breast cancer. Technol. Cancer Res. Treat. 5, 573-578 (2006). 73. Shannan,B. et al. Challenge and promise: roles for clusterin in pathogenesis, progression and therapy of cancer. Cell Death. Differ. 13, 12-19 (2006). 74. Redondo,M. et al. Overexpression of clusterin in human breast carcinoma. Am. J. Pathol. 157, 393-399 (2000). 75. Nicholson,K.M. & Anderson,N.G. The protein kinase B/Akt signalling pathway in human malignancy. Cell Signal. 14, 381-395 (2002). 76. Schmitz,K.J. et al. Prognostic relevance of activated Akt kinase in nodenegative breast cancer: a clinicopathological study of 99 cases. Mod. Pathol. 17, 15-21 (2004). 95 77. Tokunaga,E. et al. Akt is frequently activated in HER2/neu-positive breast cancers and associated with poor prognosis among hormone-treated patients. Int. J. Cancer 118, 284-289 (2006). 78. Perez-Tenorio,G. & Stal,O. Activation of AKT/PKB in breast cancer predicts a worse outcome among endocrine treated patients. Br. J. Cancer 86, 540-545 (2002). 79. Clark,A.S., West,K., Streicher,S., & Dennis,P.A. Constitutive and inducible Akt activity promotes resistance to chemotherapy, trastuzumab, or tamoxifen in breast cancer cells. Mol. Cancer Ther. 1, 707-717 (2002). 80. Stal,O. et al. Akt kinases in breast cancer and the results of adjuvant therapy. Breast Cancer Res. 5, R37-R44 (2003). 81. McNagny,K.M. et al. Thrombomucin, a novel cell surface protein that defines thrombocytes and multipotent hematopoietic progenitors. J. Cell Biol. 138, 1395-1407 (1997). 82. Kershaw,D.B. et al. Molecular cloning, expression, and characterization of podocalyxin-like protein 1 from rabbit as a transmembrane protein of glomerular podocytes and vascular endothelium. J. Biol. Chem. 270, 2943929446 (1995). 96 83. Kerjaschki,D., Sharkey,D.J., & Farquhar,M.G. Identification and characterization of podocalyxin--the major sialoprotein of the renal glomerular epithelial cell. J. Cell Biol. 98, 1591-1596 (1984). 84. Somasiri,A. et al. Overexpression of the anti-adhesin podocalyxin is an independent predictor of breast cancer progression. Cancer Res. 64, 50685073 (2004). 85. Arriola,E. et al. Topoisomerase II alpha amplification may predict benefit from adjuvant anthracyclines in HER2 positive early breast cancer. Breast Cancer Res. Treat. 106, 181-189 (2007). 86. Tanner,M. et al. Topoisomerase IIalpha gene amplification predicts favorable treatment response to tailored and dose-escalated anthracycline-based adjuvant chemotherapy in HER-2/neu-amplified breast cancer: Scandinavian Breast Group Trial 9401. J. Clin. Oncol. 24, 2428-2436 (2006). 87. Knoop,A.S. et al. retrospective analysis of topoisomerase IIa amplifications and deletions as predictive markers in primary breast cancer patients randomly assigned to cyclophosphamide, methotrexate, and fluorouracil or cyclophosphamide, epirubicin, and fluorouracil: Danish Breast Cancer Cooperative Group. J. Clin. Oncol. 23, 7483-7490 (2005). 88. Di,L.A. et al. HER-2 amplification and topoisomerase IIalpha gene aberrations as predictive markers in node-positive breast cancer patients 97 randomly treated either with an anthracycline-based therapy or with cyclophosphamide, methotrexate, and 5-fluorouracil. Clin. Cancer Res. 8, 1107-1116 (2002). 89. Duffy,M.J. Predictive markers in breast and other cancers: a review. Clin. Chem. 51, 494-503 (2005). 90. Vendrell,E., Morales,C., Risques,R.A., Capella,G., & Peinado,M.A. Genomic determinants of prognosis in colorectal cancer. Cancer Lett. 221, 1-9 (2005). 91. Savage,S.A. & Chanock,S.J. Using germ-line genetic variation to investigate and treat cancer. Drug Discov. Today 9, 610-618 (2004). 92. Cecilia Ritz. Clinical markers are by no means outperformed by gene expression profiles in breast cancer prognosis. 2004. 93. Eden,P., Ritz,C., Rose,C., Ferno,M., & Peterson,C. "Good Old" clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. Eur. J. Cancer 40, 1837-1841 (2004). 94. Nimeus-Malmstrom,E. et al. Gene expression profilers and conventional clinical markers to predict distant recurrences for premenopausal breast cancer patients after adjuvant chemotherapy. Eur. J. Cancer 42, 2729-2737 (2006). 98 95. Shedden,K.A. et al. Accurate molecular classification of human cancers based on gene expression using a simple classifier with a pathological tree-based framework. Am. J. Pathol. 163, 1985-1995 (2003). 96. Li,L. et al. Integration of clinical information and gene expression profiles for prediction of chemo-response for ovarian cancer. Conf. Proc. IEEE Eng Med. Biol. Soc. 5, 4818-4821 (2005). 97. Nevins,J.R. et al. Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Hum. Mol. Genet. 12 Spec No 2, R153-R157 (2003). 98. Cox,D.R. Regression models and life tables. Journal of theRoyal Statistical Society B[34], 187-202. 1972. 99. Lunn,M. & McNeil,D. Applying Cox regression to competing risks. Biometrics 51, 524-532 (1995). 100. Galea,M.H., Blamey,R.W., Elston,C.E., & Ellis,I.O. The Nottingham Prognostic Index in primary breast cancer. Breast Cancer Res. Treat. 22, 207219 (1992). 101. Mitchell,T. Machine Learning(McGraw Hill, New York, 1997). 99 102. Cruz,J.A. & Wishart,D.S. Applications of Machine Learning in Cancer Predictionand Prognosis. Cancer Informatics 2, 59-78. 2006. 103. Cortes,C. & Vapnik,V. Support-vector networks. Machine Learning 20, 273297. 1995. 104. Duda,R.O., Hart,P.E., & Stork,D.G. Pattern classification (Wiley, New York, 2001). 105. Lee,Y.J., Mangasarian,O.L., & Wolberg,W.H. Breast cancer survival and chemotherapy: A support vector machine analysis. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 55, 1-20 (2000). 106. ftp://ftp.esat.kuleuven.ac.be/sista/vanbelle/reports/07-70.pdf. 2008. 107. ftp://ftp.esat.kuleuven.ac.be/sista/vanbelle/reports/07-213.pdf. 2008. 108. Ben-Dor,A. et al. Tissue classification with gene expression profiles. J. Comput. Biol. 7, 559-583 (2000). 109. Liu,H., Li,J., & Wong,L. Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics. 21, 3377-3384 (2005). 110. Hausmann,B., Hach,H., Voigt,B., & Simon,R. [Echocardiography online quantification of left and right ventricular function by automatic boundary detection: reference values and reproducibility in healthy probands]. Z. Kardiol. 83, 556-561 (1994). 100 111. Simon,R. Development and validation of biomarker classifiers for treatment selection. Journal of Statistical Planning and Inference 138, 308-320 (2008). 112. Simon,R. Validation of pharmacogenomic biomarker classifiers for treatment selection. Cancer Biomark. 2, 89-96 (2006). 113. Levine,M.N. et al. Randomized trial of intensive cyclophosphamide, epirubicin, and fluorouracil chemotherapy compared with cyclophosphamide, methotrexate, and fluorouracil in premenopausal women with node-positive breast cancer. National Cancer Institute of Canada Clinical Trials Group. J. Clin. Oncol. 16, 2651-2658 (1998). 114. Levine,M.N. et al. Randomized trial comparing cyclophosphamide, epirubicin, and fluorouracil with cyclophosphamide, methotrexate, and fluorouracil in premenopausal women with node-positive breast cancer: update of National Cancer Institute of Canada Clinical Trials Group Trial MA5. J. Clin. Oncol. 23, 5166-5170 (2005). 115. Furberg,B.D. & Furberg,C.D. Evaluating Clinical Research: All that glitters is not gold (Springer,2007). 101