Download COMBINING BIOMARKERS AND CLINICOPATHOLOGIC FACTORS FOR PREDICTION OF RESPONSE TO ADJUVANT

Document related concepts
no text concepts found
Transcript
COMBINING BIOMARKERS AND CLINICOPATHOLOGIC
FACTORS FOR PREDICTION OF RESPONSE TO ADJUVANT
CHEMOTHERAPY FOR BREAST CANCER: COX MODEL AND
SUPPORT VECTOR MACHINE (SVM) METHODS
by
Xudong Liu
A thesis submitted to the Department of Community Health and Epidemiology
In conformity with the requirements for
the degree of Master of Science
Queen’s University
Kingston, Ontario, Canada
April, 2010
Copyright ©Xudong Liu, 2010
Abstract
Background: Breast cancer is a complex disease, both phenotypically and etiologically.
Accordingly, the responses to various treatments in the adjuvant setting among
individuals vary considerably. There is a demand for tools that can distinguish patients
who may benefit or may suffer from particular systemic treatments. We hypothesized that
combination of data on genetic biomarkers with data from traditional clinical and
pathophysiological (clinicopathologic) factors using traditional Cox model or Support
Vector Machine (SVM) method, a new machine learning method, may provide a better
tool for prediction of benefits to chemotherapy for the treatment of early breast cancer
than using either biomarker or clinicopathologic data alone.
Methods: This project included 531 patients from NCIC-CTG MA.5 trial who had data
on both clinicopathologic factors, such as age, tumor size, ER status, type of surgery,
tumor grade and lymph node involvement, and biomarkers assayed on tissue microarrays
(TMAs), including HER2, p53, CA9, MEP21, clusterin, pAKT, COX2 and TOP2A. The
Cox model and SVM methods were used to develop prognostic indices for relapse-free or
overall survival with either data from TMAs and clinicopathologic assessments alone or
their combination. The prognostic indices developed were then examined for their value
as predictive classifiers for benefits from CEF treatment. The power of the predictive
classifiers derived was evaluated and compared using the bootstrap approach.
Results: None of the prognostic indices developed were found to have significant
predictive value, although the prognostic index developed using SVM method based on
i
only biomarkers yielded a marginal significant p-value (p=0.0527) for the interaction
between classifier and treatment. In accordance with results published previously, the
interaction between the classifier developed based on HER2 or TOP2A and treatment
was significant (p=0.02 and 0.04 respectively). Comparisons based on the bootstrap
approach indicate classifiers developed based on SVM performed better than those based
on the Cox model method.
Conclusions: Combination of data using biomarkers and clinical-pathological factors,
and using either the traditional COX model method or the new machine learning method
was not shown to perform better than two single previously known biomarkers in
prediction of response to CEF treatment for early breast cancer.
ii
Acknowledgements
First, I would like to express my sincerest gratitude to my primary supervisor, Dr.
Dongsheng Tu, for his encouragement, guidance, assistance, motivation, enthusiasm, and
most of all, patience throughout the development and execution of this research. Your
faith in me and your expertise and guidance through the challenges of this project have
been invaluable. Secondly, I would also like to sincerely thank my secondary supervisor
Dr. Lois Shepherd for her valuable advice from medical and oncologic prospective
during the development of my thesis project and for her important feedback in
preparation of my thesis proposal and final thesis.
My sincere thanks extend to the NCIC Clinical Trials Group for allowing me to
use their data for this project.
I would like to acknowledge the faculty and staff in the Department of
Community Health and Epidemiology for providing such a friendly and supportive
learning environment. Special thanks to Lee Watkins and Katherine Cook for their
patient assistance.
I also greatly appreciate the generous support from my mentor Dr. Jeanette
Holden, whose encouragement and consideration is important for me to start and fulfill
this program.
Last but not least, I would like to thank my dearest son Kevin whose love and
amusement constantly keeps my heart warm and spirit high; to my wife Jianhua who has
been a cheerful and loving supporter and encourager in my life; to my parents whose love
and encouragement keeps supporting me spiritually throughout my life.
iii
Table of Contents
Abstract ................................................................................................................................ i
Acknowledgements............................................................................................................ iii
Table of Contents............................................................................................................... iv
List of Figures .................................................................................................................... vi
List of Tables .................................................................................................................... vii
Abbreviations................................................................................................................... viii
Chapter 1 Introduction ........................................................................................................ 1
1.1 Overview ................................................................................................................... 1
1.2 Breast Cancer and Its Treatment ............................................................................... 3
1.3 The Need to Identify Optimal Treatment for Women with Breast Cancer............... 4
1.4 Prognostic and Predictive Factors of Breast Cancer ................................................. 6
1.4.1 Traditional clinical and pathologic (clinicopathologic) factors.......................... 8
1.4.2 Biomarkers from Tissue Microarrays (TMAs) Assays ...................................... 9
1.5 Need for Combining Clinicopathologic Characteristics and Molecular Biomarkers
....................................................................................................................................... 14
1.6 Objectives of the Project ......................................................................................... 15
Chapter 2 A Review of Statistical Techniques ................................................................. 17
2.1 Traditional Cox model for censored survival data (COX model method).............. 17
2.2 Support Vector Machine (SVM) Method................................................................ 18
2.3 Verification, validation and comparison of predictive classifiers........................... 23
Chapter 3 Patients and Methods ....................................................................................... 25
3.1 NCIC CTG MA.5 Clinical Trial ............................................................................. 25
3.2 Data ......................................................................................................................... 26
3.3 Analysis strategies................................................................................................... 27
3.3.1 Analysis of Prognostic and Predictive Values of Each Individual Variable .... 27
3.3.2 Development of Prognostic Indices.................................................................. 28
3.3.2.1 Cox Model Method .................................................................................... 28
3.3.2.2 Support Vector Machine (SVM) method................................................... 29
iv
3.3.3 Verification for the Prognostic Indices as Predictive Classifier for the
Treatment................................................................................................................... 30
3.3.4 Validation of Predictive Classifier using Bootstrap Methods .......................... 31
Chapter 4 Results .............................................................................................................. 32
4.1 Data Sets.................................................................................................................. 32
4.2 Summary of Baseline Characteristics ..................................................................... 32
4.3 Prognostic and Predictive Values of Each Individual Variable .............................. 35
4.4 Prognostic Indices ................................................................................................... 38
4.4.1 Classification Rules based on COX Model Method......................................... 38
4.4.2 Classification Rules based on SVM Method.................................................... 42
4.5 Prognostic Indices as Predictive Classifiers............................................................ 45
4.5.1 Treatment Effects in High and Low Risk Groups Defined by Prognostic Indices
................................................................................................................................... 45
4.5.2 Hazard Ratio and Confidence Interval for Interaction between Treatment and
Prognostic Index ........................................................................................................ 71
4.6 Validation of Predictive Classifier using Bootstrap Methods................................. 74
Chapter 5 Conclusions and Discussion............................................................................. 79
References......................................................................................................................... 84
v
List of Figures
Figure 1: Recurrence-free Survival Curves by COX Model and SVM Approach –
Index 1…………………………………………………………….……….….51
Figure 2: Recurrence-free Survival Curves by COX Model and SVM Approach –
Index 2…………………………..………………………………………….…53
Figure 3: Recurrence-free Survival Curves by COX Model and SVM Approach –
Index 3………………………………………………………….………….….55
Figure 4: Overall Survival Curves by COX Model and SVM Approach – Index 1….…57
Figure 5: Overall Survival Curves by COX Model and SVM Approach – Index 2….…59
Figure 6: Overall Survival Curves by COX Model and SVM Approach – Index 3….…61
Figure 7: Recurrence free survival curves by COX model and SVM approach –
HER2/TOP2A…………………………………………………………………63
Figure 8: Overall survival curves by COX model and SVM approach –
HER2/TOP2A…………………………………………………………………65
Figure 9: Recurrence free and overall survival curves by COX model – HER2………..67
Figure 10: Recurrence free and overall survival curves by COX model – TOP2A……..69
vi
List of Tables
Table 1: Baseline Characteristics of Patients with both Clinical and Biomarker
Available………………………………………………………………......…..34
Table 2:
Prognostic and Predictive Factors (P-values from COX model)……….…….37
Table 3.1: Summary of Three Indices in Recurrence-free Survival by COX model….....40
Table 3.2: Summary of Three Indices in Recurrence-free Survival by SVM…………....41
Table 4.1: Summary of Classification Results in Recurrence-free Survival by SVM...…43
Table 4.2: Summary of Classification Results in Overall Survival by SVM…………….44
Table 5.1: Summary of Treatment Effect for Poor Prognostic Groups
(Recurrence-free Survival)………………………………………….….….….47
Table 5.2: Summary of Treatment Effect for Good Prognostic Groups
(Recurrence-free Survival)………………………………………….….….….48
Table 5.3: Summary of Treatment Effect for Poor Prognostic Groups
(Overall Survival)……………………………..…………………….…….…..49
Table 5.4: Summary of Treatment Effect for Good Prognostic Groups
(Overall Survival)……………….……………………………………..….…..50
Table 5.5: Summary of Prognostic and Predictive Indices (Recurrence-free Survival)…72
Table 5.6: Summary of Prognostic and Predictive Indices (Overall Survival)…………..73
Table 6.1: Summary of Proportion Interaction Term with Significant p-value by
Bootstrap Method (Recurrence-free Survival)……………………………….75
Table 6.2: Summary of Proportion Interaction Term with Significant p-value by
Bootstrap Method (Overall Survival)………………………………………..76
Table 7.1: Summary of Hazard Ratio and CI by Bootstrap Method (Recurrence-free
Survival)……………………………………………………………….….…..77
Table 7.2: Summary of Hazard Ratio and CI by Bootstrap Method (Overall Survival)...78
vii
Abbreviations
AMP
CA9
cDNA
CEF
CLU
CMF
COX-2
COX-PH
DNA
ER
HER2
MEP21 (PODO)
mRNA
NCIC-CTG
p53
pAKT
SD
SVM
TMA
TOP2A
Amplification
Carbonic anhydrase IX (CAIX or CA9)
Complementary Deoxyribonucleic acid
Cyclophosphamide, Epirubicin, and S-Fluorouracil
Clusterin
Cyclophosphamide, Methotrexate, and S-Fluorouracil
Cyclooxygenases -2
COX proportional hazards model
Deoxyribonucleic acid
Estrogen receptor
The human epidermal growth factor receptor type 2
Podocalyxin
Messenger RNA
Clinical Trials Group
Tumor protein 53
phosphorylated Akt
Standard Deviation
Support Vector Machine
Tissue Microarrays Assays
Topoisomerase II alpha gene
viii
Chapter 1
Introduction
1.1 Overview
One of the biggest challenges for basic life science and clinical research, clinical
practice, and the pharmaceutical industry in the 21st century is to develop and deliver
treatments for disease that are tailored to an individual patient's biology and
pathophysiology. This is especially important for complex diseases, such as breast cancer.
Breast cancer is a complex disease, both phenotypically and etiologically1.
Accordingly, the response to various treatments among individuals varies considerably.
Adjuvant therapies which include chemo and hormonal therapies have been proven to
substantially improve disease-free and overall survival in women with breast cancer, are
effective forms of treatment after surgery and widely used for the treatment of breast
cancer following the primary surgery2, 3. Yet, all who are involved in the medical
management of breast cancer patients know that the same treatment does not work in the
same way in each patient. Often, only a proportion of patients will have delayed disease
recurrence after treatment with a particular drug, and among these patients, only a
proportion of patients will continue to remain free of disease. Also different patients may
suffer from different side effect of the same drug2,
4, 5, 6
. Thus there is a clinical demand
for tools that could distinguish patients who may benefit or suffer from particular
systemic treatments. The ultimate goal is to administer the adjuvant systemic therapies in
a ‘tailored” or “personalized” fashion and cost-effective way that offers the most benefit
and fewest side effects to the breast cancer patients.
1
Patients who remain disease free for a long period of time after treatment may be
considered as having responded well to the treatment. Response prediction and efficient
administration of various types of treatment is a huge challenge for both clinicians and
basic researchers. The selection of adjuvant therapies for breast cancer patients has been
primarily directed by the long-term results of clinical outcome studies from previous
clinical trials. The treatment guidelines and evidence-based decision-making tools have
long been based on clinical and pathological features of the patients and are far from
being satisfactory and accurate7. Recently, classical pharmacogenetic studies identified a
limited number of genetic candidate predictive markers which would be used to predict
response to certain treatment.
These limited markers often failed to reflect the
complexity of the treatment response. The development of genomic technology, such as
gene expression arrays, for the analysis of human tumor samples has provided a
comprehensive approach and enhanced our understanding of the complexity of the
biology of breast cancer. This technology has resulted in the development of genomic
profiles that more accurately assess the risk of recurrence and predict clinical outcomes
from particular treatments in this disease.
Incorporating genetic information, such as gene expression signatures, with
clinical, pathophysiological, and other risk factors for breast cancer may be better than
any of them alone for evaluating the prognosis and predicting response to chemotherapy,
yet few studies have examined the value in integrating new genomic information with the
traditional factors to provide a more detailed assessment of clinical risk and an improved
prediction of response to therapy. The work described in this thesis explores this
2
integration approach using the NCIC CTG MA.5 study, a randomized trial comparing the
efficacy of Cyclophosphamide, Methotrexate, and S-Fluorouracil (CMF) and the more
dose intensive Cyclophosphamide, Epirubicin, and S-Fluorouracil (CEF) for the
treatment of early breast cancer in node positive pre and perimenopausal women.
1.2 Breast Cancer and Its Treatment
The incidence of breast cancer and mortality internationally has been stable or has
declined over the past decades8, 9. More women with breast cancer are surviving longer
than in the past. In Canada, Canadian Cancer Statistics 2007 shows the breast cancer
death rate among Canadian women has dropped by 25 per cent in the last two decades
and that 86 per cent of women diagnosed with the disease are surviving at least five
years10. Yet, breast cancer remains the most common cancer occurring in women
worldwide and it is the leading cancer-related cause of death for women in Canada10 and
in other industrialized countries9, 11-13. In Canada, more than 22,000 women will be newly
diagnosed with breast cancer every year; breast cancer kills more than 5,000 Canadian
women every year, more than any other type of cancer except lung cancer; one in nine
Canadian women will be diagnosed with and 1 in 27 will die of breast cancer in their
lifetime; it accounts for an estimated 95,300 potential years of life lost10.
Women diagnosed with early-stage breast cancer often undergo surgery for
removal of the tumor and then, typically, are treated with adjuvant systemic therapy.
Since all detectable cancer in these women with ‘early’ breast cancer is restricted to the
breast and, in the case of node-positive patients, the local axilliary lymph nodes, it can be
removed surgically. However, undetected micro-metastatic deposits of the disease may
3
remain and, perhaps after a delay of several years, develop into a clinically detectable
recurrence that eventually causes death. The aim of adjuvant systemic therapy is to
eradicate these micro-metastases. Adjuvant systemic therapies, including chemo and
hormonal therapies, have been the most effective approach to treatment after surgery and
radiotherapy if appropriate. Therefore they are widely used for the treatment of breast
cancer following the primary surgery and have been proven to substantially improve
disease-free and overall survival in both pre-menopausal and postmenopausal women
with breast cancer2, 3. It is largely due to early detection and advances in adjuvant therapy,
such as the improved management exemplified by the introduction of screening programs
and new treatment regimes, that breast cancer mortality rates are now declining.
1.3 The Need to Identify Optimal Treatment for Women with Breast Cancer
Although adjuvant therapy is useful in the treatment of women with early breast
cancer, the absolute survival gain with the postoperative adjuvant therapy has been
modest (5~10%); a substantial proportion of patients continue to relapse with their
disease; and all of the therapies have mild to severe toxicity4. Many women with early
breast cancer may be cured with surgery alone or may not benefit from adjuvant systemic
therapy.
Breast cancer is a complex and heterogeneous disease, phenotypically as well as
genotypically1. There exist wide-ranging subsets of breast cancer patients who have
different prognoses and who respond differently to treatments. Therefore the
identification of patients who need treatment and the definition of the best therapy for an
individual have become one of the priorities in breast cancer care and research.
4
Breast cancer patients with the same stage of disease can have markedly different
treatment responses and overall outcome after the treatment. Systemic adjuvant therapy,
chemotherapy or hormonal therapy, reduces the risk of distant metastasis by
approximately one-third; however, many patients receiving adjuvant systemic therapy
would have survived without it2, 3, while many women receiving adjuvant systemic
therapy will still develop widespread or metastatic disease. Even poor prognosis patients
receiving no adjuvant therapy have only a 50%–60% risk for developing distant
metastases, indicating that 40%–50% of these patients will remain free of disease without
any adjuvant systemic therapy2,
5, 6
. Studies have shown a large proportion of women do
not benefit from adjuvant systemic therapy as some relapse despite adjuvant therapy and
others are never destined to relapse4. All these reflect the fact that breast cancer is a
heterogeneous disease encompassing different biological mechanisms with distinct
prognoses, unavoidably raising the question about which patients will benefit from
treatment, and what treatment should be given. There is currently a great variety in the
adjuvant therapy prescribed for women with breast cancer, with the consequence that
some women are likely being under-treated whereas others are being over-treated.
Existing chemotherapies have in general severe side effects for patients, and they may
also be very expensive. The severity of possible side effects of aggressive
chemotherapies cannot be overemphasized. Some patients may die as a result of the
treatment, not of the disease.
Therefore, breast cancer patients and their clinicians need better information to
determine which patients are most likely to benefit from adjuvant therapy, which are very
5
unlikely to benefit, and which particular type of adjuvant therapies are most appropriate
for individual women.
With this information, optimal and personalized adjuvant
systemic therapy could be administered in a cost-effective way offering the most benefit
and fewest side effects to the breast cancer patients. This would be beneficial to both
individual patients and society in general.
1.4 Prognostic and Predictive Factors of Breast Cancer
In early breast cancer, as with other tumors, the first step towards individualized
therapy is to distinguish those patients who need treatment from those who do not. The
second step is to tailor treatment to the characteristics of the specific tumor, given that
certain cancers respond better to certain regimens or individual drugs than do others. In
general, breast cancer patients with a poor prognosis for survival benefit the most from
adjuvant chemotherapy14, 15. In fact, the decision to administer adjuvant therapy for breast
cancer has been mainly based on prognostic features.
It is critical to identify reliable factors that are associated with clinical outcome
regardless of treatment. These are called prognostic factors. The factors that can identify
patients more or less likely to benefit from the therapy are called predictive factors. The
terms prognostic and predictive factors are often imprecisely defined. A prognostic factor
is usually defined as a patient characteristic that identifies subgroups of untreated patients
having different outcomes, and a factor predictive of treatment effect as a patient
characteristic that identifies subgroups of treated patients having different (as a
consequence of treatment) outcomes16-18. The prognostic factors are measurements at
diagnosis associated with outcome (i.e., recurrence rate, death rate, and other clinical
6
outcomes), while predictive factors are associated with the degree of response to specific
therapy.
A prognostic factor is most often, but not necessarily, also predictive of the effect
of specific treatments. Whether a prognostic factor is also predictive of treatment effect
or not can only be assessed in a valid comparative setting such as in a randomized trial. A
factor that is predictive for one treatment effect may not be predictive for another
treatment. Also, if a factor predicts (for a specific treatment) the effect on one outcome, it
may or may not predict the effect for another outcome16. Good prognostic and predictive
factors need to be accurate and reproducible, and can help distinguish between patients at
high risk and those at low risk for developing recurrent disease.
Sometimes, a prognostic or predictive factor may be constructed from several
different types of measurements at the diagnosis of breast cancer.
In this case, a
prognostic factor consisting of multiple variables is often called a prognostic index (or
signature) and a predictive factor as a predictive classifier. The primary objective of
this thesis project is to identify predictive classifiers from data from NCIC CTG MA.5
trial. These may include demographic characteristics, clinical characteristics, and
pathological features at diagnosis and some molecular biomarkers of the tumor.
Statistical and machine learning techniques will be used to assess and compare the ability
of these classifiers in the prediction of patients’ response to anthracycline-containing
chemotherapy for early breast cancer. In the following section, I will first provide a
description and review of some traditional clinical factors and pathologic characteristics
7
of breast cancer tumors and specific molecular biomarkers which will be included in this
thesis project.
1.4.1 Traditional clinical and pathologic (clinicopathologic) factors
There is a vast literature which has investigated prognostic effects of many
demographic and tumor characteristics of breast cancer patients. Age, race, tumor size,
histological tumor type, auxiliary nodal status, standardized pathological grade,
lymphatic and vascular invasion, and hormone-receptor (eg. estrogen and/or progesterone)
status are some of the clinical and pathologic factors which have been considered as
having prognostic value in connection with disease recurrence and/or survival. Among
them, only hormone-receptor status is accepted as an established predictive factor for
selection of systemic adjuvant treatment of breast cancer19. The presence of estrogen
and/or progesterone receptors in the primary tumor is the strongest predictor for response
and benefit to tamoxifen therapy for the breast cancer.
Although the above clinicopathologic factors are used currently by clinicians to
inform decision making regarding the need for, and type of, adjuvant chemotherapies,
these conventional “macro-scale” clinical, environmental and behavioral parameters have
only limited predictive power7. The information arising from these conventional “macroscale” clinicopathologic factors is nowhere near as precise and accurate as is needed to
provide robust predictions for the treatment responses. Aside from perhaps the hormone
receptor status, these factors do not appear to reflect or take into account the underlying
biology of breast cancer.
8
1.4.2 Biomarkers from Tissue Microarrays (TMAs) Assays
Recent advances in genome research and technologies have changed the
landscape of biomedical research. New technologies are not only important for
fundamental research, but will also be useful for diagnostic, prognostic or therapeutic
purposes in individual patients. In cancer research, the use of Tissue Microarray (TMA)
and the emerging advanced gene expression profiling technology, such as DNA
Microarray, makes it possible to construct more accurate classifiers for predicting drug
responses in personalized medicine settings using factors more closely related to the
biology of the breast cancer. In a landmark study, Perou et al.20 identified four subtypes
(luminal ER+, basal, HER-2 and normal breast) of breast cancer based on complementary
DNA (cDNA) microarray expression patterns of 8,102 genes and later demonstrated
prognostic values of a revised subtype classification which divided ER+ tumors further
into two subtypes (luminal A and luminal B/C).
While DNA microarrays, which include oligonucleotide (eg. Affymetrix platform)
or complementary DNA (cDNA), allow for high-throughput analyses of the mRNA
expression of thousands of genes simultaneously, and have been leading to the
identification of new genes and pathways with importance in cancer development and
progression, or as targets for new therapies, the validation and prioritization of genes
emerging from genome screening analyses in large series of clinical tumors has become a
new bottleneck in research. Tissue microarray (TMA)21 technology is a histology based
approach that permits the study of the expression of a single gene in hundreds of tissue
samples, and has been developed to further understand the importance of the genes
9
identified on the profiling platforms. TMAs are tissue sections mounted in a slide
containing samples from hundreds of individual tumor specimens. They can be used for
large-scale analysis of gene expression (protein) level using immunohistochemical
staining on hundreds of tumor specimens at a time.
TMAs, coupled with DNA microarrays, provide a powerful approach to identify
large numbers of new candidate genes, and rapidly validate their clinical impact in large
series of human tumors. They have been frequently and successfully used in cancer
diagnosis and detection, and more recently have been applied to cancer prognosis and
prediction22-26.
With TMAs and immunohistochemical analysis, several molecular biomarkers
showed prognostic value in breast cancer regarding outcome irrespective of therapy, and,
more recently, have been shown to be predictive factors for outcome after conventional
or high-dose chemotherapy27-31. In the following, I am going to review some of these
molecular markers which have been studied and analyzed by the Genetic Pathology
Examine Centre, Vancouver General Hospital and are included in my thesis project.
P53
p53 is a tumor suppressor gene and an important factor for cell cycle regulation,
p53 also induces apoptosis after toxic damage32, 33 to the cell. Abrogation of p53 function
results in a more aggressive tumor biology and a worse outcome for patient with breast
cancer34. Considering that many anticancer agents kill cancer cells primarily by inducing
apoptosis and p53 is a key determinant to induce either growth arrest or apoptosis in
response to cytotoxic stress, p53 has been considered as one of the most important
10
candidate prognostic and predictive factor for breast cancer.
The prognostic and
predictive values of p53 has been extensively studied in breast cancer (reviewed by
Elledge35). p53 has consistently been shown to have predictive value for breast cancer
patients treated with standard chemotherapy or high-dose chemotherapy regimens for
recurrent dosease36-38.
HER2
The human epidermal growth factor receptor (HER) family plays a key role in
regulation of mammalian cell survival, proliferation, migration, adhesion, and
differentiation. The HER family of receptor tyrosine kinases comprises four structurally
related transmembrane receptors: HER1 (EGFR EGFR (HER-1), HER-2, HER-3, and
HER-4. All members have an extracellular ligandbinding domain, a transmembrane
domain, and a cytoplasmic tyrosine kinase domain. When responding to ligand specific
binding, these receptors act by forming hetero- or homodimers to initiate tyrosine kinase
activity in the intracellular domain. A number of ligands were identified for HER1 and
HER-3 and -4, yet no direct ligand for HER-2 has yet been discovered. HER-2 was
found to be the preferred heterodimerisation partner for all other HER family members,
suggesting a more central role of HER2 in the HER family due to its the primary function
as a co-receptor. HER2, also known as c-erbB-2, encodes the protein HER2/neu antigen.
Although the signaling pathways activated by HER2 are not completely characterized,
HER2 is clearly established as oncogenic factor associated with a more malignant
phenotype,39 cell proliferation,40 invasion,41 and metastasis.42 Numerous studies 36, 37, 43-50
have established HER2/neu not only as an important independent prognostic factor but
11
also as a reliable predictive marker for projecting response to chemotherapeutics,
antiestrogens, and therapeutic anti-HER2/neu antibodies in breast cancer. It has been
shown that amplification of HER2/neu, overexpression of its product, or both in breast
tumors not only predicts responsiveness to trastuzumab but also identifies patients who
may benefit from high-dose chemotherapy or from anthracycline-containing regimens
(summarized by Pritchard et al, 2006)45.
CA9
Carbonic anhydrase IX (CAIX or CA9) protein is a member of the carbonic
anhydrase family, and is thought to play a role in the regulation of cell proliferation in
response to hypoxic conditions and may be involved in oncogenesis and tumor
progression51-53. CA9 expression has been show to be linked to poor prognosis in a
number of human tumours54. It has been identified as an independent predictor of
survival in advanced renal clear cell carcinoma55, 56, and more recently was shown to
have promising predictive value for the efficacy and outcome of primary
epirubicin/tamoxifen therapy for breast cancer57.
COX-2
COX-2 is a member of the family of Cyclooxygenases (COXs) enzymes that
catalyze the rate-limiting step in the conversion of arachidonic acid to prostaglandins58-60,
which have been implicated in breast carcinogenesis61-63. While multiple studies have
shown prognostic value of COX2 in breast cancer64-66, the evidence for it as predictive
marker has been limited67.
12
Clusterin (CLU)
Clusterin (CLU) is a secretory heterodimeric disulphide-linked glycoprotein
expressed in virtually all tissues and found in all human fluids68, 69. It is involved in
numerous physiological processes important for carcinogenesis and tumor growth,
including apoptotic cell death, cell cycle regulation, DNA repair, cell adhesion, tissue
remodeling and so on68-70. Several studies have shown clusterin was associated with
prognosis of breast cancer71-74, yet no study has shown its predictive value in adjuvant
therapy.
pAKT
pAKT, also known as PKB, is a serine/threonine protein kinase and a crucial
regulator of widely divergent cellular processes, including apoptosis, proliferation and
differentiation. Disruption of normal Akt/PKB signaling occurs frequently in several
human cancers, and the enzyme appears to play an important role in cancer progression
and cell survival75. Besides prognostic value in breast cancer76, pAKT have also been
shown to predict therapeutic response in the breast cancer adjuvant setting. The pAKT
signaling pathway was shown to decrease the efficacy of hormone therapy and was
associated with a poorer outcome in patients who receive adjuvant hormone therapy77, 78.
In addition, constitutive and inducible pAKT activity promotes resistance to
chemotherapy, trastuzumab or tamoxifen in breast cancer cells79, suggesting that pAKT
activation could be used as a predictive marker for sensitivity to various therapies, which
was confirmed by several studies80.
13
MEP21(Podo)
MEP21, or Podocalyxin, is a member of the CD34 family of integral membrane
proteins with anti-adhesive qualities81-83. Its overexpression is an independent predictor of
breast cancer progression84. There is no study yet on it role as a prognostic or predictive
factor in breast cancer.
TOP2A
Topoisomerase II alpha (TOP2A) is an enzyme that is integrally associated with the
cytotoxic action of anthracyclines and whose gene is close to HER2/neu on chromosome
17q. Several recent articles have suggested that TOP2A gene amplification and deletion
are predictive markers for the benefit of adjuvant anthracycline containing therapy in
primary breast cancer85-88. It has been suggested in fact that the predictive value of HER2 is most likely related to the concomitant amplification of the topo II alpha gene88.
1.5 Need
for
Combining
Clinicopathologic
Characteristics
and
Molecular
Biomarkers
Since tissue microarray analysis is done individually for a biomarker of interest,
research had been focusing on the study of prognostic and predictive effects of single
biological markers. Like their clinicopathologic counterparts, the predictive power for
most of these molecular biomarkers alone is very limited. Recently, there have been
several studies which showed that combinations of multiple molecular biomarkers are
more predictive than single biomarkers89-91. Moreover, since the microarray technology
suffers from a low signal-to-noise ratio, integration of other sources of information, such
as macro-scale clinicopathologic data, could be more important to counter randomly
14
generated differences in expression levels. Nevertheless, the focus in most current studies
is on gene expression while the traditional clinicopathologic data is often underused when
DNA microarray data are available. Recent studies have shown ‘old’ clinicopathologic
markers are by no means outperformed by gene expression profiles in breast cancer
prognosis and prediction92-94. Combining gene expression data using IHC analysis with
clinicopathologic data might lead to a more robust and accurate assessment of cancer
prognosis and predictions. Shedden et al (2003) used a pathological framework and
showed that this information significantly lowered the number of genes required in their
model95. Recent successful cases also showed superiority of clinical and genetic data
integration rather than the single type data classifier94, 96, 97.
1.6 Objectives of the Project
In this project, multiple molecular markers (reviewed above) will be combined
with multiple clinicopathologic characteristics of patients to develop predictive classifiers
which would be more robust and predictive than that developed from single biomarkers
or clinicopathologic factors to provide tailored prescription of cyclophosphamide,
epirubicin, and fluorouracil (CEF) or cyclophosphamide, methotrexate, and fluorouracil
(CMF) in the adjuvant therapeutic treatment of premenopausal women with node-positive
breast cancer. In the next chapter, I will provide a review of major statistical techniques
which will be used in this project. Specifically the followings are the objectives of this
project:
•
The primary objective of this thesis project is to use traditional Cox proportional
15
hazards model and new machine learning methods to develop predictive
classifiers which would be used to identify patients who are responsive to the
anthracycline-containing chemotherapy for early breast cancer based on the tissue
microarray and clinicopathologic data from NCIC CTG MA.5 study.
•
In addition, this project will also evaluate and compare the predictive powers of
the classifiers developed.
16
Chapter 2
A Review of Statistical Techniques
Currently, no statistical procedure has been developed specifically for the
construction of predictive classifiers. The common approach in the literature is to first
construct a prognostic index using statistical modeling or machine learning techniques
and then consider this prognostic index as also a predictive classifier after its ability in
prediction of response to treatment is demonstrated in a formal test of interaction between
this classifier and treatment. The same approach will be used in this thesis project. In the
following, I will provide an overview of two techniques, COX model and SVM methods,
which can be used in the first step of this approach, i.e., construction of a prognostic
index, and are explored in this thesis.
2.1 Traditional Cox model for censored survival data (COX model method)
In studies with survival data, traditional statistical models, either directly or
through its hazard function, on a set of covariates are main techniques for the
identification of independent prognostic factors. These models can also be used to
construct a prognostic index. Cox proportional hazards model, the most widely used
regression model for censored survival data98, 99, is usually used as the first choice of the
models. Suppose that there are a total of N patients under study and, for i-th patient (i=1,
2, …, N), let hi(t) be the hazard function of disease recurrence or death for an individual
with a set of covariates x_i = (xi1, . . . , xip), which, in the context of this thesis, are the
17
values of clincopathologic and biomarkers for this patient.
Cox proportional hazard
model assumes the following relationship between the hazard function and covariates:
hi ( t ) = h0 ( t ) exp ( b1 xi1 + b2 xi 2 + ... + bd xid )
where h0(t) is an unspecified baseline hazard function. Based on data, we can estimate the
regression coefficients β = (β1, . . . , βp) in the model. Denote these estimates as β̂ =
( β̂1 , . . . , βˆ p ). With these estimates, we can define a linear risk score function f (X) =
βˆ ' X for a patient with covariates X=(x1, …, xp). Based on this function, we can divide
patients into various risk groups once a cut-off point c is identified. For example, a
patient with covariates X may be classified into the high risk group (and, therefore, may
benefit from treatment) if f(X)>c. A well known prognostic index used in early breast
cancer, the Nottingham prognostic index, was constructed based on this method100
The main advantage of the Cox model is that it is a well-understood and welltested framework in survival analysis. But a crucial assumption underlying the Cox
model, i.e., the proportional hazards assumption, may not be satisfied by the data in
practice.
2.2 Support Vector Machine (SVM) Method
Statistical models have been the main statistical techniques used to construct
prognostic indicies.
They focus on explicitly modeling the underlying probabilistic
mechanism of the phenomenon under study. Recently, new machine learning methods
have been applied to develop prognostic indicies, especially when the number of the
covariates is large. Machine learning techniques take a rather different perspective from
18
the traditional statistical modeling methods. Their main focus is merely to learn a
predictive rule from the current data, which are called training samples, using a variety of
statistical, probabilistic and optimization techniques without explicitly assuming a
statistical model for these data. These rules are then applied to classify new data, identify
new patterns or predict novel trends101. Machine learning methods are frequently used
for cancer diagnosis and detection and more recently, they have been applied to study
cancer prognosis and prediction. Relative to traditional statistical methods, machine
learning methods were shown to substantially (15-25%) improve the accuracy of
outcome prediction102. Machine learning techniques, especially the supervised machine
learning methods are a promising approach for cancer prognosis and prediction based on
integrated data sources. In the following, I will provide a detailed description and review
of one specific machine learning method, called support vector machine method, which
will be explored in my thesis project.
The support vector machine or SVM103, 104 technique is a relatively new but popular
method in machine learning. The basic principle of SVM is to find an equation for a line
(or hyperplane when there are more than one covariate) which can separate two classes
(such as patients with good or bad prognosis) maximally. Figure 1 illustrates a simple
SVM that discriminates two separable classes. The hyperplane is determined by a subset
of the points of the two classes, called support vectors. The SVM algorithm creates a
hyperplane that separates the data into two classes with the maximum margin – meaning
that the distance between the hyperplane and the closest examples, defined as the margin,
is maximized. If the two classes are overlapping and not linearly separable, the SVM can
19
projects the data into a higher-dimensional space using kernel function, and solves the
classification problem in the higher dimension. Commonly used kernel functions are the
linear kernel, the radial basic kernel, and the polynomial kernel.
Class Y
Class X
Figure 1: Separation by Support Vectors (source: The Elements of Statistical Learning
(2nd edition) by Hastie, Tibshirani and Friedman (2008). Springer-Verlag.)
SVM possesses a flexible structure and thus are especially useful when the data
structure is complex and from different sources. SVM was successfully applied in tumor
classification and prognosis prediction102 and in the study of cancer treatment
prediction105.
To deal with censored survival time data, we need to divide the survival times
into a number of predefined intervals or classes or to treat it as a rank regression problem
although new algorithm was recently proposed to handle the censored survival data
directly using SVM106,
107
but software for these algorithm are not readily available.
Since our goal is to classify the patients into “bad’ and “good’ prognostic groups, we can
20
divide patients into two groups based on their survival times using the approach
described below and develop a two-class SVM with a linear kernel function, which was
suggested to be superior to other more complex forms of SVM for class prediction when
dealing with complex data source with larger number of covariates 108.
To classify patients into two classes during the training process of SVM, we can
adopt the strategy proposed by Liu et al 109, using the only patients with extreme survival
times. The training dataset will only consist of two types of extreme patients: short-term
survivors and long-term survivors. The reason for not considering patients whose survival
times were not extreme during the training steps is that the sharp contrast between the
short-term and long-term survivors may be more informative and reliable for building
and understanding the relation between the covariates and outcomes, while considering
other patients sample would bring in more noise and confusion signals.
The following are steps during the construction of a SVM classifier based on
censored survival times:
1). Selecting extreme samples for SVMs training:
For a patient T, if his/her follow-up time is F(T) and status at the end of follow-up
time is E(T), we will define
where E(T)=1 stands for “dead” and E(T)=0 stands for “alive” or “censored”; c1 and c2
are two thresholds of survival time for defining short-term and long-term survivors,
respectively. The thresholds may vary and the guideline is that the selected training
21
dataset should contain enough patients in original training sample. Patients in the selected
training dataset will be assigned with class label “1” for “short-term survivors” and “-1”
for “long-term survivors”.
2). Supervised training of SVM.
Using all data on covariates as input and the assigned classes as output, a twoclass SVM with linear kernel function can be trained on the selected training dataset. The
SVM regression function G(T) can be expressed as a linear combination of the values of
covariates:
G (T ) = ∑ α i yi K (T , x(i )) + b ,
i
where x(i ) are support vectors samples, yi are the class labels of x(i ) , vector T represents
a test sample. The parameter α i and b are determined from the training data during the
supervised training.
3). Calculation of risk score index based on the trained SVM
For a given new test sample, it is more likely a shorter survivor if G(T) >0; and is
more likely to be longer survivor if G(T) <0. The predicted output of G(T) can be
transformed into a probability that can be expressed as so called risk score index as
following:
S (T ) =
1
1 + e − G (T )
4). Classification of samples into high and low risk groups based on the risk
score index
22
The S(T) is in the range of (0,1), with the smaller value indicating for the better
prognosis and the larger value for poorer prognosis. Since we only want to classify
patients into two groups: high risk and low risk, the cut-off value for S(T) would be
selected as 0.5. That is, if S(T) >0.5 then the patient will be assigned to high risk group,
and the patient will be assigned to low risk group if S(T) <0.5.
R e1071 package can be used in the development a SVM index.
2.3 Verification, validation and comparison of predictive classifiers
In order to determine whether the prognostic indices developed from the methods
reviewed in last section are also predictive classifiers, we need first to verify it has truly
the ability to predict the response to treatment.
This is usually done by testing the
significance of the interaction in a Cox model with the subgroup (high risk or low risk)
defined by a given index, treatment and the subgroup by treatment interaction terms. The
indices with significant interactions will be considered as (significant) predictive
classifiers.
For a predictive classifier developed from above, its usefulness in predicting
treatment responses for future patients is also important to assess. This step is usually
called validation. Validation of a prognostic index has been discussed extensively in the
literature
110
. Validation of a predictive classifier is, however, much more complicated.
The ideal method to validate a predictive classifier is carrying out a new prospective
randomized clinical trial. Several designs for these prospective clinical trials have
recently been suggested for the validation of predictive classifiers111, 112 but they are not
feasible for this thesis project. Instead, we will use an approach based on bootstrap
23
method which generates B bootstrap samples from the original sample and then
calculates the proportion of the bootstrap samples with significant interaction terms. This
proportion will be used as a measure for the generalizibility of a predictive classifier to
new population, which can also be used as an informal measure to compare two
predictive classifiers. Another measure based on hazard ratio of interaction, which is
defined as the ratio of treatment hazard ratios between the two classes defined by the
predictive classifiers, and associated 95% confidence interval will also be computed
using the bootstrap method. These two measures will be used to compare and rank the
predictive classifiers developed.
We did not reconstruct the classification indices based on the bootstrap samples
because our objective is to assess the value of the indices as the predictive classifiers and
not to compare the approaches which generated the indices.
24
Chapter 3
Patients and Methods
3.1 NCIC CTG MA.5 Clinical Trial
The NCIC Clinical Trials Group (NCIC CTG) MA.5 trial was a randomized
multi-institutional trial comparing cyclophosphamide, epirubicin, and fluorouracil (CEF)
to cyclophosphamide, methotrexate, and fluorouracil (CMF) in premenopausal women
with node-positive breast cancer.
A total of 716 women with early stage breast cancer were randomized to receive
chemotherapy with CEF (epirubicin 60 mg/m2 and 5-FU 500 mg/m2 both IV days 1 & 8
and cyclophosphamide 75 mg/m2 p.o. days 1-14) or CMF (methotrexate 40 mg/m2 and
5-FU 600 mg/m2 both IV days 1 & 8 and cyclophosphamide 100 mg/m2 p.o. days 1-14)
all given in a 28 day cycle.
Main eligibility criteria for the MA.5 trial included: 1) pre- or perimenopausal
women with node positive breast cancer, 2) histologically proven breast cancer treated
with total or partial mastectomy, 3) no distant metastases, 4) no prior hormonal or
cytotoxic therapy, 5) LVEF ≥45% 6) Adequate hematological, renal and liver function.
The primary endpoint for the MA.5 study was relapse-free survival (RFS) with
overall survival (OS) as the secondary endpoint. RFS was defined as time from random
assignment to recurrence including local breast chest wall and regional and/or distant
recurrence. OS was based on death from any cause (refer TOP II and MA5).
During and after the MA.5 trial, pathologists were asked to submit a
representative formalin-fixed, paraffin-embedded block of tumor tissue from each woman,
25
or if tumor blocks were unavailable, 20 4-μm unstained sections, to the central office of
the NCIC CTG. Paraffin blocks were stored at room temperature, and unstained sections
were kept at 4°C. Samples were identified only by an identification number assigned to
each patient at randomization.
CEF was shown to be superior to CMF and remained superior in terms of relapsefree survival and overall survival after a median follow-up of 10 years113,
114
. As
compared with CMF, however, CEF was associated with increased rates of alopecia,
nausea, vomiting, stomatitis, and febrile neutropenia; a temporary reduction in the quality
of life; and an increase in the risk of congestive heart failure (1.1 percent) and acute
leukemia(1.4 percent) 45, 113, 114.
3.2 Data
Information on the clinicopathologic factors including age, tumor size, ER status, type of
surgery, tumor grade and status of lymph nodes was collected from all patients as part of
data collection for the original clinical trial. Tumor tissues were available from 639 of
710 eligible women (90 percent)45,
113, 114
and tissue microarrays (TMAs) were
constructed from 531 patients with paraffin embedded specimens. TMA assays were
done on the following biomarkers by Dr. Blake Gilks and Dr. David Huntsman at the
Genetic Pathology Evaluation Centre (GPEC) of Vancouver General Hospital: HER2,
p53, CA9, MEP21 (podo,), clusterin (CLU), pAKT, COX2 and TOP2A. The algorithms
to determine these biomarkers as positive or negative were also pre-specified by them.
Assay results were reported to the NCIC CTG central office. All the data required for
this thesis are maintained by NCIC CTG.
26
3.3 Analysis strategies
We first analyzed the prognostic and predictive value for each clinicopathologic
factor and biomarker to understand the contribution from each factor and biomarker.
Two methods described above, Cox model and SVM methods, were then used to develop
prognostic indices based directly on the censored relapse-free or overall survival in MA.5
trial.
The prognostic indices developed were verified to see whether they are also
significant predictive classifiers. Finally, the predictive powers of the predictive
classifiers derived from these two different approaches were evaluated and compared. In
the following, we will describe each of these steps in detail.
3.3.1 Analysis of Prognostic and Predictive Values of Each Individual Variable
For each individual clinicopathologic factor and biomarker, its prognostic value
was assessed using a Cox model with each factor or biomarker as a single covariate and
recurrence-free survival and overall survival times as outcome variables. Factors with a
p-value <0.05 from the Cox model were considered as having significant prognostic
value. Predictive value of each clinicopathologic and biomarker was assessed by
evaluating the treatment by factor interaction term in the Cox model by fitting a Cox
model with treatment, factor and treatment*factor interaction term as covariates and
recurrence-free survival and overall survival times as outcome variables. A factor with a
significant p-value <0.05 for the interaction term was considered as having significant
predictive value.
27
3.3.2 Development of Prognostic Indices
Three different combinations of clinicopathologic factors and biomarkers were
explored to develop prognostic indices with the COX model and SVM methods: 1)
clinicopathologic factors only, 2) molecular biomarkers assayed with TMA only, and 3)
clinicopathologic factors and all molecular biomarkers assayed with TMA combined.
This generated three indices for each of the methods:
•
Index1 (all clinicopathologic factors): age, surgery type, node, grade, tumor
size and EST (clinical assessment of ER protein level)
•
Index2 (all biomarkers): CA9, p53, Podo, CLU, COX2, pAKT, TOP2A,
HER2 and ER
•
Index3 (all clinicopathologic factors and biomarkers combined): age, surgery
type, node, grade, size, EST, CA9, p53, Podo, CLU, COX2, pAKT, TOP2A,
HER2 and ER
Both the recurrence-free survival time and overall survival time were used
separately as outcome variables for constructing these indices with COX model and SVM
methods. In the following, I give some details for each of these two methods.
3.3.2.1 Cox Model Method
With Cox model method, we first fit a Cox proportional hazards model with a set
of covariates (as defined above, all clinicopathologic factors only, all biomarkers from
TMA assay only or the combination of the two types) with either the recurrence-free
survival time or the overall survival time as outcome variable. A series of estimates of
regression coefficients β̂ = ( β̂1 , . . . , βˆ p ) for the input covariates were then generated
28
from the fitted Cox model. With these estimated coefficients, a linear risk score was then
formed by the index function f (X) = βˆ ' X for each given patient based on their values on
all the covariates used to fit the COX model. The mean value of the index scores from all
patients is taken as the cut-off C-value. If for a new patient, her risk score calculated
based on the index function is greater than or equal to the mean of the index (cut-off Cvalue), she will be classified into the high risk or poor prognostic group and into the low
risk or good prognostic group otherwise.
3.3.2.2 Support Vector Machine (SVM) method
Supervised learning approach was used for support vector machine method. The
training dataset were first defined. In contrast to fitting the COX model, in which all the
available samples were used, the training data set used for supervised training of the
SVMs included only the extreme samples, i.e., the samples with longer or shorter
survival times. The samples with mid-range survival times were excluded from the
training process. Descriptive analysis showed that the mean for recurrent free survival
time and overall survival time in this study were 6.2 and 7.4 years respectively. Based on
these two numbers, the cut-off criteria for the longer or shorter term survivors were
defined as follows:
1) If a patient had a recurrence-free survival time or overall survival time less
than 5 years and died, then we treated this patient as shorter survivor.
2) On the other hand, if a patient was still alive and had recurrence-free survival
time or overall survival time greater than 6 years, then we treated this patient as
longer survivor.
29
A two-class SVM with linear kernel function was then trained with supervised
learning approach on the training dataset, where class consisted of longer or shorter
survivors identified using above criteria. Because missing data was not allowed for
support vector machine method, patients with missing data on any input variable were
excluded in SVM model. The supervised training of SVM was accomplished using the
following code within the e1071 package in R:
svm.model
<-
svm(CLASS
~.
,
data=trainset,
cost=100,
gamma=1,
kernel="linear")
With the trained SVM, a classification rule was then formed and applied to
classify any patient into two predicted groups, shorter survivor or longer survivor, based
on her values on the input variables (covariates). We assumed these two groups can be
regarded as respectively high risk group and low risk group. This prediction classifying
process was realized with the following code within the e1071 package in R with the
two-class SVM model:
svm.pred <- predict(svm.model, testset[, -X]) ,
where X denotes the position of class variable. X depended on the number of clinical
and/or biomarkers in each index.
3.3.3 Verification for the Prognostic Indices as Predictive Classifier for the
Treatment
To asses whether a prognostic index constructed based on respectively Cox model
and SVM methods can also be used as a predictive classifier for prediction of responses
to CEF and CMF treatment, we first assess the difference of two treatments in each of the
30
risk groups defined by the index based on log-rank test and Kaplan-Meier plots. Cox
proportional hazards models were then fitted using recurrence-free survival time or
overall survival time as outcome variables with treatment, risk group (high risk or low
risk) defined either by the index, and treatment by risk group interaction terms in the
model. An index was considered to have predictive value if the treatment by risk group
term was significant at 0.05 level in the fitted COX model.
3.3.4 Validation of Predictive Classifier using Bootstrap Methods
In order to assess the usefulness of a predictive classifier in the prediction of
treatment responses for future patients, validation using bootstrap approach was adopted
for this project. With this approach, 1,000 bootstrap samples from the original sample
were first generated. Cox proportional hazard models were then fitted with treatment, risk
group and treatment by risk group interaction terms in the model using either recurrencefree survival or overall survival time as out come variable. The proportion of the
bootstrap samples with significant interaction terms were calculated for each index.
In addition, with the bootstrap samples, the mean and the confidence interval for
the difference in the logarithms of interaction hazard ratios (Ratio of Hazard Ratio, RHR)
between the two risk groups classified by the index were also computed.
31
Chapter 4
Results
4.1 Data Sets
Data on clinical and pathologic factors in the MA.5 clinical trial were available
for a total of 716 participating patients, including age, surgical type (Total vs Partial
Mastectomy), tumor size (T1, T2, T3), tumor grade (1, 2, 3), number of positive lymph
nodes and EST (clinical assessment of ER protein levels, i.e., local ER measurements
from each center [ ≥ 10, <10 fmol protein/mg tissue]). Data on biomarkers from Tissue
Microarrays (TMAs) profiling were available for 531 patients. Biomarkers that were
assessed in this trial include CA9, p53, Podo, CLU, COX2, pAKT, TOP2A, HER2 and
ER (estrogen receptor status).
4.2 Summary of Baseline Characteristics
The distributions of baseline characteristic on the 531 patients included in the
analysis were summarized in Table 1. Among the 531 patients included, 255 patients
were treated with CEF and 276 with CMF.
270 (50.8%) of these patients had a
recurrence of breast cancer and 213 died during the trial period. The average age of all
included was 41.9 years old with no difference between patients in the two treatment
groups. The distributions of surgery type, tumor grade, tumor size and number of positive
nodes in the two treatment groups were similar. The percentages of subjects with data
available on each biomarker were also similar between the two treatment groups.
Analyses for the prognostic and predictive effects of HER2 and TOP2A were reported
32
previously but the published results on HER2 were based on the data from RT-PCR
quantification. HER2 level was assayed using TMA in this analysis and data were
available on a total of 434 patients, among which 57 were HER2 amplified, of whom 23
were in CEF group and 34 in CMF arm. Among the 423 subjects with TOP2A tested, 24
patients had TOP2A deleted and 53 were TOP2A amplified with no significant
differences among the CEF and CMF treatment groups.
33
Table 1: Baseline Characteristics of Patients
Age (years)
Surgery type, No.
(%)
Grade, No. (%)
Tumor Size, No. (%)
Node positive, No.
(%)
EST, No. (%)
HER2 status, No.
(%)
CA9, No. (%)
COX2, No. (%)
Mean ± SD
Partial
mastectomy
Mastectomy
1
2
CEF
(N=255)
41.6 ± 11.0
129 (50.6%)
CMF
(N=276)
42.3 ±9 .9
140 (50.7%)
Total
(N=531)
41.9±10.5
269(50.7%)
126 (49.4%)
27 (10.6%)
80 (31.4%)
136 (49.3%)
28 (10.1% )
86 (31.2% )
3
143 (56.1%)
152 (55.1% )
missing
T1
T2
T3
missing
0
5 (1.9%)
92 (36.1%)
134 (52.6%)
9 (3.5%)
20 (7.8%)
0 (0.0%)
10 (3.6)
110 (39.9%)
135 (48.9%)
16 (5.8%)
15 (5.4%)
0 (0.0%)
262 (49.3%)
55 (10.4% )
166
(31.3% )
295
(55.6% )
15 (2.8%)
202(38.0%)
269 50.7%)
125 (4.7%)
35 (6.6%)
0 (0.0%)
1<=Node<=3
>3
<=10
>10
missing
Amplified
154 (60.4%)
101 (39.6%)
67 (26.3%)
150 (58.8%)
38 (14.9%)
23 (9.0%)
166 (60.1%)
110 (39.9%)
71 (25.7%)
160 (58.0%)
45 (16.3%)
34 (12.3%)
320 (60.3%)
211 (39.7%)
138(26.0%)
310 (58.4%)
83 (15.6%)
57 (10.7%)
Non-amplified
Missing
0
1
3
Missing
0
1
missing
184 (72.1%)
48 (18.8%)
203 (79.6%)
10 (3.9%)
28 (11.0%)
14 (5.5%)
144 (56.5%)
93 (36.5%)
18 (7.0%)
193 (69.9%)
49 (17.7%)
213 (77.2%)
21 (7.6%)
27 (9.8%)
15 (5.4%)
146 (52.9%)
106 (38.4%)
26 (8.7%)
377 (71.0%)
97 (18.3%)
416 (78.3%)
31 (5.8%)
55 (10.4%)
29 (5.5%)
290 (54.6%)
199(37.5%)
44 (8.3%)
34
Table 1 (continued): Baseline Characteristics of Patients
CEF
CMF
(N=255)
(N=276)
P53, No. (%)
0
191(74.9%)
192 (69.6%)
1
26(10.2%)
28 (10.1%)
3
17 (6.7%)
33 (12.0%)
missing
21 (8.2%)
23 (8.3%)
PODO, No. (%)
0
23 (9.0%)
16 (5.8%)
1
137 (53.7%)
138 (50.0%)
2
60 (23.5%)
77 (27.9%)
3
10 (3.9%)
16 (5.8%)
missing
25 (9.8%)
29 (10.5%)
CLU, No. (%)
0
21 (8.2%)
22 (8.0%)
1
161 (63.1%)
153 (55.4%)
3
51 (20.0%)
71 (25.7%)
Missing
22 (8.6%)
30 (10.8%)
PAKT, No. (%)
0
4 (1.6%)
4 (1.5%)
1
86 (33.7%)
101 (36.6%)
2
118 (46.3%)
115 (41.7%)
3
11 (4.3%)
16 (5.8%)
missing
36 (14.1%)
40 (14.5%)
TOP2A, No. (%)
0
161 (63.1%)
185 (67.0%)
1
11 (4.3%)
13 (4.7%)
2
32 (12.5%)
21 (7.6%)
missing
51 (20%)
57 (20.6%)
ER LEVEL, No. (%)
0
119 (46.6%)
132 (47.8%)
1
71 (27.8%)
63 (22.8%)
2
37 (14.5%)
43 (15.6%)
3
16 (6.3%)
18 (6.5%)
missing
12 (4.7%)
20 (7.2%)
Total
(N=531)
383 (72.1%)
54 (10.2%)
50(9.4%)
44 (8.3%)
39(7.3%)
275 (51.8%)
137(25.8%)
26 (4.9%)
54 (10.1%)
43(8.1%)
314 (59.1%)
122(23.0%)
52 (9.8%)
8(1.5%)
187 (35.2%)
233(43.9%)
27 (5.1%)
76 (14.3%)
346 (65.2%)
24 (4.5%)
53 (10.0%)
108 (20.3%)
251 (47.3%)
134 (25.2%)
80 (15.1%)
34 (6.4%)
32 (6.0%)
4.3 Prognostic and Predictive Values of Each Individual Variable
Table 2 lists all the p-values for assessment of both prognostic and predictive
values for all the factors and biomarkers included in this study.
Five clinicopathologic factors including age, surgery type, number of positive
lymph node, grade and tumor size and three biomarkers including HER2, P53 and ER
were found to be of significant prognostic factors when recurrence-free survival was used
35
as outcome variable. When assessing prognostic value using overall survival, in addition
to the above clinical and biomarkers, clinicopathologic factor EST and biomarker PODO
were also found to have significant prognostic values.
No clinicopathologic factor was found to have predictive value when either
recurrence-free survival time or overall survival time was used as a response variable.
HER2 was found to have a statistically significant interaction with treatment with both
recurrence-free survival time (p=0.0154) and overall survival time (p=0.0409) being as
the outcome variable, indicating that HER2 has strong predictive value in predicting
treatment response differences between the CEF and CMF treatment groups. The
interaction between TOP2A and treatment was also found to be statistically significant
(p=0.0277) when the overall survival was the response variable and only a trend toward
significant (p=0.0904) was found with recurrence-free survival time as outcome variable.
36
Table 2: Analysis of Single Factor and Biomarker as Prognostic and Predictive Factors
(p-value of treatment or interaction effects from Cox model)
Recurrence-free Survival
_____________________
Prognostic
Predictive
Overall Survival
_______________________
Prognostic
Predictive
Clinical factors
Age
0.0178
0.7060
0.0915
0.1359
Surgery type
0.0008
0.8166
<.0001
0.9889
EST
0.5642
0.7907
0.0088
0.3598
Node
<.0001
0.5734
<.0001
0.7727
Grade
<.0001
0.5704
<.0001
0.3757
Size
<.0001
0.3702
<.0001
0.8922
0.0119
0.0154
0.0004
0.0409
CA9
0.8765
0.5368
0.1575
0.2136
P53
0.0297
0.8231
<.0001
0.3005
PODO
0.8421
0.3801
0.0424
0.4718
CLU
0.2079
0.5298
0.1074
0.1306
ER
0.0016
0.1712
<.0001
0.8130
COX2
0.7117
0.4685
1.0000
0.5409
PAKT
0.4427
0.5444
0.5895
0.9181
TOP2A
0.4126
0.0904
0.0672
0.0277
Biomarkers
HER2
37
4.4 Prognostic Indices
4.4.1 Classification Rules based on COX Model Method
The following are the three index functions constructed based on Cox model
using recurrence-free survival time as outcome variable:
1) Index1 = (-0.01654)*age + 0.33840*surgtype + 0.43112*grade + 0.08767*node + 0.50894*size +
0.0002281*EST
2) Index2 = ( -0.12721)*CA9 + 0.00759*P53 + ( -0.08475)*PODO + 0.09723*CLU + ( -0.09908)*COX2 +
( 0.01352)*PAKT+ ( -0.07040)*TOP2A + 0.34702*HER2 + ( -0.23602)*ER
3) Index3 = (-0.01396)*age + 0.54277*surgtype + 0.22260*grade + 0.09461*node + 0.34215*size + (0.14113)*ER +
0.0009667*EST + ( -0.04800)*CA9 + 0.06618*P53 + ( -0.19417)*PODO + ( -0.01023)*CLU +
( -0.12836)*COX2 + ( 0.05839)*PAKT + ( -0.13051)*TOP2A + 0.28897*HER2
The three index functions constructed using overall survival time as outcome
variable are as following:
4) Index1 = (-0.01345)*age + 0.46177*surgtype + 0.75219*grade + 0.07922*node + 0.55648*size + (0.00215)*EST
5) Index2 = ( -0.08191)*CA9 + 0.08725*P53 + ( 0.07409)*PODO + 0.16671*CLU + ( -0.08720)*COX2 +
( -0.08785)*PAKT+( 0.01080)*TOP2A + 0.50923*HER2 + (-0.32309)*ER
6) Index3 = ( -0.01242)*age + 0.63033*surgtype + 0.40067*grade + 0.08967*node + 0.43195*size +
0.0005004*EST + ( 0.03415)*CA9 + 0.16667*P53 + ( -0.06177)*PODO + ( 0.05415)*CLU +
( -0.00332)*COX2 + ( 0.00217)*PAKT + ( -0.08976)*TOP2A + 0.49786*HER2 + ( -0.15314)*ER
Table 3.1 summarized the cut-off C-value for each index with the recurrence-free
survival time as outcome variable and the number and percentage of patients classified
into high and low risk groups. For Index 1, which was constructed with only the
clinicopathologic factors being used, 40.1% of patients were classified into high risk
group and 59.9% in good (low risk) prognostic group. With Index 2 developed based on
only biomarkers, 174 (32.8%) and 357 (67.2%) patients were classified into poor and
good prognostic group respectively. When using all clinicopathologic factors and
38
biomarkers in Index 3, there were about 137 (25.8%) patients assigned to the poor
prognostic group and 394 (74.2%) to the good prognostic group. Results are similar when
overall survival was used as outcome variable (Table 3.2). For comparison purposes, we
also presented results for indices developed based on HER2 and TOP2A only and their
combination. It seems fewer patients were classified into the high risk group if these two
biomarkers were used alone or in combination.
39
Table 3.1: Summary of Three Indices for Recurrence-free Survival based on COX Model
method
____Low Risk Group____
CEF
CMF
Total
151
167
318
(28.4) (31.6%) (59.9%)
Index 1
1.74
____High Risk Group___
CEF
CMF
Total
104
109
213
(19.5%) (20.5%) (40.1%)
Index2
-0.20
171
185
356
84
91
175
(15.8%) (17.1%) (32.9%) (32.2%) (34.8%) (67.0%)
Index3
0.92
59
76
135
196
200
396
(11.1%) (14.3%) (25.4%) (36.9%) (37.6%) (74.6%)
HER2
0.06
23
(4.33%)
34
(6.4%)
57
232
242
474
(10.7%) (43.7%) (45.6%) (89.3%)
TOP2A
0.02
43
(8.1%)
34
(6.4%)
77
212
242
454
(14.5%) (39.9%) (45.6%) (85.5%)
HER2+TOP2A
0.06
30
21
(3.95%) (5.65%)
Index
Cut-off
C-value
40
51
(9.6%)
234
246
480
(44.1%) (46.3%) (90.4%)
Table 3.2: Summary of Three Indices for Overall Survival based COX model Method
Index
Index 1
Index2
Index3
HER2
TOP2A
HER2+TOP2A
___High Risk Group____
CEF
CMF
Total
___Low Risk Group___
CEF
CMF
Total
2.62
112
(21.1%)
115
(21.6%)
227
(42.7%)
143
(26.9%)
161
(30.4%)
304
(57.3%)
-0.015
84
(15.8%)
100
(18.8%)
184
(34.7%)
171
(32.2%)
176
(33.2%)
347
(65.3%)
64
84
(15.8%)
148
(27.9%)
191
(36.0%)
192
(36.1%)
383
(72.1%)
0.09
(12.1%)
23
(4.3%)
34
(6.4%)
57
(10.7%)
232
(43.7%)
242
(45.6%)
474
(89.3%)
0.06
43
(8.1%)
34
(6.4%)
77
(14.5%)
212
(39.9%)
242
(45.6%)
454
(85.5%)
0.08
21
(3.9%)
30
(5.6%)
51
(9.6%)
234
(44.1%)
246
(46.3%)
480
(90.4%)
Cutoff
Cvalue
1.93
41
4.4.2 Classification Rules based on SVM Method
The numbers and percentages of patients who were classified into lower/higher
risk groups based on prognostic indices developed with SVM method are presented in
Table 4.1 for recurrence-free survival and in Table 4.2 for overall survival. Numbers and
percentages are also displayed for each treatment group. In these two tables, SVM1,
SVM2 and SVM3 represent indices developed by SVM method with clinicopathologic
factors, only biomarkers, and all clinicopathologic factors and biomarkers respectively.
Compared to the classification results based on COX model method, smaller numbers of
patients were classified into poor (low risk) group under SVM approach.
42
Table 4.1: Summary of Classification Results for Recurrence-free Survival based on
SVM method
______Low Risk Group______
CEF
CMF
Total
SVM1
______High Risk
Group_____
CEF
CMF
Total
67
69
136
(15.6%) (16.0%) (31.6%)
144
(33.4%)
151
(35.0%)
295
(68.4%)
SVM2
20
(5.7%)
28
(7.9%)
48
(13.6%)
143
(40.6%)
161
(45.8%)
304
(86.4%)
SVM3
43
(15.0%)
60
(20.9%)
103
(36.0%)
87
(30.4%)
96
(33.6%)
183
(64.0%)
HER2+TOP2A
21
(5.6%)
30
(8.0%)
51
(13.6%)
156
(41.6%)
168
(44.8%)
324
(86.4%)
43
Table 4.2: Summary of Classification Results for Overall Survival based on SVM
Method
______Low Risk Group______
CEF
CMF
Total
SVM1
______High Risk
Group_____
CEF
CMF
Total
50
55
105
(11.6%) (12.8%) (24.4%)
161
(37.3%)
165
(38.3%)
326
(75.6%)
SVM2
20
(5.7%)
28
(7.9%)
48
(13.6%)
143
(40.6%)
161
(45.8%)
304
(86.4%)
SVM3
28
(9.8%)
37
(12.9%)
65
(22.7%)
102
(35.7%)
109
(41.6%)
221
(77.3%)
HER2+TOP2A
21
(5.6%)
30
(8.0%)
51
(13.6%)
156
(41.6%)
168
(44.8%)
324
(86.4%)
44
4.5 Prognostic Indices as Predictive Classifiers
4.5.1 Treatment Effects in High and Low Risk Groups Defined by Prognostic
Indices
To check whether the prognostic indices developed above can be considered also
as predictive classifiers, we first studied the difference in treatment with CEF and CMF
for each of the two risk groups defined by a given prognostic index to see whether
treatment effects are different for patients in these two groups. Table 5.1~5.4 and KaplanMeier survival curves in Figure 1~6 summarize the results of these investigation results
for all prognostic indices developed based on Cox model and SVM methods.
When only clinicopathologic factors were included, no significant treatment
difference between CEF and CMF was found in low risk groups defined by the
prognostic indices developed based on either Cox model or by SVM method. This can be
seen also from Kaplan-Meier survival curves (graphs b and d of Figure 1 and 4), which
showed that the CEF and CMF curves were very close although the CEF curve was
always above the CMF curve. For the high risk group defined by these indices, although
the difference between two treatments did not reach significance, patients treated by CEF
had apparently better response to those treated by CMF in terms of either recurrence-free
survival or overall survival. It was worth noticing that index developed based on SVM
performed better than that based on Cox model method since p-value for the treatment
difference almost reached significance (p=0.0592).
45
When all biomarkers were used, the results for low risk groups defined by
prognostic indices developed by Cox model and SVM methods were very similar to that
with only clinicopathologic factors. For the high risk group defined by the SVM
prognostic index, patients treated by CEF had a significant better relapse-free survival
(p=0.0396) and marginally better overall survival (p=0.0742) than those treated by CMF.
In contrast, no significant treatment difference was found for patients in the high risk
group defined by the prognostic index developed based on Cox model method. This
shows an apparent better performance of SVM method to the COX model method.
We have hypothesized that combination of clinicopathologic factors and
biomarkers may improve performance of the predictive classifiers based on only
clinicopathologic factors or biomarkers. But, our hypothesis was not confirmed in this
study. Actually, unexpected results were observed, which showed that patients in the low
risk group, instead of high risk group, defined by the SVM prognostic index had a
significantly different treatment effect in relapse free survival (p=0.0482).
For comparison purposes, results for indices developed based on HER2 and
TOP2A alone and their combination are also presented in tables and figures. Consistent
with results published, no significant difference between treatments was found for
patients in the low risk group defined by these indices, where significant difference in
both relapse free and overall survivals was found for patients in the high risk group
defined by either HER2 or TOP2A alone. For the index defined by combination of
HER2 and TOP2A, significant difference was found on relapse free survival (p=0.0386)
and only marginally significant difference on overall survival.
46
Table 5.1 Summary of Treatment Effect (CEF/CMF) in Recurrence-Free Survival for
Good Prognostic (Low Risk) Group
Classifiers
Index1/SVM1 = age surgtype
grade node size EST
Index2 /SVM2 = CA9 P53
PODO CLU COX2
PAKT
TOP2A HER2 ER
Index3 /SVM3= age surgtype
grade node size EST
CA9 P53 PODO CLU
COX2 PAKT TOP2A
HER2 ER
HER2
Cox Model
Hazard ratios
P-value
(95%CI)
for
treatment
0.2775
0.83
(0.59, 1.17)
SVM
Hazard ratios
P-value
(95%CI)
for
treatment
0.1817
0.78
(0.54, 1.12)
0.2257
0.83
(0.61, 1.12)
0.8114
0.96
(0.70, 1.33)
0.2174
0.83
(0.62, 1.12)
0.0482
0.62
(0.38, 0.99)
0.5607
0.93
(0.72, 1.20)
0.91
(0.70, 1.18)
0.90
(0.70, 1.16)
0.9013
0.98
(0.72, 1.34)
TOP2A
0.4728
HER2+TOP2A
0.4132
47
Table 5.2 Summary of Treatment Effect (CEF/CMF) in Recurrence-Free Survival for
Poor Prognostic (Good Risk) Groups
Classifiers
Index1/SVM1 = age surgtype
grade node size EST
Index2 /SVM2 = CA9 P53
PODO CLU COX2
PAKT
TOP2A HER2 ER
Index3 /SVM3= age surgtype
grade node size EST
CA9 P53 PODO CLU
COX2 PAKT TOP2A
HER2 ER
HER2
Cox Model
Hazard ratios
P-value
(95%CI)
for
treatment
0.2145
0.81
(0.58, 1.13)
SVM
Hazard ratios
P-value
(95%CI)
for
treatment
0.0592
0.68
(0.46, 1.02)
0.3592
0.83
(0.56, 1.23)
0.0396
0.44
(0.20, 0.96)
0.6711
0.92
(0.61, 1.38)
0.7617
0.93
(0.59, 1.45)
0.0111
0.37
(0.18, 0.81)
0.48
(0.26, 0.88)
0.45
(0.21, 0.96)
0.0386
0.45
(0.21, 0.96)
TOP2A
0.0173
HER2+TOP2A
0.0386
48
Table 5.3 Summary of Treatment Effect (CEF/CMF) in Overall Survival for Good
Prognostic (Low Risk) Group
Classifiers
Index1/SVM1 = age surgtype
grade node size EST
Index2 /SVM2 = CA9 P53
PODO CLU COX2
PAKT
TOP2A HER2 ER
Index3 /SVM3= age surgtype
grade node size EST
CA9 P53 PODO CLU
COX2 PAKT TOP2A
HER2 ER
HER2
Cox Model
Hazard ratios
P-value
(95%CI)
for
treatment
0.726
1.08
(0.70, 1.66)
SVM
Hazard ratios
P-value
(95%CI)
for
treatment
0.1800
0.76
(0.50, 1.14)
0.6017
0.91
(0.64, 1.30)
0.5710
1.11
(0.77, 1.60)
0.6415
0.92
(0.65, 1.30)
0.0906
0.65
(0.39, 1.07)
0.9277
0.98
(0.73, 1.32)
0.99
( 0.74, 1.34)
0.96
(0.72, 1.28)
0.5157
1.12
(0.79, 1.60)
TOP2A
0.9716
HER2+TOP2A
0.7706
49
Table 5.4 Summary of Treatment Effect (CEF/CMF) in Overall Survival for Poor
Prognostic (High Risk) Group
Classifiers
Index1/SVM1 = age surgtype
grade node size EST
Index2 /SVM2 = CA9 P53
PODO CLU COX2
PAKT
TOP2A HER2 ER
Index3 /SVM3= age surgtype
grade node size EST
CA9 P53 PODO CLU
COX2 PAKT TOP2A
HER2 ER
HER2
Cox Model
Hazard ratios
P-value
(95%CI)
for
treatment
0.048
0.70
(0.50, 0.99)
SVM
Hazard ratios
P-value
(95%CI)
for
treatment
0.0615
0.65
(0.41, 1.02)
0.4779
0.86
(0.57, 1.30)
0.0742
0.48
(0.22, 1.07)
0.5837
0.89
(0.58, 1.37)
0.7206
0.90
(0.51, 1.59)
0.0349
0.45
(0.21, 0.95)
0.41
(0.22, 0.79)
0.51
(0.24, 1.09)
0.0847
0.51
(0.24, 1.10)
TOP2A
0.0074
HER2+TOP2A
0.0847
50
Figure 1: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by
Prognostic Indices Developed based on COX model and SVM methods and All
Clininopathologic factors
[a]
[b]
51
Figure 1: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by
Prognostic Indices Developed based on COX model and SVM methods and All
Clininopathologic factors (continued)
[c]
[d]
52
Figure 2: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by
Prognostic Indices Developed based on COX model and SVM methods and All
Biomarkers
[ a]
[b]
53
Figure 2: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by
Prognostic Indices Developed based on COX model and SVM methods and All
Biomarkers (continued)
[c]
[d]
54
Figure 3: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by
Prognostic Indices Developed based on COX model and SVM methods and All
Clininopathologic Factors and Biomarkers
[a]
[b]
55
Figure 3: Kaplan-Meier Curves of Recurrence Free Survival by Risk Group defined by
Prognostic Indices Developed based on COX model and SVM methods and All
Clininopathologic Factors and Biomarkers (continued)
[c]
[d]
56
Figure 4: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic
Indices Developed based on COX model and SVM methods and All Clininopathologic
factors
[a]
[b]
57
Figure 4: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic
Indices Developed based on COX model and SVM methods and All Clininopathologic
factors (continued)
[c]
[d]
58
Figure 5: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic
Indices Developed based on COX model and SVM methods and All Biomarkers
[a]
[b]
59
Figure 5: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic
Indices Developed based on COX model and SVM methods and All Biomarkers
(continued)
[c]
[d]
60
Figure 6: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic
Indices Developed based on COX model and SVM methods and All Clininopathologic
factors and Biomarker
[a]
[b]
61
Figure 6: Kaplan-Meier Curves of Overall Survival by Risk Group defined by Prognostic
Indices Developed based on COX model and SVM methods and All Clininopathologic
factors and Biomarkers (continued)
[c]
[d]
62
Figure 7: Recurrence free survival curves by COX model and SVM approach –
HER2/TOP2A
[a]
[b]
63
Figure 7: Recurrence free survival curves by COX model and SVM approach –
HER2/TOP2A (continued)
[c]
[d]
64
Figure 8: Overall survival curves by COX model and SVM approach – HER2/TOP2A
[a]
[b]
65
Figure 8: Overall survival curves by COX model and SVM approach – HER2/TOP2A
(continued)
[c]
[d]
66
Figure 9: Recurrence free and overall survival curves by COX model – HER2
[a]
[b]
67
Figure 9: Recurrence free and overall survival curves by COX model – HER2 (continued)
[c]
[d]
68
Figure 10: Recurrence free and overall survival curves by COX model – TOP2A
[a]
[b]
69
Figure 10: Recurrence free and overall survival curves by COX model – TOP2A
(continued)
[c]
[d]
70
4.5.2 Hazard Ratio and Confidence Interval for Interaction between Treatment and
Prognostic Index
To formally confirm whether a prognostic index is also a predictive classifier, the
hazard ratios and associated 95% confidence intervals for the interaction between
treatment and risk group defined by prognostic index along with the p-values are
presented respectively in Table 5.5 with recurrence-free survival time as outcome and
Table 5.6 with overall survival time as outcome. Consistent with observations made
above based on treatment effects in respect to risk groups and Kaplan-Meier curves, only
prognostic index developed based on SVM method and all biomarkers had marginally
significant p-values for interaction between treatment and risk groups in terms of either
relapse-free survival (p-value=0.0527, hazard ratio=2.32) or overall survival (pvalue=0.0573, hazard ratio=2.34). In contrast, the p-value of interaction was significant
for both recurrence-free and overall survivals when either HER2 or TOP2A was used
alone and marginally significant when they were combined.
71
Table 5.5: Summary of Predictive Results in Recurrence-free Survival
Classifiers
Index1/SVM1 = age surgtype
grade
node size EST
Index2 /SVM2 = CA9 P53
PODO
CLU COX2
PAKT
TOP2A HER2
ER
Index3 /SVM3= age surgtype
grade
node size EST CA9
P53
PODO CLU COX2
PAKT
TOP2A HER2 ER
HER2
Cox Model
SVM
Hazard
P-value of
Hazard
P-value of
Ratio for
treatment
Ratio for
treatment
by index Interaction* by index Interaction*
Term
interaction
Term
interaction
(95% CI)
(95% CI)
0.9480
1.02
0.5939
1.16
(0.63, 1.64)
(0.68,
1.98)
0.9955
1.01
(0.61, 1.64)
0.0527
2.32
(0.99, 5.43)
0.6691
0.90
(0.54,1.49)
0.3448
0.73
(0.38,1.41)
0.0196
2.51
(1.16, 5.45)
TOP2A
0.0386
2.0
(1.04, 3.86)
HER2+TOP2A
0.0623
2.12
0.0438
(0.96, 4.68)
*Ratio of the treatment hazard ratios between the two classes defined.
72
2.30
(1.02, 5.17)
Table 5.6: Summary of prognostic and predictive indices (overall survival)
Cox Model
Classifiers
Index1/SVM1 = age surgtype grade
node size EST
Index2 /SVM2 = CA9 P53 PODO
CLU COX2 PAKT
TOP2A HER2 ER
Index3 /SVM3= age surgtype grade
node size EST CA9 P53
PODO CLU COX2
PAKT TOP2A HER2 ER
HER2
SVM
P-value of
treatment by
index
interaction
0.1262
Hazard Ratio
for Interaction
Term*
(95% CI)
1.54
(0.89, 2.68)
P-value of
treatment
by index
interaction
0.5745
Hazard Ratio
for Interaction
Term*
(95% CI)
1.19
(0.65, 2.19)
0.8357
1.06
(0.61, 1.83)
0.0573
2.34
(0.97, 5.61)
0.9261
1.03
(0.59, 1.79)
0.3646
0.71
(0.33, 1.50)
0.0459
2.25
(1.01, 4.98)
2.51
(1.24, 5.07)
1.95
(0.87, 4.38)
0.0540
2.27
(0.99, 5.22)
TOP2A
0.0108
HER2+TOP2A
0.1055
*Ratio of the treatment hazard ratios between the two classes defined.
73
4.6 Validation of Predictive Classifier using Bootstrap Methods
Although no significant predictive classifier was found by either Cox model or
SVM method for all clinicopathologic factors, biomarkers or their combination, we still
evaluated the performance of each classifier based on proportion of significant interaction
p-values and interaction hazard ratio and associated confidence intervals estimated from
bootstrap samples. Summary of the proportion of significant interaction p-values were
presented in Table 6.1 for recurrence-free survival and Table 6.2 for overall survival. We
can see from these two tables that the proportions were quite low for prognostic index
developed by Cox model method: all were lower than 10% except for index based on all
clinicopathologic factors with overall survival as outcome variable. Classifiers based on
SVM method performed better than these based on COX model method in all cases
except in the same case where Cox model had higher than 10% proportion as mentioned
before. SVM method performed best when either all biomarkers or only combination of
HER2 and TOP2A were used but it was not superior to the classifiers based on HER2 and
TOP2A alone.
Tables 7.1 and 7.2 present the mean and the confidence interval for the hazard
ratio of interaction between treatment and risk groups defined by the prognostic indices
developed based on Cox model and SVM method respectively for recurrence-free
survival and overall survival. The conclusions from these results were the same with what
were observed above based on proportion of significant p-values.
74
Table 6.1: Summary for Proportion of Significant Interaction between Treatment and
Risk Group Estimated by Bootstrap Method for Recurrence-Free Survival
Classifiers
Index1/SVM1 = age surgtype grade node size
EST
Index2 /SVM2= CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
Index3 /SVM3= age surgtype grade node size
EST
CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
Cox Model
SVM
6.3%
9.5%
5.2%
50.3%
6.9%
25.2%
HER2
65.7%
TOP2A
50.9%
HER2+TOP2A
49.5%
75
54.5%
Table 6.2: Summary for Proportion of Significant Interaction between Treatment and
Risk Group Estimated by Bootstrap Method for Overall Survival
Classifiers
Index1/SVM1 = age surgtype grade node size
EST
Index2 /SVM2= CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
Index3 /SVM3= age surgtype grade node size
EST
CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
Cox Model
SVM
35.3%
12.6%
6.2%
46.2%
5.7%
13.5%
HER2
54.5%
TOP2A
75.3%
HER2+TOP2A
36.7%
76
46.2%
Table 7.1: Summary of Hazard Ratio of Interaction in Recurrence-free survival) and 95%
CI Estimated by Bootstrap Method
Cox Model
Classifiers
SVM
Index1/SVM1 = age surgtype grade node size 1.01 (0.99-1.03) 1.17 (1.15-1.20)
EST
Index2 /SVM2= CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
1.01(0.99-1.02)
2.34 (2.25-2.42)
Index3 /SVM3= age surgtype grade node size
0.90 (0.88-0.91) 0.66 (0.64-0.68)
EST
CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
HER2
2.51 (2.44,
2.56)
TOP2A
1.95 (1.91,
1.99)
HER2+TOP2A
2.17 (2.12,
2.23)
77
2.31 (2.23,
2.39)
Table 7.2: Summary of Hazard Ratio of Interaction in Recurrence-free survival) and 95%
CI Estimated by Bootstrap Method
Cox Model
Classifiers
Index1/SVM1 = age surgtype grade node size 1.57 (1.54-1.60)
EST
Index2 /SVM2= CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
1.04 (1.02-1.06)
Index3 /SVM3= age surgtype grade node size
1.02 (1.00-1.04)
EST
CA9 P53 PODO CLU COX2
PAKT TOP2A HER2 ER
HER2
2.27 (2.22,
2.33)
2.53 (2.48,
2.59)
1.97 (1.92,
2.02)
TOP2A
HER2+TOP2A
78
SVM
1.21 ( 1.181.24)
2.33 (2.24,
2.42)
0.71 (0.69,
0.75)
2.25 (2.17,
2.33)
Chapter 5
Conclusions and Discussion
As suggested in other studies96, 97 on breast cancer and other types of cancer, we
hypothesized that the combination of immunohistochemical information from biomarkers
assessed by Tissue Microarrays (TMAs) with data on clinical and pathophysiological
factors may provide a better classifier in predicting response to chemotherapy than using
data on biomarkers or clinicopathologic factors alone. We have explored two different
methods, a traditional COX model method and a new machine learning method, SVM,
for development of classifiers based on combination of the data. Our hypothesis was,
however, not confirmed by the analysis of data from NCIC CTG MA.5, a clinical trial
comparing CEF vs. CMF in the treatment of early breast cancer. The only significant
predictive classifier we identified was based on two single biomarkers alone. Even
classifiers developed based on combination of these two biomarkers were not better than
that based on these two biomarkers alone.
There could be several explanations for these findings. It could be that two
biomarkers, HER2 and TOP2A, are just too strong for prediction of response to these
particular treatments.
Therefore, adding information on other biomarkers and
clinicopathologic factors, none of them proven as being predictive alone, would only
dilute their effects.
Another reason may be the statistical approach we used.
As
mentioned before, we chose to find a prognostic index first and then to verify whether it
is a significant predictive classifier. A direct approach to identify predictive classifiers
79
which take into account the biomarkers and clinicopathologic factors may yield different
results but this kind of approach has not been yet available in statistical literature.
The biological and clinical significance of HER2 and TOP2A, especially HER2,
for treatment decision making in the therapy for early breast cancer is well established
both in work done by others and described in previous publications and by us in this
study. HER2 amplification is however quite rare (around 10% in this study and variably
described between 10-25% in different series). Although HER2 has been shown to be the
most reliable predictor for response to CEF treatment, this biomarker is limited to a small
group of breast cancer patients. The importance of the identification of more sensitive
predictive classifiers becomes apparent.
The results from two statistical methods used to develop predictive classifiers
demonstrated that the SVM method outperformed the COX model method. When using
the bootstrap method to evaluate the performance of each classifier based on proportion
of significant interaction p-values, the estimation of the interaction hazard ratio and
associated confidence intervals, classifiers developed with SVM method were
demonstrated to be more stable. The proportion of significant p-values was higher for the
classifiers developed based on SVM method except in one case when overall survival
was the outcome variable and only clinicopathologic data were used.
One of the strengths of this project is to use clinical trial data to test our
hypothesis. Data from randomized controlled clinical trials, such as NCI-CTG MA.5,
provide highest level of evidence in evaluating treatments115. The prospective nature of
clinical trials gives the researchers control over the data collection, which is superior to
80
retrospective studies in terms of completeness, integrity and quality of the data. In
addition, the randomization in the clinical trial protect against bias in the allocation of
subjects to the different study groups, thus eliminating systematic differences between
study groups.
The other strength of this thesis is that we used two different types of methods to
identify predictive classifiers. Therefore, we are assured that negative conclusion we had
was not due only to the statistical method we used. Using different methods also offered
an opportunity to compare and contrast these two methods.
Our study has several potential limitations as listed below:
(1). Potential measurement errors of the data: TMA data may be subject to errors
in the measurement of true expression level of a gene in a tissue. Construction of TMAs
is complex and relies on accurate identification of appropriate tumor tissue. In addition
the distribution of individual molecular markers within the malignant tissue is often quite
heterogeneous and, thus, part of tissue used for construction of TMAs may not properly
represent the biological features of the entire tumor cell population;
(2). Missing tumor tissue and assessments: Among the 710 patients who
participated in the trial, tumor tissue were only available for 639 individuals and among
them, only 531 had results available from tissue microarrays (TMAs). This may limit the
generalization of our results to patients who participated in MA.5. Further to this, not all
markers worked on all patient samples. The smaller sample size may limit the power of
our study to detect significant interaction, which in general requires a large sample size;
81
(3). Generalizability of results: This study used data only from one clinical trial,
NCIC-CTG MA.5, which tested CEF in pre-menopausal women with early breast cancer.
It is difficult to say whether the results from this study can be generalized to other
treatments or other cancer types. In fact, many studies in other areas have shown that
combination of traditional clinical data with gene expression data provided better
prediction in treatments95-97;
(4). Specific cut-off value in Cox model: When classifying patient into different
risk group using COX model, different cut-off values used may yield different results. In
this study, the mean value of the index scores from all patients is taken as the cut-off Cvalue. Other cut-off values, such as weighted mean and median of the index scores from
all patients can also be used, which may result different conclusions;
(5). Use of only extreme survival times in SVM method: For SVM approach we
adopt the strategy proposed by Liu et al 109, using the only patients with extreme survival
times. The training dataset consisted of only two types of extreme patients: short-term
survivors and long-term survivors. The reason for not considering patients whose survival
times were not extreme during the training steps is that the sharp contrast between the
short-term and long-term survivors may be more informative and reliable for building
and understanding the relation between the covariates and outcomes, while considering
other patients sample would bring in more noise and confusion signals. In this study, if a
patient had a recurrence-free survival time or overall survival time less than 5 years and
died, then classify this patient as shorter survivor; if a patient was still alive and had
recurrence-free survival time or overall survival time greater than 6 years, then we treated
82
this patient as longer survivor. This is somewhat arbitrary and some other cut-off values
may lead different results.
In summary, the results in this project confirmed the strong predictive effects of
the previously identified biomarkers HER2 and TOP2A. When using either of them
alone for the prediction of response to CEF treatment in early breast cancer, there
remained approximately 30% of cases where inappropriate decisions might be made.
Identification of more reliable predictive classifiers, by new genetic investigations or
through new statistical techniques, is an interesting and open area for further research.
83
References
1.
Aubele,M. & Werner,M. Heterogeneity in breast cancer and the problem of
relevance of findings. Anal. Cell Pathol. 19, 53-58 (1999).
2.
Polychemotherapy for early breast cancer: an overview of the randomised
trials. Early Breast Cancer Trialists' Collaborative Group. Lancet 352, 930942 (1998).
3.
Tamoxifen for early breast cancer: an overview of the randomised trials.
Early Breast Cancer Trialists' Collaborative Group. Lancet 351, 1451-1467
(1998).
4.
Williams,C. et al. Cost-effectiveness of using prognostic information to select
women with breast cancer for adjuvant systemic therapy. Health Technol.
Assess. 10, iii-xi, 1 (2006).
5.
Robain,M. et al. Predictive factors of response to first-line chemotherapy in
1426 women with metastatic breast cancer. Eur. J. Cancer 36, 2301-2312
(2000).
6.
Chang,J. et al. Survival of patients with metastatic breast carcinoma:
importance of prognostic markers of the primary tumor. Cancer 97, 545-553
(2003).
84
7.
Isaacs,C., Stearns,V., & Hayes,D.F. New prognostic factors for breast cancer
recurrence. Semin. Oncol. 28, 53-67 (2001).
8.
Jemal,A., Thomas,A., Murray,T., & Thun,M. Cancer statistics, 2002. CA
Cancer J. Clin. 52, 23-47 (2002).
9.
10.
Jemal,A. et al. Cancer statistics, 2007. CA Cancer J. Clin. 57, 43-66 (2007).
Canadian Cancer Statistics 2007. (http://www.cancer.ca/vgn/images/portal/
cit_86751114/36/15/1816216925cw_2007stats_en.pdf ). 2007.
11.
DOH
2002
NHS
Cancer
Screening
Programmes.[Cancer
Screening
Programmes]. URL:www.cancerscreening.nhs.uk. Accessed 6 September, 2002.
2002.
12.
Bray,F., Sankila,R., Ferlay,J., & Parkin,D.M. Estimates of cancer incidence
and mortality in Europe in 1995. Eur. J. Cancer 38, 99-166 (2002).
13.
American Cancer Society. [Breast cancer facts and figures]. URL:
www.cancer.org. Accessed 6 September 2002. 2002.
14.
Goldhirsch,A., Glick,J.H., Gelber,R.D., Coates,A.S., & Senn,H.J. Meeting
highlights: International Consensus Panel on the Treatment of Primary Breast
Cancer. Seventh International Conference on Adjuvant Therapy of Primary
Breast Cancer. J. Clin. Oncol. 19, 3817-3827 (2001).
85
15.
Eifel,P. et al. National Institutes of Health Consensus Development
Conference Statement: adjuvant therapy for breast cancer, November 1-3,
2000. J. Natl. Cancer Inst. 93, 979-989 (2001).
16.
Adolfsson,J. & Steineck,G. Prognostic and treatment-predictive factors-is
there a difference? Prostate Cancer Prostatic. Dis. 3, 265-268 (2000).
17.
Pusztai,L. & Hess,K.R. Clinical trial design for microarray predictive marker
discovery and assessment. Ann Oncol. 15, 1731-1737 (2004).
18.
Steineck,G., Adolfsson,J., Scher,H.I., & Whitmore,W.F., Jr. Distinguishing
prognostic and treatment-predictive information for localized prostate cancer.
Urology 45, 610-615 (1995).
19.
Aapro,M.S. Adjuvant therapy of primary breast cancer: a review of key
findings from the 7th international conference, St. Gallen, February 2001.
Oncologist. 6, 376-385 (2001).
20.
Perou,C.M. et al. Molecular portraits of human breast tumours. Nature 406,
747-752 (2000).
21.
Kononen,J. et al. Tissue microarrays for high-throughput molecular profiling
of tumor specimens. Nat. Med. 4, 844-847 (1998).
86
22.
Aaltomaa,S. et al. Expression of Ki-67, cyclin D1 and apoptosis markers
correlated with survival in prostate cancer patients treated by radical
prostatectomy. Anticancer Res. 26, 4873-4878 (2006).
23.
Sebastiani,V. et al. Tissue microarray analysis of FAS, Bcl-2, Bcl-x, ER, PgR,
Hsp60, p53 and Her2-neu in breast carcinoma. Anticancer Res. 26, 2983-2987
(2006).
24.
Park,K., Han,S., Shin,E., Kim,H.J., & Kim,J.Y. Cox-2 expression on tissue
microarray of breast cancer. Eur. J. Surg Oncol. 32, 1093-1096 (2006).
25.
Bueso-Ramos,C. et al. Protein expression of a triad of frequently methylated
genes, p73, p57Kip2, and p15, has prognostic value in adult acute
lymphocytic leukemia independently of its methylation status. J. Clin. Oncol.
23, 3932-3939 (2005).
26.
Zellweger,T. et al. Tissue microarray analysis reveals prognostic significance
of syndecan-1 expression in prostate cancer. Prostate 55, 20-29 (2003).
27.
Kroger,N. et al. Prognostic and predictive effects of immunohistochemical
factors in high-risk primary breast cancer patients. Clin. Cancer Res. 12, 159168 (2006).
87
28.
Allred,D.C., Harvey,J.M., Berardo,M., & Clark,G.M. Prognostic and
predictive factors in breast cancer by immunohistochemical analysis. Mod.
Pathol. 11, 155-168 (1998).
29.
Chang,J. et al. Biologic markers as predictors of clinical outcome from
systemic therapy for primary operable breast cancer. J. Clin. Oncol. 17, 30583063 (1999).
30.
Fitzgibbons,P.L. et al. Prognostic factors in breast cancer. College of
American Pathologists Consensus Statement 1999. Arch. Pathol. Lab Med.
124, 966-978 (2000).
31.
Hamilton,A. & Piccart,M. The contribution of molecular markers to the
prediction of response in the treatment of breast cancer: a review of the
literature on HER-2, p53 and BCL-2. Ann Oncol. 11, 647-663 (2000).
32.
Bergh,J., Norberg,T., Sjogren,S., Lindgren,A., & Holmberg,L. Complete
sequencing of the p53 gene provides prognostic information in breast cancer
patients, particularly in relation to adjuvant systemic therapy and
radiotherapy. Nat. Med. 1, 1029-1034 (1995).
33.
Kirsch,D.G. & Kastan,M.B. Tumor-suppressor p53: implications for tumor
development and prognosis. J. Clin. Oncol. 16, 3158-3168 (1998).
88
34.
Rudin,C.M. & Thompson,C.B. Apoptosis and cancer in The Genetic Basis of
Human Cancer (eds. Vogelstein,B. & Kinzler,K.W.) 193-204 (The McGrawHill companies, Inc, New York, 1998).
35.
Elledge,R.M. & Allred,D.C. Prognostic and predictive value of p53 and p21
in breast cancer. Breast Cancer Res. Treat. 52, 79-98 (1998).
36.
Nieto,Y. et al. Evaluation of the predictive value of Her-2/neu overexpression
and p53 mutations in high-risk primary breast cancer patients treated with
high-dose chemotherapy and autologous stem-cell transplantation. J. Clin.
Oncol. 18, 2070-2080 (2000).
37.
Thor,A.D. et al. erbB-2, p53, and efficacy of adjuvant therapy in lymph nodepositive breast cancer. J. Natl. Cancer Inst. 90, 1346-1360 (1998).
38.
Somlo,G. et al. Predictors of long-term outcome following high-dose
chemotherapy in high-risk primary breast cancer. Br. J. Cancer 87, 281-288
(2002).
39.
Hudziak,R.M., Schlessinger,J., & Ullrich,A. Increased expression of the
putative growth factor receptor p185HER2 causes transformation and
tumorigenesis of NIH 3T3 cells. Proc. Natl. Acad. Sci. U. S. A 84, 7159-7163
(1987).
89
40.
Aguilar,Z. et al. Biologic effects of heregulin/neu differentiation factor on
normal and malignant human breast and ovarian epithelial cells. Oncogene
18, 6050-6062 (1999).
41.
Ignatoski,K.M. et al. ERBB-2 overexpression confers PI 3' kinase-dependent
invasion capacity on human mammary epithelial cells. Br. J. Cancer 82, 666674 (2000).
42.
Spencer,K.S., Graus-Porta,D., Leng,J., Hynes,N.E., & Klemke,R.L. ErbB2 is
necessary for induction of carcinoma cell invasion by ErbB family receptor
tyrosine kinases. J. Cell Biol. 148, 385-397 (2000).
43.
Muss,H.B. et al. c-erbB-2 expression and response to adjuvant therapy in
women with node-positive early breast cancer. N. Engl. J. Med. 330, 12601266 (1994).
44.
Paik,S. et al. HER2 and choice of adjuvant chemotherapy for invasive breast
cancer: National Surgical Adjuvant Breast and Bowel Project Protocol B-15.
J. Natl. Cancer Inst. 92, 1991-1998 (2000).
45.
Pritchard,K.I. et al. HER2 and responsiveness of breast cancer to adjuvant
chemotherapy. N. Engl. J. Med. 354, 2103-2111 (2006).
90
46.
Arpino,G. et al. Predictive value of apoptosis, proliferation, HER-2, and
topoisomerase IIalpha for anthracycline chemotherapy in locally advanced
breast cancer. Breast Cancer Res. Treat. 92, 69-75 (2005).
47.
Willsher,P.C. et al. C-erbB2 expression predicts response to preoperative
chemotherapy for locally advanced breast cancer. Anticancer Res. 18, 36953698 (1998).
48.
Fritz,P. et al. c-erbB2 and topoisomerase IIalpha protein expression
independently predict poor survival in primary human breast cancer: a
retrospective study. Breast Cancer Res. 7, R374-R384 (2005).
49.
Houston,S.J. et al. Overexpression of c-erbB2 is an independent marker of
resistance to endocrine therapy in advanced breast cancer. Br. J. Cancer 79,
1220-1226 (1999).
50.
Jarvinen,T.A., Holli,K., Kuukasjarvi,T., & Isola,J.J. Predictive value of
topoisomerase
IIalpha
and
other
prognostic
factors
for
epirubicin
chemotherapy in advanced breast cancer. Br. J. Cancer 77, 2267-2273 (1998).
51.
Ivanov,S. et al. Expression of hypoxia-inducible cell-surface transmembrane
carbonic anhydrases in human cancer. Am. J. Pathol. 158, 905-919 (2001).
52.
Wykoff,C.C., Pugh,C.W., Maxwell,P.H., Harris,A.L., & Ratcliffe,P.J.
Identification of novel hypoxia dependent and independent target genes of the
91
von Hippel-Lindau (VHL) tumour suppressor by mRNA differential
expression profiling. Oncogene 19, 6297-6305 (2000).
53.
Pastorek,J. et al. Cloning and characterization of MN, a human tumorassociated protein with a domain homologous to carbonic anhydrase and a
putative helix-loop-helix DNA binding segment. Oncogene 9, 2877-2888
(1994).
54.
Potter,C.P. & Harris,A.L. Diagnostic, prognostic and therapeutic implications
of carbonic anhydrases in cancer. Br. J. Cancer 89, 2-7 (2003).
55.
Bui,M.H. et al. Carbonic anhydrase IX is an independent predictor of survival
in advanced renal clear cell carcinoma: implications for prognosis and
therapy. Clin. Cancer Res. 9, 802-811 (2003).
56.
Atkins,M. et al. Carbonic anhydrase IX expression predicts outcome of
interleukin 2 therapy for renal cancer. Clin. Cancer Res. 11, 3714-3721
(2005).
57.
Generali,D. et al. Role of carbonic anhydrase IX expression in prediction of
the efficacy and outcome of primary epirubicin/tamoxifen therapy for breast
cancer. Endocr. Relat Cancer 13, 921-930 (2006).
92
58.
Holmes,D.R., Wester,W., Thompson,R.W., & Reilly,J.M. Prostaglandin E2
synthesis and cyclooxygenase expression in abdominal aortic aneurysms. J.
Vasc. Surg 25, 810-815 (1997).
59.
Topley,N. et al. Human peritoneal mesothelial cell prostaglandin synthesis:
induction of cyclooxygenase mRNA by peritoneal macrophage-derived
cytokines. Kidney Int. 46, 900-909 (1994).
60.
Kulkarni,P.S. Synthesis of cyclooxygenase products by human anterior uvea
from cyclic prostaglandin endoperoxide (PGH2). Exp. Eye Res. 32, 197-204
(1981).
61.
Kundu,N., Yang,Q., Dorsey,R., & Fulton,A.M. Increased cyclooxygenase-2
(cox-2) expression and activity in a murine model of metastatic breast cancer.
Int. J. Cancer 93, 681-686 (2001).
62.
Gilhooly,E.M. & Rose,D.P. The association between a mutated ras gene and
cyclooxygenase-2 expression in human breast cancer cell lines. Int. J. Oncol.
15, 267-270 (1999).
63.
Liu,X.H.
&
Rose,D.P.
Differential
expression
and
regulation
of
cyclooxygenase-1 and -2 in two human breast cancer cell lines. Cancer Res.
56, 5125-5127 (1996).
93
64.
Zerkowski,M.P., Camp,R.L., Burtness,B.A., Rimm,D.L., & Chung,G.G.
Quantitative analysis of breast cancer tissue microarrays shows high cox-2
expression is associated with poor outcome. Cancer Invest 25, 19-26 (2007).
65.
Mohammad,A.M. et al. Expression of cyclooxygenase-2 and 12-lipoxygenase
in human breast cancer and their relationship with HER-2/neu and hormonal
receptors: impact on prognosis and therapy. Indian J. Cancer 43, 163-168
(2006).
66.
Grudzinski,M. et al. [Cox-2 and CD105 expression in breast cancer and
disease-free survival]. Rev. Assoc. Med. Bras. 52, 275-280 (2006).
67.
Park,K., Han,S., Shin,E., Kim,H.J., & Kim,J.Y. Cox-2 expression on tissue
microarray of breast cancer. Eur. J. Surg Oncol. 32, 1093-1096 (2006).
68.
Trougakos,I.P., So,A., Jansen,B., Gleave,M.E., & Gonos,E.S. Silencing
expression of the clusterin/apolipoprotein j gene in human cancer cells using
small interfering RNA induces spontaneous apoptosis, reduced growth ability,
and cell sensitization to genotoxic and oxidative stress. Cancer Res. 64, 18341842 (2004).
69.
Trougakos,I.P. & Gonos,E.S. Clusterin/apolipoprotein J in human aging and
cancer. Int. J. Biochem. Cell Biol. 34, 1430-1448 (2002).
94
70.
Pucci,S., Bonanno,E., Pichiorri,F., Angeloni,C., & Spagnoli,L.G. Modulation
of different clusterin isoforms in human colon tumorigenesis. Oncogene 23,
2298-2304 (2004).
71.
Kruger,S., Ola,V., Fischer,D., Feller,A.C., & Friedrich,M. Prognostic
significance of clusterin immunoreactivity in breast cancer. Neoplasma 54,
46-50 (2007).
72.
Zhang,S. et al. Clusterin expression and univariate analysis of overall survival
in human breast cancer. Technol. Cancer Res. Treat. 5, 573-578 (2006).
73.
Shannan,B. et al. Challenge and promise: roles for clusterin in pathogenesis,
progression and therapy of cancer. Cell Death. Differ. 13, 12-19 (2006).
74.
Redondo,M. et al. Overexpression of clusterin in human breast carcinoma.
Am. J. Pathol. 157, 393-399 (2000).
75.
Nicholson,K.M. & Anderson,N.G. The protein kinase B/Akt signalling
pathway in human malignancy. Cell Signal. 14, 381-395 (2002).
76.
Schmitz,K.J. et al. Prognostic relevance of activated Akt kinase in nodenegative breast cancer: a clinicopathological study of 99 cases. Mod. Pathol.
17, 15-21 (2004).
95
77.
Tokunaga,E. et al. Akt is frequently activated in HER2/neu-positive breast
cancers and associated with poor prognosis among hormone-treated patients.
Int. J. Cancer 118, 284-289 (2006).
78.
Perez-Tenorio,G. & Stal,O. Activation of AKT/PKB in breast cancer predicts
a worse outcome among endocrine treated patients. Br. J. Cancer 86, 540-545
(2002).
79.
Clark,A.S., West,K., Streicher,S., & Dennis,P.A. Constitutive and inducible
Akt activity promotes resistance to chemotherapy, trastuzumab, or tamoxifen
in breast cancer cells. Mol. Cancer Ther. 1, 707-717 (2002).
80.
Stal,O. et al. Akt kinases in breast cancer and the results of adjuvant therapy.
Breast Cancer Res. 5, R37-R44 (2003).
81.
McNagny,K.M. et al. Thrombomucin, a novel cell surface protein that defines
thrombocytes and multipotent hematopoietic progenitors. J. Cell Biol. 138,
1395-1407 (1997).
82.
Kershaw,D.B. et al. Molecular cloning, expression, and characterization of
podocalyxin-like protein 1 from rabbit as a transmembrane protein of
glomerular podocytes and vascular endothelium. J. Biol. Chem. 270, 2943929446 (1995).
96
83.
Kerjaschki,D.,
Sharkey,D.J.,
&
Farquhar,M.G.
Identification
and
characterization of podocalyxin--the major sialoprotein of the renal
glomerular epithelial cell. J. Cell Biol. 98, 1591-1596 (1984).
84.
Somasiri,A. et al. Overexpression of the anti-adhesin podocalyxin is an
independent predictor of breast cancer progression. Cancer Res. 64, 50685073 (2004).
85.
Arriola,E. et al. Topoisomerase II alpha amplification may predict benefit
from adjuvant anthracyclines in HER2 positive early breast cancer. Breast
Cancer Res. Treat. 106, 181-189 (2007).
86.
Tanner,M. et al. Topoisomerase IIalpha gene amplification predicts favorable
treatment response to tailored and dose-escalated anthracycline-based
adjuvant chemotherapy in HER-2/neu-amplified breast cancer: Scandinavian
Breast Group Trial 9401. J. Clin. Oncol. 24, 2428-2436 (2006).
87.
Knoop,A.S. et al. retrospective analysis of topoisomerase IIa amplifications
and deletions as predictive markers in primary breast cancer patients
randomly assigned to cyclophosphamide, methotrexate, and fluorouracil or
cyclophosphamide, epirubicin, and fluorouracil: Danish Breast Cancer
Cooperative Group. J. Clin. Oncol. 23, 7483-7490 (2005).
88.
Di,L.A. et al. HER-2 amplification and topoisomerase IIalpha gene
aberrations as predictive markers in node-positive breast cancer patients
97
randomly treated either with an anthracycline-based therapy or with
cyclophosphamide, methotrexate, and 5-fluorouracil. Clin. Cancer Res. 8,
1107-1116 (2002).
89.
Duffy,M.J. Predictive markers in breast and other cancers: a review. Clin.
Chem. 51, 494-503 (2005).
90.
Vendrell,E., Morales,C., Risques,R.A., Capella,G., & Peinado,M.A. Genomic
determinants of prognosis in colorectal cancer. Cancer Lett. 221, 1-9 (2005).
91.
Savage,S.A. & Chanock,S.J. Using germ-line genetic variation to investigate
and treat cancer. Drug Discov. Today 9, 610-618 (2004).
92.
Cecilia Ritz. Clinical markers are by no means outperformed by gene
expression profiles in breast cancer prognosis. 2004.
93.
Eden,P., Ritz,C., Rose,C., Ferno,M., & Peterson,C. "Good Old" clinical
markers have similar power in breast cancer prognosis as microarray gene
expression profilers. Eur. J. Cancer 40, 1837-1841 (2004).
94.
Nimeus-Malmstrom,E. et al. Gene expression profilers and conventional
clinical markers to predict distant recurrences for premenopausal breast
cancer patients after adjuvant chemotherapy. Eur. J. Cancer 42, 2729-2737
(2006).
98
95.
Shedden,K.A. et al. Accurate molecular classification of human cancers based
on gene expression using a simple classifier with a pathological tree-based
framework. Am. J. Pathol. 163, 1985-1995 (2003).
96.
Li,L. et al. Integration of clinical information and gene expression profiles for
prediction of chemo-response for ovarian cancer. Conf. Proc. IEEE Eng Med.
Biol. Soc. 5, 4818-4821 (2005).
97.
Nevins,J.R. et al. Towards integrated clinico-genomic models for personalized
medicine: combining gene expression signatures and clinical factors in breast
cancer outcomes prediction. Hum. Mol. Genet. 12 Spec No 2, R153-R157
(2003).
98.
Cox,D.R. Regression models and life tables. Journal of theRoyal Statistical
Society B[34], 187-202. 1972.
99.
Lunn,M. & McNeil,D. Applying Cox regression to competing risks.
Biometrics 51, 524-532 (1995).
100.
Galea,M.H., Blamey,R.W., Elston,C.E., & Ellis,I.O. The Nottingham
Prognostic Index in primary breast cancer. Breast Cancer Res. Treat. 22, 207219 (1992).
101.
Mitchell,T. Machine Learning(McGraw Hill, New York, 1997).
99
102.
Cruz,J.A. & Wishart,D.S. Applications of Machine Learning in Cancer
Predictionand Prognosis. Cancer Informatics 2, 59-78. 2006.
103.
Cortes,C. & Vapnik,V. Support-vector networks. Machine Learning 20, 273297. 1995.
104.
Duda,R.O., Hart,P.E., & Stork,D.G. Pattern classification (Wiley, New York,
2001).
105.
Lee,Y.J., Mangasarian,O.L., & Wolberg,W.H. Breast cancer survival and
chemotherapy: A support vector machine analysis. DIMACS Series in
Discrete Mathematics and Theoretical Computer Science 55, 1-20 (2000).
106.
ftp://ftp.esat.kuleuven.ac.be/sista/vanbelle/reports/07-70.pdf. 2008.
107.
ftp://ftp.esat.kuleuven.ac.be/sista/vanbelle/reports/07-213.pdf. 2008.
108.
Ben-Dor,A. et al. Tissue classification with gene expression profiles. J.
Comput. Biol. 7, 559-583 (2000).
109.
Liu,H., Li,J., & Wong,L. Use of extreme patient samples for outcome
prediction from gene expression data. Bioinformatics. 21, 3377-3384 (2005).
110.
Hausmann,B., Hach,H., Voigt,B., & Simon,R. [Echocardiography online
quantification of left and right ventricular function by automatic boundary
detection: reference values and reproducibility in healthy probands]. Z.
Kardiol. 83, 556-561 (1994).
100
111.
Simon,R. Development and validation of biomarker classifiers for treatment
selection. Journal of Statistical Planning and Inference 138, 308-320 (2008).
112.
Simon,R. Validation of pharmacogenomic biomarker classifiers for treatment
selection. Cancer Biomark. 2, 89-96 (2006).
113.
Levine,M.N. et al. Randomized trial of intensive cyclophosphamide,
epirubicin, and fluorouracil chemotherapy compared with cyclophosphamide,
methotrexate, and fluorouracil in premenopausal women with node-positive
breast cancer. National Cancer Institute of Canada Clinical Trials Group. J.
Clin. Oncol. 16, 2651-2658 (1998).
114.
Levine,M.N. et al. Randomized trial comparing cyclophosphamide,
epirubicin, and fluorouracil with cyclophosphamide, methotrexate, and
fluorouracil in premenopausal women with node-positive breast cancer:
update of National Cancer Institute of Canada Clinical Trials Group Trial
MA5. J. Clin. Oncol. 23, 5166-5170 (2005).
115.
Furberg,B.D. & Furberg,C.D. Evaluating Clinical Research: All that glitters is
not gold (Springer,2007).
101