Download variation in prostate cancer survival explained by significant

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Vol. 158, 1487-1490. October 1997
Printed i n U.S.A.
From the Departments of Urology, Medicine, and Epidemiology and Public Health, and the Sylvester Comprehensive Cancer Center,
University of Miami School of Medicine, and Geriatric Research, Education, and Clinical Center, Veterans Affairs Medical Center,
Miami, Florida
Purpose: Traditional survival analytical tools do not reveal the ability of significant prognostic
factors to predict (that is, explain variation in) survival. We used survival data in patients with
prostate cancer to illustrate how the association of factors with survival diverges from their
ability to explain variation in survival; bladder cancer was included as a point of general
Materials and Methods: We used the 1973 to 1990 records of the Surveillance Epidemiology
and End Results program. Multivariate proportional hazards models were used to identify
factors that significantly associated with survival. The proportion of variation explained by
these factors was estimated with the Schemper method.
Results: The dataset included 10,636 patients with prostate cancer and 1,070 with bladder
cancer. Median survival was significantly longer in prostate than bladder cancer; other characteristics were similarly distributed. Age, stage and marital status were associated with survival
in both cancers (p value range 0.0001 to 0.0009). The total proportion of variation explained was
7.1%and 32.1%for prostate and bladder cancer, respectively. In prostate cancer, age, stage and
marital status explained 0.6, 5.5 and 0.4%, of the adjusted proportion of variation explained,
respectively, and in bladder cancer, they explained 14.7, 8.9 and 0.6%,respectively.
Conclusions: Proportional hazards models identified but did not reveal the ability of significant
prognostic factors to explain variations in survival. The proportion of variation explained analyses illustrate why predicting survival is so difficult, especially in prostate cancer. The prognostic
factors used do not possess the ability to explain variation in survival; new prognostic factors
must be identified.
KEY WORDS:prognosis; mortality; prostatic neoplasms; bladder neoplasms; models, statistical.
figure, R2 quantifies the ability of age to predict specific end
points. R2 cannot be derived from the statistical association
(p value) of the putative predictive factor with the study end
point. As shown in figure 1,the associations of age with PSA
and year of birth are highly significant (p <0.001).This
means that the relationships of age with year of birth and
PSA are not attributable to chance alone; that is, we can
observe reproducibly that PSA goes up and year of birth goes
down as we examine older patients. The significant association does not mean that we can gauge equally precisely PSA
and year of birth by knowing a patient’s age. This point is
exemplified in figure 1, in that age very precisely predicts
year of birth (because its relationship with year of birth is
perfect; R2 = 1.0) but not PSA (because its relationship with
PSA is far from perfect; R2 = 0.02). As a practical validation
of the example, try using a prostate cancer patient’s age to
predict his year of birth and PSA. To more precisely predict
PSA, one must examine age and other independently associated factors, such as disease stage.
To identify factors that predict survival, one often uses
proportional hazards models. Proportional hazards models
generate risk ratios, confidence intervals and p values but
not a descriptor of the Variation in survival explained by the
prognostic factor. In this regard, proportional hazards models are somewhat handicapped, relative to the linear regression model. In recent years, a n approach has been introduced
* Accepted for publication February 21,1997
to survival analyses to quantify the proportion of variation
Funded by the Health Foundation of South Florida.
Requests for reprints: Urology (112),Veterans Affairs Medical explained by significant prognostic fact0rs.2-~Like the R2 of
Center, 1201 NW 16 St., Miami, Florida 33125.
linear models the proportion of variation explained of propor-
Predicting clinical outcomes can be very difficult. Theoretically, precise predictions depend on variables that can explain large variations in outcome. However, one might find it
ironic and frustrating that prediction of specific outcomes,
death of patients with prostate cancer, for example, is difficult when statistically significant prognostic variables (for
example, disease stage) have been identified. The origins of
the problem may reside partly in the blurred distinction
between the term “significant” as used in precise statistical
language and in everyday language.’ In statistical terminolom, “significant” denotes that an association is not attributable to chance alone. In everyday language, “significant”
connotes that an observed association is large. Blurring semantic distinctions can generate the false expectation that
statistically significant prognostic variables can predict a
large part of an associated outcome. Unless a variable is not
only statistically significantly associated with an outcome
but also able to explain variation in the outcome, our efforts
to apply it in precise predictions will be frustrated.
To identify factors (for example, age) that predict end
Points other than survival, for example, serum prostate specific antigen (PSA) level one often uses linear regression
models. Regression models are generally characterized with
a descriptor of the variation in the end point--”goodness of
fit”-explained by the tested factor; this descriptor is R2,the
W a r e of the multiple correlation coefficient. As shown in the
We used the 1973 to 1990 public-use tapes of the Surveillance Epidemiology and End Results program, which we
have described,7 We examined the importance of
clinical stage in untreated nonmetastatic prostate or bladder
cancer, in which stage was defined by the historic staging
systems: in situ (Tl), localized (T2) or regional tT3 and/or
N + ). In this analysis, bladder cancer was studied to allow a
point of comparison. Because there were few in situ case,
they were grouped with localized cases in the analyses. Survival time was calculated from the date of diagnosis to death
or December 31, 1990.
Cox proportional hazards modelsH were used to examine
the relative risk of overall mortality, corresponding 95% confidence intervals and p values by age, disease stage and
marital status. In these models age was treated a s a continuous variable, whereas stage and marital status were dichotomized in accordance with our previous
The proportion of variation explained by the full model (total
proportion of variation explained) and for each factor was
calculated with the methods of Schemper.“ Marginal “unadjusted” proportion of variation explained for each of the factors was obtained from separate regression for each of the
factors, and partial “adjusted” proportion of variation explained was calculated as the difference of the total proportion of variation explained and proportion of variation explained for a model with the factor of interest excluded.
1945 -
1930 -
. . .’ . .. ..’
a -
. .
Patient Age (years)
Goodness of fit (R2)in artificial linear regression models. A, year of
birth is plotted as function of patient age and provides model of year
of birth described by equation (1997) - (1Kage). B, PSA is plotted a s
function of patient age and provides model of PSA described by
(-6.3) + (0.23Kage).Models show highly significant associations of
year of birth and PSA with age (p <0.001),but goodness of fit (R2;
possible range 0.00 to 1.00)differs substantially.
tional hazards models is held as a descriptor that is distinct
from measures of relative risk, confidence intervals and p
value. One reason to use proportion of variation explained is
the direct comparability of quantitative prognostic factors
that it allows. With proportion of variation explained, one
can establish a ranking of the relative “weight” of independent prognostic variables that appear based on proportional
hazards models to have similar risk ratios and significance.
One can also learn that the presumed “weight” of a prognostic variable is wrong. For example, CD4 count is a frequently
applied clinical measure in the assessment of the severity of
human immunodeficiency virus (HIV)infection and analyses
have shown that a CD4 count of 200 or more is a highly
significant predictor of mortality in HIV infection (risk ratio = 2.67; p = 0.02).” However, using proportion of variation
explained analysis, we have recently shown that CD4 accounts for only 4% of the variation in individual mortality. To
predict prognosis in HIV, one must also take into account
variables that explain more of the variation.
To our knowledge, proportion of variation explained analyses have not been previously applied to analysis of survival
in prostate cancer. Our purpose was t o illustrate the use of
proportion of variation explained analyses and generate hypotheses regarding the ability of prognostic factors to explain
variation in survival. To Drovide a Doint of comDarison. we
also included bladder cancer in the kalysis.
The dataset included 10,636 patients with untreated prostate cancer and 1,070 patients with bladder cancer whose
survival times were known. The median survival for patients
with prostate cancer was 50 months (with 95% confidence
interval 49 to 52 months), which was significantly ( p <0.001)
longer than the median survival of 40 months (with 95%
confidence interval 31 to 47 months) for patients with bladder cancer. Other distributions were quite similar and are
presented in table 1. About 7 2 4 of patients with prostate
cancer were above 70 years old, whereas only 56% of patients
with bladder cancer were above 70 years old.
In proportional hazards models, age, disease stage and
marital status were highly significantly ( p value range
0.0001 to 0.0009) associated with overall survival in prostate
and bladder cancer (tables 2 and 3).
The total proportion of variation explained for the model
for prostate cancer is 7.170, whereas the total proportion of
variation explained for the model for bladder cancer is 32.1%.
The proportion of variation explained for individual prognostic factors for prostate cancer is dramatically different from
TABLE1. Demographic characteristics of patients
prostate or bladder cancer
No Prostate
_ _
trca ted
No Bladder
_ _ _ _ ~ _C n_I ( ,
50 or Less
51 to 60
61 to 70
71 to 80
80 or More
In situ
Marital status:
72 (0.71
527 (5.01
2419 122.71
4657 143.8,
2961 (27.8)
60 15.61
282 126.41
357 133.41
2.37 r‘22.l)
8931 184.01
1128 (10.6)
577 (5.51
951 (88.91
82 15.7)
19 10.21
8485 179.81
2132 120.0)
0 10.01
888 183.01
182 117.01
7106 166.8)
3530 (33.2)
37 ( 3 . 5 )
i l n (.
6 41
360 133 61
nearly identical relative risks for bladder and prostate cancer. Why age explains more of the variation in survival in
bladder than prostate cancer is hard to explain.
r/r Unadjusted
% Adjusted
Readers may be surprised that prognostic factors to which
Value Proportion of Proportion of
they have become accustomed performed poorly. The reader
is asked to consider that the factors applied were found to be
independently associated with survival in the relative abAge
1.04 11.036-1.043)
sence of other measures, such as illness. Thus, they may
Localized 1.00
altogether cease to associate independently with survival on
1.28 11.21-1.36)
the introduction into the models of other measures. One
might even argue that the goal of clinical medicine is to
0.84 (0.79-0.88)
substitute such nonremediable factors as age with such reTotal proportion of variation explained 7.1%.
mediable factors as illness, in our models of patient outcome
so that we may better effect improved outcomes; this goal is
in essence a rendering of such factors as age irrelevant to our
TABLE3 . Relative risk (95%confidence interval) of overall
prognostic models. Consider also that having prostate cancer
mortality in patients with bladder cancer
is associated with many other factors-physical dysfunction,
illness, cognitive impairment-that may better explain variIr Unadjusted
% Adjusted
Proportion of
Proportion of
ation in survival. Given that in prostate cancer, median
survival is relatively long and patients are relatively olderExnlained
medically and psychosocially diverse-the opportunity for
k e
1.06 (1.05-1.07)
associated factors to exert a n effect on survival is relatively
great. Accordingly, one might hypothesize that in younger,
simpler patient cohorts, (for example, patients with osteosar3.55 (2.914.32)
coma), such factors as disease stage would better explain
variation in survival. One might also hypothesize that other
0.75 (0.63-0.89)
measures of disease severity (for example, cancer grade) will
Localized also included in situ cases.
explain greater proportions of survival variation than obTotal proportion of variation explained 32.1%.
served with the measures applied in this study.
How much explanation of survival is enough? The utopian
answer is 100%-maximal theoretical explanation-althose for bladder cancer. For example, although the relative though "the more the better" is more useful, given that the
risk for age is almost identical in the prostate and bladder objectives are to make precise predictions of outcomes and
cancer models, the proportion of variation explained is much develop interventions that redistribute survival. When prohigher for bladder cancer.
posing a new prognostic factor, one should be prepared first
to illustrate its independence from other known prognostic
factors. One might also wish to quantify its ability to explain
Identification of factors that predict clinical outcomes is variation in outcome. If this ability is meager, interventions
important. When prognosis is poorly understood, it is very designed to change the factor may effect minimally the imdifficult to advise and treat patients intelligently. Given the proved outcomes. On the other hand, a great ability to eximportance of understanding prognostic factors, how do we plain variation suggests that interventions aimed a t this
assess the predictive power of a given prognostic model? factor have merit and will effect improved outcomes.
Our observations fit with the clinical impression that preTheoretically, a high predictive power depends on the correct
integration of factors that explain large variations in out- dicting survival in prostate cancer is difficult and strongly
comes. However, one might find it ironic and frustrating that suggest that current efforts to predict survival in prostate
prediction of specific outcomes (for example, survival in pros- cancer are dependent on variables that are poorly suited to
tate cancer) is difficult even when many statistically signifi- them. More poignantly, the observations suggest that proscant prognostic factors have been identified from conven- tate cancer stage, as assessed in the Surveillance Epidemitional models, for example, Cox proportional hazards models. ology and End Results program data, is in fact a very poor
Because analysis of prognostic factors in survival studies is predictor (5.59 of variation in this analysis). Substitution
usually based on COXproportional hazards models, the re- with other stage ascertainments may or may not change
sults allow only point estimates and confidence intervals for ability to explain survival variation, especially because clinthe relative risk associated with prognostic factors (and cor- ical stage ascertainment is notoriously unreliable." To preresponding p values). The predictive power of the prognostic dict precisely survival in prostate cancer will require supplefactor is rarely evaluated and should not be confused with mentation with variables that can explain larger proportions
statistical significance. The fact that a model is correctly of survival variation.
The present analysis is not meant to be a definitive and
specified for a set of prognostic variables does imply that, on
average, predictions will be precise but not that for an indi- final analysis of the variation in survival of prostate and
bladder cancer. Importantly, we have used a simple but unvidual patient the prediction will be very precise.2
By limiting analysis of a data set solely to the COXmodels, usual sample-untreated patients-to help illustrate conwe may miss important information on the extent to which cepts; because the sample is unusual, the results may not be
Prognostic factors determine the outcome for a patient. This generalizable. The relevance of the proportions of variation
Point is made vividly in the present study. Inspection of the explained that are generated in the present analyses to preP values generated in proportional hazards models did not dictive models that might include other prognostic factors,
allow meaningful assessment of the relative ability of signif- such as grade and illness, can only be determined through
'cant input variables to explain variations in survival; even future analyses.
the relative risk estimates lead to misleading conclusions
about predictive power. For example, proportion of variation
explained analysis shows that the variation explained by age
The present study is meant as an illustration of principle
1s larger for bladder cancer than prostate cancer (14.7 versus
0.6%).However, the Cox model showed identical p values and from which we can learn to define more precisely our objecTABLE2 Relative risk (and 951; confidence intervals) of overall
mortalitv in Datients with urostate cancer
tives and match our methods to those objectives. The analysis is also meant as a reminder that, as in Babylon, our plans
can be unraveled by semantic imprecision and segregation of
the clinical and analytical worlds. To develop the predictive
models that our patients deserve, we will have to accept the
limitations of variables to which we have become accustomed
and seek variables that can explain progressively larger proportions of variations in survival. Our present results, if
generalizable, suggest that we have a long way to go before
we can effect better survival for patients with prostate
1. Significance of significant. New Engl. J. Med., 2 7 8 1232, 1968.
2. Korn, E. L. and Simon, R.: Measures of explained variation for
survival data. Stat. Med., 9 487, 1990.
3. Korn, E. L. and Simon, R.: Explained residual variation, explained risk and goodness of fit. Amer. Stat., 4 5 201, 1991.
4. Schemper, M.: The relative importance of prognostic factors in
studies of suMval. Stat. Med., 1 2 2377, 1993.
5. Page, J. B., Lai, S., Fletcher, M. A., Patarca, R., Smith, P. C., Lai,
H. C. and Klimas, N. G.: Predictors of survival in human
immunodeficiency virus type 1-seropositive intravenous drug
users. Clin. Diag. Lab. Immunol., 3: 51, 1996.
6. Krongrad, A., Lai, H., L a m , S. H. and Lai, S.: Mortality in
prostate cancer. J. Urol., 156 1084, 1996.
7. Krongrad, A,, Lai, H., Burke, M. A., Goodkin, K. and Lai, S.:
Marriage and mortality in prostate cancer. J. Urol., 156 1696,
8. Cox, D. R.: Regression models and life-tables. J. Roy Stat. SOC.,
34: 187, 1972.
9. Smith, D. S. and Catalona, W. J.: Interexaminer variability of
digtal rectal examination in detecting_ _prostate cancer. Urolo&, 4 5 70, 1995.
This article provides important insight for nonstatisticians into
one of the vagaries of statistics in clinical papers. The urology jour.
nals are replete with studies using Cox regression analysis to show
that various clinical parameters have a significant correlation with
survival. The authors used the large data base from the Surveillance
Epidemiology and End Results program to show that age, clinical
stage and marital status were all associated with survival for patients with prostate and bladder cancer. Among the potential uses
for such correlations is the ability to counsel patients about their
relative risks from a disease and help guide them toward optinial
therapy. The important lesson from this study is that finding a
significant statistical correlation using COXregression does not neeessarily have much clinical significance. The authors have introduced to the urology community the proportion of variation explained, which is a measure of how much outcome can be explained
by a particular parameter. Despite finding similar p values for the
relationship between stage and survival in bladder and prostate
cancer, the adjusted proportion of variation explained for this variable in prostate cancer was only 5.5%. In other words, this variable,
although statistically significant, has very little ability to predict
outcome. Other factors not assessed in this study need to be studied
to determine which factor predicts survival more effectively.
The importance of this paper is not that stage is a poor predictor of
outcome. Rather it demonstrates that when reading papers on prognostic factors, there is a need to know the proportion of variation
explained in order to truly assess the potential usefulness of the
Perhaps future papers that use Cox regression should be required
to also provide the proportion of variation explained so that the
reader can assess whether statistically significant variables have
any clinical usefulness. What is so interesting about this study is
that the 3 highly significant factors only explain 6.5% of the mortality. By using proportion of variation explained, perhaps, the most
clinically useful prognostic factors now can be defined.
Gerald W . Chodak
Prostate and Urology Center
Louis A. Weiss Memorial Hospital
Uniuersitv of Chicago Hosnitals
Chicago, ?ll&is