Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
American Journal of Epidemiology
Copyright © 1998 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol. 147, No. 9
Printed in U.S.A.
Population Attributable Fraction Estimation for Established Breast Cancer
Risk Factors: Considering the Issues of High Prevalence and Unmodifiability
Beverly Rockhill,1 Clarice R. Weinberg,2 and Beth Newman3
Established breast cancer risk factors, in addition to being relatively unmodifiable, are highly prevalent
among US women. Previous reports of population attributable fraction for the established risk factors have
used definitions that resulted in 75-100% of women in the source population labeled exposed. The practical
value of such estimates has not been discussed; further, the estimates have frequently been misinterpreted.
In the context of examining the interpretation and public health value of such estimates, the authors
demonstrate the sensitivity of the population attributable fraction to changes in exposure outpoints. They use
data from the Carolina Breast Cancer Study, a case-control study of breast cancer conducted in North
Carolina between 1993 and 1996. For the four established risk factors (menarche before age 14 years, first
birth at age 20 years or later/nulliparity, family history of breast cancer, and history of benign breast biopsy),
the estimated population attributable fraction was 0.25 (95% confidence interval 0.06-0.48). Over 98% of the
source population was exposed to at least one of these risk factors. The population attributable fraction
estimate was reduced to 0.15 when more restrictive definitions of early menarche (less than age 12 years) and
late age at first full-term pregnancy (30 years or more) were used (proportion exposed, 0.62). Population
attributable fractions for established breast cancer risk factors probably have little public health value because
of both the high proportions exposed and the relative unmodifiability of the risk factor distributions. Am J
Epidemiol 1998; 147:826-33.
breast neoplasms; risk factors
30-54 years and 0.29 among white women aged
55-84 years for 10 risk factors: first-degree family
history of breast cancer, history of benign breast biopsy, Jewish ethnicity, menopause at age 50 years or
older, menarche before age 12 years, ever married, no
first birth by age 30 years, college education, daily
alcohol consumption, and relative weight index
greater than or equal to 110. Using case-control data
from the Breast Cancer Detection and Demonstration
Project (BCDDP), Bruzzi et al. (4) considered four
risk factors: menarche before age 14 years, no fullterm pregnancy by age 20, history of breast cancer in
mother/sister(s), and history of benign breast biopsy
and estimated a population attributable fraction of 0.55
for pre- and postmenopausal white women. Most recently, Madigan et al. (5) used data on white women
aged 25-74 years who were part of the First National
Health and Nutrition Examination Survey Epidemiologic Follow-up Study to calculate a population attributable fraction of 0.41 for the three risk factors (no
first birth by age 20 years, history of breast cancer in
a first-degree relative, and family income level in the
upper two tertiles of the US population).
Neither Seidman et al. (3) nor Bruzzi et al. (4)
reported the precision of their population attributable
Many of the established breast cancer risk factors
are highly prevalent in industrialized societies such as
the United States. They are largely unmodifiable from
a population perspective and are associated with only
modest relative risks (1,2). At least three reports (3-5)
have estimated the proportion of breast cancer cases
that would be prevented if established breast cancer
risk factors were to be eliminated; however, none
discuss the issue of scientific or public health value of
their estimates. Using data from the large American
Cancer Society volunteer cohort study begun in 1959,
Seidman et al. (3) estimated summary population attributable fractions of 0.21 among white women aged
Received for publication May 17, 1997, and accepted for publication October 25, 1997.
Abbreviations: BCDDP, Breast Cancer Detection and Demonstration Project; CBCS, Carolina Breast Cancer Study.
1
Channing Laboratory, Brigham and Women's Hospital, Boston,
MA.
2
National Institute of Environmental Health Sciences, Research
Triangle Park, NC.
3
Department of Epidemiology, University of North Carolina at
Chapel Hill, Chapel Hill, NC.
Reprint requests to Dr. Beverly Rockhill, Channing Laboratory,
181 Longwood Avenue, Boston, MA 02115.
Presented as the 1997 Abraham Ulienfeld Student Prize paper at
the 30th Annual Meeting of the Society for Epidemiologic Research,
Edmonton, Alberta, Canada, June 12-14,1997.
826
Attributable Fraction
fractions. Madigan et al. (5) assumed independent
Poisson distributions of cases in the various risk factor
strata and computed a fairly wide 95 percent confidence interval of 0.16-0.80 around their point estimate of 0.41. The precision of a population attributable fraction estimate computed in a multivariable
setting depends on several factors, including the precision of the relative risks (odds ratios) used in estimation, the number of parameters estimated (including
covariates), and the prevalence of the exposure(s). In
particular, the precision of a population attributable
fraction worsens sharply as the proportion exposed
falls below 0.20 or rises above 0.80 (6); in the analysis
by Madigan et al., 90 percent of the population was
exposed to at least one of the considered risk factors.
In the analysis by Bruzzi et al. (4), 98 percent of
controls fell into the exposed category. This suggests
that the precision of their estimate is low. Although
Bruzzi et al. considered only four risk factors while
Seidman et al. considered 10, the broader exposure
definitions used in the BCDDP for two of the risk
factors (menarche before age 14 years rather than
before age 12 and first full-term pregnancy at age 20
or later rather than at age 30 or later) likely account for
the larger proportion exposed as well as for the higher
population attributable fraction. Along with the standard error, the population attributable fraction increases with an increasingly broad definition of exposure, provided that the group added to the exposed
category has a risk greater than 1.0 relative to the
remaining unexposed group (7).
In this paper, we focus on conceptual and methodological issues raised by population attributable fraction estimation for the highly prevalent, relatively unmodifiable, established breast cancer risk factors.
Many of the issues we discuss are relevant to population attributable fraction estimation in other settings as
well. Specifically, we use data from the Carolina
Breast Cancer Study (CBCS) to estimate population
attributable fraction for the four established risk
factors (early menarche, late age at first full-term
pregnancy/nulliparity, history of breast cancer in
mother/sister, and history of benign breast biopsy). We
demonstrate the sensitivity of the population attributable fraction and its precision to different, commonly
used cutpoints for early menarche and late age at first
full-term pregnancy. We also examine the assumptions underlying valid estimation and discuss these
assumptions in light of our analysis and the analyses
previously mentioned (3-5).
MATERIALS AND METHODS
The CBCS is a population-based, case-control study
of breast cancer in African-American and Caucasian
Am J Epidemiol
Vol. 147, No. 9, 1998
827
women aged 20-74 years living in eastern and central
North Carolina (8). Women between ages 20 and 74
years who resided in a 24-county area and were diagnosed with invasive breast cancer for the first time
between May 1993 and May 1996 were eligible for
inclusion as cases. Population-based controls were selected from one of two computerized sources: for
women younger than age 65 years, a list from the
North Carolina Division of Motor Vehicles was used;
for women aged 65-74 years, a list from the US
Health Care Financing Administration was used.
Predetermined sampling probabilities based on race
and age were applied to both eligible cases and controls to ensure adequate representation of black and
younger women and to ensure approximate frequency
matching of cases and controls on race and age. The
use of predetermined sampling probabilities in the
selection of cases and controls ("randomized recruitment") was developed as a design alternative to
matching (9-11). The inverse of a participant's sampling probability can be thought of intuitively as the
number of women in the general population in the
same age/race/disease category who are "represented"
by the participant. After randomized recruitment, a
modified analysis that takes into account the sampling
probabilities allows unbiased estimation of effects associated with all variables studied, including the variables associated with sampling (i.e., the "matching"
variables) (9-11). For the logistic model, this requires
inclusion of offset terms, which was accomplished
with PROC GENMOD in SAS (12).
The total target sample size of the CBCS was 1,600,
with 400 women (200 cases and 200 controls) in each
of the following cells: white, aged 20-49 years;
African American, aged 20-49 years; white, aged
50-74 years; and African American, aged 50-74
years. Because of the stochastic nature of the sampling
design, actual cell sizes upon completion of recruitment deviated somewhat from the targets. In this paper, we present analyses based on white women only
to allow more direct comparison to previous analyses.
Seventy-nine percent of white women who met the
eligibility requirements for cases completed agreed to
participate in the study; 70 percent of eligible white
controls agreed to participate. Each participant was
interviewed in person by a trained nurse-interviewer
according to a pretested standardized questionnaire.
There were a total of 1,026 white women (538 cases
and 488 controls) in the study. We limited our analyses to the 958 white women (513 cases and 445
controls) who had complete data for the four risk
factors of interest.
To estimate population attributable fraction in our
analyses, we used a modification of the formula
828
Rockhill et al.
j J )
(formula 1). This formula enables computation of population attributable fraction
in a general multivariable setting in which both confounding and interaction may be addressed (4). In this
formula, j indexes the mutually exclusive strata
formed by cross-classifying the risk factor(s) of interest in the population attributable fraction estimation.
The j = 0 stratum consists of those unexposed to all
the considered risk factors. RR, is the adjusted relative
risk (odds ratio) comparing those in the jth exposure
stratum with the unexposed (j — 0) baseline stratum.
The RR, can be obtained from a stratified analysis or
a logistic model. The model may contain covariates
that are used to obtain unbiased effect estimates for the
risk factors under study, but that are not of interest
themselves as factors in the population attributable
fraction (that is, these covariates are "adjustment variables" and are not among the cross-classified factors).
A population attributable fraction computed with adjusted relative risks is interpretable as the proportional
reduction in average population disease risk that
would occur if the risk factors of interest were eliminated from the population, assuming that the distributions of the adjustment variables remain unchanged. In
the above formula, pdj represents the proportion of
cases who fall into theyth exposure stratum. When the
cases in a sample represent a complete enumeration or
a random sample of all cases in the relevant population, pdj can be directly estimated as the number of
cases in the yth exposure stratum divided by the total
number of cases in the study.
Because of the randomized recruitment used in the
CBCS, our sample of cases did not represent a random
sample of cases in the population. We thus had to
modify the above formula to take into account the
sampling probabilities. We discuss in Appendix 1 the
algebraic modification that generalizes formula 1 for
use with two-stage sampling designs.
To determine the most appropriate logistic model
form for the four risk factors, we conducted extensive
preliminary analyses to examine the strength of statistical interactions of the four factors with each other
and with other variables, and we also examined potential confounding by these other variables. None of
the variables (exogenous hormone use, body mass
index, alcohol consumption, history of lactation, or
physical activity level) were found to be effect modifiers on the multiplicative scale {p > 0.30 for all
interactions examined) or confounders (as determined
by magnitude of change in the effect estimates of
interest) of the four associations of concern in this
analysis. In addition, there was no effect modification
among the four factors themselves. Thus, our final
model included the design variable age (coded as an
ordinal 11-level variable, with each level reflecting a
5-year age group (i.e., 0 = 20-24 years, 1 = 25-29
years, ...10 = 70-74 years) and the four factors of
interest.
We used a method based on the bootstrap to estimate standard errors of the population attributable
fractions (13). Because of the complicated nonlinearities involved in population attributable fraction computation, there are no exact formulas for its standard
error. We developed an SAS macro that repeatedly
drew samples of 958 women, with replacement, from
the original data set of 958. (Each sampling was stratified on case status, with 513 cases and 445 controls).
For each sample drawn, the attributable fraction was
computed exactly as described above. One thousand
repeat samples were drawn from the original data set;
from the 1,000 resulting population attributable fraction estimates, the 2.5th and 97.5th quantiles of the
frequency distribution formed an approximate percentile-based 95 percent confidence interval around the
original estimate (13, 14).
RESULTS
Table 1 presents frequencies of the cases and controls in the various risk factor strata, as well as estimates of the percentage of the source population of
TABLE 1. Frequencies and estimated percent of source
population in strata,* white women in the Carolina Breast
Cancer Study aged 20-74 years, 1993-1996
Controls
(n = 445)
Estimated
%ot
source
population
111
294
108
104
260
81
31.8
46.6
21.6
95
167
99
61
91
94
163
97
42
49
18.0
35.4
18.2
7.8
20.0
History of breast cancer in
mother/sister(s)
No
Yes
433
80
389
56
78.0
22.0
History of benign breast
biopsy
No
Yes
412
101
354
91
86.6
13.4
Cases
Risk
factor
(n = 513)
Age at menarche (years)
>14
12-13
<12
Agefftpt
<20
20-24
25-29
>30
Nulliparous
* Obtained by weighting control respondents by the inverse of
their sampling probability.
f Agefftp, age at first full-term pregnancy.
Am J Epidemiol
Vol. 147, No. 9, 1998
Attributable Fraction
white women in the strata. These estimates of population prevalence were derived by weighting controls
by the inverse of their sampling probability.
On the basis of our sample data, we estimated that
approximately 98 percent of women with breast cancer
as well as 98 percent of nondiseased women in our
source population were exposed to at least one of these
four breast cancer risk factors; the corresponding estimates for exposure to more than one of the risk
factors were 79 and 76 percent for cases and controls,
respectively. There were no substantial differences by
case status in the number of the four factors to which
women were exposed.
Table 2 presents age- and multivariable-adjusted
odds ratios and confidence intervals for the factors of
interest, with age at menarche of 14 years or more as
the reference category for early menarche and age at
first full-term pregnancy of less than 20 years as the
reference category for late age at first full-term pregnancy. The multivariable-adjusted odds ratios were
derived from a logistic model containing 5-year age
group, age at menarche, age at first full-term pregnancy, family history of breast cancer, and history of
benign biopsy. Age at menarche and age at first fullterm pregnancy/nulliparity were each coded as series
of indicator variables (i.e., no ordered structure was
imposed). These four factors were associated only
modestly with breast cancer risk in these data. Adjust-
ment for multiple variables changed the effect estimates very little beyond simple age adjustment.
The population attributable fraction estimate for the
four risk factors (menarche before age 14 years, first
full-term pregnancy at age 20 years or later/nulliparity,
history of breast cancer in mother/sister, and history of
benign breast biopsy) was 0.25 (figure 1), with an
approximate percentile-based 95 percent confidence
interval of 0.06-0.48. The frequency distribution of
the 1,000 bootstrapped population attributable fraction
estimates was somewhat skewed in shape (distribution
not shown). However, a percentile-based interval
agrees well with a standard normal interval that is
constructed on an appropriate transformation of the
nonnormal distribution, and that then is mapped to the
original scale (13).
We examined the effect on the population attributable fraction and on its precision of changing the
exposure cutpoints for the two risk factors early menarche and late age at first full-term pregnancy (figure
1). Table 3 shows the adjusted odds ratios for the four
factors of interest, according to the changing reference
category definitions for early menarche and late age at
first full-term pregnancy. The proportion of the source
population exposed to at least one risk factor changes
sharply with the changing cutpoints (figure 1). As
figure 1 shows, as the exposure definitions became
more conservative, the population attributable fraction
TABLE 2. Adjusted odds ratios and 95% confidence intervals,* Carolina Breast Cancer Study,
1993-1996
Rtek
factor
Ageadjusted
OR
95% Clt
Multivariableadjusted
OR
95% Cl
Age at menarche (years)
>14
12-13
<12
1.00
1.07
1.23
0.77-1.47
0.83-1.84
1.00
1.08
1.24
0.78-1.49
0.83-1.86
Agefftpt (years)
<20
20-24
25-29
£30
Nulliparous
1.00
1.03
0.98
1.33
1.50
0.71-1.49
0.65-1.48
0.80-2.20
0.94-2.40
1.00
1.08
1.02
1.35
1.53
0.74-1.56
0.67-1.54
0.81-2.25
0.96-2.46
History of breast cancer in mother/sister(s)
No
Yes
1.00
1.38
0.95-2.01
1.00
1.36
0.93-1.98
History of benign breast biopsy
No
Yes
1.00
1.10
0.80-1.53
1.00
1.06
0.76-1.47
* Odds ratios (OR) are adjusted for age (coded as ordinal 11-level variable). Multivariable-adjusted odds ratios
simultaneously adjusted for age and all factors in table: age at menarche (two indicator variables), history of benign
breast biopsy, history of breast cancer in mother/sister, and age at first full-term pregnancy (four indicator
variables).
t Cl, confidence interval; Agefftp, age at first full-term pregnancy.
Am J Epidemiol
Vol. 147, No. 9, 1998
829
830
Rockhill et al.
0.45 -
g 0.35 £•
e
^o
|
•
.a
|
i
0.3 -
025
i
0.2 -
4 »
i •
0.15-
<»
i
i
0.1 0.05 0 -
1
Menarche <12
agefftp**>=30 or
rniiiparous
1
Menarche <12
agefftp>=25 or
nUliparous
1
Menarche <12
agefftp>=20 or
nUliparous
1
Menarche <14
agefftp>=30 or
nUliparous
!
Menarche <14
agefftp>=25 or
nUliparous
1
Menarche <14
agefftp>=20 or
nUliparous
Risk factor definitions (proportion source population exposed)
** Age at first full-term pregnancy
FIGURE 1. Attributable fraction extimates and 95% confidence intervals for different exposure outpoints, Carolina Breast Cancer Study,
1993-1996. History of breast cancer in mother/sister and history of benign breast biopsy are included in all calculations.
declined, and the confidence intervals narrowed. We
note that the proportion of cases exposed did not differ
greatly from the proportion of controls exposed under
any set of exposure definitions, and the number of risk
factors present differed little between cases and controls (data not shown).
DISCUSSION
A central goal of this paper has been to demonstrate
that estimates of population attributable fraction for
established breast cancer risk factors can be made
"high" only by defining risk factors in such a way that
virtually the entire population must be labeled "exposed," and therefore, "at risk." To demonstrate this,
we estimated population attributable fraction for the
four established breast cancer risk factors early age at
menarche, late age at first full-term pregnancy/
nulliparity, history of breast cancer in mother/sister,
and history of benign breast biopsy and examined the
sensitivity of the population attributable fraction and
its precision to changes in exposure cutpoints. Using
the broad exposure definitions for early age at menarche (<14 years) and for late age at first full-term
pregnancy (^20/nulliparous), we found a high proportion exposed among white cases and controls (0.98
and 0.98, respectively). This is comparable with the
proportions exposed in the BCDDP analysis by Bruzzi
et al. (4), which used the same risk factors and cutpoints. However, our population attributable fraction
estimate (0.25) was considerably lower than that of
Bruzzi et al. (0.55). This is not surprising, since Bruzzi
et al. selected these four factors on the basis of the
strength of their associations in their own data. Among
white women in the CBCS, these four factors were
only modestly associated with risk. The confidence
interval around our population attributable fraction
estimate of 0.25 was wide. It is likely that the BCDDP
estimate was also imprecise, due to the high proportions considered exposed.
When we used more restrictive exposure definitions
for early age at menarche and late age at first full-term
pregnancy, the proportions exposed dropped, the population attributable fractions fell, and confidence intervals narrowed. The population attributable fraction
estimate was reduced (from 0.25 to 0.15) when the
most "restrictive" exposure definitions of early age at
menarche (<12 years) and late age at first full-term
pregnancy (S:30 years/nulliparous) were used. The
patterns in figure 1 demonstrate the distributive property of the population attributable fraction, as disAm J Epidemiol
Vol. 147, No. 9, 1998
1
TABLE 3. Adjusted* odds ratios and 95% confidence intervals under changing exposure cutpoints for early age at menarche and late age at first full-term pregnancy,
Carolina Breast Cancer Study, 1993-1996
Menarche at <14 years
Menarche at <12 years
r\
w
Risk
factor
CD
CD
CO
Age at menarche (years)
£14
12-13
<12
Agefftp (years)
<20
20-24
25-29
£30
Nulliparous
History of breast cancer in mother/sister(s)
No
Yes
AgetttpM
Agefftp
Agefftp
£30
£25
£20
years
95% Clt
years
95% Cl
years
or
or
or
nulliparous
nulliparous
nulliparous
}*100
1.18
}i.00
0.85-1.64
1.17
0.84-1.63
1.17
0.65-1.28
0.82-1.97
1.03-2.20
1.00
1.04
0.94
1.30
1.57
95% Cl
1.00
} 1.00
}i.00
}i.00
95% Cl
Agefftp
£30
years
or
nulliparous
0.84-1.63
1.07
1.24
Agefftp
£25
years
or
nulliparous
95% Cl
1.00
0.78-1.48
0.83-1.86
1.07
1.23
0.78-1.48
0.82-1.85
} 1.00
0.73-1.50
0.62-1.41
} 1.00
0.80-2.14
1.00-2.50
1.30
1.57
0.85-2.00
1.06-2.31
0.91
1.27
1.53
0.65-1.27
0.82-1.97
1.03-2.29
1.30
1.57
0.85-2.00
1.06-2.31
0.91
1.27
1.53
1.00
1.36
0.93-1.98
1.00
1.36
0.93-1.98
1.00
1.36
0.93-1.99
1.00
1.35
0.93-1.98
1.00
1.36
0.93-1.98
1.00
1.06
0.76-1.47
1.00
1.05
0.76-1.46
1.00
1.05
0.76-1.46
1.00
1.06
0.76-1.47
1.00
1.06
0.76-1.47
History of benign breast biopsy
No
Yes
Proportion exposed (to at least one factor)
0.62
0.72
* Adjusted for age and all other factors in table.
t Agefftp, age at first full-term pregnancy; Cl, confidence interval.
t } designates that two or more categories were combined as the reference group
0.88
0.91
0.94
832
Rockhill et al.
cussed in detail by Wacholder et al. (7): the population
attributable fraction will increase as the definition of
exposure becomes more inclusive, provided that each
group added to the exposed segment has a risk greater
than 1.0 relative to the remaining unexposed group.
This increase will occur even if the overall relative risk
(that is, the relative risk comparing all exposed with
the unexposed) declines with a change in exposure
cutpoint(s).
The idealized interpretation of our findings is that
approximately 25 percent of breast cancer cases
among white women in our population would be prevented if all white women in the population were to
undergo menarche at age 14 years or later, had no
genetic or cultural/lifestyle predisposition to disease as
reflected in family history of breast cancer, had no
benign breast conditions detected by a biopsy, and had
their first full-term pregnancy before age 20 years, and
if no other risk factors for breast cancer were to change
in distribution as a result of "elimination" of these four
factors. This interpretation appears inappropriate and
of little practical value when discussing risk factors
that are largely unmodifiable and for which nearly 100
percent of the population would need to be "shifted" to
achieve the estimated benefit. Further, one cannot
even state that the practical value of such an estimate
lies in the information it conveys about the other,
unspecified risk factors and the "unexplained" portion
of disease risk; the population attributable fraction
cannot be partitioned into "chunks" that sum to 1.0
(15). We acknowledge the limitations in the interpretation of our population attributable fraction estimate,
yet we give this interpretation for two reasons. First,
the practical scientific or public health value of population attributable fraction estimates for such factors
has been largely overlooked, despite several analyses
and numerous citations. By stating what a population
attributable fraction estimate means, we see more
clearly whether it has practical, scientific, or conceptual value. Second, we provide the strict interpretation
of a population attributable fraction to emphasize what
it is not, as there has been confusion and miscommunication surrounding such estimates, particularly in the
breast cancer literature.
Importantly, and contrary to common misinterpretations of population attributable fractions, these estimates convey no information on the proportions of
women with breast cancer who have any of the considered factors (15). For instance, citations of population attributable fraction estimates of Seidman et al.
(0.21 and 0.29) have included the following: "Although various risk factors have been identified as
causes of breast cancer, the fact remains that in 75
percent of all breast cancer no identifiable risk factor
can be found (16, p. 2567)"; "...only 21 per cent of the
cancers occurring in women from 30 to 54 years of age
and 29 per cent in the women over 50 could be
attributed to one or more risk factors, meaning that the
majority of cancers occur in women with no risk
factors" (17, p. 608). An information brochure published recently by the San Francisco Breast Cancer
Coalition stated that "75 percent of breast cancers
occur in women with none of the currently identified
risk factors." In reality, under the risk factor definitions used by Seidman et al., the percentage of breast
cancer cases who had none of the considered factors
was on average about 20 percent, considerably lower
than the miscited 75 percent. The population attributable fraction estimate of Bruzzi et al. (4) has similarly
been misinterpreted (18).
It is also common for researchers to equate the
population attributable fraction with the proportion of
disease cases "explained" by the risk factors. For instance, after computing an attributable fraction of 0.41
for the three risk factors no first birth by age 20 years,
family history of breast cancer in a first-degree relative, and family income level in the upper two tertiles
of the United States, Madigan et al. state that their
estimates "suggest that a substantial proportion of
breast cancer cases in the United States are explained
by well-established risk factors" (5, p. 1680). This use
of the word "explain" is misleading. According to the
data of Madigan et al., nearly the entire population of
women in the United States has at least one of the
considered risk factors. Since the vast majority of such
exposed women will not develop breast cancer, stating
that such factors explain a large proportion of breast
cancer risk is misleading and even alarmist.
These analyses for established breast cancer risk
factors support a premise put forth by Rose (19) in
reference to chronic diseases in general: Susceptibility
to breast cancer does not appear to be confined to a
high-risk minority within our population. The majority
of breast cancer cases arise from the mass of the
population with established risk factor values around
the population average, and the majority of women
with one or more recognized risk factors do not develop breast cancer. Stating that a certain percentage
of cases could be prevented if nearly the entire population of women were to change its profile on a variety
of relatively unmodifiable factors unfortunately does
not lead to practical preventive public health strategies. However, such statements do provide impetus to
develop new frameworks to understand the distribution of breast cancer within our high-risk population
and to evaluate realistically the possibility of primary
prevention strategies, by turning attention to factors
other than the "major," "established" ones.
Am J Epidemiol
Vol. 147, No. 9, 1998
Attributable Fraction
ACKNOWLEDGMENTS
The authors thank Drs. R. Millikan and D. Savitz for their
helpful comments.
Supported in part by National Cancer Institute funding
for a Specialized Program of Research Excellence (SPORE)
in breast cancer (P50-CA58223).
REFERENCES
1. Kelsey JL. Breast cancer epidemiology: summary and future
directions. Epidemiol Rev 1993; 15:256-63.
2. Harris JR, Lippman ME, Veronesi U, et al. Breast cancer.
(Part 1). N Engl J Med 1992,327:319-28.
3. Seidman H, Stellman SD, Mushinski MH. A different perspective on breast cancer risk factors: some implications of the
nonattributable risk. CA Cancer J Clin 1982;32:301-12.
4. Bruzzi P, Green SB, Byar DP, et al. Estimating the population
attributable risk for multiple risk factors using case-control
data. Am J Epidemiol 1985;122:904-14.
5. Madigan MP, Ziegler RG, Benichou J, et al. Fraction of breast
cancer cases in the United States explained by wellestablished risk factors. J Natl Cancer Inst 1995;87:1681-5.
6. Walter SD. The estimation and interpretation of attributable
fraction in health research. Biometrics 1976;32:829-49.
7. Wacholder S, Benichou J, Heineman EF, et al. Attributable
risk: advantages of a broad definition of exposure. Am J
Epidemiol 1994;140:303-9.
833
8. Newman B, Moorman PG, Millikan R, et al. The Carolina
Breast Cancer Study: integrating population-based epidemiology and molecular biology. Breast Cancer Res Treat 1995:35:
51-60.
9. Wacholder S, Weinberg CR. Flexible maximum likelihood
methods for assessing joint effects in case-control studies with
complex sampling. Biometrics 1994,50:350-7.
10. Weinberg CR, Sandier DP. Randomized recruitment in casecontrol studies. Am J Epidemiol 1991;134:421-32.
11. Weinberg CR, Wacholder S. The design and analysis of casecontrol studies with biased sampling. Biometrics 1990;46:
963-75.
12. SAS technical report P-243. SAS/STAT software: the
GENMOD Procedure. Cary, NC: SAS Institute, Inc., 1993.
13. Efron B, Tibshirani R. An introduction to the bootstrap. New
York, NY: Chapman and Hall, 1993.
14. Kooperberg C, Petitti DB. Using logistic regression to estimate the adjusted attributable risk of low birthweight in an
unmatched case-control study. Epidemiology 1991;2:363-6.
15. Rockhill B, Newman B, Weinberg C. Use and misuse of
population attributable fraction. Am J Public Health 1988;88:
15-19.
16. Freeman HP, Wasfie TJ. Cancer of the breast in poor black
women. Cancer 1989;63:2562-9.
17. Love SM. Use of risk factors in counseling patients. Hematol
Oncol Clin N Am 1989;3:599-610.
18. Garfinkel L. Perspectives on cancer prevention. (Editorial).
CA Cancer J Clin 1995,45:5-7.
19. Rose G. Sick individuals and sick populations. Int J Epidemiol
1985;14:32-8.
APPENDIX
It can be shown through simple arithmetic operations that formula 1 can be reexpressed as
1 -
(1/JC)2*
(l/rr,) (formula 2) (4). Here, x is the
total number of cases in the population (or in the
random sample of cases), and rrt is the adjusted relative risk for the ith case. (Each individual has a risk
relative to a person in the lowest-risk stratum (stratum
j = 0); rri — RR7 for all individuals in theyth exposure
stratum.) This more general expression allows for continuous modeling of the exposure effect, for example,
with splines. When the cases in a sample represent a
complete enumeration or a random sample of all cases
in the rele-vant population, formula 2 can be used without modification. This formula can be rewritten more
Am J Epidemiol Vol. 147, No. 9, 1998
generally as 1 - (2tl(w,>r,)]/2*[l/w,]) (formula 3),
/
i
where w, represents the sampling probability for the ith
case, and rrt is again the adjusted relative risk for the ith
case. The summation is again over cases only. The equivalence of formulas 2 and 3 can be seen intuitively by
considering 1/w, as the number of cases in the population
represented by individual i; thus, the sum over all i (that
is, all cases) of 1/w, estimates the total number of cases in
the sampling frame. This more general expression for
attributable fraction enables estimation when studies
have used two-stage sampling (of which randomized
recruitment is a specific type) or stratified sampling. We
note that in the special case where all the w, are the same
for cases (e.g., where the study cases represent a random
sample of all cases), formula 3 reduces to formula 2.