Download Population and Individual Based Approaches to the Design and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epidemiology of HIV/AIDS wikipedia , lookup

Hospital-acquired infection wikipedia , lookup

Hepatitis B wikipedia , lookup

Hepatitis C wikipedia , lookup

Eradication of infectious diseases wikipedia , lookup

Microbicides for sexually transmitted diseases wikipedia , lookup

Oesophagostomum wikipedia , lookup

Sexually transmitted infection wikipedia , lookup

Cross-species transmission wikipedia , lookup

Transcript
Population and Individual Based Approaches to the Design
and Analysis of Epidemiological Studies of STD
Transmission
Stephen Shiboski
Nancy Padian
University of California, San Francisco
Abstract
Epidemiological studies of Sexually Transmitted Disease (STD) transmission present a
number of unique challenges in design and analysis. These arise from both the social nature
of STD transmission, and from inherent difficulties in collecting accurate and informative data
on exposure and infection. Risk of acquiring an STD depends on both individual level factors
and the behavior and infectiousness of other individuals. Consequently, study designs and
analysis methods developed for studying chronic disease risk in individuals or groups may
not directly apply. In this article we use simple models of STD transmission to investigate
these issues, focusing on how the interplay between individual and population level factors
influences design and interpretation of epidemiological studies. Particular attention is paid to
interpretation of common measures of association and to common sources of bias in epidemiologic data. Modifications of existing methods for investigating risk factors which allow these
issues to be addressed directly are presented. In addition, examples from studies of transmission of Human Immunodeficiency Virus (HIV) and other STDs are provided.
1
Epidemiology of STD Transmission
2
Introduction
Epidemiologists have long recognized that the causal processes relevant to understanding the epidemiology of sexually transmitted diseases (STDs) represent a complex interplay between factors
acting on the population and individual level. In contrast to the etiology of many chronic diseases, STDs have a strong behavioral component in the causal pathway, with individual disease
risk depending directly on the exposure of susceptible to infected individuals. Within a sexual
partnership, population/group and individual level factors play an important role in partner choice
and patterns of sexual behavior, while individual level factors in both partners mediate whether or
not transmission actually occurs. Factors on both the individual and group levels must be taken
into consideration in the design and interpretation of epidemiological studies of transmission risk.
Figure 1 is a schematic representation of the transmission process within a single partnership,
progressing from formation of the partnership through sexual contact to transmission. The white
boxes represent the physical elements of the process, and the gray boxes and connecting lines indicate the influence of population/group level and individual/partnership level factors on the stages.
Population and group level factors can refer either to characteristics which have no clear counterpart at the individual level and vary primarily between groups (e.g. environmental conditions, laws
and social norms), or features which represent measured summaries or aggregates of individual
level characteristics (e.g. the prevalence of infection). Susser [1] has termed these integral and
contextual factors, respectively. Individual and partnership level factors refer to behavioral and
biological features of individuals in a sexual partnership.
From the figure, both individual behavior and societal factors play a role in partner selection
(e.g. demographic factors and individual preferences) and the nature of sexual contact (e.g. mode
of contact and contraceptive decisions). Once sexual contact is initiated, transmission can only
occur if one partner is infectious and the other susceptible. Thus, the prevalence of infectiousness
among partners (a group property) directly influences the probability of transmission because it
Epidemiology of STD Transmission
3
influences the likelihood of selecting an infected partner. Conditional on potentially infectious
contact occurring, transmission depends on the nature (e.g. frequency and mode) of contact as
well as biological factors in both partners, including those influencing susceptibility (e.g. immunity) and infectiousness (e.g. stage of disease). The scope and validity of conclusions drawn
from a study of transmission depends critically on the level of detail of the accompanying data
on exposure and other risk factors. For example, if factors influencing partner choice are measured but not variables describing contact, conclusions about transmission depend implicitly on
assumptions regarding exposure, susceptibility and infectiousness. By contrast, if data include
detailed information on contact patterns and biological risk factors but are limited to monogamous infective/susceptible pairs, infectiousness and susceptibility issues can be addressed, but no
conclusions about risk associated with exposure to multiple partners (i.e. about transmission in
non-monogamous individuals) are possible.
In this paper we discuss design and interpretation issues for epidemiologic studies of STDs,
emphasizing the importance of the interplay between individual and population level factors. A
number of recent articles have focused on related issues, including bias in standard measures
of association applied to STD transmission data [2], interpretation of risk factors derived from
individual and population level data [3] and implications for design of interventions [4]. The first
section of this paper describes basic designs for epidemiological studies of STD transmission in
individuals and groups, with examples from existing studies. The second section presents methods
of analysis for epidemiologic data arising on both group and individual levels. The methods are
based on very simple mathematical models of the transmission process, and their relationship to
standard techniques such as logistic regression is described. The third section discusses biases
that arise in analyses of transmission data. Sources of bias are shown to be related to incorrect
assumptions regarding the relationship between transmission risk and exposure, and the nature of
infectiousness and prevalence. Finally, we discuss implications for design and analysis of future
epidemiologic studies and interventions.
4
Epidemiology of STD Transmission
Study design issues
Traditional designs for epidemiological studies can be roughly classified into individual and population/group level studies depending on how the outcome is measured and data are analyzed. In
this scheme, individual studies encompass designs such as case-control and cohort studies which
measure disease outcomes and exposure in individuals, and treat outcomes as independent in analyses. Outcomes in group level studies are generally based on the average number/rate of outcomes
in a group and risk factors which are also defined on the group level. Analysis of data from such
studies is generally based on regression techniques similar to those used in individual studies.
Unlike most studies of chronic diseases, infectious disease studies do not cleanly separate into
these levels because individuals are linked together by exposure [3, 4]. STDs are unique among
infectious diseases in that exposure in the form of sexual contact is required for transmission to
occur. Thus, studies investigating risk factors for transmission among individuals implicitly involve exposure information from partners, and group level characteristics such as prevalence of
infection. Similarly, infection rates in groups are strongly linked to patterns of individual behavior. These features have important implications for design and interpretation of studies. In this
section we review study designs commonly used in investigating STD transmission in individuals and groups. In addition, recent approaches to investigating transmission among networks of
interacting individuals [5] are discussed.
Individual level studies
Most existing studies of risk factors for transmission are based on observation of disease outcomes
in individuals. The major outcome considered is the infection status of an individual following
exposure to potentially infectious partners through sexual contact. Risk factors focus on behavioral
and biological characteristics of both the respondent and of partners, and analyses are usually
based on standard epidemiological methods, including contingency table methods and logistic
Epidemiology of STD Transmission
5
regression which treat all individuals as independent. The major difference between these studies
and traditional designs most commonly applied to chronic disease outcomes is the need to collect
exposure data involving other individuals. Thus although the individual is taken as the unit of
analysis, sexual partnerships are the basis for measurement of exposure. Data analyses must rely
on assumptions about other partners whether or not they have been recruited into the study.
A common study design for investigating individual risk factors for transmission is provided
by partner studies of HIV transmission in sex partners of known infected individuals [6, 7, 8]. The
main outcome is infection status of the (susceptible) partner following exposure to the infective.
Partnerships are generally recruited on a volunteer basis, with eligibility depending on the ability
of investigators to identify the primary infected partner (index case), and to rule out outside sources
of infection for the secondary partner. Exposure and risk factor information is collected retrospectively by some combination of interview, physical examination and laboratory data. Because of
the difficulties in prospective follow-up of couples and the necessity for counseling on safe sexual
practices (which lead to reductions in exposure over time), most partner studies are retrospective
in design [9]. Prospective studies of couples with discordant HIV status have been carried out,
but generally few seroconversions are observed [10]. A second type of individual-based design
is illustrated by recent studies of risk factors for HIV transmission among homosexual men in
San Francisco [11, 12]. These studies recruited men from high risk locations and examined associations between past sexual behaviors and current HIV serostatus. Exposure was measured
as numbers of partners and frequencies of sexual contacts in a period prior to recruitment, and
between visits during prospective follow-up. Another example of a prospective design is a study
of gonorrhea transmission among sailors on leave in the far east [13]. Men were tested for infection before and after their leave, and sex workers in the port-of-call were concurrently tested for
prevalence of infection. The association between exposure (sexual contacts with sex workers) and
outcome (new infection on return to the ship) was investigated, and estimates made of the perpartner infection probability or infectivity. A shared feature of all these examples is that although
Epidemiology of STD Transmission
6
observed data includes information from sexual partners, the outcome and analysis are based on
the susceptible individual. Further, in addition to measuring exposure in terms of frequency of
contact, prevalence of infectiousness among partners is a critical consideration in both design and
interpretation of results: for partner studies, prevalence is fixed by design and only partnerships
with known infected partners are admitted; in the studies of gay men, recruitment was restricted to
high prevalence areas to insure that infected individuals appeared in the sample; in the final example, determination of prevalence among the sex workers allowed estimation of the risk per-partner
of transmission.
Correct inferences about transmission risk factors requires some knowledge of exposure of
susceptible to infective individuals. As discussed above, this may be in the form of detailed contact information at the partnership level, population level information about the prevalence of
infectiousness, or both. To illustrate these points further, Figure 2 presents a hypothetical exposure history for a study subject prior to recruitment into a study. Only one of the three partners
contacted was infectious, so contacts with the other two carry no transmission risk. Complete
exposure data would consist of the number of partners in addition to the infectiousness, timing
and frequency of contacts with each. Additional information on the nature of sexual behaviors
with each partner such as use of contraception and mode of contact are also important. In practice, exposure data from individual level studies is usually incomplete; the degree of completeness
determines the nature of conclusions that can be drawn about transmission risk. Sources of incompleteness range from no information on infectiousness and susceptibility at the time contacts are
made to inaccurate information on numbers of partners or frequency and nature of contacts.
If complete exposure information (similar to that presented in Figure 2) is available for a
sample of individuals, at least two different approaches to risk factor analysis are possible. First,
transmission risk conditional on contact with each known infectious partner can be investigated
(i.e. non-infectious partners are omitted). This allows behavioral risk factors within a partnership
(e.g. condom use, mode of intercourse) to be examined minimizing the concern that exposure data
Epidemiology of STD Transmission
7
are misclassified. No assumptions about prevalence of infectiousness in the population are needed
in this case. If data are sufficiently detailed, properties of infectiousness and susceptibility can also
be investigated. Studies of HIV transmission risk in partners of known infected individuals provide
an example, although data on infectiousness are typically very limited [14]. Second, if factors
associated with partner selection are of primary interest, the association between transmission and
collective properties of all partnerships can be investigated.
By contrast, if the observed data are limited to current infection status of the susceptible,
total numbers of partners and contacts, and summaries of sexual behaviors across partnerships,
very little can be concluded about the association between exposure and transmission within a
single partnership. In particular, infectiousness and susceptibility properties cannot be investigated
directly since periods of infectiousness of partners are not observed. Risk factor information in this
situation must be averaged or summed across partners, and conclusions about transmission risk are
limited accordingly. Conclusions depend critically on knowledge of the prevalence of infection in
the underlying population. For example, if no information is available about the infectiousness of
the three partners in Figure 2, the inclusion of ”unexposed” contacts and non-infectious partners
in an analysis clearly over-represents the true degree of exposure. Erroneous inferences about the
association between exposure and outcome will result unless the assumption that all partners are
picked randomly from a ”pool” of individuals (i.e. homogeneous mixing) with known prevalence
of infectiousness is adopted. Thus, valid conclusions about risk associated with exposure require
use of population level information on infectiousness (i.e. prevalence). Because detailed data
on the epidemic at this level are rarely available, prevalence is usually assumed to be constant.
In addition, infection outcomes among individuals in analyses are assumed to be independent
[3]. Biases which can arise when these assumptions are false are discussed in the third section.
Examples of studies of this type include the San Francisco studies of HIV transmission among
homosexual men mentioned above.
To summarize, outcomes and exposure data from studies of individual level risk of STD trans-
Epidemiology of STD Transmission
8
mission depend on individuals other than the study respondent. Thus any assessment of risk factors
in this setting implicitly involves group level information such as the prevalence of infection in the
community from which the sample is drawn. Studies which focus on single partnerships with
known exposure histories require few assumptions about individuals outside of the partnership,
but often lack detailed information about infectiousness and susceptibility. In comparison, exposure information from studies of individuals reporting contact with multiple partners is often too
incomplete to draw conclusions about risk according to the nature of contact within partnerships.
Such studies can provide valuable information about risks associated with multiple partnerships,
but only at the cost of additional assumptions about the nature of contact within partnerships and
infectiousness of partners.
Group level studies
Unlike studies of individuals, studies based on group level information relate transmission outcomes for entire groups to measures of exposure and risk factors also defined at the group level.
Although the outcome measures in such studies are aggregations of individual level observations,
variation in levels of exposure and risk factors within groups are ignored in data analyses (i.e.
individuals within groups are assumed to share the same level of exposure). This is the source of
many of the limitations associated with group level analyses.
Perhaps the most common group level studies of STD transmission compare observed disease
occurrence among groups distinguished by geographic area such as census tract, and by other factors assumed to be homogeneous within areas such as socioeconomic status. The limitations of
such ecological studies are well described in the epidemiological literature [15] and will not be
covered in detail here. Perhaps the most notable is the ecological fallacy, which addresses bias
which results when associations observed on the group level are assumed to apply at the individual
level. Several recent articles have noted that in the case of infectious diseases, population level
information such as that obtained in ecological studies may provide insights about epidemic pro-
Epidemiology of STD Transmission
9
cesses not easily observable in individual level studies [1, 3, 4, 16]. In the case of HIV, geographic
trends in disease rates reveal much about patterns of disease occurrence and provide critical information for intervention and control programs. In the U.S., case surveillance reveals that AIDS
cases and HIV infection are clustered by geography, risk behavior, and demographic characteristics [17]. Sixty percent (190,000) of AIDS cases reported through June 1993, were registered
from only 20 (21%) of the 97 metropolitan areas of the U.S. with populations greater than 500,000
persons. In a multivariate analysis, race, urbanicity, and geographic region were independently associated with seroprevalence [18]. Eighty-eight percent (88%) of the estimated HIV seropositive
women in this study came from only 8 states, and the great majority of these women resided in the
major population centers of these states. Analyses of AIDS case rates by zip codes in New Jersey
(8), and by census tracts in Philadelphia [19] demonstrated an association with low socioeconomic
status. Finally, cases are also concentrated among groups characterized by their risk behaviors. Of
the 85,155 AIDS cases reported between July 1992 and June 1993, 79% were in men who had sex
with other men (53%) and injecting drug users (23%). Studies of group level effects can also be
valuable in detecting secondary or indirect effects of changes in risk factors, such as reductions
in transmission rates due to herd immunity [20]. These are particularly relevant in evaluating the
benefits of vaccination programs [4].
Because group level studies like those described above frequently include large numbers of
individuals representing a range of geographic locations, they offer opportunities to investigate
factors which are difficult to measure in individual level samples, which are often homogeneous
with respect to such characteristics. However, in addition to interpretation problems like the ecological fallacy, these studies have several other shortcomings for investigating STD transmission:
Even within fairly well defined geographic and/or social strata, factors controlling exposure (e.g.
mode and frequency of sexual contact, and prevalence of infectiousness) may be very heterogeneous within groups, varying between and within partnerships. Thus, even if such studies yield
information on risk factors not observable in individual level studies, the potential for misclassi-
Epidemiology of STD Transmission
10
fication of exposure and outcome, and the presence of uncontrolled confounding factors remains
high.
Transmission network studies
A major shortcoming of the more ”traditional” study designs discussed above is that transmission
outcomes and exposure information either focus on an individual, or are assumed to be homogeneous within larger groups. In real populations, STDs spread through networks of partnerships
strongly influenced by both social and geographic factors. Because of the heterogeneity in behavior between partnerships, many of these factors are not easily observable in studies of individual
or group level risk. An example of such a factor is the presence in the population of core groups
of individuals characterized by high levels of risk behavior and prevalence of infection. Initially
suggested by mathematical modeling of gonorrhea transmission [21], the presence of core groups
in a population can profoundly affect the development of an epidemic. However, most results
about the importance of core groups come from theoretical studies of mathematical models, with
a few exceptions [22].
Recent interest in the group level properties of disease spread described above has led to consideration of study designs better suited to their measurement. One approach examines transmission within and/or between groups of susceptible individuals linked by exposure to one or more
infectious index cases in a contact network. Originally conceived for investigation of social interactions [23], these studies focus on factors which influence partnership formation and on patterns
of interaction between individuals and groups. One of the major applications for the results is in
improving mathematical models of epidemic spread by incorporating more realistic information
on factors influencing the heterogeneity of sexual behavior. Linking these descriptions of behavior
with empirical data on transmission outcomes is challenging and very little work has been done in
this area. The basic problem has similarities to studies of infectious disease transmission within
small groups or households [24, 25, 26], but is complicated by the fact that sexual partnerships
Epidemiology of STD Transmission
11
linked to known index cases may be very difficult to trace.
Figure 3 presents graphs representing sexual networks between individuals and groups. A
complete description of these networks involves detailed knowledge of exposure (i.e. frequency
and timing of contacts), infectiousness and directions of transmission for all partner and group
interactions. Since observed data in actual studies are very limited, assumptions about mixing
patterns and exposure within and between groups must be made for data analyses to proceed.
Consider the network of contacts linked to a single index case shown in Figure 3A. If observed
data describing this network are limited to the identity of the index case and infection status of all
partners following a fixed time period, a very large number of infection patterns in addition to that
shown could equally well explain the observations, and must be accounted for in the analysis. If no
information on exposure within partnerships is available, all must be assumed to share an average
degree of exposure in the analysis. Figure 3B represents possible interactions between 5 groups.
Individuals within groups are assumed to be homogeneous with respect to key factors affecting
transmission such as geography, level of sexual activity and prevalence. In order to estimate the
transmission probability associated with mixing between individuals in any two groups, detailed
information on contact patterns and rates, and infection status on the individual level are needed.
Study designs for collecting such information are not well developed. A major difficulty in conducting such studies is that neither group composition nor interactions between groups is static
[24]. Because of these difficulties, most studies have focused on describing the mixing patterns in
very restricted networks without examining the relationship to transmission [27, 28, 29].
Analysis of STD transmission data
In this section we present a few simple models of STD transmission and discuss their application
to analysis and interpretation of data from epidemiological studies. The assumptions underlying
these models reflect the difficulties in design and interpretation outlined for the study designs pre-
Epidemiology of STD Transmission
12
sented in the previous section, and help clarify the integral role of both population and individual
level factors in determining transmission risk. Although these assumptions lead to models which
are extremely simplistic, they provide useful ”null” models against which more complex alternatives can be compared. In addition, since relationship between exposure and transmission is
clearly specified these models provide a better basis for investigations of the impact of additional
risk factors on risk than more conventional approaches such as logistic regression. More detail on
the derivation of the models and their properties is given in the Appendix.
Models for individual responses
Mathematical models relating STD transmission risk in susceptible individuals to individual measures of exposure might be more appropriately called partnership level models, since the sexual
partnership forms the smallest unit necessary for transmission to occur. These models represent
transmission risk conditional on the susceptible being in sexual contact with one or more infective
individuals and can be used as the basis for analysis of data arising in studies of individual risk
described above.
As shown in the previous section, given complete data on exposure patterns and infectiousness of all partners (see Figure 2), a fairly realistic (and correspondingly complex) transmission
model could be constructed. (See equation (A-1) in the Appendix.) Studies of HIV transmission
in monogamous partners of identified index cases with known time of infection may come the
closest to achieving this ideal [7, 8]. Since the period of potential infectiousness of the index case
is known, ”exposed” contacts occurring after this time can be estimated. However, without knowledge of factors controlling infectiousness and susceptibility, there is no basis for modeling how
these factors may modify transmission risk. Thus, in quantitative representations of transmission
risk, the per-contact transmission probability or infectivity is typically assumed to be constant, and
successive contacts assumed to occur independently. With these assumptions, the transmission
Epidemiology of STD Transmission
13
probability can be written
Pr(infected j k contacts ) = 1 , (1 , λ)k ;
(1)
where k gives the number of (exposed) contacts and the constant λ represents the infectivity. Note
that although no assumption about prevalence of infection is required - only partners of known
index cases are included in the sample - the infectivity is assumed to be constant across contacts
within a partnership, and across all partnerships in the population. Thus, the model specifies that
transmission risk depends on the infected partner only through the number of contacts. This implies that both susceptibility and infectiousness are also constant. More realistic generalizations
would allow the infectivity to vary within and across partnerships according to other measured
and/or unmeasured characteristics of both partners [30, 31]. Below, we discuss a regression framework for controlling for the effects of observed risk factors.
When multiple partners are a possibility, observed data about exposure and infectiousness are
typically less detailed and more assumptions become necessary to construct models. In the examples of HIV transmission among San Francisco gay men, data were limited to infection status
of the susceptible, and number of potentially infectious partners contacted in an interval of time
before enrollment. Thus, no detailed information was available about sexual behavior with particular partners. A model for the transmission probability in this case requires assumptions about
the partner selection process, in addition to the degree of infectiousness and exposure associated
with each partner. Specifically, it is assumed that the infectivity per-partner is a constant, that partners are selected independently (i.e. homogeneous mixing [20]), and that the degree of exposure
to each partner is the same. These assumptions lead to the following well known model for the
infection probability associated with exposure to m partners:
Pr(infected j m partners) = 1 , (1 , ρθ)m :
(2)
The per-partner transmission probability in this model is made up of two components: ρ represents
the probability of selecting an infectious partner (i.e. the prevalence of infectiousness in the pool of
Epidemiology of STD Transmission
14
potential partners), and θ is the per-partner infectivity (the probability of infection conditional on
contact with an infectious partner). Notice that this model depends on population level information
ρ and individual level data m. The infectivity θ (and often its dependence on additional risk
factors) is the object of estimation. As shown in the Appendix, the fact that θ and ρ appear
together in the above equation means that their respective effects on the transmission probability
cannot be separated unless one of them is known or can be separately estimated. The assumption
that θ and ρ are constants have strong implications about the nature of exposure, infectiousness
and susceptibility within and across partnerships. In fact, this model subsumes the assumptions
required in modeling transmission within a single partnership (above), and compounds them with
the additional assumptions about partner selection and prevalence.
In the case where no exposure information is available, the assumption that each susceptible
contacts on the average of ν partners leads from model (2) to the following expression for the
transmission probability:
Pr (infected) = 1 , exp(,νρθ)
:
(3)
The complete lack of knowledge about exposure and infectiousness is reflected in the fact that
parameters for these factors appear together. As shown in the Appendix, omitting exposure information from an analysis of transmission risk factors is equivalent to assuming that such a model
applies, and that all individuals are subject to average levels of exposure and infectiousness.
Models for group level data
When the focus of an investigation shifts from the relationship between exposure and transmission
risk for an individual to the mechanisms affecting spread of an infectious agent through a group,
collecting detailed exposure and outcome data is virtually impossible. The corresponding models
take on an added layer of complexity reflecting the need to realistically account for individual level
characteristics contributing to exposure and transmission.
Epidemiology of STD Transmission
15
STD transmission within a group of individuals defined by linked partnerships has similar
features to spread of an infectious disease through groups of individuals in close proximity, such
as members of a household. Thus, models for household epidemics should provide some insights
into how models of STD transmission in groups should be constructed. The earliest transmission
models were constructed to represent the spread of an infectious disease through a closed group
or household of susceptibles initiated by one or multiple infectives, and are called chain binomial
models [26, 32]. The basic data are summarized by generations of cases, often called epidemic
chains; the first generation consists of the initial index cases(s), and each successive generation
is made up of the new cases resulting from contact with the infectives of the previous generation.
Although very simple in nature, these models capture many important features of transmission,
and form the basis for a range of more complex models for the spread of a disease in an entire
population. The simplest such model states that the number of new infectives in generation j + 1
is related to the current number of susceptibles and the number of infectives of the jth generation
as follows:
h
I j+1 = S j 1 , (1 , p)I j
i
:
(i.e. the number of infectives in the j + 1st generation follows a binomial distribution with pa-
rameters S j and 1 , (1 , p)I j .) The expression in brackets is the Reed-Frost model [32], which
states that the probability of a susceptible becoming infected following simultaneous exposure to
all I j infectives of the previous generation is the same as the infection probability following successive independent exposures to the same number of infectives from different generations. The
constant p represents the probability of transmission given exposure to one infective. This model
typifies the assumptions about individual level behavior needed in quantitative interpretation of
transmission data.
There are several difficulties in generalizing this framework to accommodate STD transmission. First, the notion of a group of susceptibles who share exposure by being in close proximity to
one or more infectives needs to be modified to accommodate sexual partnerships. Thus, a ”house-
Epidemiology of STD Transmission
16
hold” needs to be replaced with a group of susceptibles and infectives linked by a network of
sexual partnerships (e.g. Figure 3A). In principle, this group could extend to an entire community.
As with household epidemics, the possibility of infection from outside the group must be allowed.
Second, the assumption implicit in Reed-Frost transmission probability that each susceptible is
equally exposed to each infective needs to be modified to account for the likelihood of forming a
partnership with an infective individual. One possibility is to substitute the transmission probability (3) in for the Reed-Frost probability (in brackets), and let ρ j denote the average prevalence in
the group at the jth generation or time period:
I j+1 = S j [1 , exp(,νρ j θ)]
:
(4)
Note that this model retains the homogeneous mixing assumption, and also assumes that susceptible individuals share the same (average) level of sexual activity, and that both this level and the
per-partner infectivity are fixed constants. It has been used as the basis for a discrete mathematical
model for simulating the course of an epidemic [33], but to our knowledge has not been applied
to epidemiological data. A number of mathematical models have been constructed to generalize
the above approach to account for non-random mixing networks of partnerships [33]. However,
the inherent complexity of these models coupled with the paucity of data on transmission in these
settings precludes applying them to results from epidemiological studies.
Regression models for additional risk factors
The models presented above represent the link between exposure and the probability of transmission in an extremely simple way. However, transmission risk on either individual or group levels
depends in a complex way on many behavioral and biological factors not represented. Epidemic
models can account for the additional complexities introduced by these factors by increasing the
number of parameters and adding assumptions about how they relate to risk. By contrast, inferences from epidemiological studies are confined to the risk factors which can actually be observed.
Epidemiology of STD Transmission
17
These are limited, and typically do not support fitting of complex models. A standard epidemiological approach to investigating the impact of such factors is to use regression models. For disease
outcomes, these models postulate a mechanism whereby risk factors impact the background or
”baseline” disease risk which applies in the absence of risk factors. Although the form of the
models is not thought to accurately represent true mechanisms, they are extremely useful in investigating the importance of risk factors. For example, in the logistic regression model risk factors
act multiplicatively on the background disease odds [34]. Because of the ease of interpretation of
regression coefficients and the wide availability of software, this is the method usually chosen for
analysis of data from individual level studies of STD transmission. However, regression models
can also be constructed from the transmission models presented above [30]. The resulting models
are examples of proportional hazards models [35], and offer a number of advantages over the
logistic approach. For example, the influence of an additional risk factor z on the transmission
probability (2) can be expressed in the proportional hazards form:
log[, log(1 , P)] = log[, log(1 , ρθ)] + log m + βz
:
(5)
This is an additive linear model, with intercept dependent on infectivity and prevalence, and two
independent variables: the logarithm of the number of partners, and z. (Note that the value of the
coefficient for the former is fixed at one and is not estimated.) If the prevalence ρ can be estimated
separately, the intercept provides an estimate of the infectivity θ. The regression coefficient β gives
the (constant) increase in the logarithm of the infection rates induced by each unit increase in the
level of the z; the effect does not depend on the level of exposure, prevalence or on the values of
other risk factors unless the assumptions underlying the model are incorrect. If this is the case, the
effects on estimated coefficients can be examined using techniques discussed below.
The logistic model for risk factors does not share the natural links with the transmission process discussed above for the proportional hazards model. In particular, it is not clear how exposure
should be introduced into the logistic model. Further, the interpretation of logistic regression co-
Epidemiology of STD Transmission
18
efficients is less clear in the transmission context. For example, if a logistic regression model is fit
to transmission data generated by model (2), the regression coefficients (and corresponding odds
ratios) will depend on the levels of prevalence, infectivity and exposure. This is demonstrated in
the Appendix, and is illustrated in Figure 4. The figure shows that the discrepancy between the
logistic regression coefficient and the corresponding coefficient from (5) increases with increasing
levels of infectivity and prevalence. (For very small values of the product ρθ , estimated regression
coefficients from (5) and the logistic regression model are indistinguishable [36].) Thus, if investigating risk factor effects is the primary goal, the logistic model should provide similar conclusions
to transmission models such as (6). However, due to the dependence on key epidemic parameters,
assessments of risk factor effects may be unstable and not generalizable outside of the epidemic
conditions which obtained when the study was conducted. This phenomenon has also been noted
by other authors in the context of epidemic modeling [2, 33]. In the next section we will discuss
a number of other factors which can lead to similar problems in interpreting the effects of risk
factors.
Regression methods for transmission data arising in group level studies and in partner networks
are a current research topic. Data from ecological studies is usually analyzed using standard linear regression or Poisson regression [16]. The binomial form of model (4) suggests that binary
regression of grouped response data may yield parameter estimates more interpretable in the transmission context. Estimation and inference could be based on methods similar to those developed
for household epidemics [26, 37]. Presumably if detailed data on transmission networks were
available, models such as (4) could be used as well. However, modifications to account for the
more complex dependencies between individuals in the network would be needed. Magder and
Brookmeyer [38] present a regression model for the special case of single partnerships where the
identity of the index case is unknown. The complexities which arise in their methods indicate the
difficulties likely to be encountered in developing methods for larger networks.
Epidemiology of STD Transmission
19
Bias in analyses of STD transmission data
In their recent book on infectious disease modeling, Anderson and May [20] state (in Chapter
11) that ”... most sexually transmitted diseases cannot be understood without acknowledging the
marked heterogeneity in the degrees of sexual activity within the overall population.” This statement could be amended to include another important source of heterogeneity: individual variations
in infectiousness and susceptibility. As discussed above, many important sources of heterogeneity
may not be accounted for in the design and analysis of epidemiological studies. Ignoring this
heterogeneity in analyses of individual or group level data can lead to biased estimates of risk
factor effects and incorrect inferences about the importance of risk factors in influencing transmission. Even if simple models are thought to provide an adequate description of the transmission
process, complete and accurate exposure data are extremely difficult to obtain. Measurement error in exposure data is another likely sources of bias. Although new study designs and analysis
techniques may improve this situation somewhat, it is likely that inferences made from epidemiological studies will continue to be plagued by these problems. A major challenge in analyzing
transmission data is to draw valid inferences about the impact of risk factors is spite of the wide
array of potential biases.
In any study of STD transmission, observed data on exposure and important factors influencing infectiousness and susceptibility are likely to be very limited or completely unavailable. For
example, in the partner studies of HIV transmission discussed earlier, exposure data are limited
to retrospective assessments of the frequency and duration of contact, and the time of infection of
the index case can rarely be determined. Further, biological measures of infectiousness (e.g. viral
titers) and susceptibility are usually unavailable for the period of exposure. The constant infectivity model (1) for transmission within a partnership is a useful tool in understanding the impact of
heterogeneity and measurement error on inferences about risk factors [30]. For a single additional
Epidemiology of STD Transmission
20
risk factor z, the regression form of this model
log [, log (1 , P)] = log[, log (1 , λ)] + log k + βz ;
(6)
assumes that the infectivity λ is constant between and within partnerships. If λ in fact varies
across partnerships reflecting variations in infectiousness and/or susceptibility, but the analysis is
incorrectly based on (6) (or on a logistic regression model with constant intercept), the estimated
regression coefficient for z will be attenuated from its true value, with the degree of attenuation
increasing with increasing levels of variability [30, 36]. Estimates of the infectivity and corresponding transmission probabilities will be biased as well. In extreme cases, harmful risk factors
will appear protective or vice-versa [36]. (This is discussed further in the Appendix.) This phenomenon is well known in demographic applications of survival analysis [39], and the statistical
literature contains many approaches to estimating effects of heterogeneity in this context [40, 41].
The effects of measurement error in exposure information will have similar effects on estimated
coefficients, but generally to a lesser degree [14, 30]. Results from several investigations of partner study data have indicated that heterogeneity and measurement error are likely to be important
factors [36, 42, 43]. This implies that the degree of risk associated with reported risk factors may
be understated, and that conclusions regarding the importance of controlling for exposure in such
studies should be re-evaluated [43, 44, 45, 46]. These results also mean that estimates of infectivity
and transmission probabilities from such studies are likely to be biased. An additional impact of
ignoring these sources of bias is in underestimation of variability of risk factor effects. This may
lead to incorrect inferences about the importance of risk factors. Although very few quantitative
results are available on this topic, related research in other fields [47] indicates that it is likely to
be important for STD transmission models.
A second example is provided by the studies of HIV transmission among homosexual men reporting multiple exposures with multiple partners discussed in the first section. Figure 5 displays
approximate bias in regression coefficients from fitting model (5) to data when the infectivity
Epidemiology of STD Transmission
21
parameter varies across partnerships according a known distribution. In the example presented,
the true value of the regression coefficient is assumed to be one, and exposure and prevalence
are fixed. Clearly, substantial attenuation in estimated coefficients can be expected as infectivity increases, and with increasing variation across partnerships. For example, if the distribution
characterizing variation in infectivity across partnerships had a squared coefficient of variation
of 8 (corresponding to the standard deviation being almost three times larger than the mean) and
mean 0.1, the estimated coefficient would be attenuated to 0.4. In addition to the sources of bias
encountered in monogamous partner studies, these studies are subject to biases from uncontrolled
variations in the prevalence, and from deviations from the assumption of homogeneous mixing.
The prevalence is assumed to be a constant ρ in model (5), which has been applied to data from
several studies [13, 48]. If prevalence varies in time, or due to selection of partners from pools of
differing prevalence, similar effects on regression parameters can be expected. To control for this
in an analysis of risk factors, model (5) could be modified to incorporate stratification by different
levels of prevalence and/or variation of prevalence with chronological time. A further possibility
would be to separately model the prevalence as a function of population level variables, using
”multilevel” regression methods [49, 50]. These methods allow parameters in individual level
models (e.g. ρ in (5)) to depend on a separate set of explanatory variables representing population
level characteristics. Extensive data and custom software are required for their application.
Bias in regression models arising from non-random selection of partners (inhomogeneous mixing) has not been addressed in the statistical literature. Koopman, et al [2] used epidemic models
to investigate bias in common measures of association induced by non-random mixing. Their results show that measures of association such as the odds ratio and relative risk can be very unstable
if mixing patterns are not properly controlled for. This bias is likely to be important in analyses of
transmission in individuals reporting multiple partners when partners are drawn differentially from
groups with varying levels of sexual activity and/or infectiousness. To take an extreme example,
consider a community divided into two non-mixing subgroups with identical levels of prevalence
Epidemiology of STD Transmission
22
and infectiousness, but different levels of sexual activity (exposure). Assume that individuals mix
homogeneously within the subgroups. If a random sample of individuals is drawn from the community and the data analyzed using model (5) under the assumption of homogeneous mixing, the
regression coefficient of an additional risk factor z will be attenuated from its true value, with the
degree of attenuation increasing with increasing levels of infectiousness and with increasing difference between exposure levels in the subgroups. This is illustrated in Figure 6. For example, if
the exposure level for second group is ten times higher than the first group and the infectivity is
10%, an estimated regression coefficient will be approximately 0.7 when the actual value is one.
This bias can be viewed as arising from the failure to control for community subgroup (i.e. a
omitted covariate). Since the effects of the dependence induced by non-random mixing should in
most circumstances be similar to effects observed for longitudinal and clustered binary outcomes,
statistical methods for incorporating this dependence could possibly be based on modifications of
existing techniques [51, 52].
The models for transmission in groups discussed in the last section rely heavily on assumptions about homogeneity of behavior and infectiousness among individuals. Analyses of data from
group level epidemiological studies must implicitly or explicitly make similar assumptions, and
are thus subject to many of the same biases discussed for individual level analyses. The addition of
possible heterogeneity due to factors operating at the group level makes the interpretation of such
data particularly difficult. Biases arising from ecological analyses of group level data have been
widely discussed in the epidemiological and statistical literature [15, 16, 53]. As mentioned in the
first section, perhaps the most serious problem with such analyses in that the necessary assumption of homogeneity of groups with respect to risk factors can lead to measures of effect which
do not agree with corresponding measures obtained from individual level data. Greenland and
Robbins [16] call this cross-level bias and provide a number of examples of how this might arise
in studies of chronic diseases. Often, the disagreement is related to the presence of unmeasured
confounding variables which vary in level across groups. As we have argued above, the potential
Epidemiology of STD Transmission
23
for the existence of such variables is very high in epidemiological studies of STDs. Thus, even
though ecological effects may themselves be of interest [1, 16], extreme caution must be used in
interpreting analyses of grouped data.
Regression models based on grouped outcomes suffer from additional sources of biases similar
to those discussed above for individual level outcomes. For example, unmeasured heterogeneity
among individuals within groups may cause group level outcomes to be more highly variable (i.e.
over-dispersed [34]) than the distribution assumed to hold in the analysis (e.g. normal or Poisson),
leading to underestimation of the variability of estimates and incorrect inferences about risk factor
effects. Becker [26] presents methods for introducing heterogeneity into models for infectious
disease transmission among groups. McCullagh and Nelder [34] provide simple methods for controlling overdispersion in regression without introducing models for the source of the underlying
heterogeneity.
Conclusions
Our goal in this article has been to show that epidemiological studies of STD transmission present
a number of unique challenges in design, analysis and interpretation, and that many of these arise
from the complex relationships between group and individual level factors. From the examples
presented, a clear understanding of these relationships is critical for valid inferences from both
individual and group level studies. In addition, studies on both levels are important for a thorough
understanding of factors controlling transmission.
Recent articles have raised several issues regarding current approaches to design and analysis
of epidemiological studies of STD transmission. Much of the discussion centers around the inadequacy of traditional approaches to risk assessment, the value of individual level studies, and the
need for new approaches which account for the complexities introduced by interactions between
individuals [1, 2, 3, 54]. Clearly, studies based on individuals and monogamous partnerships are
24
Epidemiology of STD Transmission
of limited value in identifying factors influencing disease spread on the group level. However,
because these studies exercise more control over exposure they are ideally suited for investigating
biological and behavioral factors for transmission risk, and for estimating parameters such as the
per-contact infectivity. Unfortunately, even when fairly accurate exposure information is available, estimates of risk factor effects and infectivity are subject to numerous sources of bias and
uncertainty [14, 36]. Studies of transmission in groups or networks offer unique opportunities to
investigate factors controlling disease spread which would be difficult or impossible to study on
an individual level. However, for reasons discussed above, measures of association derived from
group level data will in general not be representative of counterparts measured at the individual
level. Thus the ecological fallacy is as much of a concern for infectious as for chronic disease investigations. Further, measured associations in group level studies which do not properly control
for individual heterogeneity of exposure, infectiousness and susceptibility will be subject to substantial bias, even for factors without counterparts on the individual level. Thus, although group
level studies have great potential, they must be designed and interpreted with extreme caution.
Recent interest in studies of transmission networks promise to give new insights into STD spread,
but pose many unsolved methodological and analytic problems.
Research in epidemic models has traditionally played a key role in understanding of the spread
of infectious disease, and will continue to be important in guiding investigations of STDs. The
complexities of sexual behavior and new data on biological factors influencing transmission has
led to a large number of new approaches to constructing such models. However, the current level
of sophistication of mathematical and statistical models for STD transmission far outstrips the
availability of data necessary to validate their use. The need for reliable data on mixing patterns
and key parameters such as the infectivity is well recognized [55]. In addition, although parameters in many current models are based on information from epidemiological studies, the techniques
for ascertaining the impact of uncertainty in these estimates on model predictions are poorly developed. Recently developed statistical methods may assist in matching parameter values and
Epidemiology of STD Transmission
25
predictions with observed data from surveillance and epidemiological studies and deserve further
investigation [56].
Although new study designs, models and analysis techniques may improve our ability to draw
conclusions from epidemiological data, because of the inherent complexities in the transmission
process, studies will continue to be subject to a large number of sources of potential bias. For
this reason, we believe that the most important advances in STD epidemiology will follow from
increased understanding of biological and behavioral determinants of transmission. We agree with
the recommendations of Koopman et al [54] that studies of STD transmission should focus on
collecting detailed exposure and outcome information at the partnership level. In addition further
research into study designs and analysis techniques which provide better control of bias is needed.
Epidemiology of STD Transmission
26
References
[1]
Susser, M. The logic in ecological: I. the logic of analysis. Am J Public Health 1994;84:825829.
[2]
Koopman JS, Longini IM, Jacquez JA, Simon CP, Ostrow DG, Martin, WR and Woodcock
DM. Assessing risk factors for transmission of infection. Am J Epidemiol 1991;133:11991209.
[3]
Koopman JH, Longini, IM. The ecological effects of individual exposures and nonlinear
disease dynamics in populations. Am J Public Health 1994;84:836-842.
[4]
Halloran ME, Struchiner CJ. Study designs for dependent happenings. Epidemiology
1991;2:331-338.
[5]
Altmann M, Beeling CW, Willard K, Peterson D, Gatewood LC. Network analytic methods
for epidemiological risk assessment, Stat Med 1994;13:53-60.
[6]
Padian N, Marquis L, Francis DP, Anderson RE, Rutherford GW, O’Malley PM, Winkelstein
W. Male-to-female transmission of human immunodeficiency virus. JAMA 1987;258:788790.
[7]
Peterman TA, Stoneburner RL, Allen JR, Jaffe HW, Curran JW. Risk of HIV transmission
from heterosexual adults with transfusion-associated infections. JAMA 1988;259:55-63.
[8]
O’Brien TR, Busch MP, Donegan E, et al. Heterosexual transmission of human immunodeficiency virus type I from transfusion recipients to their sexual partners. J AIDS 1994;7:705710.
[9]
Kim MY, Lagakos SW. Estimating the infectivity of HIV from partner studies. Ann Epidemiol 1990;1:117-128.
Epidemiology of STD Transmission
27
[10] De Vincenzi, I. A longitudinal study of human immunodeficiency virus transmission by heterosexual partners. N Engl J Med 1994;331:341-346.
[11] Winkelstein W, Lyman DM, Padian N, et al. Sexual practices and risk of infection by the human immunodeficiency virus. The San Francisco Men’s Health Study. JAMA 1987;257:321325.
[12] Moss AR, Osmond DH, Bacchetti PR, Chermann J-C, Barre-Sinoussi F, Carlson J. Risk factors for AIDS and HIV seropositivity in homosexual men. Am J Epidemiol 1987;125:10351047.
[13] Hooper RR, Reynolds GH, Jones OG, et al. Cohort study of venereal disease. I: The risk of
gonorrhea transmission from infected women to men. Am J Epidemiol 1978;108:136-145.
[14] Jewell, NP, Shiboski SC. The design and analysis of partner studies of HIV transmission.
In: Ostrow DG, Kessler R, eds. Methodological Issues in AIDS Behavioral Research. New
York: Plenum Publishing Company, 1993:291-342.
[15] Morgenstern H. Uses of ecologic analysis in epidemiologic research. Am J Public Health
1982;72:1336-1344.
[16] Greenland S, Robins J. Ecologic studies - biases, misconceptions and counterexamples. Am
J Epidemiol 1994;139:747-760.
[17] Centers for Disease Control. HIV/AIDS Surveillance report. 1993; 5.
[18] Wasser S, Gwinn M, Fleming P. Urban-nonurban distribution of HIV infection in childbearing women in the United States. J AIDS 1993;6:1035-1042.
[19] Fife D, Mode C. AIDS prevalence by income group in Philadelphia. J AIDS 1992;5:11111115.
Epidemiology of STD Transmission
28
[20] Anderson RM, May RM. Infectious diseases of humans: dynamics and control. New York:
Oxford University Press, 1992.
[21] Hethcote HW, Yorke JA. Gonnorhea transmission dynamics and control. Lecture Notes in
Biomathematics 1984;56:1-105.
[22] Rothenberg RB. The geography of gonorrhea: empirical demonstration of core group transmission, Am J Epidemiol 1983;117:688-694.
[23] Holland P, Leinhardt S. A new method for detecting structure in sociometric data. Am J
Sociol Med 1970;76:492-513.
[24] Morris M. Data driven network models for the spread of infectious disease. In: Mollison D
ed. Epidemic models: their structure and relation to data. London: Cambridge University
Press, 1994.
[25] Heasman MA, Read DD. Theory and observation in family epidemics of the common cold,
Brit J Prev Soc Med 1961;15:12-16.
[26] Becker, NG. Analysis of infectious disease data. New York: Chapman and Hall, 1989.
[27] Haraldsdottir S, Gupta S, Anderson RM. Preliminary studies of sexual networks in a male
homosexual community in Iceland. J AIDS 1992;5:374-381.
[28] Schmitz S-FH, Castillo-Chavez C. Parameter estimation in nonclosed social networks related
to dynamics of sexually transmitted diseases. In: Kaplan E, Brandeau, M eds. Modeling the
AIDS Epidemic. New York: Raven Press, 1994:533-559.
[29] Klovdahl AS, Potterat JJ, Woodhouse DE, et al. Social networks and infectious disease: the
Colorado Springs Study. Soc Sci Med 1994;1:79-88.
Epidemiology of STD Transmission
29
[30] Jewell NP, Shiboski SC. Statistical analysis of HIV infectivity based on partner studies. Biometrics 1990;46:1133-1150.
[31] Shiboski SC, Jewell NP. Statistical analysis of the time dependence of HIV infectivity based
on partner study data. J Am Stat Assoc 1992;87:360-372.
[32] Dietz K, Schenzle D. Mathematical Models for Infectious Disease Statistics . In: Fienberg S ed. Celebration of Statistics: the ISI centenary volume. New York: Springer Verlag
1985;167-204.
[33] Brookmeyer R and Gail MH. AIDS epidemiology: a quantitative approach. New York: Oxford University Press, 1993.
[34] McCullagh P, Nelder, JA. Generalized linear models, 2nd ed. New York: Chapman and Hall,
1989.
[35] Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New York: John
Wiley & Sons, 1980.
[36] Shiboski S. Statistical interpretation of data from partner studies of heterosexual HIV transmission. In: Kaplan E, Brandeau, M eds. Modeling the AIDS Epidemic. New York: Raven
Press, 1994:387-422.
[37] Longini IM, Koopman JS, Haber M, Costonis GA. Statistical inference for infectious diseases: risk-specific household and community transmission parameters. Am J Epidemiol
1988;128:845-859.
[38] Magder L, Brookmeyer R. Analysis of infectious disease data from partner studies with
unknown source of infection. Biometrics 1993;49, 1110-1116.
Epidemiology of STD Transmission
30
[39] Manton KG, Singer BM, Woodbury MA. Some issues in the quantitative characterization of
heterogeneous populations. In: Trussel J, Tilton J eds. Demographic Applications of Event
History Analysis. New York: Oxford University Press, 1991.
[40] Aalen OO. Heterogeneity in survival analysis, Stat Med 1988;7:1121-1137.
[41] Omori Y, Johnson RA. The influence of random effects on the unconditional hazard rate and
survival functions. Biometrika 1993;80:910-914.
[42] Wiley JA, Herschkorn SJ, Padian NS. Heterogeneity in the probability of HIV transmission
per sexual contact: the case of male-to-female transmission in penile-vaginal intercourse,
Stat Med 1989;8:93-102.
[43] Kaplan EH. Modeling HIV infectivity: must sex acts be counted? J AIDS 1990;3:55-61.
[44] Anderson RM, May RM. Epidemiological parameters of HIV transmission. Nature
1988;333:514-519.
[45] Blower SM. Re: ”A method for estimating HIV transmission rates among female sex partners
of male intravenous drug users” (letter to the editor). Am J Epidemiol 1993;136:1532-153.
[46] Padian N, Shiboski S, Jewell NP. The association between number of exposures and
rates of heterosexual transmission of Human Immunodeficiency Virus (HIV). J Infect Dis
1990;161:883-887.
[47] Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 1984;71:431-444.
[48] Grant RM, Wiley JA, Winkelstein W. Infectivity of the Human Immunodeficiency Virus:
estimates from a prospective study of homosexual men. J Infect Dis 1987;156:189-193.
Epidemiology of STD Transmission
31
[49] Wong GY, Mason, WM. The hierarchical logistic regression model for multilevel analysis. J
Am Stat Assoc 1985;80:513-524.
[50] Von Korf M, Koepsell T, Curry S, and Diehr P. Multi-level analysis in epidemiologic research
on health behaviors and outcomes. Am J Epidemiol 1992;135:1077-1082.
[51] Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika
1986;73:13-22.
[52] Neuhaus, J. Statistical methods for longitudinal and clustered designs with binary outcomes.
Stat Meth Med Res 1992;1:249-273.
[53] Kronmal R. Spurious correlation and the fallacy of the ratio standar revisited. J Roy Stat Soc
A 1993;156:379-392.
[54] Koopman J, Simon CP, Jaquez JA. Assessing effects of vaccines on contagiousness and risk
factors for transmission. In Modeling the AIDS Epidemic (eds. Kaplan E, Brandeau, M),
New York: Raven Press, 1994:439-460.
[55] Mollison D, Isham V, Grenfell B. Epidemics: models and data (with discussion). J Roy Stat
Soc A 1994;157:115-149.
[56] Givens, G and Hughes JP. A method for determining uncertainty of predictions from deterministic epidemic models. In: Proceedings of the 1995 SCS Western Multiconference on
Health Science, Physiology and Pharmacological Simulation Studies, Las Vegas, 1995.
[57] Allard, R. A family of mathematical models to describe the risk of infection by a sexually
transmitted agent. Epidemiology 1990;1:30-33.
[58] Longini IM, Clark WS, Haber M, Horsburgh R, The stages of HIV infection: waiting times
and infection transmission probabilities”, In: C Castillo-Chavez, ed. Mathematical and Sta-
Epidemiology of STD Transmission
32
tistical Approaches to AIDS Epidemiology. New York: Springer-Verlag (Lecture Notes in
Biomathematics); 1989:83:111-137.
[59] Samuel MC, Mohr MS, Speed TP, Winkelstein W. Infectivity of HIV by anal and oral intercourse among homosexual men. In: Kaplan E, Brandeau, M eds. Modeling the AIDS
Epidemic. New York: Raven Press, 1994.
[60] Neuhaus J, Jewell, NP. A geometric approach to assess bias due to omitted covariates in
generalized linear models. Biometrika 1993;80:807-815.
Epidemiology of STD Transmission
33
Appendix
This appendix provides a brief introduction to several simple models of STD transmission within
partnerships, along with statistical considerations which arise in their estimation and interpretation.
Transmission models
For a susceptible individual with a number of potentially infectious partners, the basic data on
transmission consist of a binary indicator Y of infection status of the susceptible following a known
exposure to each partner. Complete exposure data consist of the total number of contacts with
each partner, the time each contact occurred and the level of susceptibility of the susceptible and
infectiousness of the partner at the time of each contact. For simplicity, assume that partners are
chosen independently and without regard to their infectiousness, and that partners are selected
and successive contacts made independently (i.e. homogeneous mixing). In addition, assume that
partners are either infectious or non-infectious throughout the period where contacts are made (i.e.
no new infections take place among partners). For a susceptible reporting contact with m partners,
let (X1 ; : : : ; Xm ) denote binary infection status indicators of the partners, (K1 ; : : : ; Km ) be numbers
of contacts with each, and X j λi j denote the (random) infection probability associated with the
ith contact with the jth partner. (Thus X j λi j
0 when X j = 0.)
Under these assumptions, if the
exposure history of a given susceptible were completely observed, the conditional probability of a
susceptible remaining uninfected can be written
Pr (Y
j
= 0 (x j ; k j ) ; i = 1; : : : ;k j ;
m
kj
j = 1; : : : ;m ) = ∏ ∏ (1 , x j λi j ) :
j=1 i=1
(A-1)
(Here, lower case letters correspond to observed values of random variables.) Since partnerspecific infection probabilities associated with each contact are virtually impossible to observe,
it is usually assumed that all infectives are equally infectious as long as they remain infected and
Epidemiology of STD Transmission
34
that susceptibility is constant throughout the exposure period. These assumptions imply that the
infection probability associated with each contact is a constant λ. With this modification, the
above expression simplifies to:
Pr(Y
=0
j x1
m
xm ; k1 ; : : : ;km ) = ∏ (1 , x j λ)k j
;::: ;
j =1
:
In practice, the infection status of partners is often unknown; the appropriate model for this situation is obtained by taking expectations of the above expression with respect to the joint distribution
of the indicators x j ( j = 1; : : : ; m). Assume that the binary indicator X j of prevalence status for the
jth parter follows a Bernoulli distribution with parameter ρ j . Then, with the independence assumptions made above,
Pr(Y
=0
j k1
m
;::: ;
n
h
km ) = ∏ 1 , ρ j 1 , (1 , λ)k j
io
:
j=1
(A-2)
As it stands, the above model still retains partnership-level exposure characteristics in the form of
the prevalence probabilities (ρ j ) and contact counts. Typically these are assumed to be identical,
i.e. ρ j
ρ.
This modification of (A-2) is a special case of the model described by Allard [57],
who provides further insight into its interpretation and application. In the case where numbers of
contacts are also not observed, a further averaging over the distribution of contacts (assumed to be
identically distributed Poisson random variables with mean µ) gives:
Pr(Y
=0
j M = m ) = f1 , ρ[1 , exp(,µλ)]gm
:
(Note that other distributions for contact counts such as the negative binomial distribution can be
used and yield similar results.) Setting 1 , exp(,µλ) = θ gives
Pr(Y
=0
j M = m ) = (1 , ρθ)m
;
(A-3)
where θ represents the per-partner probability of infection given an average degree of exposure and
constant per contact infectivity λ. This model has been applied to estimation of the per-partner
Epidemiology of STD Transmission
35
level infectivity of HIV by Grant et al [48]. Finally, in the case where even the number of partners
is unknown, a final averaging over the distribution of the numbers of partners (assumed to well
approximated by the Poisson distribution with parameter ν) gives ([33, 32]):
νm exp(,ν)
m
(1 , ρθ)
m!
m=0
∞
Pr(Y
= 0) =
∑
,νρθ)
= exp(
:
(A-4)
This model represents the situation where no partnership-level exposure information is available.
As before, the Poisson assumption is chosen for convenience, and could be replaced by a number
of other discrete distributions. In particular, since the development above only makes sense for
m > 1, a ”zero-truncated” discrete distribution may be more appropriate in situations where the
Poisson approximation above does not hold.
Models for single susceptible/infective pairs are obtained by setting m = 1 in the above equations. For example, for a single known infective, ρ = 1 in (A-3), which leads to the familiar
model:
Pr(Y
=0
j k) = (1 , λ)k
;
(A-5)
the complementary probability to equation (1). Under these same conditions (A-4) reduces to
exp(,µλ).
Note that all of the above models above can be generalized to relax some of the simplifying
assumptions regarding infectivity and exposure. For example, the constant per-contact infectivity assumption can be modified to accommodate several sources of variation: in time following
infection of the infective partner [31]; with stage of disease of the infective [58]; and across partnerships ([42], [30]). In addition, intermittent exposure patterns and multiple types of infectious
contacts can also be modeled [59].
Regression models
The general form of models (A-3) through (A-5) above suggests the proportional hazards model
familiar from the analysis of survival data, with contacts forming the ”time” scale and the infec-
Epidemiology of STD Transmission
36
tivity taking the place of the hazard function [31]. For example, the proportional hazards form of
model (A-3) can be written
log
,
, log S
,
mi zi ; β;ρ;θ
;
, log (1 , ρθ)] + log mi + βzi
= log [
;
(A-6)
where S mi zi ; β;ρ;θ represents the probability of escaping infection on the left-hand side of (A;
3). The logarithm of the number of partners mi is called an offset term, and has unit regression
coefficient. The parameters ρ and θ cannot be separately estimated. However, given an estimate
of the average prevalence ρ of infection among potential infectives, an estimate of the per-partner
infectivity is possible [48]. For cross-sectional transmission data, this is a standard generalized
linear model [34] for the binary outcome Y , and can be fitted using a wide variety of existing
software packages. As in conventional proportional hazards models, the regression coefficient β
can be interpreted as the logarithm of the ”relative hazard” for infection. The intercept term is
the complementary log-log transformation of the baseline escape probability, and for the models
above can be separated into exposure information and transmission parameters. When exposure
information is unavailable, analyses can be based on (A-4). In this case, the infectivity cannot be
estimated separately. Similarly, omitting exposure information from an analysis is equivalent to
assuming that all partnerships share the same (average) level of exposure.
To examine the effects on regression coefficients of ignoring heterogeneity of infectivity, assume that data are generated under the random infectivity model
exp (,ρθW m)e
βz
;
where W is an absolutely continuous positive random variable with density g(w) and unit mean.
Here we have used the fact that the escape probability S in (A-6) is well approximated by exp (,ρθm)e
βz
([36],[33]). Because the factors controlling the heterogeneity are not observed, the data then follow the model
S(m z; β;ρ;θ ) =
;
Z
∞
0
exp (,ρθwm)e g(w) dw :
βz
Epidemiology of STD Transmission
37
If model (A-6) is fit rather than the above model (which still depends on properties of the distribution controlling the heterogeneity), standard techniques ([36]) can be used to compute the
expected bias in the estimated regression coefficient. If the proportional hazards form in (A-6) is
assumed to hold, the estimated coefficient will be given by β = ,d log [, log S(m z; β;ρ;θ )] dz.
;
The estimated coefficient will be attenuated from the true value β, with the degree of attenuation
increasing with increasing infectivity and exposure. The attenuation can be explicitly calculated if
the mixing variable W is assumed to follow a gamma distribution (see Figure 5). In cases where
factors controlling the heterogeneity depend on variables included in the fitted model (e.g. z or
m), the resulting bias may cause protective risk factors to appear harmful, or vice-versa [36]. The
bias in the estimated hazard function (infectivity) and transmission probability can be investigated
similarly [40].
To investigate the interpretation of logistic regression in the transmission context, assume that
the following logistic regression model is fit to cross-sectional transmission data generated from
model (A-6):
1 , S(z) =
exp(α + β z)
1 + exp(α + β z)
;
where α and β are the regression coefficients from the fitted logistic model, and S refers to the
escape probability in (A-6). The following results hold whether or not the number of partners m is
included in the model: The relationship between the logistic coefficient β and the coefficient in
the true model is given by the expression
1 , S(z + 1)
β = log
S(z + 1)
, log 1 ,S(Sz)(z)
:
Using methods described in Neuhaus and Jewell [60], a first order Taylor series approximation
about β = 0 gives the following relationship between the two coefficients:
β
β
,m log(1 , ρθ) 1 , (1 , ρθ)m
:
The term in brackets approaches one in the limit as θ approaches zero (see Figure 4). Thus, the
estimated coefficient for the logistic model will be systematically larger than the coefficient β,
38
Epidemiology of STD Transmission
with the difference increasing with increasing values of the product ρθ and with increasing values
of m. (A similar calculation can be made for the constant infectivity model (A-5).)