Download Manus 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Disease wikipedia , lookup

Sociality and disease transmission wikipedia , lookup

Germ theory of disease wikipedia , lookup

Globalization and disease wikipedia , lookup

Transcript
Revision Jenny 20100310
When is a sampled network a good enough descriptor for epidemic
predictions?
Jenny Lennartsson1, Annie Jonsson1, Nina Håkansson1 and Uno Wennergren2,*
1
Systems Biology Research Centre, Skövde University, Box 408, 541 28, Skövde, Sweden
2
IFM Theory and Modeling, Linköping University, 581 83 Linköping, Sweden
*Corresponding author
ABSTRACT
The lack of complete data sets can be a limitation when using network analysis. In this paper, we
analyse how the link density affect the property of a network. The methodology is general yet it is
applied onto the property spread of disease. We used networks with weighted links to run scenarios
with the assumptions of distance dependent probabilities of transmission, and compared it with
scenarios with randomly drawn probabilities of transmission, i. e. non-distance dependence. In these
scenarios, we also included link sampling procedures equivalent to the two transmission scenarios,
one that samples by distance dependence and one at random. We could then study the effect of link
density in sampled networks on spread of disease under different link sampling procedures and
transmission scenarios. We conclude that under the assumption of distance dependence for both
link sampling procedure and disease transmission, predictions about the extent of an epidemic can
be drawn from a network even with low link density yet the density is higher than in most empirical
studies. In reality both sample procedures and disease transmissions don’t perfectly fit a distance
dependency. Our results show how this enforces a yet higher level of link density in sampled
networks to achieve reasonable predictions.
Keywords: network, missing links, link density, infectious diseases, disease transmission
1
Revision Jenny 20100310
1. INTRODUCTION
The use and interest of network analysis has been growing in many different scientific areas during
the last decade, for example in biology, epidemiology, economy and social science. A network
consists of interacting units, here denoted nodes, and these units connect to each other through
relations we term links. Nodes could for example be individual animals or persons, animal holdings,
species habitats, or schools and links could be visits, animal transports or links between web pages.
The pattern with links between the nodes gives rise to networks with different contact structures.
These structures depend on both the amount of links and how these are organized. Here we classify
networks into three categories. First, the complete network (Wasserman & Faust 1994) (figure 1a)
where all theoretical possible links are included, second the real world network (figure 1b) where all
realizations of links during a specified time period are included, and thirdly the sampled network
(figure 1c) where all the measured links during the sample period are included. In addition to
estimate link structure of networks, one can measure probabilities of occurrence or disease
transmission per time unity for the individual links. The network is then called a weighted network
(Barrat et al. 2004). In contrast to classical epidemiological models such as SI/SIR/SEIR models,
network models relax the assumption of homogeneous mixing (mass-action type of assumptions)
since not all nodes are linked to all other nodes or the links are weighted for example depending on
distance. Network analysis on the other hand requires huge amounts of data, which is possible to
handle by the computational power today.
The sampled network can be estimated through sample surveys, literature studies, contact tracing,
or by databases such as national databases for animal movements. The estimation is cumbersome
and one may expect that estimated networks most probably will lack some links and even some of
the nodes (Clauset et al. 2008). The real world network is the network one would like to find but one
has to represent it with the sampled network. Hence, there is a need to evaluate the effect of
missing links to assess or possibly reduce errors when networks are applied. The focus in this study is
on how the number of links in sampled networks affects predictions on epidemical size. We simulate
spread of disease in networks with different link densities and different scenarios that mimic
sampling procedures.
A real world weighted contact network consists of all contacts with probabilities during a specific
time period. Another time period is yet another event with its specific contacts and may very well
result in a different network with another set of links. The question then arises whether the
properties of the two real networks will differ or not? Furthermore, will the properties of the
sampled networks for these two events differ? May a property, as spread of disease, of the first
sampled network be valid as an approximation of the spread of disease during the second time
period? It is obvious that a too short period will result in a bad approximation of the two events. A
measured period with very large time frame on the other hand will result in a nearly complete
network and will be an almost perfect approximation. Beyond reality is the infinite sampling
procedure that results in a complete network where all links exist and has its specified probabilities.
Somewhere in between the short time period and the excessive sampling, there is a sampled
network with enough links and sufficiently estimated probabilities to generate an adequate
approximation.
In the present study, we will focus on networks with weighted links since one may expect higher
probability of contact for some links compared to others. We will run scenarios with the assumptions
of distance dependent probabilities and compare with scenarios with randomly drawn probabilities.
The distance dependence is tested for disease transmission and link sampling separately and in
combination. In a worst-case scenario, there is a mismatch, with distance dependent transmission
probabilities yet random link sampling procedure. We thus combine our study on the necessary
amount of measured links with how such an amount depends on the mismatch between real world
2
Revision Jenny 20100310
network and sampling procedure. We will refer to networks in veterinary medicine, with spread of
infectious diseases among animal holdings, since network analysis is an increasingly applied tool in
this area (Barthélemy et al. 2005; Ortiz-Pelaez et al. 2006).
The potential use of network analysis and modeling in veterinary medicine is to predict spread of
disease and epidemic size and to examine effects of different intervention methods such as
vaccination, stand still and stamping out. For example, Corner et al (2003) studied the transmission
of Mycobacterium bovis among a network of wild brushtail possums and the social contacts between
them. In another study, Kiss et al. (2006) analysed networks of sheep movements within Great
Britain. They showed that during an epidemic it is most efficient to concentrate control interventions
to highly connected nodes. Despite the increased use of network analysis in epidemiology, there are
shortcomings in the analysis of missing links as well as how to represent a structure given a single
sample. Collected network data is often incomplete (Christley et al. 2005; Ortiz-Pelaez et al. 2006;
Clauset et al. 2008; Heath et al. 2008; Eames et al. 2009; Guimerà & Sales-Pardo 2009). It could for
example be missing animal movements or unknown locations of herds in databases. Accordingly
Perkins et al. (2009) demonstrated that network structures are only approximations of contacts and
that it is almost impossible to identify all contacts when collecting data.
Depending on the structure of the network, properties as spread of disease can vary (Newman et al.
2001; Keeling 2005; Shirley & Rushton 2005; Kiss et al. 2006). Since disease transmission depends on
the networks structure, results based on networks with missing links may be misleading. In practice,
this means that there will be problems with missing data resulting in lost links in the representation
of a network. These lost links is the result of errors during the sampling period or a consequence of
the finite length of the sampling period. Guimerà and Sales-Pardo (2009) introduce a method to use
a single measure of a network, a sampled network, to generate a more correct representation, i. e.
an approximation of the real world network. Their method focuses on a reduced network because of
errors during sampling. By measuring and classifying the structure of the sampled network, they
could identify either missing or spurious links. In our study, the focus is more general and handles the
relation between link density and estimates of properties as spread of disease and specific network
measures. Yet our result may combine with their study by stressing when to expect missing links.
During a survey to achieve a sampled network, it is important to consider the time window of the
sampling period. For example, Kao et al. (2007) studied the relation between UK livestock movement
network and disease dynamics over different time-scales. They simulated transmission of two
diseases, foot-and-mouth disease and scrapie, which have very different time-scales regarding
incubation time as well as infectious period. They concluded that for network analysis to be a
valuable tool in epidemiological modeling, it is important to consider the time-scale as well as the
potentially infectious contacts. In another study, Robinson et al. (2007) investigated animal
movement networks evolving over time in Great Britain and their findings point out the importance
of temporal scale. With increased time-period, the networks became more and more connected and
in that way fueled the disease transmission. They also found a seasonal pattern with a peak in spring
and August. Thus depending on the question to be examined or when comparing different networks
it is important to choose the appropriate temporal scale (Vernon & Keeling 2009). Our study will
relate this time window problem with the difficulty of achieving enough links during a chosen period.
2. METHODS
2.1 The model
Figure 2 illustrates the network generation and simulation process. The first step is the placement of
animal holdings in the landscape and the second is the link forming processes. Link forming is here
related to empirical sampling. The third step is the disease transmission with simulation runs. The
3
Revision Jenny 20100310
network generation algorithm, simulations and calculations were implemented and run in MATLAB
(version R2009a).
2.1.1 Landscape of animal holdings
The number of animal holdings was arbitrarily set to 500 and these were randomly placed into a
landscape of size 34 x 34. See figure 2. The holding density was chosen according to farm density in
southern Sweden. Each animal holding was considered as a node, which implies that each animal was
not individually modeled.
2.1.2 Link sampling process and link density
Animal holdings were connected by two different processes, either by distance dependence (eq. 1
and Håkansson et al. 2010; Lindström et al. 2008) between the holdings, Dl, or completely at random,
Rl.
P(lij )  Ke
 dij  b


 a 
(1)
The probability that a link between node i and j is formed is P(lij) and di,j is the Euclidian distance
between holding i and j. Parameters a and b are set by the parameters: kurtosis, к, and standard
deviation, σ (see Lindström et al. 2008). The constant K normalized the distribution such that the
probabilities of all possible links summed to one. For distance dependent link sampling, Dl, we use a
kurtosis value of 10/3, meaning an exponential distribution, and a standard deviation of one. The
links were randomly sampled from this probability distribution (eq. 1), one at time until the desired
link density was achieved. Since stochasticity was included in this method links between holdings
that were more distant from each other can also be sampled, even if a low link density was used. For
random distribution of links, Rl, the links were drawn one at time, with the same probability for all
links to be sampled. To avoid edge effects, periodic boundaries were used (Lindström et al. 2008)
along the edges of the 34x34 landscape.
Link density is the actual connections in the network as a proportion of all theoretical possible links in
the network (Wasserman & Faust 1994, eq. 2).
Cn 
nn  1
2
(2)
Here n is the number of animal holdings in the network. In the simulations we varied link density
between 0.001 to 1.0. A link density of 1.0 means a complete network (figure 1a) where all
theoretical connections are included. Because the link density of the networks was set when
generating the networks, also the mean link degree was given from start, Table 1.
2.1.3 Disease transmission
As for the link sampling process we assume two different processes for the transmission probabilities
of a disease, one distance dependent, Dt, and one completely random, Rt. The two processes could
represent two diseases with different behaviour. Transmission rates were determined with the same
processes as for the link sampling, section 2.1.2. Hence, Dt is set by equation 1 and the same
parameter values as Dl and transmission probabilities of Rt arbitrary set to 0.01.
2.1.4 Model scenarios
The combination of two link sampling processes and two disease transmission processes leads to
four different scenarios: DlDt, DlRt, RlDt and RlRt (figure 2). The RlRt scenario is an example of mass
4
Revision Jenny 20100310
action mixing model (Keeling 2005) that assumes that all links have the same probability of
transmitting the disease combined with a matching random link sampling procedure. Matching in the
sense of process yet not necessarily matching in occurrence of events, i. e two different realizations
of the randomization from the same process. The DlDt scenario, with linking- and transmission
probabilities for each link, is the distance dependent scenario where there is match between the
process of probability of measure and probability of transmission. The remaining two scenarios, DlRt
and RlDt, are combinations where the link sampling procedure does not match the actual process that
generates probability of transmission. For example, the RlDt is a scenario where transmission is
distance dependent yet the link sampling procedure is random and hence is expected to be a non
effective procedure. The link sampling will be random and regardless of distance, some of the first
connections detected will have low probabilities while some of the connections with high probability
of transmission will not be detected within the sampling time frame.
2.1.5 Simulation model
To simulate disease transmission in the sampled networks, we used a general and very simple
epidemiological model, where the holdings could be in one of the two phases: susceptible (S) or
infectious (I) (eq. 3).
dS
  SI
dt
dI
 SI
dt
(3)
Parameter λ is the probability for disease transmission from an infective holding through a link to a
susceptible. The variables S(t) and I(t) are the number of holdings in respectively phase, susceptible
or infected, at time, t. We did not incorporate incubation time and hence animal holdings in contact
with an infected holding could infect other holdings already during next time step. Since a recovery
phase was neither included in the model, an infected holding could therefore never turn into the
susceptible phase again. That is, an infected holding remained in the infectious phase during the
remaining simulation time. Undirected links were used and diseases could thus be transmitted in
both directions along the links. Disease transmission could only occur between animal holdings that
were connected by a link. Note that the probability of a link in the sampled network is according to Rl
or Dl while the probability of transmission is according to Rt or Dt.
2.2 Simulation runs
For each link density, table 1, simulations were run separately for all four scenarios, figure 2. 100
different networks were generated per link sampling process, Dl and Rl, and link density (figure 2).
For each link density and any of the two link sampling procedures 10 replicates of randomly
distributed holdings were created and for each of these landscapes of holdings 10 replicates of
networks were made by using one of the two sampling processes, see section 2.1.2. For each of
these sampled networks 10 simulations were performed per transmission process, Dt and Rt, by
initiating the spread from a randomly picked animal holding. Totally, there were 1000 simulations
run per scenario and link density. Simulation time was set to 300 time steps and numbers of infected
animal holdings were calculated each time step.
2.3 Analysis
To compare the different scenarios and prediction power depending on link density the extent of the
spread of disease is analysed as mean number of infected holdings per time step as time until a
specified proportion of holdings are infected (here 10%, 50% and 90%). To characterize the networks
5
Revision Jenny 20100310
and to see how a change in link density affects the structure and function of the networks, we used
network measures; degree assortativity, clustering coefficient and fragmentation index.
Degree assortativity (Newman 2002) measures if nodes with equal degree are connected to each
other or not. Values range from minus one to one. A value near one indicates that a higher
proportion of holdings with equal degree are linked to each other. Assortativity near minus one
means that holdings with different degree have a higher probability of being connected. A value of
zero means that the connections between holdings are not dependent of node degree.
Clustering coefficient (Watts & Strogatz 1998) for a holding is the number of links that exists between
neighbors to that holding, divided by all possible links that could exist between the neighbors. Here
we have used the average clustering coefficient for the whole network. This measure ranges
between zero and one where one indicates that the network is highly clustered.
Fragmentation index (Borgatti 2003; Webb 2005) measures to what extent the network is
disconnected and it ranges from zero to one. Low value indicates that the network is highly
connected and a high value means that the networks are very fragmented.
3. RESULTS
The results shows, that for the scenario with distance dependent link sampling and distance
dependent disease transmission (DlDt), a link density of around 0.04 gives the same number of
infected animal holdings as for networks with higher proportion of connections (figure 3). Under the
assumptions of our model, the results indicate that such low proportions of links in the network
could be enough to examine the extent of the disease transmission. The scenario with random link
sampling with a distance dependent disease transmission (RlDt) requires a higher link density until a
limit where more links have no influence. For the scenarios with random transmission (DlRt and RlRt)
the number of infected animal holdings increases with increased link density and no limit was
reached.
In figure 4, the median values of the 1000 replicates of the simulations of the DlDt scenario is plotted
with the first and third quartile on each side on the median line. For a link density as low as 0.001
(figure 4a) the median is one for the whole time period because only in some cases the disease is
transmitted to other holdings. When link density increases to 0.01 (figure 4b), the difference
between the first and third quartiles also increases. The difference is small in the beginning of the
simulation time when few holdings are infected but increase during the time period. If link density
increases furthermore, up to 0.02 (figure 4c) and 0.03 (figure 4d), the difference between the first
and third quartiles decreases. For link densities from 0.04 and higher (figures 4e and 4f), the shape of
the curves and the distances between them is almost the same. Also here the difference between
the quartiles is small in the beginning of the simulation time and then increases. In the last part of
the simulation time, when almost all holdings are infected, the difference decrease to almost no
difference between the first and third quartiles.
The time until a given proportion of the holdings were infected differs depending on link sampling
scenario as well as on disease transmission scenario (figure 5). Random disease transmission scenario
(DlRt and RlRt) require almost the same time to reach a given proportion of infected animal holdings.
In addition, they also have a much faster disease spread than with the distance dependent disease
transmission scenarios (DlDt and RlDt) (figure 3 and 5). Scenario RlDt, i. e. random link sampling and
distance dependent transmission, has the slowest transmission rate of all scenarios.
6
Revision Jenny 20100310
The number of infected holdings at a given link density are compared between the four scenarios
(figure 6). At low link densities, all methods gave different results. When link density increases, the
two distance dependent disease transmission scenarios (DlDt and RlDt) approach each other. As well
as the two random disease transmission scenarios (DlRt and RlRt) did. The higher link density the
more similar are the results between the different distance dependent disease transmission
scenarios. As mentioned before, using random transmission gives a much faster disease spread than
using distance dependent transmission.
The average assortativity for the networks depends on the link creation method that is used (figure
7a). Distance dependent link creation ends up with higher values of assortativity compare to random
link creation. The networks made by random linking have as expected assortativity around zero for
all link densities.
The average clustering coefficient for all networks increases with increasing link density (figure 7b).
The clustering coefficients for the networks generated by distance dependent link sampling are
higher than the values for the networks made by random link creation. When link density increases,
the random link sampling approaches the distance dependent link sampling. The networks generated
by the random link sampling give clustering coefficients that are equal to the link density in question.
Of course, the clustering coefficients for all networks are one when the link density is one, and all
animal holdings are connected to each other.
The fragmentation index for the networks shows that for both link sampling scenarios the index is
close to one when link density is 0.001 (table 2). When link density increases to 0.01, the
fragmentation index dramatically decreases. With both link sampling scenarios, the index has
reached zero when link density is 0.03 or higher.
4. DISCUSSION
Our aim with this study was to investigate the effects of using a network with missing links for
predictions of disease spread. We investigated if it was possible to predict anything about the size of
the spread of disease with only some proportion of all theoretical possible links realized. Our results
showed that a link density of 0.04 gave the same number of infected animal holdings as a higher link
density when simulating spread of disease in a scenario where the probability for identifying a link, as
well as for disease transmission, is distance dependent (DlDt-scenario). For distance dependent
disease transmission in a random link sampled network (RlDt), the numbers of infected animal
holdings converge to the same numbers as of DlDt, as expected. However, with random link sampling
many more links are needed to reach the rate of DlDt. With random link sampling, there are a larger
number of less probable (longer distance) links included than with distance dependent link sampling.
For random disease transmission (scenario DlRt and RlRt), the number of infected holdings increased
with increased link density. This implies a much higher link density to reach relevant approximations
of spread of disease. Below we discuss implications of our results in relation to sampling procedures
and the effects of using networks with missing links.
Empirical data have showed that only a small fraction of all connections in a network actually takes
place (Webb 2006; Eames et al. 2009). When sampling data it is almost impossible to trace all
connections between nodes, even if it is a small fraction, and this often leads to incomplete data sets.
Therefore, it is important to consider link density when working with network modeling. Comparing
simulations in a scenario DlDt network with a link density of 0.04 or higher, to simulations in a
complete network, both would result in the same number of infected holdings. This implies that a
link density of 0.04 is sufficient and sampling beyond that is unnecessary. Another important issue to
consider when using empirical sampled networks is the time window for the sampling period. Using
7
Revision Jenny 20100310
“wrong” time window can lead to missing links or unnecessary sampling. The period used affects
how complete the network will be, a longer time window could result in a more connected network
than one based on a very short time window. Different studies use different length on the time
windows. For example, Kiss et al. (2006) used a 4-week time scale in their study of sheep movements
in Great Britain. In another study of animal transports, Robinson and Christley (2007) used periods of
10 weeks. Such studies may very well describe the network during actual time period yet it is not
clear whether it can be used for predictions.
Combining our results with the time frame of a study emphasize the problem and focuses on number
of actually measured links. A shorter period will by definition result in fewer measured links. The
question is whether such a sample is enough to test for spread of the disease during a longer period
than the measured one. Obviously, in the DlDt case a link density of 0.04 is a guarantee for correct
measure of the spread of disease given any time period. If the link density is below 0.02 our results
indicate that measure of spread of disease seems erroneous even to short time periods. While a
density of 0.03 may hold until a time period of 50-100 time steps, that is when the 0.03 curve diverge
from the ones with higher density (see fig 3). These conclusions are only true for a perfect link
sampling procedure as the DlDt scenario. On the other hand if the sample procedure is not as perfect
as in our DlDt scenario we have to include even more links. Our RlDt scenario is the extreme at the
other end, i. e. a complete random link sampling procedure that is not at all related to the probability
of contacts. In such a case almost all links has to be sampled even for estimations during short time
periods. In real life our link sampling procedures are somewhere in between these two extremes.
The better link sampling procedure the fewer links yet below 0.02 is not recommendable for any
time period. Of course, this conclusion does depend on our setup: how we model the spread of
disease, what distance dependence we have used (eq. 1), and the spatial configuration of holdings.
Still our results show that to achieve reliable measures one may have to include a higher link density
than expected. Furthermore, our methodology can be applied to any specific system to assess the
necessary level of link density. The link density can be achieved by a long single measure or by
repetitive sampling procedures.
Measured link densities in empirical investigated networks is often only about some per million or
just a few percent of the total number of theoretical connections in the networks. An example is
Ortiz-Pelaez et al. (2006) who have studied animal movements during the initial phase of the
epidemic of foot-and-mouth disease in Great Britain in 2001. Their network has an average link
degree of 1.22 and that corresponds to a link density as low as about 0.0019. Also in the Swedish
animal transport network a low link density is measured (Nöremark et al, submitted). It is important
to remember that the measured contacts in an empirical network only are subsets of realizations of
all possible contacts. That means that the number of links in these networks is a subset of the ones
that have been realized during the time for data collection. Actually, there are probabilities for a
huge number of additional connections but these were not even realized during the current time
period. For example when a link density of 0.01 is used in our study, all theoretical connections are
possible but only 1% of all of them are realized and these could differ between the replicates. Also
when modeling virtual networks it is important to consider the link density. Kiss et al. (2005) have
used virtual networks with different mean degree in their epidemiological modeling. They have
varied the mean degree between 5 and 20 to see how it implies the final epidemic size. These values
corresponds to link density values from 0.0025 to 0.01, that is, rather low densities. Despite the
study of Kiss et al. (2005), network studies focusing on link densities and missing links are rare and
more studies are needed in this field.
That a network with random disease transmission will spread diseases faster than spatially clustered
network is well known (Watts & Strogatz 1998; Kiss et al. 2005). Such random networks are
generated in our study when we apply random transmission probabilities (RlRt and DlRt). For the DlRt
scenario with random transmissions and distance dependent link sampling the transmission rate is
8
Revision Jenny 20100310
slightly slower at any given link density. The link sampling procedure of DlRt assumes density
dependent contact yet the contact structure is random. In such a case, the rate is higher than the
rate of the sampled network since the link sampling procedure will miss some important long
distance links. Hence even higher link density is necessary to sample to reach the right levels rate of
the spread of disease. Lindström et al (2009) showed that the spatial kernel explaining the distance
dependence of contacts between holdings due to transport is a mix of distance independence, mass
action mixing, and distance dependence. The mass action mixing is represented by Rt in our setup
and once again the reality is somewhere in between these two extremes, the DlRt and the DlDt
scenario. Consequently our study of the RlRt and DlRt scenarios implies that the found 0.03 and 0.04
link density levels is expected to be too low since the mass action component in contact structure
creates even higher demands of link density.
That random networks have a low level of clustering compared to other kind of networks (for
example small-world) is recognized (Watts & Strogatz 1998; Shirley & Rushton 2005). We have
measured the clustering coefficient of our networks and as expected, the clustering coefficient was
lower in the random networks than in the networks generated by distance dependent link sampling.
How fragmented a network is influence how well diseases could spread between the holdings.
Fragmentation index measures to what extent the networks are disconnected. Here, only the
networks with link density below 0.03 resulted in disconnected networks. Link densities of 0.03 or
higher give rise to a connected graph and it is then possible for a disease to spread between all
animal holdings in the network. Consider that a link density of 0.03 corresponds to an average link
degree of almost 7.5, the values of the fragmentation index are reasonable. A disconnection may
reduce the spread of the disease immensely. Hence, any disconnection that is apparent after a link
sampling procedure should be scrutinized. If the disconnection is a result of the specific realization
and hence not necessary to exist in the same manner for any other realization, that is new time
period, such disconnection will jeopardize any conclusions drawn from the study. This points out the
difference between a network that represent one time period with all its measures and a network
that is possible to use for predictions and estimations of rate of any time period yet of the same
length.
We were interested in how many animal holdings that may become infected and in the rate of the
disease transmission. That is why no incubation time is included in the model. This is a simplification
since the incubation time for infections differs between diseases. Some diseases have an incubation
time of only a few days while it for others may be as long as a couple of years. The model can easily
be extended to a more complex model, by including a recovery phase and incubation time. We
measured the number of infected animal holdings as a measure of spread of a possible epidemic. In
practice, this is perhaps not that relevant because it is not desirable to let the disease transmission
go on for such a long time. Instead, it is of course desirable that control strategies are adopted as
soon as possible after identifying an infection. Still our study has implications for what link density
one ought to achieve when testing for different strategies.
4.1 Conclusion
Our results indicate that to estimate properties of networks such as spread of disease one may have
to construct link sampling procedures that reach high link densities. More specifically our scenarios
show that with a perfect sample procedure it is enough with 0.02 density to estimate spread of
disease during shorter time periods while 0.04 is necessary for longer periods. Yet, in reality link
sampling procedures are not perfect and we also expect some mass action mixing component in the
contacts between holdings. Our results show that these two components of reality enforce an even
higher level of link density to achieve a relevant measure on spread of disease.
9
Revision Jenny 20100310
Acknowledgements
References
Barrat, A., Barthélemy, M., Pastor-Satorras, R. and Vespignani, A., 2004. The architecture of complex
weighted networks. PNAS 101, 3747-3752. (doi: 10.1073/pnas.0400087101)
Barthélemy, M., Barrat, A., Pastor-Satorras, R. and Vespignani, A., 2005. Dynamic patterns of
epidemic outbreaks in complex heterogeneous networks. Journal of Theoretical Biology 235, 275-288.
(doi:10.1016/j.jtbi.2005.01.011)
Borgatti, S., 2003. The Key Player Problem in Dynamic Social Network Modeling and Analysis:
Workshop Summery and papers, R. Breiger, K. Carley, P. Pattison, (Eds). National Academy of
Sciences Press.
Christley, R.M., Robinson, S.E., Lysons, R. and French, N.P., 2005. Network analysis of cattle
movement in Great Britain. Proceedings of the Society for Veterinary Epidemiology and Preventive
Medicine (2005), 234-243.
Clauset, A., Moore, C. and Newman, M.E.J., 2008. Hierarchical structure and the prediction of missing
links in networks. Nature 453, 98-101. (doi:10.1038/nature06830)
Corner, L.A.L., Pfeiffer, D.U. and Morris, R.S., 2003. Social-network analysis of Mycobacterium bovis
transmission among captive brushtail possums (Trichosurus vulpecula). Preventive Veterinary
Medicine 59, 147-167. (doi:10.1016/S0167-5877(03)00075-8)
Eames, K.T.D., Read, J.M. and Edmunds, W.J., 2009. Epidemic prediction and control in weighted
networks. Epidemics 1, 70-76. (doi:10.1098/rspb.2003.2554)
Guimerà, R. & Sales-Pardo, M., 2009. Missing and spurious interactions and the reconstruction of
complex networks. PNAS 106, 22073-22078. (doi:10.1073/pnas.0908366106)
Heath, M.F., Vernon, M.C. and Webb, C.R., 2008. Construction of networks with intrinsic temporal
structure from UK cattle movement data. BMC Veterinary Research 4:11. (doi:10.1186/1746-6148-411)
Håkansson, N., Jonsson, A., Lennartsson, J., Lindström, T. and Wennergren, U., 2010. Generating
structure specific networks. Accepted for publication in Advances in Complex Systems.
Kao, R.R., Green, D.M., Johnson, J. and Kiss, I.Z., 2007. Disease dynamics over very different timescales: foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R.
Soc. Interface 4, 907-916. (doi:10.1098/rsif.2007.1129)
Keeling, M. 2005. The implication of network structure for epidemic dynamics. Theoretical
Population Biology 67, 1-8. (doi:10.1016/j.tpb.2004.08.002)
Kiss, I.Z., Green, D.M. and Kao, R.R., 2005. Disease contact tracing in random and clustered networks.
Pro. R. Soc. B 272, 1407-1414. (doi:10.1098/rspb.2005.3092)
10
Revision Jenny 20100310
Kiss, I.Z., Green, D.M. and Kao, R.R., 2006. The network of sheep movements within Great Britain:
network properties and their implications for infectious disease spread. J. R. Soc. Interface 3, 669-677.
(doi:10.1098/rsif.2006.0129)
Lindström, T., Håkansson, N., Westerberg, L. and Wennergren, U., 2008. Splitting the tail of the
displacement kernel shows the unimportance of kurtosis. Ecology 89, 1784-1790. (doi:10.1890/071363.1)
Lindström, T., Sisson, S.A., Nöremark, M., Jonsson, A. and Wennergren, U., 2009. Estimation of
distance related probability of animal movements between holdings and implications for disease
spread modeling. Preventive Veterinary Medicine 91, 85-94. (doi:10.1016/j.prevetmed.2009.05.022)
Nöremark, M., Håkansson, N., Sternberg Lewerin, S., Lindberg, A. and Jonsson, A.. Network analysis
of cattle and pig movements in Sweden: measures relevant for disease control and risk based
surveillance. Submitted to Preventive Veterinary Medicine.
Newman, M.E.J., Strogatz, S.H. and Watts, D.J., 2001. Random graphs with arbitrary degree
distributions and their applications. Phys. Rev. E 64, 026118. (doi:10.1103/PhysRevE.64.026118)
Newman, M. E. J., 2002. Assortative mixing in networks. Phys. Rev. Lett. 89 (20).
(doi:10.1103/PhysRevLett.89.208701)
Ortiz-Pelaez, A. Pfeiffer, D.U. Soares-Magalhães, R.J. and Guitian, F.J., 2006. Use of social network
analysis to characterize the pattern of animal movements in the initial phases of the 2001 foot and
mouth disease (FMD) epidemic in the UK. Prev. Vet. Med. 76, 40-55.
(doi:10.1016/j.prevetmed.2006.04.007)
Perkins, S.E., Cagnacci, F., Straditto, A., Arnoldi, D. and Hudson, P.J., 2009. Comparison of social
networks derived from ecological data: implications for inferring infectious disease dynamics. Journal
of animal ecology 78, 1015-1022. (doi:10.1111/j.1365-2656.2009.01557.x)
Robinson, S.E. and Christley, R.M. 2007. Exploring the role of auction markets in cattle movements
within Great Britain. Preventive Veterinary Medicine 81, 21-37.
(doi:10.1016/j.prevetmed.2007.04.011)
Shirley, M.D.F. and Rushton, S.P. 2005. The impacts of network topology on disease spread.
Ecological Complexity 2, 287-299. (doi:10.1016/j.ecocom.2005.04.005)
Vernon, M.C. and Keeling, M.J., 2009. Representing the UK´s cattle herd as static and dynamic
networks. Proc. R. Soc. B 276, 469-476. (doi:10.1098/rspb.2008.1009)
Wasserman , S. and Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge
University Press, Cambridge.
Watts, D.J. and Strogatz, S.H., 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440442. (doi:10.1038/30918)
Webb, C.R. 2005. Farm animal networks: unraveling the contact structure of the British sheep
population. Preventive Veterinary Medicine 68, 3-17. (doi:10.1016/j.prevetmed.2005.01.003)
11
Revision Jenny 20100310
Webb, C.R., 2006. Investigating the potential spread of infectious diseases of sheep via agricultural
shows in Great Britain. Epidemiology and Infection 134, 31-40. (doi:10.1017/S095026880500467X)
12
Revision Jenny 20100310
Table captions
Table 1. Link densities used in simulations and the corresponding mean link degree for the networks.
mean degree
link density
(nr of links/node)
0.001
0.005
0.01
0.02
0.03
0.04
0.05
0.07
0.10
0.25
0.50
0.75
1.00
0.250
1.248
2.495
4.990
7.485
9.980
12.48
17.47
24.95
62.38
124,8
187.1
249.5
Table 2. Fragmentation index depending on link density and the link forming method used.
link density
distance dependence
random
0.001
0.005
0.01
0.02
0.03
0.9983
0.9065
0.0385
0.0002
0.0000
0.9981
0.1976
0.0133
0.0001
0.0000
13
Revision Jenny 20100310
Figure captions
a)
b)
c)
Figure 1. Network categories: a) complete network, b) real world network, c) sampled network
Number of
Animal holdings
Random Placement
Distance Dependent
Linking
Distance
dependent
transmission
Dl Dt
Random Linking
Distance
Random
dependent
transmission transmission
D l Rt
Rl Dt
Random
transmission
R l Rt
Figure 2. Flow chart showing the different parts of the model and how these relate to each other.
14
Revision Jenny 20100310
Figure 3. Mean number of infected holdings per time step depending on linking and disease
transmission scenarios. Scenarios DlDt (a) and RlDt (b) have distance dependent disease transmission
while scenarios DlRt (c) and RlRt (d) have random transmission. With scenarios DlDt (a) and DlRt (c)
distance dependent link creation are used. In scenarios RlDt (b) and RlRt (d) random link creation are
used. Link densities used: 0.001 (---), 0.005 (…), 0.01 (--.--), 0.02 (__), 0.03 (-○-), 0.04 (-*-), 0.05 (-□), 0.1 (-♦-), 0.25 (-◦-), 0.5 (-▼-), 0.75 (-x-) and 1.0 (-+-).
15
Revision Jenny 20100310
Figure 4. The solid line shows the median number of infected holdings per time step for the DlDt
scenario. Dashed lines represent the first and the third quartiles of the replicates. Link densities in the
sub graphs: a=0.001, b=0.01, c=0.02, d=0.03, e=0.04 and f=1.0. Notice that the scales of the y-axes
are not the same in all sub graphs.
16
Revision Jenny 20100310
Figure 5. Number of time steps until (a) 10%, (b) 50% and (c) 90% of all in the network are infected.
The time depends on which of the four scenarios that are used. Dashed line = DlDt , dotted line = RlDt,
solid line = DlRt and RlRt = dash-dot line. For scenario RlDt the number of infected holdings did not
reach any of the given proportions during the simulation time.
17
Revision Jenny 20100310
Figure 6. Mean number of infected per time step for a given link density and the four scenarios.
Dashed line = DlDt , dotted line = RlDt, solid line = DlRt and RlRt = dash-dot line. Here, eight link
densities, one at time, are used and compared. Link densities in the sub graphs: a= 0.001, b= 0.01, c=
0.03, d=0.05, e=0.07, f=0.1, g=0.5 and h=1.0. Notice that the scales of the y-axes are not the same in
all sub graphs.
18
Revision Jenny 20100310
1
(a)
assortativity
0.8
0.6
0.4
0.2
0
0.1
0
0.7
0.6
0.5
0.4
0.3
0.2
0.8
0.9
link density
1
(b)
clustering coefficient
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
link density
Figure 7. Average (a) assortativity and (b) clustering coefficient for the networks, depending on the
way the holdings are connected to each other. Distance dependent linking = dashed line and random
linking = solid line.
Short title for page headings: Network sampling and epidemic predictions
19