Download Is a sampled network a good enough descriptor for epidemic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Chagas disease wikipedia , lookup

Leptospirosis wikipedia , lookup

Eradication of infectious diseases wikipedia , lookup

Pandemic wikipedia , lookup

African trypanosomiasis wikipedia , lookup

Transcript
1
Is a sampled network a good enough descriptor for epidemic
2
predictions? Missing links and appropriate choice of representation
3
Jenny Lennartssona, Annie Jonssona, Nina Håkanssona,b and Uno Wennergrenb,*
4
5
a
Systems Biology Research Centre, Skövde University, Box 408, 541 28, Skövde, Sweden
6
b
IFM Theory and Modeling, Linköping University, 581 83 Linköping, Sweden
7
* Corresponding author. IFM Theory and Modeling, Linköping University, 581 83
8
Linköping, Sweden. Phone: +46 13 281666. Fax: +46 13 281399
9
E-mail address: [email protected]
10
11
12
ABSTRACT
13
14
Lack of complete data sets can be a limitation in network analysis. Here, we studied how
15
link density affects the properties of disease transmission networks. Networks with
16
weighted links were used to run scenarios assuming distance dependent probabilities of
17
disease transmission, which were subsequently compared with scenarios where
18
probabilities of disease transmission were randomly drawn (i.e. non-distance dependence).
19
In both types of scenarios, two link sampling methods were tested, one based on distance
20
dependence and the other on a random approach. This allowed us to study how link density
21
influences the spread of disease in networks generated using different link sampling
22
methods and transmission scenarios. We conclude that, under the assumption of distance
1
23
dependence of both link sampling and disease transmission, predictions about the extent of
24
an epidemic can be drawn from a network, even at a link density that is low, albeit higher
25
than in most empirical studies. In reality, neither sampling procedures nor disease
26
transmissions fit distance dependence perfectly. Our results show how this enforces an even
27
higher level of link density in sampled networks to achieve reasonable predictions for
28
disease transmission.
29
30
Keywords: epidemic modelling, link density, link sampling, network analysis, disease
31
transmission
32
33
34
35
1. INTRODUCTION
36
37
During recent years, there has been growing interest in and use of network analysis in
38
epidemiology. A network consists of interacting units, here denoted nodes, and these units
39
are connected through relationships we call links. Examples of nodes are individual animals
40
or animal holdings, and links can be visits or animal transports. The pattern of links
41
between the nodes gives rise to networks with contact structures that differ depending on
42
both the amount of links and how the links are organized. Here, we classify networks into
43
three categories: (i) the complete network (Wasserman and Faust, 1994), which includes all
44
theoretically possible links (figure 1a); (ii) the real-world network, which comprises all
45
realizations of links during a specified time period (figure 1b); (iii) the sampled network,
2
46
which encompasses the links measured during the sample period (figure 1c). In addition, to
47
estimate the link structure of networks, probabilities of occurrence or disease transmission
48
can be measured per time unit of individual links, in which case the network is referred to
49
as being weighted (Barrat et al., 2004). In contrast to classical epidemiological models such
50
as SI, SIR, and SEIR, network models relax the assumption of homogeneous mixing (mass-
51
action type of assumptions), because all nodes are not linked to all other nodes, or the links
52
are weighted, for example depending on distance.
53
54
The sampled network can be estimated through sample surveys, literature studies, or
55
contact tracing, or by use of databases (e.g. national databases for animal movements). The
56
estimation is cumbersome, and it might be expected that estimated networks will lack some
57
links and even some nodes (Clauset et al., 2008). It is the real-world network that is of
58
interest to register, but we have to use the sampled network to represent it. Hence, there is a
59
need to evaluate the effect of missing links in order to assess or possibly reduce errors when
60
networks are applied. The current study focused on how the number of links in sampled
61
networks affects predictions of the size of epidemics. We simulated spread of disease in
62
networks with different link densities and different scenarios that mimicked sampling
63
procedures.
64
65
A real-world weighted contact network consists of all contacts that occur during a specific
66
time period, where the link weights are estimated, as probabilities, from the contact
67
frequencies. Another time period constitutes yet another event along with its specific
68
contacts, which may very well lead to another network with a different set of links. Thus,
3
69
the question arises as to whether the properties of the two real networks will differ.
70
Furthermore, it must be asked whether the properties of the sampled networks for those two
71
events will differ. Is it possible that a property of the first sampled network (e.g. the spread
72
of disease) can be valid as an approximation of the spread of disease during a second event?
73
Obviously, a time period that is too short will result in a poor approximation of any of the
74
two events. On the other hand, a measured period with a very large time frame will yield a
75
nearly complete network that is almost a perfect approximation. Beyond reality is the
76
infinite sampling procedure that results in a complete network in which all links exist with
77
specified probabilities. Somewhere in between the short time period and the excessive
78
sampling, there is a sampled network that has enough links and sufficiently estimated
79
probabilities to generate an adequate approximation.
80
81
In the present study, we concentrated on disease transmission networks with weighted
82
links, because it might be expected that the probability of contact can be higher for some
83
links than for others. We ran scenarios with the assumptions of distance dependent
84
probabilities and compared the results with scenarios based on randomly drawn
85
probabilities. The distance dependence was tested for disease transmission and link
86
sampling, both separately and in combination. In a worst-case scenario, there would be a
87
mismatch when using distance dependent transmission probabilities together with random
88
link sampling. Hence, in addition we studied how the necessary amount of measured links
89
also depended on the mismatch between a real-world network and the sampling procedure.
90
We chose to focus our investigation on networks in veterinary medicine, specifically the
4
91
spread of infectious diseases between animal holdings, because the use of network analysis
92
is increasing in that field (Barthélemy et al., 2005; Ortiz-Pelaez et al., 2006).
93
94
In veterinary medicine, network analysis and modelling can be employed to predict the
95
spread of disease and epidemic size, and also to examine the effects of various intervention
96
methods, such as vaccination, stand still, and stamping out. An example of this is a study
97
performed by Corner et al. (2003), which was aimed at examining a network of wild
98
brushtail possums with regard to transmission of the pathogen Mycobacterium bovis and
99
social contacts between the animals. In another investigation, Kiss et al. (2006) analysed
100
networks of sheep movements within Great Britain and found that, during an epidemic, the
101
most efficient strategy is to concentrate control interventions on highly connected nodes.
102
Despite the increased use of network analysis in epidemiology, there are shortcomings
103
related to missing links and how to represent a structure when only a single sample is
104
available. Collected network data are often incomplete (Christley et al., 2005; Clauset et al.,
105
2008; Eames et al., 2009; Guimerà and Sales-Pardo, 2009; Heath et al., 2008; Ortiz-Pelaez
106
et al., 2006), for instance, there can be missing animal movements or unknown locations of
107
herds in databases. Accordingly, Perkins et al. (2009) demonstrated that network structures
108
are only approximations of contacts and that it is almost impossible to identify all contacts
109
when collecting data.
110
111
Properties such as spread of disease can vary depending on the structure of the network
112
(Keeling, 2005; Kiss et al., 2006, Newman et al., 2001; Shirley and Rushton, 2005). Thus,
113
due to the relationship between disease transmission and network structure, results based on
5
114
networks with missing links may be misleading. In practice, this means that there will be
115
problems with missing data, which will lead to lost links in the representation of a network.
116
The links are lost due to errors that occur during the sampling period or as a consequence of
117
the finite length of the sampling period. Guimerà and Sales-Pardo (2009) introduced a
118
method to use a single measure of a network, called a sampled network, to generate a more
119
correct representation, i.e. an approximation of the real-world network. Those researchers
120
focused on networks that were reduced during sampling, and by measuring and classifying
121
the structure of the sampled network, they were able to identify either missing or spurious
122
links. By comparison, our study was more general in nature and handled the relationship
123
between link density and estimates of properties such as spread of disease and specific
124
network measures. Our results can be combined with the findings of Guimerà and Sales-
125
Pardo (2009) by stressing when to expect missing links.
126
127
When conducting a survey to achieve a sampled network, it is important to consider the
128
time window of the sampling period. For example, Kao et al. (2007) studied the
129
relationship between the network of UK livestock movement and disease dynamics on
130
different time scales. This was achieved by simulating transmission of two diseases, scrapie
131
and foot-and-mouth disease, which differ greatly regarding the time scales of the incubation
132
and infectious periods. Kao and colleagues concluded that, in order for network analysis to
133
be a valuable tool in epidemiological modelling, it is important to consider the time scale as
134
well as the potentially infectious contacts. In another study, Robinson et al. (2007)
135
investigated animal movement networks evolving over time in Great Britain, and their
136
findings point out the importance of temporal scale. With increasing length of the time
6
137
period under consideration, the networks became progressively more connected, and in that
138
way fuelled the spread of disease. Those authors also found a seasonal pattern with a peak
139
in spring and August. Thus, depending on the question to be examined or when comparing
140
different networks, it is important to choose the appropriate temporal scale (Vernon and
141
Keeling, 2009). In the current study, we discuss this time window problem in relation to the
142
difficulty of achieving a sufficient number of links during a selected period.
143
144
145
2. METHODS
146
147
2.1 The model
148
Figure 2 illustrates the process of network generation and simulation in our study. The first
149
step involved placement of animal holdings in the landscape, and the second the link-
150
forming procedures, which in this case were related to empirical sampling. The third step
151
comprised simulation runs of disease transmission. The network-generating algorithm,
152
simulations, and calculations were implemented and run in MATLAB (version R2009a).
153
154
2.1.1 Landscape of animal holdings
155
The number of animal holdings was set to 500, and these entities were randomly placed in a
156
landscape of size 34 x 34 (see figure 2). The holding density was chosen according to
157
actual farm density in southern Sweden. Each animal holding was considered to be a node,
158
which implies that each animal was not modelled individually.
159
7
160
2.1.2 Link sampling and link density
161
The animal holdings were connected by distance dependence (eq. 1; Håkansson et al.,
162
2010; Lindström et al., 2008) between those entities (Dl) or completely at random (Rl).
163
164
P(lij )  K exp  dij ab
(1)
165
166
In the equation, P(lij) is the probability that a link is formed between nodes i and j, and di,j is
167
the Euclidian distance between holdings i and j. Parameters a and b are set by the
168
parameters kurtosis, к, and standard deviation, σ (see Lindström et al., 2008). The constant
169
K normalized the distribution such that the probabilities of all possible links summed to
170
one. For distance dependent link sampling, Dl, we used a kurtosis value of 10/3, meaning
171
an exponential distribution and a standard deviation of one. The links were sampled
172
randomly and successively from this probability distribution (eq. 1), until the desired link
173
density was achieved. Since stochasticity was included in the method, it was also possible
174
to sample links between holdings that were more distant from each other, even at a low link
175
density. For random distribution of links, Rl, the links were sampled one at time, with the
176
same probability for all links. To avoid edge effects, periodic boundaries were used
177
(Lindström et al., 2008) along the edges of the 34 x 34 landscape.
178
179
Link density, D, represents the actual connections, L, in a network as a proportion of all
180
theoretical possible links in that network (Wasserman and Faust, 1994, eq. 2).
181
8
182
D  2L nn 1
(2)
183
184
In our study, n represented the number of animal holdings in the network. In the
185
simulations, we varied link density between 0.001 and 1.0, and a density of 1.0 indicated a
186
complete network (figure 1a) that included all theoretical connections. Inasmuch as the link
187
density of the networks was set when generating the networks, the mean link degree was
188
also given from the start (table 1).
189
190
2.1.3 Disease transmission
191
As in the link sampling process, we assumed two different processes for the transmission
192
probabilities of a disease, one distance dependent, Dt, and the other completely random, Rt.
193
The two processes could represent two diseases with different behaviours. Transmission
194
rates were determined using the same processes as applied in the link sampling (see §2.1.2).
195
Hence, Dt was set by equation 1 and the same parameter values as Dl, and the transmission
196
probabilities of Rt were arbitrarily set to 0.01.
197
198
2.1.4 Model scenarios
199
Combining two link sampling processes and two disease transmission processes yielded
200
four different scenarios that we designated DlDt, DlRt, RlDt and RlRt (figure 2), and these
201
can be described as follows. The RlRt scenario is an example of a mass action mixing
202
model (Keeling, 2005) that assumes that all links have the same probability of transmitting
203
disease combined with a matching random procedure for link sampling. Matching in this
9
204
context is considered in the sense of process but not necessarily with respect to the
205
occurrence of events, i.e. two different realizations of the randomization from the same
206
process. The DlDt, which comprises linking and transmission probabilities for each link, is
207
a distance dependent scenario that involves matching between the process of probability of
208
measure and probability of transmission. Considering the combinations in the remaining
209
two scenarios, DlRt and RlDt, the link sampling procedure does not match the actual process
210
that generates probability of transmission. For example, in RlDt, transmission is distance
211
dependent and yet the link sampling procedure is random and hence expected to be
212
ineffective. Accordingly, in this case, link sampling is random, which, regardless of
213
distance, implies that some of the first connections detected will have low probabilities,
214
while some that have high probability of transmission will not be detected within the
215
sampling time frame.
216
217
2.1.5 Simulation model
218
To simulate disease transmission in the sampled networks, we used a general and very
219
simple epidemiological model, where the holdings could be in either of two phases:
220
susceptible (S) or infectious (I) (eq. 3).
221
222
dS dt   SI
dI dt  SI
(3)
223
224
Parameter λ in the equation is the probability of disease transmission from an infective
225
holding through a link to a susceptible holding, and the variables S(t) and I(t) are the
10
226
number of holdings in the susceptible and the infected phase, respectively, at time t. We did
227
not incorporate incubation time, and hence animal holdings in contact with an infected
228
holding were already able to infect other holdings during the next time step. Furthermore, a
229
recovery phase was not included in the model, and thus an infected holding remained in the
230
infectious phase during the remaining simulation time. Undirected links were used, and the
231
disease could thus be transmitted in both directions along the links. Disease transmission
232
could occur only between animal holdings that were connected by a link. It should be noted
233
that the probability of a link in the sampled network was according to Rl or Dl, whereas the
234
probability of transmission was according to Rt or Dt.
235
236
2.2 Simulation runs
237
For each link density presented in table 1, simulations were run separately for all four
238
scenarios illustrated in figure 2. One hundred different networks were generated for each of
239
the two link sampling processes, Dl and Rl, and each link density (figure 2). Also, for each
240
density and link sampling procedure, 10 replicates of randomly distributed holdings were
241
created, and, for each of these landscapes of holdings, 10 replicates of networks were made
242
by using one of the two sampling processes (see §2.1.2). For each of these sampled
243
networks, 10 simulations were performed per transmission process, Dt and Rt, by initiating
244
the spread from a randomly chosen animal holding. In all, 1000 simulations were run per
245
scenario and link density. Simulation period was set to 300 time steps, and numbers of
246
infected animal holdings were calculated for each time step.
247
248
2.3 Analysis
11
249
250
To compare the different scenarios and prediction powers determined by link density, we
251
analysed the extent of the spread of disease as the mean number of infected holdings per
252
time step, and also, the mean number of time steps elapsed until a specified proportion of
253
holdings was infected (here 10%, 50% and 90%). To characterize the networks and to
254
ascertain how a change in link density would affect the structure and function of the
255
networks, we used the following network measures: degree assortativity, clustering
256
coefficient and fragmentation index.
257
258
Degree assortativity (Newman, 2002) is a measure of to what extent nodes with equal
259
respectively unequal degree are connected. Values range from minus one to one. A value
260
near one indicates that a larger proportion of holdings with equal degree are linked to each
261
other. Assortativity near minus one corresponds to a network where holdings with a
262
different degree have a higher probability of being connected. A value of zero implies that
263
the connections between holdings are not dependent on node degree.
264
265
The clustering coefficient (Watts and Strogatz, 1998) for a holding is the number of links
266
that exist between neighbours of that holding divided by all possible links between the
267
neighbours. Here, we used the average clustering coefficient for the whole network; this
268
measure ranges between zero and one, where one indicates that the network is highly
269
clustered.
270
12
271
The fragmentation index (Borgatti, 2003; Webb, 2005) measures to what extent a network
272
is disconnected. This index ranges from zero to one; a low value indicates that the network
273
is highly connected, and a high value means that the networks are very fragmented.
274
275
276
3. RESULTS
277
278
The results show that, for the scenario with distance dependent link sampling and disease
279
transmission (DlDt), a link density of around 0.04 gave the same number of infected animal
280
holdings as it did for networks with a larger proportion of connections (figure 3). Under the
281
assumptions of our model, these findings suggest that such low proportions of links in the
282
network were sufficient to examine the extent of the disease transmission. The scenario
283
comprising random link sampling and distance dependent disease transmission (RlDt)
284
required a higher link density until a limit was reached where additional links had no
285
influence. For the scenarios involving random transmission (DlRt and RlRt), the number of
286
infected animal holdings increased with increasing link density, and no limit was reached.
287
288
Since the spread of disease is stochastic, we also studied the variation in different
289
realizations. We found variation between realizations in both cases, that is, incomplete
290
networks compared with complete networks with link density of 1.0. Hence, it is important
291
to assess both the expected and the measured variation. Figure 4 shows the median values
292
of the 1000 replicates of the simulations of the DlDt scenario plotted with the first and third
293
quartile on each side on the median curve. For a link density as low as 0.001 (figure 4a), the
13
294
median was one for the whole time period, because only in some cases was the disease
295
transmitted to other holdings. When link density increased to 0.01 (figure 4b), the
296
difference between the first and third quartiles also increased. The difference was small at
297
the beginning of the simulation time when few holdings were infected but increased over
298
the time period. Moreover, if link density increases further, up to 0.02 (figure 4c) and 0.03
299
(figure 4d), the difference between the first and third quartiles decreases. For link densities
300
from 0.04 and higher (figures 4e and 4f), the shape of the curves and the distances between
301
them are almost the same, which implies that measured and expected stochastic processes
302
generate equal variation between realizations. Of course, in the last part of the simulation
303
time, when almost all holdings are infected, the variation between realizations decreases
304
towards zero.
305
306
The time until a given proportion of the holdings were infected differed depending on the
307
link sampling scenario and the disease transmission scenario (figure 5). The random disease
308
transmission scenarios (DlRt and RlRt) required almost the same length of time to reach a
309
given proportion of infected animal holdings. In addition, they occurred at a much faster
310
rate compared to the distance dependent disease transmission scenarios (DlDt and RlDt)
311
(figure 3 and 5). Considering all the scenarios, the slowest transmission rate was found for
312
RlDt, i.e. random link sampling and distance dependent transmission.
313
314
Figure 6 shows a comparison of the four scenarios with regard to the number of infected
315
holdings at a given link density. At low link densities, all methods gave different results.
316
When link density increases, the two distance dependent disease transmission scenarios
14
317
(DlDt and RlDt) approach each other. As well as the two random disease transmission
318
scenarios (DlRt and RlRt) did. It can be seen that the higher the link density, the greater the
319
similarity between the results for the different distance dependent disease transmission
320
scenarios. As mentioned above, disease spread was much faster with random transmission
321
than with distance dependent transmission.
322
323
The average assortativity for the networks depended on the link creation method that was
324
used (figure 7a). Distance dependent link creation led to higher values of assortativity
325
compared to random link creation. As expected, the networks produced by random linking
326
had assortativity close to zero at all link densities.
327
328
The average clustering coefficient for all networks increased with increasing link density
329
(figure 7b). The clustering coefficients for the networks generated by distance dependent
330
link sampling were higher than the values for the networks made by random link creation.
331
When link density was increased, the random link sampling approached the distance
332
dependent link sampling. The networks generated by the random link sampling gave
333
clustering coefficients that were equal to the link density in question. Of course, the
334
clustering coefficient was one for all networks when the link density was one, and all
335
animal holdings were connected to each other.
336
337
For both link sampling scenarios, the fragmentation index for the networks was close to one
338
when link density was 0.001 (table 2). When we increased link density to 0.01, the
15
339
fragmentation index decreased dramatically. In both link sampling scenarios, the index
340
reached zero when link density was 0.03 or higher.
341
342
343
4. DISCUSSION
344
345
Our aim was to study the effects of using a disease transmission network with missing links
346
to predict the spread of disease. We investigated whether it is possible to predict anything
347
about the size of such dissemination using only a proportion of all theoretically possible
348
links. According to the results, a link density of 0.04 gave the same mean number of
349
infected animal holdings as a higher link density when spread of disease was simulated in a
350
scenario in which both the probability of identifying a link, and disease transmission, was
351
distance dependent (the DlDt scenario). Also, the variation between different realizations of
352
disease spread converged to expected variation at link density 0.04. When considering
353
distance dependent disease transmission and random link sampling (RlDt), as expected, the
354
numbers of infected animal holdings reached the same level as in the DlDt scenario.
355
Although most of the links were needed to attain that rate since the less probable (longer
356
distance) links will be included when using random sampling than with distance dependent
357
sampling. For random disease transmission (scenario DlRt and RlRt), the number of infected
358
holdings increased with increased link density, which implies that a much higher link
359
density is required to reach relevant approximations of spread of disease. The discussion
360
below addresses the implications of our results in relation to sampling procedures and the
361
effects of using networks with missing links.
16
362
363
Studies using empirical data have shown that only a small fraction of all possible
364
connections in a network actually occurs (Webb, 2006; Eames et al., 2009). When sampling
365
data, it is almost impossible to trace all connections between nodes, even if the number is
366
small, and this often leads to incomplete data sets. Therefore, it is important to consider link
367
density, or mean link degree, when modelling networks. If simulations in a scenario DlDt
368
network with a link density of 0.04 or higher are compared with simulations in a complete
369
network, both will result in the same mean number of infected holdings and the same
370
variation in that mean. This implies that a link density of 0.04 is sufficient and further
371
sampling is unnecessary. Another important issue to consider when using empirically
372
sampled networks is the time window for the sampling period. Using an “incorrect” time
373
window can lead to missing links or unnecessary sampling. The period chosen has an
374
impact on how complete the network will be: a longer time window can result in a more
375
connected network compared to one that is based on a very short time window. The lengths
376
of the time windows used in different studies have varied. For example, Kiss et al. (2006)
377
chose a four-week time scale in their investigation of sheep movements in Great Britain,
378
and Robinson and Christley (2007) used periods of 10 weeks to analyse animal transports.
379
Such studies may indeed provide a very good description of a network during the actual
380
time period under consideration, but that information may not be suitable for making
381
predictions.
382
383
Combining our results with the time frame of a study can help emphasize the problem and
384
focus on number of links that are actually measured. By definition, a shorter period will
17
385
result in fewer measured links, but the question is whether such a sample can suffice to test
386
for spread of disease over a period that is longer than the one that is actually measured.
387
Obviously, in the DlDt scenario, a link density of 0.04 is a guarantee for correct
388
measurement of disease transmission during any given time period. However, at a link
389
density below 0.02, our results indicate that measuring disease dissemination will be
390
erroneous even during short periods. By comparison, a density of 0.03 may hold until a
391
period comprising 50–100 time steps is reached, i.e. when the 0.03 curve diverges from the
392
curves for higher densities (see figure 3) and there is a large overlap in variation between
393
realizations (figure 4). These conclusions are true only when considering a perfect link
394
sampling procedure such as in the DlDt scenario. On the other hand, if the sample
395
procedure is not that perfect, even more links must be included. Our RlDt scenario
396
represents the opposite extreme, i.e. a complete random link sampling procedure that is in
397
no way related to the probability of contacts. In such a case, almost all links have to be
398
sampled, even to make estimations during short time periods. In real life, a link sampling
399
procedure fall somewhere in between these two extremes. Considering any time period or
400
sampling procedure, it is not recommendable to base analysis on link densities below 0.02.
401
Of course, this conclusion applies to our setup: how we modelled the spread of disease,
402
what distance dependence we used (eq. 1), the number and spatial configuration of the
403
holdings. Still our results show that to achieve reliable measurements, it may be necessary
404
to include a higher link density than expected. Link density is a measure that very much
405
depends on the node density; the more nodes the less link density will suffices. Yet, link
406
density is a relevant measure when determining how large proportion of all links is needed
407
to get a fair estimate of disease spread. On the other hand, mean degree is a more general
18
408
measure and our study show that at least a mean degree of 10 links is required. Our method
409
includes periodic boundaries which expel boundary effects and hence our results applies for
410
larger, to infinite, sets of nodes given that the configuration of nodes/animal holdings are
411
realized by our set up of 500 randomly distributed nodes. Hence, a mean degree of 10 links
412
should hold for larger sets of randomly distributed animal holdings although link density
413
consequently will decrease. Furthermore, our methodology can be applied to any specific
414
system, other spatial configurations, latency periods etc, to assess the necessary level of
415
link density. The link density can be achieved by making a single measurement over a
416
sufficiently long period of time or by conducting repetitive sampling over shorter time.
417
418
Empirical investigations of networks have shown that link densities is often very small, i.e.
419
merely a parts per thousand or a few per cent of the total number of theoretical connections
420
in the networks. An example of this is a study by Ortiz-Pelaez et al. (2006), which was
421
conducted to analyse animal movements during the initial phase of an epidemic of foot-
422
and-mouth disease that occurred in Great Britain in 2001. The network in that investigation
423
had a mean link degree of 1.22, which corresponds to a link density that is as low as about
424
0.0019. A low link density was also found in the Swedish animal transport network
425
(Nöremark et al., submitted). It is important to remember that the measured contacts in an
426
empirical network are simply subsets of realizations of all possible contacts, which means
427
that the number of links in such a network is in fact a subset of the links that have been
428
realized at the time of data collection. Actually, there are probabilities for a huge number of
429
additional connections, but they are not even realized during the chosen time period of a
430
network study. For instance, when a link density of 0.01 was used in our investigation, all
19
431
theoretical connections were possible but only 1% of them were realized, and those may
432
have differed between the replicates. When modelling virtual networks, it is also important
433
to consider the link density and mean degree. Kiss et al. (2005) performed epidemiological
434
modelling using virtual networks with mean degrees varying between 5 and 20. In their
435
results, there is an indication that a mean degree of 15-20 was enough to estimate final
436
epidemic size. Besides the investigation by Kiss et al. (2005) few other network studies
437
have focused on link densities and missing links. When comparing either different
438
theoretical studies or theoretical results with empirical investigations one has to use
439
relevant measures. In general, our study indicates that a mean link degree of at least 10
440
links is required and in empirical studies do have too few links in their estimates. The
441
results of Kiss et al. (2005) also support our findings.
442
443
Diseases will be spread faster by a network with randomly distributed links than by
444
clustered networks (Kiss et al., 2005; Watts and Strogatz, 1998). We generated such
445
random networks in our study when we applied random transmission probabilities (RlRt
446
and DlRt), and RlRt represents the full random scenario with rapid spread of disease. For a
447
scenario such as DlRt, with random transmissions and distance dependent link sampling, the
448
transmission rate is slightly slower at any given link density. The link sampling procedure
449
of DlRt erroneously assumes density-dependent contact, and yet the contact structure is
450
random. In a case like this, the rate of the real network, i.e. with random transmission
451
probabilities, is higher than the rate of the sampled network, since the link sampling
452
procedure will miss some important long distance links. Hence, in this mismatch between
453
sampling and transmission probabilities, even higher link density is necessary when
20
454
sampling to reach the correct levels of spread of disease (compare RlRt and DlRt).
455
Lindström et al. (2009) have shown that the spatial kernel explaining the distance
456
dependence of contacts between holdings due to transport is a mix of distance
457
independence, mass action mixing, and distance dependence. In our setup, the mass action
458
mixing is represented by Rt, and, once again, the reality is somewhere in between these two
459
extremes, the DlRt and the DlDt scenario. Consequently, our results for the RlRt and DlRt
460
scenarios imply that the link density levels of 0.03 and 0.04 that were found can be
461
expected to be too low, since the mass action component in contact structures creates even
462
higher demands on link density.
463
464
It is recognized that random networks have a low level of clustering compared to other
465
kinds of networks, such as small-world networks (Shirley and Rushton, 2005; Watts and
466
Strogatz, 1998). We measured the clustering coefficient for each of our networks, and, as
467
expected, found lower values for those generated by random sampling than for those
468
generated by distance dependent link sampling. The degree of fragmentation of a network
469
influenced the extent to which diseases could spread between the holdings. Fragmentation
470
index is a measure of the extent of disconnection of networks, and, in our study, only the
471
networks with link density below 0.03 resulted in disconnection. Link densities of 0.03 or
472
higher gave rise to connected graphs, indicating that it is possible for a disease to spread
473
between all animal holdings in these networks. Since we know that a link density of 0.03
474
corresponds to a mean link degree of almost 7.5, the values of the fragmentation index
475
seem reasonable. It is plausible that a disconnection in a network would reduce the spread
476
of the disease immensely, and hence any disconnection that is apparent after a link
21
477
sampling procedure should be scrutinized. If a disconnection is the result of a specific
478
realization and thus is not necessarily the same in any other realizations (i.e. new time
479
period), this will jeopardize any conclusions drawn from the study. This is evident
480
considering the observed variation, in our results, in the rate of spread for different link
481
densities (figure 4), which emphasizes the difference between a network that represents one
482
specific time period with all its measures and a network that can be used to predict and
483
estimate rate for any given time.
484
485
We were interested in determining how many animal holdings that could become infected
486
and the rate of disease transmission, and thus incubation time was not included in our
487
model. This is a simplification, because diseases differ with respect to incubation time,
488
which can vary from only a few days to as long as a number of years. However, our model
489
can easily be extended to encompass a more complex disease context by including a
490
recovery phase and incubation time. We calculated the number of infected animal holdings
491
as a measure of the spread of disease. In practice, this might not be particularly relevant,
492
because it is not desirable to allow disease transmission to proceed for such a long time.
493
Obviously, it would be preferable to adopt control strategies as soon as possible after
494
identifying an infection. Notwithstanding, the findings of our study do have implications
495
regarding what link density ought to be achieved when testing different strategies.
496
497
4.1 Conclusions
498
Our results indicate that to estimate network properties such as spread of disease, it might
499
be necessary to construct link sampling procedures that yield high link densities. More
22
500
specifically, our scenarios based on Swedish farms show that, if the sampling procedure is
501
ideal a density of 0.02 (mean degree of 5) can suffice to estimate disease transmission over
502
shorter time periods, whereas 0.04 (mean degree of 10) is required for longer periods.
503
Nevertheless, in reality, link sampling procedures are not perfect, and some mass-action
504
mixing component can be expected in the contacts between holdings. Our results
505
demonstrate that these two components of reality enforce an even higher level of link
506
density and thereby represent a relevant measure of spread of disease.
507
508
509
ACKNOWLEDGEMENTS
510
511
We would like to thank the Swedish Civil Contingencies Agency (MSB) for funding this
512
project. We also like to thank Patricia Ödman for revising the English.
513
514
515
REFERENCES
516
517
Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture of
518
complex weighted networks. PNAS 101, 3747-3752. (doi:10.1073/pnas.0400087101)
519
520
Barthélemy, M., Barrat, A., Pastor-Satorras, R., Vespignani, A., 2005. Dynamic patterns of
521
epidemic outbreaks in complex heterogeneous networks. Journal of Theoretical Biology
522
235, 275-288. (doi:10.1016/j.jtbi.2005.01.011)
23
523
524
Borgatti, S., 2003. The Key Player Problem in Dynamic Social Network Modeling and
525
Analysis: Workshop Summery and papers, R. Breiger, K. Carley, P. Pattison, (Eds).
526
National Academy of Sciences Press.
527
528
Christley, R.M., Robinson, S.E., Lysons, R., French, N.P., 2005. Network analysis of cattle
529
movement in Great Britain. Proceedings of the Society for Veterinary Epidemiology and
530
Preventive Medicine (2005), 234-243.
531
532
Clauset, A., Moore, C., Newman, M.E.J., 2008. Hierarchical structure and the prediction of
533
missing links in networks. Nature 453, 98-101. (doi:10.1038/nature06830)
534
535
Corner, L.A.L., Pfeiffer, D.U., Morris, R.S., 2003. Social-network analysis of
536
Mycobacterium bovis transmission among captive brushtail possums (Trichosurus
537
vulpecula). Preventive Veterinary Medicine 59, 147-167. (doi:10.1016/S0167-
538
5877(03)00075-8)
539
540
Eames, K.T.D., Read, J.M., Edmunds, W.J., 2009. Epidemic prediction and control in
541
weighted networks. Epidemics 1, 70-76. (doi:10.1098/rspb.2003.2554)
542
543
Guimerà, R., Sales-Pardo, M., 2009. Missing and spurious interactions and the
544
reconstruction of complex networks. PNAS 106, 22073-22078.
545
(doi:10.1073/pnas.0908366106)
24
546
547
Heath, M.F., Vernon, M.C., Webb, C.R., 2008. Construction of networks with intrinsic
548
temporal structure from UK cattle movement data. BMC Veterinary Research 4:11.
549
(doi:10.1186/1746-6148-4-11)
550
551
Håkansson, N., Jonsson, A., Lennartsson, J., Lindström, T., Wennergren, U., 2010.
552
Generating structure specific networks. Advances in Complex Systems 13:2, 239-250.
553
(doi:10.1142/S0219525910002517)
554
555
Kao, R.R., Green, D.M., Johnson, J., Kiss, I.Z., 2007. Disease dynamics over very different
556
time-scales: foot-and-mouth disease and scrapie on the network of livestock movements in
557
the UK. J. R. Soc. Interface 4, 907-916. (doi:10.1098/rsif.2007.1129)
558
559
Keeling, M. 2005. The implication of network structure for epidemic dynamics. Theoretical
560
Population Biology 67, 1-8. (doi:10.1016/j.tpb.2004.08.002)
561
562
Kiss, I.Z., Green, D.M., Kao, R.R., 2005. Disease contact tracing in random and clustered
563
networks. Proc. R. Soc. B 272, 1407-1414. (doi:10.1098/rspb.2005.3092)
564
565
Kiss, I.Z., Green, D.M., Kao, R.R., 2006. The network of sheep movements within Great
566
Britain: network properties and their implications for infectious disease spread. J. R. Soc.
567
Interface 3, 669-677. (doi:10.1098/rsif.2006.0129)
568
25
569
Lindström, T., Håkansson, N., Westerberg, L., Wennergren, U., 2008. Splitting the tail of
570
the displacement kernel shows the unimportance of kurtosis. Ecology 89, 1784-1790.
571
(doi:10.1890/07-1363.1)
572
573
Lindström, T., Sisson, S.A., Nöremark, M., Jonsson, A. and Wennergren, U., 2009.
574
Estimation of distance related probability of animal movements between holdings and
575
implications for disease spread modeling. Preventive Veterinary Medicine 91, 85-94.
576
(doi:10.1016/j.prevetmed.2009.05.022)
577
578
Nöremark, M., Håkansson, N., Sternberg Lewerin, S., Lindberg, A. and Jonsson, A.
579
Network analysis of cattle and pig movements in Sweden: measures relevant for disease
580
control and risk based surveillance. Submitted to Preventive Veterinary Medicine.
581
582
Newman, M.E.J., Strogatz, S.H. and Watts, D.J., 2001. Random graphs with arbitrary
583
degree distributions and their applications. Phys. Rev. E 64, 026118.
584
(doi:10.1103/PhysRevE.64.026118)
585
586
Newman, M. E. J., 2002. Assortative mixing in networks. Phys. Rev. Lett. 89 (20).
587
(doi:10.1103/PhysRevLett.89.208701)
588
589
Ortiz-Pelaez, A., Pfeiffer, D.U., Soares-Magalhães, R.J., Guitian, F.J., 2006. Use of social
590
network analysis to characterize the pattern of animal movements in the initial phases of the
591
2001 foot and mouth disease (FMD) epidemic in the UK. Prev. Vet. Med. 76, 40-55.
26
592
(doi:10.1016/j.prevetmed.2006.04.007)
593
594
Perkins, S.E., Cagnacci, F., Straditto, A., Arnoldi, D., Hudson, P.J., 2009. Comparison of
595
social networks derived from ecological data: implications for inferring infectious disease
596
dynamics. Journal of animal ecology 78, 1015-1022. (doi:10.1111/j.1365-
597
2656.2009.01557.x)
598
599
Robinson, S.E., Christley, R.M. 2007. Exploring the role of auction markets in cattle
600
movements within Great Britain. Preventive Veterinary Medicine 81, 21-37.
601
(doi:10.1016/j.prevetmed.2007.04.011)
602
603
Shirley, M.D.F., Rushton, S.P. 2005. The impacts of network topology on disease spread.
604
Ecological Complexity 2, 287-299. (doi:10.1016/j.ecocom.2005.04.005)
605
606
Vernon, M.C., Keeling, M.J., 2009. Representing the UK´s cattle herd as static and
607
dynamic networks. Proc. R. Soc. B 276, 469-476. (doi:10.1098/rspb.2008.1009)
608
609
Wasserman , S., Faust, K., 1994. Social Network Analysis: Methods and Applications.
610
Cambridge University Press, Cambridge.
611
612
Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of ‘small-world’ networks. Nature
613
393, 440-442. (doi:10.1038/30918)
614
27
615
Webb, C.R., 2005. Farm animal networks: unraveling the contact structure of the British
616
sheep population. Preventive Veterinary Medicine 68, 3-17.
617
(doi:10.1016/j.prevetmed.2005.01.003)
618
619
Webb, C.R., 2006. Investigating the potential spread of infectious diseases of sheep via
620
agricultural shows in Great Britain. Epidemiology and Infection 134, 31-40.
621
(doi:10.1017/S095026880500467X)
622
623
624
28
625
TABLE CAPTIONS
626
627
Table 1. Link densities used in simulations and the corresponding mean link degree for the
628
networks.
629
630
631
Table 2. Fragmentation index according to link density and the link sampling method used.
632
29
633
FIGURE CAPTIONS
634
635
Figure 1. Network categories: (a) complete network, (b) real-world network, (c) sampled network.
636
637
Figure 2. Model flow chart. Flow chart showing relationships between the different components of
638
the model.
639
640
Figure 3. Mean number of infected holdings per time step in the four linking and disease
641
transmission scenarios. Disease transmission was distance dependent in scenarios DlDt (a) and RlDt
642
(b) but random in DlRt (c) and RlRt (d). Also, distance dependent link creation was applied in DlDt
643
(a) and DlRt (c), whereas links were generated randomly in RlDt (b) and RlRt (d). The link densities
644
were as follows: 0.001 (---), 0.005 (…), 0.01 (--.--), 0.02 (__), 0.03 (-○-), 0.04 (-*-), 0.05 (-□-), 0.1
645
(-♦-), 0.25 (-◦-), 0.5 (-▼-), 0.75 (-x-) and 1.0 (-+-). Corresponding mean link degrees can be found
646
in table 1.
647
648
Figure 4. The median values of the 1000 replicates of the simulations of the DlDt scenario plotted
649
with the first and third quartile on each side on the median curve. The solid line shows the median
650
number of infected holdings per time step and the dashed lines represent the first and third quartiles
651
of the replicates. Link densities: (a) 0.001, (b) 0.01, (c) 0.02, (d) 0.03, (e) 0.04, (f) 1.0. Note that the
652
scales of the y-axes differ in (a) and (b). Corresponding mean link degrees can be found in table 1.
653
654
Figure 5. Number of time steps passed before 10% (a), 50% (b) and 90% (c) of all holdings in the
655
network were infected. The time depended on which of the four scenarios was used. The scenarios
656
are designated as follows: dashed line, DlDt ; dotted line, RlDt; solid line, DlRt; dash-dot line, RlRt.
30
657
For scenario RlDt, the number of infected holdings did not reach any of the given proportions during
658
the simulation time.
659
660
Figure 6. Mean number of infected holdings per time step for a given link density and the four
661
scenarios, designated as follows: dashed line, DlDt ; dotted line, RlDt; solid line, DlRt; dashed-dotted
662
line, RlRt. Link densities: (a) 0.001, (b) 0.01, (c) 0.03, (d) 0.05, (e) 0.07, (f) 0.1, (g) 0.5, (h) 1.0.
663
Note that the scales of the y-axes differ in (a). Corresponding mean link degrees can be found in
664
table 1.
665
666
Figure 7. Average assortativity (a) and clustering coefficient (b) illustrated for the networks
667
according to the connections of the holdings. Distance dependent linking is indicated by a dashed
668
line and random linking by a solid line.
31