* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Manus 1 - IFM - Linköping University
Survey
Document related concepts
Transcript
Draft 091216 Contact density and the spread of infectious diseases in a landscape of animal holdings: How important is it to know all realizable links? Jenny Lennartsson1, Annie Jonsson1, Nina Håkansson1 and Uno Wennergren2,* 1 Systems Biology Research Centre, Skövde University, Box 408, 541 28, Skövde, Sweden 2 IFM Theory and Modeling, Linköping University, 581 83 Linköping, Sweden *Corresponding author ABSTRACT The lack of complete data sets is often a limitation when using network modeling. In this paper, we analyze how connected a network has to be to be able to draw useful conclusions about the extent of a possible epidemic from it. Virtual risk networks with different link densities are created and diseases are simulated to see how the number of infected animal holdings depends on the level of links. We conclude that by using distance dependence assumptions for both link creation and disease transmission, predictions about the extent of an epidemic can be drawn from a network even with low link density. Keywords: network, missing links, link density, infectious diseases, disease transmission 1 Draft 091216 1. INTRODUCTION The use and interest of network analysis has been growing in many different scientific areas during the last decade, for example in biology, epidemiology, economy and social science. A network consists of interacting units, denoted nodes, and these units connect to each other through relations termed links. Nodes could for example be animals, animal holdings, habitats, persons or schools and the links could be personal visits, animal transports or links between web pages. The links between the nodes give rise to networks with different contact structures and these structures then depend on the amount of nodes and links and how these are organized. Since estimation of such structures can be cumbersome one may expect that the estimated network will most probably lack some links and even some of the nodes (Clauset et al. 2008). Hence, there is a need to evaluate the effect of missing links to reduce errors when networks are applied. Since data sampling can be costly, also unnecessary sampling including links with too seldom occurrence should be avoided. The focus in this study is on the number of links in the networks and its effect on properties as spread of disease and on network measures. We will generate scenarios to mimic sampling procedures. The number of links in a network determines its connectedness. Three network categories can be defined according to how connected the networks are. Firstly the complete network (Wasserman & Faust 1994) (figure 1a) where all theoretically possible links are included, secondly the real world network (figure 1b) where all realizations of links during a specified time period are included, and thirdly the sampled network (figure 1c) where all the estimated links, given the same timeperiod, are included. The sampled network can be estimated through sample surveys, literature studies, contact tracing or by databases such as national databases for animal movement. The real world network is the network one would like to consider but the sampled or complete network will be what one have to represent it with. The real network is a single event occurring during a specific time period. Another event, maybe also with the same time length, will most probably result in another set of links. The question then arises whether the properties of the two will differ or not? May a property, as spread of disease, of the first sampled network apply as an approximation of the property of the second event? This question does also apply to the two real world networks. Will the property of the first real network be valid as an approximation of property of the second one? It is obvious that a too short time frame will result in a bad approximation and also that a very large time frame with an almost complete network with specified probabilities on all its links is a perfect approximation. Yet somewhere in between there is an approximation that is sufficient yet not too time consuming to achieve. In contrast to classical models such as SI/SIR/SEIR epidemic models, network models relax the assumption of homogeneous mixing (mass-action type of assumptions). Network analysis on the other hand requires handling of huge amounts of data which is possible by the computational power today. In veterinary medicine network analysis, with explicit contact structures, is an increasingly applied tool (Barthélemy et al. 2005; Ortiz-Pelaez et al. 2006). The potential use of network analysis and modeling in epidemiology is to predict size and spread of epidemics and to examine effects of different intervention methods such as vaccination, stand still and stamping. For example, Corner et al (2003) studied the transmission of Mycobacterium bovis among a network of wild brushtail possums and the social contacts between them. In another study Kiss et al. (2006) analyzed networks of sheep movements within Great Britain. They showed that during an epidemic it is most efficient to concentrate control interventions to highly connected nodes. Despite the increased use of networks in epidemiology, 2 Draft 091216 there are shortcomings in the analysis of missing links as well as how to represent a structure given a single sample. How connected measured networks are, is highly varying, for example depending on the context they are sampled in, sampling method used and the time window for the sampling period. Unfortunately, it is not unusual that collected network data is incomplete (Christley et al. 2005; Ortiz-Pelaez et al. 2006; Clauset et al. 2008; Heath et al. 2008; Eames et al. 2009; Guimerà & Sales-Pardo 2009). It could for example be missing animal movements or unknown locations of herds in databases. Depending on the structure of the network, properties as epidemic development can vary (Newman et al. 2001; Keeling 2005; Shirley & Rushton 2005; Kiss et al. 2006). Since disease transmission depends on the networks structure, results based on networks with missing links may be misleading. Perkins et al. (2009) demonstrated that network structures are only approximations of contacts and that it is almost impossible to identify all contacts when collecting data. In practice, this means that there will be problems with missing data resulting in lost links in the representation of a network. These lost links is the result of errors during the sampling period or a consequence of the finite length of the sampling period. Guimerà and Sales-Pardo (2009) introduce a method to use a single measure of a network, a sampled network, to generate a more correct representation, i. e. an approximation of the real world network. Their method focuses on a reduced network because of errors during sampling. By measuring and classifying the structure of the sampled network, they could identify either missing or spurious links. In our study, the focus is more general and handles the relation between link density and estimates of properties as spread of disease and specific network measures. This can of course also be viewed as a study on the consequence of too few or too many links in a network representation. During a survey to achieve a sampled network, it is important to consider the time window of the sampling period. For example, Kao et al. (2007) studied the relation between UK livestock movement network and disease dynamics over different time-scales. They simulated transmission of two diseases, footand-mouth disease and scrapie, which have very different time-scales regarding incubation time as well as infectious period. They concluded that for network analysis to be a valuable tool in epidemiological modeling, it is important to consider the time-scale as well as the potentially infectious contacts. In another study, Robinson et al. (2007) investigated animal movement networks evolving over time in Great Britain and their findings point out the importance of temporal scale. With increased time-period, the networks became more and more connected and in that way fueled the disease transmission. They also found a seasonal pattern with a peak in spring and August. Thus depending on the question to be examined or when comparing different networks it is important to choose the appropriate temporal scale (Vernon & Keeling 2009). Otherwise, there could be too few or too many links involved in the analyses. A more probabilistic representation of networks has weighted links (ref???). With such a representation, one may estimate the network and link-weights over longer (or shorter) time periods yet apply the network over other time spans then the measured one. Still too few links will result in an underestimation of the probabilities while too many results in an overestimation. In this study we tested how high link density is necessary to achieve a network with correct properties. In addition, will it depend on sampling procedure? 2. METHODS 2.1 The model 2.1.1 Landscape of animal holdings The number of animal holdings was arbitrarily set to 500 and these were randomly placed into a landscape of size 34 x 34. See figure 2. 3 Draft 091216 The holding density was chosen according to realistic farm density in southern Sweden. Each animal holding was considered as a node, which implies that each animal was not individually modeled. 2.1.2 Virtual sampled networks Animal holdings were connected to each other to generate virtually sampled networks. Since we investigated how the extent of a possible epidemic depended on the number of links in the networks, we used the measure link density to control for that. Link density is the actual connections in the network as a proportion of all theoretical possible links in the network (Wasserman & Faust 1994). Link density was varied from 0.001 to 1.0. A link density of 1.0 means a complete network (figure 1a) where all theoretical connections (eq. 1) are included and the number of links (Cn) will be: Cn nn 1 2 (1) Where n is the number of animal holdings in the network. Because the link density of the networks was set when generating the networks, also the mean link degree was given from start. Table 1 shows which mean degree each link density corresponds to. Holdings were connected to each other in two different linking scenarios, either due to the distances between the holdings or completely at random. See figure 2. To simplify the model we assumed that distance strongly affected both the realizable links and the probability for disease spread. Others (Keeling 2005; Le Menach et al. 2005) also assumed distance dependent connection. This assumption implies more links between adjacent animal holdings than between holdings far from each other. With distance dependent link creation, links were randomly drawn from a frequency distribution where the probability of a link between two animal holdings depended on the Euclidian distance between them. Since stochasticity was included in this method links could also exist between holdings that were more distant from each other, even if a low link density was used. The probability, P(li,j), that a link exist between holding i and j is given by the exponential distribution given by eq. 2 (Håkansson et al. 2009; Lindström et al. 2008). P(lij ) Ke dij a b (2) Where di,j is the Euclidian distance between holding i and j. To avoid edge effects, periodic boundaries were used (Lindström et al. 2008). Parameters a and b are regulated by the ingoing parameters, kurtosis, к, and standard deviation, σ. Here, a kurtosis value of 10/3 and a standard deviation of one were used. The constant K normalized the distribution so that the probabilities of all possible links were summed to one. With the random linking scenario, all animal holdings have the same probability to connect to each other, independent of the distance between them. The two linking scenarios can relate to two different methods of data collection. The distance dependent linking scenario can mimic data collection where connections between nearby nodes are sampled first. And only when these shorter connections are collected or in rare cases, also connections between distant holdings are found. In contrast, the random linking scenario reflects a method of data sampling where connections between nodes are completely randomly found and no systematic way for finding the connections are used. 2.1.3 Risk networks and disease transmission To simulate disease transmission in the networks, we used a simple model, where the holdings could be in one of the two phases, susceptible or infectious. The model did not incorporate incubation time so animal holdings that have contact to an infected holding could infect other holdings already in the next time step. Since a recovery phase was not included in the model, an infected holding could never turn into the susceptible phase again. That is, an infected holding remained in 4 Draft 091216 the infectious phase during the remaining simulation time. Undirected links were used and that means that diseases could transmit in both directions along the links. Two different scenarios of disease transmission were tested, distance dependent and random transmission (figure 2). These two scenarios were combined with the two linking scenarios into four different combined scenarios: DlDt, DlRt, RlDt and RlRt (figure 2). The RlRt scenario is an example of mass action mixing model (Keeling 2005) that assumes that all links have the same probability of transmitting the disease. The DlDt scenario, with different linking- and transmission probabilities for each link, is the opposite of the RlRt scenario. The remaining two scenarios, DlRt and RlDt, are combinations of the two previously mentioned scenarios, DlDt, and RlRt. When simulating distance dependent transmission we arbitrarily used the same probability for transmission as given by the exponential probability distribution function in equation 2. The probability for disease transmission was therefore higher when the distance between the holdings was shorter. This high probability could also be interpreted as more contacts between animal holdings close to each other, although it could only exist one link between a couple of holdings. When random transmission was simulated, the probability for disease transmission was the same regardless of the distance between the animal holdings. This probability was here arbitrary set to 0.01. The two different kinds of probabilities for disease transmission that were used could be thought of as two different diseases. If the disease transmits along a link or not were then randomly determined. As before, disease transmission could only occur between animal holdings that were connected by a link. simulation run 10 randomly picked animal holdings, one at time, were initially infected. Simulations were run for 300 time steps. Numbers of infected animal holdings were calculated each time step. Simulations were run in MATLAB (version R2009a). 2.3 Analysis To characterize the networks and to see how a change in link density affects the structure and function of the networks, we used network measures. Three network measures were calculated: degree assortativity, clustering coefficient and fragmentation index. Degree assortativity (Newman 2002) measures proportions of connected holdings with equal degree. Values range from minus one to one. A value near one indicates that holdings with equal degree are often linked to each other. Assortativity near minus one means that holdings with different degree are often connected. A value of zero means that the connections between holdings are totally random. Clustering coefficient (Watts & Strogatz 1998) for a holding is the number of links that exists between neighbors to a holding, divided by all possible links that could exist between the neighbors. Here we have used the average clustering coefficient for the whole network. This measure ranges between zero and one where one indicates that the network is highly clustered. Fragmentation index (Borgatti 2003; Webb 2005) measures to what extent the network is disconnected and it ranges from zero to one. Low value indicates that the network is highly connected and a high value means that the networks are very fragmented. The network measures were implemented and calculated in MATLAB (version R2009a). 2.2 Simulation runs 3. RESULTS Since stochasticity was included in the model, replicates were needed. Therefore, 10 different spatial holding patterns were generated and for each of them, links were added in 10 different ways. Totally, 100 different networks were generated. For each The results shows, that for scenario DlDt, a link density of around 0.04 gives the same number of infected animal holdings as a network with a higher proportion of connections does (figure 3). Under the assumptions of our model, the results indicate that a low 5 Draft 091216 proportion of links in the network could be enough to be able to say something about the extent of the disease transmission. For scenario RlDt with distance dependent transmission in a randomly linked network, it requires a higher link density to reach the same transmission rate as with scenario DlDt. For the scenarios with random transmission (DlRt and RlRt) the number of infected animal holdings increases with increased link density and no limit was reached until link density of 1.0 was used. The time until a given proportion of the holdings were infected differs depending on linking scenario as well as on disease transmission scenario (figure 4). Random disease transmission scenario (DlRt and RlRt) require almost the same time to reach a given proportion of infected animal holdings. In addition, they also have a much faster disease spread than with the distance dependent disease transmission scenarios (DlDt and RlDt) (figure 3 and 4). Scenario RlDt, which bases on random link creation and distance dependent transmission, ends up with the slowest transmission rate of the methods compared. For this scenario (RlDt) only for high link densities, the given proportions of holdings are infected (figure 4). For lower link densities, the number of infected holdings did not reach the given proportions during the simulation time. used (figure 6a). Distance dependent link creation ends up with higher values of assortativity compare to random link creation. The networks made by random linking have as expected assortativity around zero for all link densities. The average clustering coefficient for all networks increases with increasing link density (figure 6b). The clustering coefficients for the networks generated by distance dependent link creation are higher than the values for the networks made by random link creation. When link density increases, the random linking method approaches the distance dependent linking method. The networks generated by the random linking scenario give clustering coefficients that are equal to the link density in question. Of course, the clustering coefficients for all networks are one when the link density is one, and all animal holdings are connected to each other. The fragmentation index for the networks shows that for both linking scenarios the index is close to one when link density is 0.001 (table 2). When link density increases to 0.01, the fragmentation index dramatically decreases. With both linking scenarios, the index has reached zero when link density is 0.03 or higher. 4. DISCUSSION The number of infected holdings at a given link density are compared between the four scenarios (figure 5). At low link densities, all methods gave different results. When link density increases, the two distance dependent disease transmission scenarios (DlDt and RlDt) approach each other. As well as the two random disease transmission scenarios (DlRt and RlRt) did. The higher link density the more similar are the results between the different distance dependent disease transmission scenarios. As mentioned before, using random transmission gives a much faster disease spread than using distance dependent transmission. The average assortativity for the networks depends on the link creation method that is Our aim with this study was to investigating how wrong it could be if using a network with too many or too few connections. We investigated if it was possible to predict anything about the extent of a disease transmission with only some proportion of all theoretical links realizable. Our results showed that a link density of 0.04 gave the same number of infected animal holdings as a higher link density did. This result obtains when the probability for link creation as well as for disease transmission is according to the distance dependent scenario (DlDt). For distance dependent disease transmission in a random linked network (RlDt), we got the same transmission rate as when the links are distance dependent. However, with random 6 Draft 091216 linking much more links are needed to reach this rate. That is because with random linking there are a higher number of longer links included than with distance dependent linking. These long links have low transmission probabilities that end up in a slower disease transmission. For random disease transmission (scenario DlRt and RlRt), the number of infected holdings increased with increased link density. Below we discuss implications of our results in relation to the effects of using networks with missing or overrated links. 4.1 Missing links Empirical data have showed that only a small fraction of all connections in a network actually takes place (Webb 2006; Eames et al. 2009). When sampling data it is almost impossible to trace all connections between nodes and this often leads to incomplete data sets. Therefore, it is important to consider link density when working with network modeling. Assuming a complete network when modeling disease transmission could result in an over estimation of the extent of an epidemic. Comparing simulations in a scenario DlDt network with a link density of 0.04 or higher, to simulations in a complete network, both would result in the same number of infected holdings. Another important issue to consider when using empirical networks is the time window for the sampling period. Using “wrong” time window can lead to missing links or too many links that both affect the link density and the spread of diseases in the networks. The length of this affects how complete the network will be, a longer time window could result in a more connected network than one based on a very short time window. Different studies use different length on the time windows. For example, Kiss et al. (2006) used a 4-week time scale in their study of sheep movements in Great Britain. In another study of animal transports, Robinson and Christley (2007) used periods of 10 weeks. Measured link densities in empirical investigated networks is often only about some per mille or just a few percent of the total number of theoretical connections in the networks. An example is Ortiz-Pelaez et al. (2006) who have studied animal movements during the initial phase of the epidemic of foot-and-mouth disease in Great Britain in 2001. Their network has an average link degree of 1.22 and that corresponds to a link density as low as about 0.0019. Also in the Swedish animal transport network a low link density is measured (ref). It is important to remember that the measured connections in an empirical network only are realizations of all possible contacts. That means that the number of links in these networks is the ones that have been realized during the time for data collection. Actually, there are probabilities for a huge number of additional connections but these have not been realized during the current time period. This is in our study mimicked by network replicates. For example when a link density of 0.01 is used, all theoretical connections are possible but only 1% of all of them are realized and these could differ between the replicates. Other examples of empirical investigated networks are found in Newman (2003). Also when modeling virtual networks it is important to consider the connection level. Kiss et al. (2005) have used virtual networks with different mean degree in their epidemiological modeling. They have varied the mean degree between 5 and 20 to see how it implies the final epidemic size. These values corresponds to link density values from 0.005 to 0.02, that is, rather low densities. It would have been interesting if they also had used higher values, to compare to our results based on higher link densities. Despite the study of Kiss et al. (2005), network studies focusing on link densities and missing links are rare and more work has to be done in this field. Our results indicate that, when simulating in a network generated by scenario DlDt, the time scale is perhaps not so important for measuring the extent of an epidemic, that previously thought. That because a low link density (though over 0.04), according to the results showed here, is as good as a high link proportion in predicting the number of animal holdings that will be transmitted if a disease is entering the network. 7 Draft 091216 That random network will spread diseases faster than spatially clustered network is well known (Watts & Strogatz 1998; Kiss et al. 2005). This is here obtained with the random network that uses random transmission probabilities (RlRt). For the DlRt scenario with random transmissibility in a distance dependent network the transmission rate is slightly slower. The reason for that is probably because this kind of networks contains more clusters than the random networks. Therefore, it takes longer time for the disease to spread between the clusters than it takes to transmit between holdings that are randomly connected. That random networks have a low level of clustering compared to other kind of networks (for example small-world) is wellknown (Watts & Strogatz 1998; Shirley & Rushton 2005). We have measured the clustering coefficient of our networks and the results was the same as mention above, the clustering coefficient was lower in the random networks than in the networks generated by distance dependent linking. How fragmented a network is influence how well diseases could spread between the holdings. Fragmentation index measures to what extent the networks are disconnected. Here, only the networks with link density below 0.03 resulted in disconnected networks. Link densities of 0.03 or higher give rise to a connected graph and it is then possible for a disease to spread between all animal holdings in the network. Consider that a link density of 0.03 corresponds to an average link degree of almost 7.5, the values of the fragmentation index are reasonable. Distance dependent link creation results in slightly higher fragmentation index than with random link creation. Therefore, distance dependent connections give rise to more disconnected networks than random connections does. If the level of link density in a network is an advantage or a disadvantage, depends on the scientific area in question. In disease transmission networks, high link densities are not preferred because more links in the networks also implicates more possible transmission ways for the disease. Nevertheless, in other contexts as for examples information spread or in a meta community of a species that are vulnerable for extinction, high link density is desirable. In this study, we have measured the number of infected animal holdings as a measure of the extent of a possible epidemic. In practice, this is perhaps not that relevant because it is not desirable to let the disease transmission go on for long time. Instead, it is of course desirable that control strategies adopt as soon as possible after identifying an infection. We are interested in how many animal holdings that is infected and in the rate of the disease transmission. That is why no incubation time is included in the model. This is a simplification since the incubation time for infections differs between diseases. Some diseases have an incubation time of only a few days while it for others may be as long as a couple of years. Our results are valid for the parameter values (volume, variance and kurtosis) used in this analysis. If other parameter values are used the results might be different. Perhaps the boundary (obtain with scenario DlDt), where the extent of the epidemic will become the same irrespective of link density, is on another level compared with those parameter values that are used here. 4.2 Overrated links As mention above, using wrong time window can lead to a sampled network with too many links, compare to the real world network that was aimed to be sampled. Using networks that are static over time could also result in to high link densities. For static networks, ones the link structure is created it does not change during the simulation time. Different kinds of network representations are discussed and compared in Vernon and Keeling (2009). They concluded that static networks overestimate the effects of a disease transmission and therefore this network representation should be used with caution when modeling epidemics. Since the purpose in this study is to investigate the effect of an incomplete network rather than modeling the course of an epidemic, static networks are sufficient here. 8 Draft 091216 4.3 Ny rubrik men vet inte vad den ska heta än eller om det ska bli flera… If connections between animal holdings are distance dependent or not can be discussed. We assume that this is the case. Because of that, adjacent holdings have more contacts than holdings that are more distant. If transmission of diseases in a network is distance dependent or not can also be discussed but also here we assume distance dependence. One empirical example where most of the transmissions of a disease occurred between animal holdings near each other is the epidemic of the foot-and-mouth disease in Great Britain in 2001 (Ferguson et al. 2001). In this epidemic, only a few transmissions occurred over long distances. Kiss et al. (2005) mention this as an additional manner that implies that the connections are clustered with a connection probability that decreases with the distances. Our method for link connection that is based on distance dependence can be an example of that. Having knowledge about what proportion of links in a network that is a good representation of an empirical network is important because it expects that a number of network measures will depend on the link density. Therefore can this study be seen as a base for knowledge about if network measures of empirical networks are useful and if they can describe the system in a relevant way. Relevant depends on the current aim. In addition to the number of links missed or overrated, individual links might have different importance in the network system. One way to find and identify the most important links in a network is using network measures. There are several network measures available but most of them consider how important the nodes are and only some of them focus on the importance of the links. It would be valuable with more investigations in this field. In the future, the simple model used here can easily be extended to a more complex model, by including a recovery phase. In relation to real diseases, incubation time should also be included. To investigate the impact of the node density, it would be interesting to change the node structure to more aggregated and then see what impact that has on the disease transmission. Acknowledgements References Barthélemy, M., Barrat, A., Pastor-Satorras, R. and Vespignani, A., 2005. Dynamic patterns of epidemic outbreaks in complex heterogenous networks. Journal of Theoretical Biology 235, 275-288. (doi:) Bell, D.C., Atkinson, J.S. and Carlson, J.W., 1999. Centrality measures for disease transmission networks. Social networks 21, 1-21. Borgatti, S., 2003. The Key Player Problem in Dynamic Social Network Modeling and Analysis: Workshop Summery and papers, R. Breiger, K. Carley, P. Pattison, (Eds). National Academy of Sciences Press. 9 Draft 091216 Christley, R.M., Robinson, S.E., Lysons, R. and French, N.P., 2005. Network analysis of cattle movement in Great Britain. Proceedings of the Society for Veterinary Epidemiology and Preventive Medicine (2005), 234-243. Clauset, A., Moore, C. and Newman, M.E.J., 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98-101. Eames, K.T.D. and Keeling, M.J., 2003. Contact tracing and disease control. Proc. R. Soc. B 270, 25652571. Eames, K.T.D., Read, J.M. and Edmunds, W.J., 2009. Epidemic prediction and control in weighted networks. Epidemics 1, 70-76. Ferguson, N.M., Donnelly, C.A. and Andersson, R.M., 2001. Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature 413, 542-548. Guimerà, R. & Sales-Pardo, M., 2009. Missing and spurious interactions and the reconstruction of complex networks. PNAS ? (doi: 10.1073/pnas.0908366106) Heath, M.F., Vernon, M.C. and Webb, C.R., 2008. Construction of networks with intrinsic temporal structure from UK cattle movement data. BMC Veterinary Research 4:11. Håkansson, N., Jonsson, A., Lennartsson, J., Lindström, T. and Wennergren, U. Generating structure specific networks. Submitted to Advances in Complex Systems. Eller hur skriver man? Kao, R.R., Green, D.M., Johnson, J. and Kiss, I.Z., 2007. Disease dynamics over very different timescales: foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R. Soc. Interface 4, 907-916. Keeling, M. 2005. The implication of network structure for epidemic dynamics. Theoretical Population Biology 67, 1-8. Kiss, I.Z., Green, D.M. and Kao, R.R., 2005. Disease contact tracing in random and clustered networks. Pro. R. Soc. B 272, 1407-1414. Kiss, I.Z., Green, D.M. and Kao, R.R., 2006. The network of sheep movements within Great Britain: network properties and their implications for infectious disease spread. J. R. Soc. Interface 3, 669-677. Kiss, I.Z., Green, D.M. and Kao, R.R., 2008. The effect of network mixing patterns on epidemic dynamics and the efficacy of disease contact tracing. J. R. Soc. Interface 5, 791-799. Le Menach, A., Legrand, J., Grais, R.F., Viboud, C., Valleron, A-J. and Flahault, A., 2005. Modeling spatial and temporal transmission of foot-and-mouth disease in France: identification of high-risk areas. Veterinary Research 36, 699-712. (doi:10.1051/vetres:2005025) Lindström, T., Håkansson, N., Westerberg, L. and Wennergren, U., 2008. Splitting the tail of the displacement kernel shows the unimportance of kurtosis. Ecology 89, 1784-1790. Newman, M.E.J., Strogatz, S.H. and Watts, D.J., 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118. 10 Draft 091216 Newman, M. E. J., 2002. Assortative mixing in networks. Phys. Rev. Lett. 89 (20). (doi:10.1103/PhysRevLett.89.208701) Newman, M.E.J., 2003. The structure and function of complex networks. SIAM Rev. 45, 167-256. Ortiz-Pelaez, A. Pfeiffer, D.U. Soares-Magalhães, R.J. and Guitian, F.J., 2006. Use of social network analysis to characterize the pattern of animal movements in the initial phases of the 2001 foot and mouth disease (FMD) epidemic in the UK. Prev. Vet. Med. 76, 40-55. Perkins, S.E., Cagnacci, F., Straditto, A., Arnoldi, D. and Hudson, P.J., 2009. Comparison of social networks derived from ecological data: implications for inferring infectious disease dynamics. Journal of animal ecology 78, 1015-1022. Robinson, S.E. and Christley, R.M. 2007. Exploring the role of auction markets in cattle movements within Great Britain. Preventive Veterinary Medicine 81, 21-37. Shirley, M.D.F. and Rushton, S.P. 2005. The impacts of network topology on disease spread. Ecological Complexity 2, 287-299. Vernon, M.C. and Keeling, M.J., 2009. Representing the UK´s cattle herd as static and dynamic networks. Proc. R. Soc. B 276, 469-476. Wasserman , S. and Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge. Watts, D.J. and Strogatz, S.H., 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440442. Webb, C.R. 2005. Farm animal networks: unraveling the contact structure of the British sheep population. Preventive Veterinary Medicine 68, 3-17. Webb, C.R., 2006. Investigating the potential spread of infectious diseases of sheep via agricultural shows in Great Briatin. Epidemiology and Infection 134, 31-40. 11 Draft 091216 Table captions Table 1. Examples of link densities used in simulations and the corresponding mean link degree for the networks. mean degree link density (nr of links/node) 0.001 0.005 0.01 0.02 0.03 0.04 0.05 0.10 0.25 0.50 0.75 1.00 0.250 1.248 2.495 4.990 7.485 9.980 12.48 24.95 62.38 124,8 187.1 249.5 Table 2. Fragmentation index depending on link density and link creation method used. link density distance dependence random 0.001 0.005 0.01 0.02 0.03 0.9983 0.9065 0.0385 0.0002 0.0000 0.9981 0.1976 0.0133 0.0001 0.0000 Figure captions a) b) c) Figure 1. Network categories: a) complete network, b) real world network, c) sampled network 12 Draft 091216 Number of Animal holdings Random Placement Distance Dependent Linking Distance dependent transmission Dl Dt Random Linking Distance Random dependent transmission transmission Dl Rt Rl Dt Random transmission Rl Rt Figure 2. Flow chart showing the different parts of the model and how these relate to each other. 13 Draft 091216 (b) Random - Distance (RlDt) Mean nr of infected holdings Mean nr of infected holdings (a) Distance - Distance (DlDt) 500 400 300 200 100 0 0 50 100 150 200 250 500 400 300 200 100 0 300 0 50 100 Time 400 300 200 100 0 50 100 150 Time 200 250 300 (d) Random - Random (RlRt) Mean nr of infected holdings Mean nr of infected holdings (c) Distance - Random (DlRt) 500 0 150 Time 200 250 300 500 400 300 200 100 0 0 50 100 150 200 250 300 Time Figure 3. Mean number of infected per time step depending on linking and disease transmission scenarios. Scenario DlDt (a) and RlDt (b) have distance dependent disease transmission while scenario DlRt (c) and RlRt (d) have random transmission. With scenario DlDt (a) and DlRt (c) distance dependent link creation are used. In scenario RlDt (b) and RlRt (d) random link creation are used. Link densities used: 0.001 (---), 0.005 (…), 0.01 (--.--), 0.02 (__), 0.03 (-○-), 0.04 (-*-), 0.05 (-□-), 0.1 (-♦-), 0.25 (-◦-), 0.5 (-▼-), 0.75 (-x-) and 1.0 (-+-). 14 Draft 091216 DD l 300 RD a) l DR 200 l RR l t t t t 100 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 300 Time b) 200 100 0 0 0.1 300 c) 200 100 0 0 0.1 Link Density Figure 4. Number of time steps until (a) 10%, (b) 50% and (c) 90% of all in the network are infected. The time depends on which of the four scenarios that are used. For scenario RlDt the number of infected holdings did not reach any of the given proportions during the simulation time. 15 Nr of inf holdings 2 a) 1.5 1.25 0 50 100 150 200 250 300 Nr of inf holdings 1 b) 400 200 0 0 50 100 150 200 250 300 c) 400 200 0 0 50 100 150 200 250 d) 400 200 0 0 50 100 150 Time 200 250 500 400 300 200 100 0 500 400 300 200 100 0 300 300 e) 0 500 400 f) 300 200 100 0 0 Nr of inf holdings 1.75 500 400 300 200 100 0 Nr of inf holdings Nr of inf holdings Nr of inf holdings Nr of inf holdings Nr of inf holdings Draft 091216 50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300 g) 0 DD h) l RD l DR l RR l 0 50 100 150 200 250 t t t t 300 Time Figure 5. Mean number of infected per time step for a given link density and the four scenarios. Here, eight link densities, one at time, are used and compared. Link densities in the sub graphs: a= 0.001, b= 0.01, c= 0.03, d=0.05, e=0.07, f=0.1, g=0.5 and h=1.0. Notice that the scales of the y-axes are not the same in all sub graphs. 16 Draft 091216 1 (a) Assortativity Assortativity 0.8 0.6 0.4 0.2 0 0 0.1 0.7 0.6 0.5 0.4 0.3 0.2 0.8 0.9 Link Density Clustering coefficient 1 (b) Clustering coefficient 0.8 0.6 0.4 Distance dependent linking Random linking 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Link Density Figure 6. Average (a) assortativity and (b) clustering coefficient for the networks, depending on the way the holdings are connected to each other. Short title for page headings: Contact density and spread of infectious diseases in networks 17