Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
6th International Conference on Software Engineering and Knowledge Engineering SEKE 2014, Hyatt Regency, Vancouver, Canada Artificial neural networks for infectious diarrhea prediction using meteorological factors in Shanghai Yongming Wang, Junzhong Gu and Zili Zhou Department of Computer Science & Technology, East China Normal University Institute of Computer Applications, Shanghai, China E-mail: [email protected] http://www.ica.stc.sh.cn OUTLINES • • • • • • • • Introduction Study area and dataset Prediction method and performance metrics Development of FFBPNN model – input and output parameters – Data pre-processing and post-processing – Determination of optimum network and parameters Development of MLR model Experiments results and discussion Sensitivity analyses Conclusions Introduction As a kind of common and important infectious disease, infectious diarrhea has a serious threat to human health and leads to one billion disease episodes and 1.8 million deaths each year (WHO, 2008). In Shanghai of China which is the biggest developing country, the incidence of infectious diarrhea has significant seasonality throughout the year and is particularly high in the summer and autumn of recent years. Hence, a robust short-term forecasting model for infectious diarrhea incidence is necessary for decision-making in policy and public health. Introduction Infectious diseases have a closely relation with meteorological factors, such as temperature and rainfall, and can affect infectious diseases in a linear or nonlinear fashion. In recent years, there has been a large scientific and public debate on climate change and its direct as well as indirect effects on human health. As far as we are concerned with the prediction of diarrhea diseases in literature, many forecasting models based on statistical methods for diarrhea diseases forecasting have been reported. With regard to the fact that number of meteorological factor that effect infectious diarrhea are too much and the inter-relation among them is also very complicated, prediction models based on statistics methods may not be fully suitable for such type of problems. Introduction Nowadays, Artificial Neural Networks (ANNs) are considered to be one of the intelligent tools to understand the complex problems and have been widely used in the medical and health field. To the best knowledge of the authors, there is no works has been carried out to utilize the ANNs method in predicting diarrhea disease. Contribution : E stablish a new ANNs model (FFBPNN) to predict infectious diarrhea in Shanghai with a set of meteorological factors as predictors. Study area and Dataset-Study area Shanghai is located in the eastern part of China which is the largest developing country in the world, and the city has a mild subtropical climate with four distinct seasons and abundant rainfalls. It is the most populous city in China comprising urban/suburban districts and counties, with a total area of 6,340.5 square kilometers and had a population of more then 25.0 million by the end of 2013. Study area and dataset-dataset 350 Weekly number of infectious diarrhea 300 250 200 150 100 50 0 0 50 150 100 Time(week) 200 T he infectious diarrhea cases for the period 2005.1.32009.1.4 250 35 25 30 30 25 20 15 10 20 15 10 5 0 5 90 85 Weekly average minimum relative humidity 30 35 Weekly average temperature 40 Weekly average minimum temperature Weekly average maximum temperature Study area and dataset-dataset 25 20 15 10 5 80 75 70 65 60 55 50 2007 2008 Time(week) 2009 -5 (b) 2005 2010 90 2009 2010 0 (c) 2005 1030 Weekly average atmospheric pressure Weekly average relative humidity 2007 2008 Time(week) 1035 85 80 75 70 65 60 55 50 45 (e) 2005 2006 1025 1020 1015 1010 1005 2006 2007 2008 Time(week) 2009 2010 (d) 45 2005 12 7 10 6 Weekly average wind speed 2006 Weekly average sunshine duration 0 (a) 2005 8 6 4 2 2006 2007 2008 Time(week) 4 3 2 1000 2006 2007 2008 Time(week) 2009 2010 (f) 995 2005 2006 2007 2008 Time(week) 2009 2010 0 (g) 2005 2006 2007 2008 Time(week) 2009 1 2010(h) 2005 2006 2007 2008 Time(week) 30 Weekly average rainfall 2010 5 35 25 The meteorological factors data for the period 2005.1.3-2009.1.4 20 15 10 5 0 (i) 2005 2009 2006 2007 2008 Time(week) 2009 2010 2009 2010 Method and performance metrics Dataset Data gathering Data Collecting Step 1: Data collection Data normalizing Models testing and comparing Prediction Model Step 2: Data pre-processing Step 3: Data mining Data mining Models development Pre-processing Data calculating The schematic method. flowchart of proposed Method and performance metrics n y f ( xi ) w0 w j (vij xi b j ) b j 1 i 1 m Three layered feed-forward back-propagation artificial neural network model. Method and performance metrics 1 n yˆ y t t n t MAE RMSE MAPE n n t 1 R 2 y t y yˆ ( ) y y ˆ t t n t n t 100% t n (y yˆ )2 t t t 1 1 n 2 (y t ) t 1 2 1 1 1 n 1 R 1 2 (y t yˆt ) t 1 n 2 (y t y ) t 1 The models with the smallest RMSE, MAE and MAPE and the largest R and R2 are considered to be the best models. Development FFBPNN model The FFBPNN modeling consists of two steps: --- Train the network using training dataset --- Model input and output parameters --- Data pre-processing and post-processing --- Determination of optimum network and parameters --- Test the network with testing dataset Hidden neurons and network errors Development FFBPNN model Parameters FFBPNN Number of input layer units 9 Number of hidden layer 1 Number of hidden layer units 4 Number of output layer units 1 Momentum rate 0.9 Learning rate 0.74 Error after learning 1e-6 Learning cycle 1500 epoch Transfer function in hidden layer Tansig Transfer function in output layer Purelin Training function TRAINGDM The optimum model architecture and parameters for the diarrhea prediction. Development MLR model WNID 1972.7903 10.9619Tmax 20.8158Tmin 2.6208Tavg 1.6506 RH min 0.2993RH avg 2.0902 APavg 5.7734 SD 15.7205WS avg 1.6048 R Dependent variable : diarrhea number Independent variables : meteorological factors Results and discussion Models FFBPNN PECs Training MLR Testing Training Testing MAE 20.7628 27.7547 29.8077 35.3774 RMSE 28.3007 36.0526 39.3739 48.9395 MAPE(% 27.27% ) 38.41% 43.37% 41.82% R 0.8490 0.8089 0.6968 0.8783 R2 0.9213 0.9125 0.8811 0.8388 The reason of better performances of the FFBPNN model over MLR model may be attributed to the complex nonlinear relationship between infectious diseases and meteorological factors. Results and discussion 350 Actual FFBPNN 300 The weekly number of infectious diarrhea The weekly number of infectious diarrhea 350 250 200 150 100 50 0 0 (a) Actual MLR 300 250 200 150 100 50 0 -50 20 40 60 80 100 Time(week) FFBPNN 120 140 160 0 20 40 60 80 100 Time(week) 120 140 MLR Comparison curves plot of actual vs. predicted trends for training dataset 160 Results and discussion 300 y=0.83+17 FFBPNN predicted values 250 R2=0.9385 200 150 100 50 0 (b)0 50 100 150 200 Actual values 250 FFBPNN 300 350 MLR Comparison scatter plot of actual vs. predicted values for training dataset Results and discussion 300 300 Actual MLR The weekly number of infectious diarrhea The weekly number of infectious diarrhea Actual FFBPNN 250 200 150 100 50 0 (c) 0 10 20 30 Time(week) 40 FFBPNN 50 60 250 200 150 100 50 0 (c) 0 10 20 30 Time(week) 40 50 MLR Comparison curves plot of actual vs. predicted trends for testing dataset 60 Results and discussion 250 200 y=0.54x+39 y=0.68x+28 180 R2=0.9125 R2=0.8388 160 MLR predicted values FFBPNN predicted values 200 150 100 140 120 100 80 60 50 40 20 (d) 0 0 50 100 150 Actual values 200 FFBPNN 250 300 0 (d)0 50 100 150 Actual values 200 250 MLR Comparison scatter plot of actual vs. predicted values for testing dataset 300 Sensitivity analyses Infectious diarrhea Meteorological factor ANNs black-box Sensitivity analysis (Cosine Amplitude Method) rij m x ik x jk / k 1 m m k 1 k 1 2 2 x x ik jk Sensitivity analyses Most effective meteorological factor : temperature least effective meteorological factor : average rainfall Conclusions 1. The proposed method is more suitable for prediction infectious diarrhea then statistical methods MLR. 2. The feed-forward back-propagation neural network (FFBPNN) model with architecture 9-4-1 has the best accurate prediction results in prediction of the weekly number of infectious diarrhea. 3. most effective meteorological factor on the infectious diarrhea is weekly average temperature, whereas weekly average rainfall is the least effective parameter on the infectious diarrhea. Therefore, this technique can be used to predict infectious diarrhea. The results can be used as a baseline against which to compare other prediction techniques in the future.