Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bayesian Neural Networks for Tornado Detection THEODORE B. TRAFALIS BUDI SANTOSA School of Industrial Engineering University of Oklahoma 202 West Boyd, Suite 124 Norman, OK 73019 USA MICHAEL B. RICHMAN School of Meteorology University of Oklahoma 1100 East Boyd, Suite 1310 Norman OK, 73019 USA Abstract: - In this paper, conventional feedforward artificial neural networks (ANNs) and Bayesian neural networks (BNNs), are applied for tornado detection. All methods are employed, using radar derived velocity data, to distinguish pre-tornadic circulations (known as mesocyclones) that remain nontornadic from those which become tornadic. Computational results show that the BNNs are more skillful than conventional ANNs for this type of discrimination. The additional skill seen in BNNs is derived from the ability of the technique to more accurately forecast when mesocyclones remain nontornadic. Key words: ANN, chaotic systems, classification, detection, error generalization, feedforward neural networks, machine learning, performance analysis, severe weather, training samples 1. INTRODUCTION Detection of tornadoes with ample warning times has long been a goal of severe weather forecasters. Weather phenomena on the space scale of a tornado are thought to be deterministically chaotic, as the observational networks cannot resolve well the small scale circulations. Moreover, there is no accepted physical model describing the range of atmospheric conditions leading to the formation of tornadoes. Radzicki [7] defines deterministic chaos as characterized by self-sustained oscillations whose period and amplitude are nonrepetitive and unpredictable, yet generated by a system devoid of randomness. Whereas the precise location and time tornadoes will strike is unknown with advance leadtimes, it is known what conditions lead to their development, when and where they are most frequent, and their likely trajectories. Operationally, when large scale atmospheric conditions appear conducive to tornado development, the Storm Prediction Center issues tornado watches. When tornadoes are detected visually or when pre-tornadic circulations (known as mesocyclones) are sensed by Doppler radar, a tornado warning is issued. With state-of-the-science weather radar, high speed computing and advanced signal processing algorithms, steady progress is being made on increasing the average lead-time of such warnings. As evidenced in the spring of 2003 in the United States, with a record number of tornadoes and a relatively small number of deaths, an extra minute of lead-time can translate into a number of lives saved. Hence the need to detect as many pre-tornadic circulations as possible is an important aspect of a tornado detection algorithm. However, meeting this goal can result in an algorithm that predicts tornadoes when none are observed. This is known as a “false alarm”. A high false alarm rate can lead to the public ignoring warnings. One of the severe weather detection algorithms, created by the National Severe Storms Laboratory and in use at the Weather Surveillance Radar 1998 Doppler (WSR-88D), is the Mesocyclone Detection Algorithm (MDA). This algorithm uses the outputs of the WSR-88D and is designed to detect storm–scale circulations associated with regions of rotation in thunderstorms. The MDA is used by meteorologists as one input in their decision to issue tornado warnings. Marzban and Stumpf [6] have shown that the performance of MDA is worse than artificial neural network (ANN) postprocessing of the MDA. In this paper, ANNs and Bayesian Neural networks (BNNs) are applied to detect tornado circulations sensed by the WSR-88D radar. A Bayesian framework is utilized for machine learning, using the evidence framework, to develop a variant of ANNs for discriminating between mesocyclones that remain nontornadic from those that become tornadic. The paper is organized as follows. In section 2, the definition of the problem is provided. Section 3 describes the data, whereas, Section 4 provides a brief overview of ANNs and BNNs and the methodology used herein is discussed. In section 5, the experimental setting is detailed. Section 6 provides sensitivity analysis of ANNs and BNNs for several forecast evaluation indices. Finally, Section 7 concludes with specific recommendations. Recent work in BNNs shows that BNNs can be more effective classifiers with noisy data in terms of generalization ([4], [5]). 3. DATA AND ANALYSIS The data set used for this research is the outputs from WSR-88D radar. Tornadoes are one of the three categories of the severe weather. The others are: hail greater than 1.9 cm in diameter and non-tornadic winds in excess of 25 ms-1. Any circulation detected on a particular volume scan of the radar data can be associated with a report of a tornado. In the severe weather database, supplied by NSSL, there are two truth numbers, the first for tornado ground truth, and the second for severe weather ground truth [6]. Tornado ground truth is based on temporal and spatial proximity of the circulation to the radar. If there is a tornado reported between the beginning and ending of the volume scan, and the report is within reasonable distance of a circulation detection (input manually), then the ground truth value is flagged. If a circulation detection falls within the prediction "time window" of -20 to +6 minutes of the ground truth report duration, then the ground truth value is also flagged. The key behind these timings is to determine whether a circulation will produce a tornado within the next 20 minutes, a suitable lead time for advanced severe weather warnings by the National Weather Service. Any data with the aforementioned flagged values are categorized as tornado cases with label 1. All other circulations are given as label -1, corresponding to a no tornado case. The predictor pool employed in this study consists of 23 attributes based on Doppler velocity data. These same attributes have been used successfully by Marzban and Stumpf [6] in their work on postprocessing radar data. 2. PROBLEM STATEMENT There are two classes of problems addressed in this research. One is physical and the other is methodological. The two are intimately entwined for the prediction of tornadoes. There are two challenges involved in tornado warnings from the meteorological viewpoint. The first one is tornado detection. Of those tornadoes that do occur, the number of tornados detected is smaller. The second one is false alarms. This means that the algorithms detect tornado circulations more often than such circulations can be confirmed. This is insidious because the warnings have the potential to go unheeded by the public after a series of false alarms. Accordingly, it is desirable to develop a statistical algorithm that will maximize detection and minimize false alarms. Prediction of tornadoes is a difficult task owing to the small scale of their circulation and the rapid production in the atmosphere. They can form within minutes and disappear just as quickly. The best tool in the meteorologist’s arsenal to remotely sense tornadoes is Doppler radar. However, present day operational radar takes approximately 6 minutes to complete one volume scan and the spatial resolution averages close to ¼ km for Doppler radar velocity. Many tornadoes and pre-tornadic circulations are smaller than that. Despite the challenges, lead times for tornadoes have increased from a few minutes (a decade ago) to approximately 11 minutes (with current radar), largely due to improvements in algorithms that use the radar data as inputs. The second research problem is to develop an intelligent system that can generalize well with data that have a significant noise component. ANNs are considered robust classifiers in terms of input noise. 4. METHODOLOGY 4.1 Artificial Neural Network (ANN) ANN models are algorithms for intellectual tasks such as learning, classification, recognition, estimation, and optimization that are based on the concept of how the human brain works [3]. An ANN model is composed of a large number of processing elements called neurons. Each neuron is connected to other neurons 2 by links, each with an associated weight. Neurons without links toward them are called input neurons and those with no link leaving away from them are called output neurons. The neurons are represented by state variables. State variables are functions of the weighted-sum of input variables and other state variables. Each neuron performs a simple transformation at the same time in a paralleldistributed manner. The input-output relation of the transformation in a neuron is characterized by an activation function. The combination of input neurons, output neurons, and links between neurons with associated weights constitute the architecture of the ANN. One of the advantages of using ANNs is that they can extract patterns and detect trends that are often too complex to be noticed by either humans or other computer techniques. ANNs are appropriate for capturing existing patterns and trends from noisy data. The procedure involves training an ANN with a large sample of representative data and testing the ANN by using data not included in the training with the aim of predicting the new outputs of the ANN. The training process involves different numbers of layers (inputs, output, and hidden) and neurons and links between neurons with associated weights. The last layer represents the output. The number of hidden layers is user-defined. The user can modify how many neurons each layer has. Training and testing error tolerances can also be adjusted by the user. After the network has been trained and tested to the user satisfaction, it is ready for use. New sets of input data can be presented to the network, and they will produce a forecast based on what it has learned. A trained ANN can be treated as an expert in the category of information it has been given to analyze. This expert can then be used to provide predictions given new daily situations, and answer what-if questions. function, and is often motivated by some underlying principle such as maximum likelihood. The disadvantage of such approaches is that the designed networks can suffer from a number of deficiencies, including the problem of determining the appropriate level of model complexity. More complex models (e.g. ones with more hidden units or with more number of layers or with smaller values of regularization parameters) give better fits to the training data, but if the model is too complex it may give poor generalization (overfitting). The Bayesian viewpoint provides a general and consistent framework for statistical pattern recognition and data analysis. In the context of neural networks, a Bayesian approach offers several important features including the following [1]: The technique of regularization arises in a natural way in the Bayesian framework. The corresponding regularization parameters can be treated consistently within the Bayesian setting, without the need for techniques such as cross-validation. For classification problems, the tendency of conventional approaches to make overconfident predictions in regions of sparse data can be avoided. Bayesian methods provide an objective and principled framework for dealing with the issue of model complexity (for example, how to select the number of hidden units in a feedforward network), and avoid many of the problems of overfitting which arise when using maximum likelihood. 4.3 Forecast Evaluation Indices for Tornado Detection In the detection paradigm, the forecast results are evaluated by using a suite of forecast evaluation indices based on a contingency table (otherwise also known as a "confusion matrix", Table 1). The cell counts (a, b, c, d) from the confusion matrix can be used to form forecast evaluation indices [9]. In this definition of the confusion matrix, one such index is the Probability of Detection, POD, which is defined as POD = a/(a+c). POD measures the fraction of observed events that were correctly forecast. Its range is 0 to 1 and a perfect score is 1 (or 100%). Note that POD is sensitive to hits, therefore, good for rare events. However, POD ignores false alarms and it can be improved artificially by issuing 4.2 Bayesian Neural Network The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay ([4], [5]). Bayesian methods have been proposed for neural networks to solve regression and classification problems. These methods claim to overcome some difficulties encountered in the standard approach such as overfitting. In conventional approaches, the training for ANNs is based on the minimization of an error 3 more "yes" forecasts to increase the number of hits. False Alarm Rate, FAR, is defined as FAR = b/(a+b). FAR measures the fraction of "yes" forecasts in which the event did not occur. Its range is 0 to 1, and 0 is a perfect rate. FAR is sensitive to false alarms and it ignores misses. It can be improved artificially by issuing more "no" forecasts to reduce the number of false alarms. The concept of skill is one where a forecast is superior to some known reference forecast (e.g., random chance). Skill ranges from –1 (antiskill) to 0 (no skill over the reference) to +1 (perfect skill). Heidke’s skill formulation is commonly used in meteorology since it uses all elements in the confusion matrix and works well for rare event forecasting (e.g. tornadoes) [2]. Heidke’s Skill = [2(ad-bc)/(a+b)(b+d)+(a+c)(c+d)]. scheme [4]. For ANNs the threshold is varied from 0 to 0.8 and for BNNs the threshold is kept fixed at value 0. The data structure consists of m by n matrix, where m refers to observations and n refers to attributes. 6. RESULTS Figures 1 through 3 depict different performance aspects through the abovementioned evaluation indices for each classification method. Each graph describes the relation between a specific statistic and percentage of tornadic observations in the testing set for all methods. All the methods are trained with a fixed training set. Tuning parameters are determined in the training phase and these are used to make predictions for different testing sets based on the ratio of tornadic to nontornadic observations. Increasing the number of nontornadic observations while keeping the tornadic observations fixed accounts for the various testing sets. By investigating this ratio, the performance for each forecast validation statistic can be observed. In these experiments, the threshold, that is the certain value of the label or output where it is categorized as tornadic (1) or nontornadic (-1), is varied. The threshold range is from 0 to 0.8. The POD results (Fig. 1) indicate that the BNNs are not affected by the percentage of tornadoes retained in the testing set (Fig. 1a) in stark contrast to the results for ANNs, where the POD rises as a function of increasing percentages of tornadoes analyzed (Fig. 1b). At a first glance, this would suggest that the POD of approximately 0.84 for the BNNs is not nearly as high as the best ANNs, which have PODs in the 0.85 to 0.9 range. However, a realistic climatological incidence for tornadic days over most of the United States is close to 2 percent, so the difference between the methods is minimal. Since BNNs are not a function of threshold, Fig. 1a does not indicate this. However, ANNs were found to be sensitive to the threshold used to label the output as tornadic or nontornadic. Results indicate increased POD as the threshold gets smaller, with a maximum at 0.8. The FAR results (Fig. 2) show the advantage of BNNs over ANNs. The BNNs (Fig. 2a) have a comparatively low FAR, ranging from about 30 percent to 18 percent, as the percentage of tornado events increases from 2 to 10. The ANNs (Fig. 2b) 5. EXPERIMENTS In the experiments, the data are split into two sets: training and testing. For the training set, the ratio between tornadic and nontornadic observations is about the same. In the testing sets, the ratio is varied from 2% to 10% in 2% increments and the sensitivity of each method to this ratio is determined. The cases used for training are different to those used in the testing set. The same training and testing sets are applied to all methods. All experiments were performed using a Pentium IV computer. The BNN and ANN experiments are performed in the MATLAB environment. A feedforward neural network with two layers is utilized, in which the first is a hidden layer and the second is an output layer. The number of nodes in the hidden layers is varied from 2, 5 and 10. The experiments show that 5 hidden nodes provide the best results. The network is trained by using a gradient descent with momentum algorithm. The ANN model is run 5 times. For each run, the testing set output is examined. The predicted labels are obtained by discretizing the average of the 5 sets of outputs into –1 (no tornado) or +1 (tornado). For BNN, the algorithm implements a two layer feedforward neural network with a hyperbolic tangent function for the hidden layer and a logistic sigmoid function for the output layer [8]. The weights are optimized with a maximum a posteriori (MAP) approach; cross-entropy error function augmented with a Gaussian prior over the weights [5]. The regularization is determined by MacKay's ML-II 4 exhibit a much higher FAR, ranging from almost 60 to 65 percent for the 2 percent tornado case. When the percentage of tornadoes increases to 10 percent, the FAR drops into the 30 to 40 percent range. Recall, the lower percentages of tornadoes match the climatological probabilities most reasonably. Hence, the FAR is superior for BNNs. Additionally, the FAR exhibits a mild dependence upon threshold used in the ANN models. The results indicate that the best threshold is zero to provide the lowest FAR. Finally, examination of skill is useful as it combines the idea of POD and FAR into one index and gives an idea of a gain over a random guess. The skill results for BNNs (Fig. 3a) depict high values in excess of 0.75 and ranging up to over 0.8 as a function of increasing percentages of tornadoes examined in the testing set. Note this would be a 75 percent (or better) improvement over a random guess. The ANNs (Fig. 3b) suggest positive skill with lower values than BNNs, ranging from close to 0.5 for the two percent tornado case up to around 0.6 when 10 percent tornadoes are used. There is also a dependence upon threshold with the best results are larger thresholds. However, the best ANN results did not rise to the level of skill shown by the BNNs. [2] Doswell, C.A. III, R. Davies-Jones, and D. Keller, On summary measures of skill in rare event forecasting based on contingency tables, Weather and Forecasting, 5, 1990, pp. 576585. [3] Haykin, S. Neural Networks: A comprehensive Foundation, 2nd edition, Prentice-Hall, Upper Saddle River, NJ, 1999. [4] MacKay, D., The Evidence Framework Applied to Classification Networks, Neural Computation, 4, 1992, pp. 720-736. [5] MacKay,D., A practical Bayesian framework for backpropagation networks, Neural Computation, 4 , 1992, pp. 448-472. [6] Marzban, C. and Stumpf, G. J., A neural network for tornado prediction based on Doppler radar-derived attributes, Journal of Applied Meteorology, 35, 1996, pp. 617-626. [7] 7. CONCLUSIONS Post processing the output of radar-derived velocity data has significantly improved both the detection of tornadoes and lowers the false alarm rate, compared to the raw mesocyclone detection algorithm, currently in use. Based on the performance improvements of BNNs over ANNs, it is recommended that further research be performed for the application of Bayesian networks for tornado detection. The high level of skill shown is an improvement over previous research in terms of predictability (reducing chaos) and could result in considerable reduction in loss of life if implemented operationally. Radzicki, M. J., Institutional dynamics, deterministic chaos, and self-organizing systems, Journal of Economic Issues, 24, 1990, pp. 57-102. [8] Sigurdsson, S., Binary Neural Classifier, version 1.0 2002, http://mole.imm.dtu.dk/toolbox/ann/ [9] Wilks, D. S,. Statistical Methods in the Atmospheric Sciences, Academic Press, London, 1995. Table 1. Confusion matrix. Observed Yes No Hits (a) False alarm (b) Total Forecast Yes No Misses (c) Correct negative (d) Forecast No Total Observed Yes Observed No Yes Acknowledgements: The present work has been partially supported by the NSF grant EIA-0205628. Predicted References [1] Bishop, C.M., Neural Networks for Pattern Recognition, University Press, Oxford, 1995. 5 ANN Bayesian NN 0.7 0.9 0.5 FAR 0.8 0.3 0.75 0.1 0.7 2% 4% 6% 8% Percent of Tornado 10% 2% 4% 6% 8% 10% 0.8 0.7 0.5 0.3 0.1 0 POD 0.85 Threshold Percent of Tornado (b) Figure 2. FAR for different thresholds for different percentage of tornado in the test set. (a) ANN 1 Bayesian NN 0.9 0.85 POD 0.8 Percent of Tornado Skill 2% 4% 6% 8% 10% 0 0.1 0.3 0.5 0.7 0.8 0.7 0.75 0.65 0.55 Threshold 0.45 2% 4% 6% 8% 10% Percent of Tornado (b) Figure 1. POD for different thresholds for different percentage of tornado in the test set. (a) ANN 0.85 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.8 10% (a) 2% 8% 4% 6% Percent of Tornado 6% 4% 8% 2% 0.7 Skill 0.65 0.6 0.55 0.5 0.45 0 0.1 0.3 0.5 0.7 0.8 0.75 10% FAR Bayesian NN Threshold Percent of Tornado (b) Figure 3. Skill for different thresholds for different percentage of tornado in the test set. 6