Download Tornado Detection with Kernel-Based Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bayesian Neural Networks for Tornado Detection
THEODORE B. TRAFALIS
BUDI SANTOSA
School of Industrial Engineering
University of Oklahoma
202 West Boyd, Suite 124
Norman, OK 73019 USA
MICHAEL B. RICHMAN
School of Meteorology
University of Oklahoma
1100 East Boyd, Suite 1310
Norman OK, 73019 USA
Abstract: - In this paper, conventional feedforward artificial neural networks (ANNs) and Bayesian neural
networks (BNNs), are applied for tornado detection. All methods are employed, using radar derived velocity data,
to distinguish pre-tornadic circulations (known as mesocyclones) that remain nontornadic from those which
become tornadic. Computational results show that the BNNs are more skillful than conventional ANNs for this
type of discrimination. The additional skill seen in BNNs is derived from the ability of the technique to more
accurately forecast when mesocyclones remain nontornadic.
Key words: ANN, chaotic systems, classification, detection, error generalization, feedforward neural networks,
machine learning, performance analysis, severe weather, training samples
1. INTRODUCTION
Detection of tornadoes with ample warning times has
long been a goal of severe weather forecasters.
Weather phenomena on the space scale of a tornado
are thought to be deterministically chaotic, as the
observational networks cannot resolve well the small
scale circulations. Moreover, there is no accepted
physical model describing the range of atmospheric
conditions leading to the formation of tornadoes.
Radzicki [7] defines deterministic chaos as
characterized by self-sustained oscillations whose
period and amplitude are nonrepetitive and
unpredictable, yet generated by a system devoid of
randomness. Whereas the precise location and time
tornadoes will strike is unknown with advance leadtimes, it is known what conditions lead to their
development, when and where they are most frequent,
and their likely trajectories. Operationally, when large
scale atmospheric conditions appear conducive to
tornado development, the Storm Prediction Center
issues tornado watches. When tornadoes are detected
visually or when pre-tornadic circulations (known as
mesocyclones) are sensed by Doppler radar, a tornado
warning is issued. With state-of-the-science weather
radar, high speed computing and advanced signal
processing algorithms, steady progress is being made
on increasing the average lead-time of such warnings.
As evidenced in the spring of 2003 in the United
States, with a record number of tornadoes and a
relatively small number of deaths, an extra minute of
lead-time can translate into a number of lives saved.
Hence the need to detect as many pre-tornadic
circulations as possible is an important aspect of a
tornado detection algorithm. However, meeting this
goal can result in an algorithm that predicts tornadoes
when none are observed. This is known as a “false
alarm”. A high false alarm rate can lead to the public
ignoring warnings. One of the severe weather
detection algorithms, created by the National Severe
Storms Laboratory and in use at the Weather
Surveillance Radar 1998 Doppler (WSR-88D), is the
Mesocyclone Detection Algorithm (MDA). This
algorithm uses the outputs of the WSR-88D and is
designed to detect storm–scale circulations associated
with regions of rotation in thunderstorms. The MDA
is used by meteorologists as one input in their
decision to issue tornado warnings. Marzban and
Stumpf [6] have shown that the performance of MDA
is worse than artificial neural network (ANN) postprocessing of the MDA.
In this paper, ANNs and Bayesian Neural
networks (BNNs) are applied to detect tornado
circulations sensed by the WSR-88D radar. A
Bayesian framework is utilized for machine learning,
using the evidence framework, to develop a variant of
ANNs for discriminating between mesocyclones that
remain nontornadic from those that become tornadic.
The paper is organized as follows. In section 2,
the definition of the problem is provided. Section 3
describes the data, whereas, Section 4 provides a brief
overview of ANNs and BNNs and the methodology
used herein is discussed.
In section 5, the
experimental setting is detailed. Section 6 provides
sensitivity analysis of ANNs and BNNs for several
forecast evaluation indices. Finally, Section 7
concludes with specific recommendations.
Recent work in BNNs shows that BNNs can be more
effective classifiers with noisy data in terms of
generalization ([4], [5]).
3. DATA AND ANALYSIS
The data set used for this research is the outputs from
WSR-88D radar. Tornadoes are one of the three
categories of the severe weather. The others are: hail
greater than 1.9 cm in diameter and non-tornadic
winds in excess of 25 ms-1. Any circulation detected
on a particular volume scan of the radar data can be
associated with a report of a tornado. In the severe
weather database, supplied by NSSL, there are two
truth numbers, the first for tornado ground truth, and
the second for severe weather ground truth [6].
Tornado ground truth is based on temporal and spatial
proximity of the circulation to the radar. If there is a
tornado reported between the beginning and ending of
the volume scan, and the report is within reasonable
distance of a circulation detection (input manually),
then the ground truth value is flagged. If a circulation
detection falls within the prediction "time window" of
-20 to +6 minutes of the ground truth report duration,
then the ground truth value is also flagged. The key
behind these timings is to determine whether a
circulation will produce a tornado within the next 20
minutes, a suitable lead time for advanced severe
weather warnings by the National Weather Service.
Any data with the aforementioned flagged values are
categorized as tornado cases with label 1. All other
circulations are given as label -1, corresponding to a
no tornado case.
The predictor pool employed in this study consists
of 23 attributes based on Doppler velocity data. These
same attributes have been used successfully by
Marzban and Stumpf [6] in their work on postprocessing radar data.
2. PROBLEM STATEMENT
There are two classes of problems addressed in this
research.
One is physical and the other is
methodological. The two are intimately entwined for
the prediction of tornadoes.
There are two challenges involved in tornado
warnings from the meteorological viewpoint. The
first one is tornado detection. Of those tornadoes that
do occur, the number of tornados detected is smaller.
The second one is false alarms. This means that the
algorithms detect tornado circulations more often than
such circulations can be confirmed. This is insidious
because the warnings have the potential to go
unheeded by the public after a series of false alarms.
Accordingly, it is desirable to develop a statistical
algorithm that will maximize detection and minimize
false alarms. Prediction of tornadoes is a difficult task
owing to the small scale of their circulation and the
rapid production in the atmosphere. They can form
within minutes and disappear just as quickly. The
best tool in the meteorologist’s arsenal to remotely
sense tornadoes is Doppler radar. However, present
day operational radar takes approximately 6 minutes
to complete one volume scan and the spatial
resolution averages close to ¼ km for Doppler radar
velocity.
Many tornadoes and pre-tornadic
circulations are smaller than that.
Despite the
challenges, lead times for tornadoes have increased
from a few minutes (a decade ago) to approximately
11 minutes (with current radar), largely due to
improvements in algorithms that use the radar data as
inputs.
The second research problem is to develop an
intelligent system that can generalize well with data
that have a significant noise component. ANNs are
considered robust classifiers in terms of input noise.
4. METHODOLOGY
4.1 Artificial Neural Network (ANN)
ANN models are algorithms for intellectual tasks such
as learning, classification, recognition, estimation, and
optimization that are based on the concept of how the
human brain works [3]. An ANN model is composed
of a large number of processing elements called
neurons. Each neuron is connected to other neurons
2
by links, each with an associated weight. Neurons
without links toward them are called input neurons
and those with no link leaving away from them are
called output neurons. The neurons are represented by
state variables. State variables are functions of the
weighted-sum of input variables and other state
variables. Each neuron performs a simple
transformation at the same time in a paralleldistributed manner. The input-output relation of the
transformation in a neuron is characterized by an
activation function. The combination of input neurons,
output neurons, and links between neurons with
associated weights constitute the architecture of the
ANN.
One of the advantages of using ANNs is that
they can extract patterns and detect trends that are
often too complex to be noticed by either humans or
other computer techniques. ANNs are appropriate for
capturing existing patterns and trends from noisy data.
The procedure involves training an ANN with a large
sample of representative data and testing the ANN by
using data not included in the training with the aim of
predicting the new outputs of the ANN. The training
process involves different numbers of layers (inputs,
output, and hidden) and neurons and links between
neurons with associated weights. The last layer
represents the output. The number of hidden layers is
user-defined. The user can modify how many neurons
each layer has. Training and testing error tolerances
can also be adjusted by the user. After the network has
been trained and tested to the user satisfaction, it is
ready for use. New sets of input data can be presented
to the network, and they will produce a forecast based
on what it has learned. A trained ANN can be treated
as an expert in the category of information it has been
given to analyze. This expert can then be used to
provide predictions given new daily situations, and
answer what-if questions.
function, and is often motivated by some underlying
principle such as maximum likelihood. The
disadvantage of such approaches is that the designed
networks can suffer from a number of deficiencies,
including the problem of determining the appropriate
level of model complexity. More complex models
(e.g. ones with more hidden units or with more
number of layers or with smaller values of
regularization parameters) give better fits to the
training data, but if the model is too complex it may
give poor generalization (overfitting).
The Bayesian viewpoint provides a general
and consistent framework for statistical pattern
recognition and data analysis. In the context of neural
networks, a Bayesian approach offers several
important features including the following [1]:
 The technique of regularization arises in a
natural way in the Bayesian framework. The
corresponding regularization parameters can
be treated consistently within the Bayesian
setting, without the need for techniques such
as cross-validation.
 For classification problems, the tendency of
conventional
approaches
to
make
overconfident predictions in regions of sparse
data can be avoided.
 Bayesian methods provide an objective and
principled framework for dealing with the
issue of model complexity (for example, how
to select the number of hidden units in a feedforward network), and avoid many of the
problems of overfitting which arise when
using maximum likelihood.
4.3 Forecast Evaluation Indices for Tornado
Detection
In the detection paradigm, the forecast results are
evaluated by using a suite of forecast evaluation
indices based on a contingency table (otherwise also
known as a "confusion matrix", Table 1).
The cell counts (a, b, c, d) from the confusion
matrix can be used to form forecast evaluation indices
[9]. In this definition of the confusion matrix, one
such index is the Probability of Detection, POD,
which is defined as POD = a/(a+c). POD measures
the fraction of observed events that were correctly
forecast. Its range is 0 to 1 and a perfect score is 1 (or
100%). Note that POD is sensitive to hits, therefore,
good for rare events. However, POD ignores false
alarms and it can be improved artificially by issuing
4.2 Bayesian Neural Network
The Bayesian evidence framework has been
successfully applied to the design of multilayer
perceptrons (MLPs) in the work of MacKay ([4], [5]).
Bayesian methods have been proposed for neural
networks to solve regression and classification
problems. These methods claim to overcome some
difficulties encountered in the standard approach such
as overfitting.
In conventional approaches, the training for
ANNs is based on the minimization of an error
3
more "yes" forecasts to increase the number of hits.
False Alarm Rate, FAR, is defined as FAR = b/(a+b).
FAR measures the fraction of "yes" forecasts in which
the event did not occur. Its range is 0 to 1, and 0 is a
perfect rate. FAR is sensitive to false alarms and it
ignores misses. It can be improved artificially by
issuing more "no" forecasts to reduce the number of
false alarms. The concept of skill is one where a
forecast is superior to some known reference forecast
(e.g., random chance). Skill ranges from –1 (antiskill) to 0 (no skill over the reference) to +1 (perfect
skill). Heidke’s skill formulation is commonly used
in meteorology since it uses all elements in the
confusion matrix and works well for rare event
forecasting (e.g. tornadoes) [2]. Heidke’s Skill =
[2(ad-bc)/(a+b)(b+d)+(a+c)(c+d)].
scheme [4]. For ANNs the threshold is varied from 0
to 0.8 and for BNNs the threshold is kept fixed at
value 0.
The data structure consists of m by n matrix, where
m refers to observations and n refers to attributes.
6. RESULTS
Figures 1 through 3 depict different performance
aspects through the abovementioned evaluation
indices for each classification method. Each graph
describes the relation between a specific statistic and
percentage of tornadic observations in the testing set
for all methods. All the methods are trained with a
fixed training set. Tuning parameters are determined
in the training phase and these are used to make
predictions for different testing sets based on the ratio
of tornadic to nontornadic observations. Increasing
the number of nontornadic observations while keeping
the tornadic observations fixed accounts for the
various testing sets. By investigating this ratio, the
performance for each forecast validation statistic can
be observed.
In these experiments, the threshold, that is the certain
value of the label or output where it is categorized as
tornadic (1) or nontornadic (-1), is varied. The
threshold range is from 0 to 0.8.
The POD results (Fig. 1) indicate that the BNNs
are not affected by the percentage of tornadoes
retained in the testing set (Fig. 1a) in stark contrast to
the results for ANNs, where the POD rises as a
function of increasing percentages of tornadoes
analyzed (Fig. 1b). At a first glance, this would
suggest that the POD of approximately 0.84 for the
BNNs is not nearly as high as the best ANNs, which
have PODs in the 0.85 to 0.9 range. However, a
realistic climatological incidence for tornadic days
over most of the United States is close to 2 percent, so
the difference between the methods is minimal. Since
BNNs are not a function of threshold, Fig. 1a does not
indicate this. However, ANNs were found to be
sensitive to the threshold used to label the output as
tornadic or nontornadic. Results indicate increased
POD as the threshold gets smaller, with a maximum at
0.8.
The FAR results (Fig. 2) show the advantage of
BNNs over ANNs. The BNNs (Fig. 2a) have a
comparatively low FAR, ranging from about 30
percent to 18 percent, as the percentage of tornado
events increases from 2 to 10. The ANNs (Fig. 2b)
5. EXPERIMENTS
In the experiments, the data are split into two sets:
training and testing. For the training set, the ratio
between tornadic and nontornadic observations is
about the same. In the testing sets, the ratio is varied
from 2% to 10% in 2% increments and the sensitivity
of each method to this ratio is determined. The cases
used for training are different to those used in the
testing set. The same training and testing sets are
applied to all methods.
All experiments were
performed using a Pentium IV computer. The BNN
and ANN experiments are performed in the MATLAB
environment.
A feedforward neural network with two layers is
utilized, in which the first is a hidden layer and the
second is an output layer. The number of nodes in the
hidden layers is varied from 2, 5 and 10. The
experiments show that 5 hidden nodes provide the
best results. The network is trained by using a
gradient descent with momentum algorithm. The
ANN model is run 5 times. For each run, the testing
set output is examined. The predicted labels are
obtained by discretizing the average of the 5 sets of
outputs into –1 (no tornado) or +1 (tornado).
For BNN, the algorithm implements a two layer
feedforward neural network with a hyperbolic tangent
function for the hidden layer and a logistic sigmoid
function for the output layer [8]. The weights are
optimized with a maximum a posteriori (MAP)
approach; cross-entropy error function augmented
with a Gaussian prior over the weights [5]. The
regularization is determined by MacKay's ML-II
4
exhibit a much higher FAR, ranging from almost 60 to
65 percent for the 2 percent tornado case. When the
percentage of tornadoes increases to 10 percent, the
FAR drops into the 30 to 40 percent range. Recall,
the lower percentages of tornadoes match the
climatological probabilities most reasonably. Hence,
the FAR is superior for BNNs. Additionally, the FAR
exhibits a mild dependence upon threshold used in the
ANN models. The results indicate that the best
threshold is zero to provide the lowest FAR.
Finally, examination of skill is useful as it
combines the idea of POD and FAR into one index
and gives an idea of a gain over a random guess. The
skill results for BNNs (Fig. 3a) depict high values in
excess of 0.75 and ranging up to over 0.8 as a function
of increasing percentages of tornadoes examined in
the testing set. Note this would be a 75 percent (or
better) improvement over a random guess. The ANNs
(Fig. 3b) suggest positive skill with lower values than
BNNs, ranging from close to 0.5 for the two percent
tornado case up to around 0.6 when 10 percent
tornadoes are used. There is also a dependence upon
threshold with the best results are larger thresholds.
However, the best ANN results did not rise to the
level of skill shown by the BNNs.
[2] Doswell, C.A. III, R. Davies-Jones, and D. Keller,
On summary measures of skill in rare event
forecasting based on contingency tables,
Weather and Forecasting, 5, 1990, pp. 576585.
[3] Haykin, S. Neural Networks: A comprehensive
Foundation, 2nd edition, Prentice-Hall, Upper
Saddle River, NJ, 1999.
[4] MacKay, D., The Evidence Framework Applied to
Classification
Networks,
Neural
Computation, 4, 1992, pp. 720-736.
[5] MacKay,D., A practical Bayesian framework for
backpropagation
networks,
Neural
Computation, 4 , 1992, pp. 448-472.
[6] Marzban, C. and Stumpf, G. J., A neural network
for tornado prediction based on Doppler
radar-derived attributes, Journal of Applied
Meteorology, 35, 1996, pp. 617-626.
[7]
7. CONCLUSIONS
Post processing the output of radar-derived
velocity data has significantly improved both the
detection of tornadoes and lowers the false alarm rate,
compared to the raw mesocyclone detection
algorithm, currently in use.
Based on the performance improvements of BNNs
over ANNs, it is recommended that further research
be performed for the application of Bayesian networks
for tornado detection. The high level of skill shown is
an improvement over previous research in terms of
predictability (reducing chaos) and could result in
considerable reduction in loss of life if implemented
operationally.
Radzicki, M. J., Institutional dynamics,
deterministic chaos, and self-organizing
systems, Journal of Economic Issues, 24,
1990, pp. 57-102.
[8] Sigurdsson, S., Binary Neural Classifier, version
1.0 2002, http://mole.imm.dtu.dk/toolbox/ann/
[9] Wilks, D. S,. Statistical Methods in the
Atmospheric Sciences, Academic Press,
London, 1995.
Table 1. Confusion matrix.
Observed
Yes
No
Hits (a)
False
alarm (b)
Total
Forecast
Yes
No
Misses
(c)
Correct
negative
(d)
Forecast
No
Total
Observed
Yes
Observed
No
Yes
Acknowledgements: The present work has been
partially supported by the NSF grant EIA-0205628.
Predicted
References
[1] Bishop, C.M., Neural Networks for Pattern
Recognition, University Press, Oxford, 1995.
5
ANN
Bayesian NN
0.7
0.9
0.5
FAR
0.8
0.3
0.75
0.1
0.7
2%
4%
6%
8%
Percent of
Tornado
10%
2%
4%
6%
8%
10%
0.8
0.7
0.5
0.3
0.1
0
POD
0.85
Threshold
Percent of Tornado
(b)
Figure 2. FAR for different thresholds for
different percentage of tornado in the test set.
(a)
ANN
1
Bayesian NN
0.9
0.85
POD
0.8
Percent of
Tornado
Skill
2%
4%
6%
8%
10%
0
0.1
0.3
0.5
0.7
0.8
0.7
0.75
0.65
0.55
Threshold
0.45
2%
4%
6%
8%
10%
Percent of Tornado
(b)
Figure 1. POD for different thresholds for
different percentage of tornado in the test set.
(a)
ANN
0.85
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.8
10%
(a)
2%
8%
4%
6%
Percent of Tornado
6%
4%
8%
2%
0.7
Skill 0.65
0.6
0.55
0.5
0.45
0
0.1
0.3
0.5
0.7
0.8
0.75
10%
FAR
Bayesian NN
Threshold
Percent of Tornado
(b)
Figure 3. Skill for different thresholds for different
percentage of tornado in the test set.
6