Download jillian

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Jillian Green
1
Volca-Net: A Collaborative Learner Network System
Abstract
Mount Erebus, located in Antarctica, is a perfect volcano for study because of its exposed lava
lake constantly swirling with fresh magma. A network of sensors has been set up to record events at the
volcano, but because of its environment, the sensors often record not only volcanic happenings, but also
icequakes and other events occurring at the South Pole. The effort required for human classification of the
data has led scientists to turn to software for automatic event classification. Previous work, however, has
only analyzed the events at a central offsite data archive, after the events have occurred. Our program has
the potential for correct event analysis on-the-spot, meaning that each sensor’s data collection rates can be
adjusted as the event is occurring. Our project consists of a network of these sensors, each running
machine learning clustering algorithms, collaborating with each other for information regarding their data.
Our goal was to evaluate the usefulness of collaboration among sensors and determine whether by
querying neighbors each sensor can reach peak accuracy faster.
Introduction
The study of volcanoes has proven to be quite valuable, not only leading scientists to fascinating
discoveries about Earth’s formation and current state, but also offering significant information regarding
the nature and hazards of volcanoes here on Earth as well as on other planets. Mount Erebus in
Antarctica is an excellent volcano for study because of its harsh environmental surroundings, consistent
activity, and exposed lava lake (Aster, et al, 2004). Scientists have been drawn to Erebus for its likeness
to other volcanoes, such as Vesuvius, Nyiragongo, and Erta’Ale, as well as its lava lake that constantly
bubbles with fresh magma. Because of this, a network of seismometers, broadband sensors, infrasonic
microphones, infrared sensors, tiltmeters, and video are currently monitoring Mount Erebus and its lake at
all times (Aster, et al, 2004).
One of the questions that emerge from this rich collection of data is exactly what we can
determine from it. One area that would prove extremely beneficial is the ability to classify different
events that are taking place at the volcano based on their characteristics. Understanding what features
distinguish an explosive eruption from an ashvent eruption, for example, would prove highly useful. This
would increase our ability to predict such events at any given time, and identify changes that take place
prior to these events occurring. If we could determine the type of event taking place as it is happening or
even before it occurs, we might not only adjust the sensors’ responses accordingly, but also save the many
hours it takes for humans to classify these events manually.
Machine learning algorithms are ideal for such event classification as mentioned above, because
they work to analyze data in order to construct automated classifiers or identify trends and patterns in
Jillian Green
2
data. In general, machine learning systems are trained on a data set to either learn an appropriate model
from which to extract information, or learn to distinguish between certain relevant characteristics from the
data for classification. After training, the system is usually tested on a testing set to evaluate accuracy and
performance before being used on new data.
Only one machine learning algorithm, neural networks, has been applied to volcanic data (Langer
et al, 2003 and Scarpetta, et al, 2005). These experiments developed for classifying volcanic happenings
have always taken place post-event, on a collection of data previously gathered. Part of what makes our
project so valuable is that it offers event classification while the event is taking place, and possibly even
before if there is enough pre-activity. This not only saves the time and resources it takes to manually
classify data, it also supplies extra knowledge of the event early enough to adjust data collection
techniques and aid with other event-related experiments. The ability to correctly classify events taking
place at Erebus before or during their onslaught would open up a whole new realm of experimental
capabilities for the scientists currently studying its behavior.
Our project aims to correctly identify volcanic happenings through collaborative clustering
amongst the network of sensors at Erebus. The sensors will query each other for data relationships as
they are recording data, receiving information about the event that is currently taking place. We
developed methods for the learners to choose two of these events, query its neighbors for a relationship
between them, and create pairwise constraints for these events based on the responses received. The
motivation behind this collaboration is that perhaps with more information provided from neighboring
sensor locations, the sensors may more accurately analyze their own data, and thus reach correct
conclusions faster than when working in isolation. Overall, improvement is made with the incorporation
of collaboration into a network of clusterers. The rate at which the network advances from collaborating
depends largely on the correctness of the constraints adopted, which is heavily dependent on the accuracy
of neighboring clusterers and the points actively selected for query. Collaboration has proven to be a
new, unique, and helpful tool for machine learning algorithms, and further improvement within a network
of clusterers collaborating is probable with the addition of variations on this method.
Theoretical Background
Java
Java is a programming language developed by Sun Microsystems that derives much of its syntax
from C and C++. It is most useful for its platform portability, which makes it ideal for networking, as
well as other projects that extend to multiple platforms. Most Java technologies are free software under
the GNU General Public License, making it easy to obtain and sustain. We chose to use Java to construct
a simulation of the sensor network currently in place at Mount Erebus because of its cross-platform
compatibility, and also Weka’s machine learning Java implementations.
Jillian Green
3
Weka
Weka is an open source Java software library developed by the University of Waikato for
machine learning and data mining (Witten and Frank, 2005). It has algorithms that can be applied
standalone on an inputted dataset, or by being called from your own written Java code. Weka has
methods incorporated into it that support clustering, visualization, pre-processing, classification, and other
machine learning methods. Weka was helpful for us as a tool for representing the volcano sensor data as
a set of Instances with the ability to manipulate certain features. Its graphical user interface was also
useful for determining the most pertinent data features to extract, as being able to observe the feature
space helps make certain data relationships apparent.
The Data
Weka machine learning algorithms typically take in data organized into an .arff file (attributerelation file format) that contains any number of attributes about the data and, if it is training data, a class
attribute. The class attribute is the true class of the data item. In the case of supervised machine learning
algorithms, the class attribute assists in training, but in unsupervised algorithms (such as clustering) the
class attribute is unknown to the algorithm and is used only for performance evaluation.
We developed and tested our project using benchmark UCI data sets (Asuncion and Newman,
2007), to get an idea of performance before working on the actual volcanic data. We were able to obtain
some data from the sensor nodes at Erebus by working with The New Mexico Institute of Mining and
Technology. They offered us a set of data collected over the course of two months at the volcano,
containing data recorded that relates to 36 eruptions and 6 icequakes, already painstakingly classified
manually by their researchers. We used this labeled data set to on our network of machine learner
clusterers using measures of agreement between data partitions, which will be explained in further detail
later.
We analyzed the data from the volcano in order to extract the most relevant features and use them
for clustering. Figure 1 shows an explosive eruption from January 1st, 2006, as recorded by six different
stations. One can observe how each sensor node recorded the event from its own perspective, as each
graph differs to some degree even though they were all documenting the same event. These differences
come from the sensor’s location relative to the event, and also its method of data collection. Based on
these perspectives, we tried to localize where the events were happening, and also how an event’s features
and impact differed at different areas of the volcano.
Jillian Green
4
Figure 1: Seismic data of a January 1st, 2006 explosive eruption as recorded at the six
different stations CON, EIS, LEH, NKB, RAY, and HOO. The x-axis is time, recorded in
seconds. Each sensor records the event differently, depending on the event’s location in
relation to the sensor.
We discovered by analyzing the data that certain attributes seemed to most clearly define the
difference between an icequake and an eruption (the two primary events we were trying to distinguish),
such as peak seismic frequency and offset (elapsed time from detection at E1S to detection at the current
node). However, these attributes are not always uniform, as not all of the sensors measure the same thing.
We received data that came predominantly from four seismic sensors and one infrasonic (acoustic)
sensor, with the former recording fluctuations in seismic frequencies, and latter documenting pitch
variations. One of the goals for our project was to determine the value of the acoustic sensor, by running
multiple clustering rounds with and without its data contributing. We wanted to determine whether the
acoustic data helped improve the ability of the network to distinguish between eruptions and icequakes, or
if it actually just confused the rest of the sensors in the network because of its dissimilar data. We expect
eruptions to have an acoustic signal but icequakes to be silent.
Algorithm Descriptions and Implementations
Our project focuses on collaborative machine learning and topology exploitation (Lane and
Wagstaff, 2007). The largest contribution of this project is experimenting with collaborative learners, as
previous work has only analyzed data from a single station in isolation (Langer, et al, 2003 and Scarpetta,
et al, 2005). A machine learning system is generally trained on a portion of the data before reaching peak
accuracy, when it is then tested for correctness. The idea behind this collaboration of machine learners is
that each sensor node at Erebus will be an individual machine learner, running its algorithm on the data it
Jillian Green
5
collects at its location, but maintaining contact with the other sensors (learners) at the volcano. Ideally,
each node will query the other learners in the network, obtain information regarding current events
occurring at the volcano from them, and individually reach peak accuracy faster because of this added
input. Another major contribution of this project is extending evaluation of volcanic data to include a
broader class of machine learning algorithms, particularly clustering.
Clustering is a machine learning algorithm that clusters data together based on similarity, even
when pre-existing labels are not available. This algorithm is especially useful for extracting trends, for it
determines natural relationships between the data, identifying populations that tend to group together. It
is expected that data relating to similar event types will cluster together based on their input similarities.
Since clustering algorithms group data together without prior labels, they have the potential to single out
irregular types of events, as well as new and different ones that we have never identified before.
We chose to incorporate two different versions of k-means clustering, which is the most
straightforward of clustering algorithms. The way k-means clusters is as follows: First, the number of
clusters is chosen, and k cluster center locations are randomly placed in the feature space, or the area
where all of the data is plotted. Next, each data point is assigned to the cluster of the centroid that it is
closest to, and the new centroid is calculated, which is the center of all of the points it owns. In the final
step of the iteration, each center jumps to the new center (the center of all its data points). This process is
repeated until no item assignments change, and therefore convergence is reached.
Figure 2: K-means clustering objective function
The two different versions of k-means clustering are called PCKMeans and MPCKMeans. Both
of these incorporate different types of constraints into k-means, and provide significant improvement on
the original algorithm (Bilenko, et al, 2004). A pairwise constraint takes two data items and assigns either
a “must-link” or a “cannot-link” relationship to them. Pairwise constraints are usually developed based
off of any pre-existing knowledge of the data, but in our project they were developed by each neighboring
sensors’ answers to queries issued about the data. For example, sensor node 1 may make a query to the
other sensors in the network, “what kind of relationship did you obtain for data items 36 and 45?” The
other nodes will respond with either “I grouped them together in the same cluster,” or “I put them in
different clusters,” which translates to a must-link or a cannot-link, respectively. We implemented some
different unification strategies for how the original node will deal with this new information. One
unification strategy is majority vote (MV), where a node receives pairwise relationships from all other
nodes in the network, and adopts the constraint associated with the majority of the other learners. So,
Jillian Green
6
continuing on the above example, if sensor node 2 responds with a must-link, but nodes 3 and 4 both
respond with a cannot-link, then node 1 (the node who originally issued the query) will create a cannotlink constraint between items 36 and 45, since it got more votes. With majority vote, there is the chance
for a tie if there are an odd number of sensors (which means an even number of voting neighbors). In the
case of a tie, the querying learner abstains for that round, and does not adopt a new constraint at all.
Another unification strategy is called consensus vote (CV), where the querying learner only adopts a new
constraint if every other learner in the network agrees on the type of link associated with the two data
points. The motivation behind consensus vote is that while CV learners may abstain more often and thus
accept fewer constraints, the constraints they do adopt are more likely to be correct.
PCKMeans makes use of these pairwise constraints for clustering as well as centroid
initialization. It clusters with respect to the constraints, and calculates a penalty associated with any
constraint that must be violated. Sometimes constraints must be violated in order to achieve convergence,
and PCKMeans chooses the violations that will result in the smallest penalty, and still give the least
variance within its clusters. PCKMeans also uses the given pairwise constraints for choosing better initial
cluster centers (instead of random initialization as in regular k-means clustering), which can lead to faster
convergence and higher accuracy. For example, if there is a cannot-link constraint between two data
points, these two points may be initialized as two of the centroids themselves. In the same way, if there is
a must-link constraint between two data items, a cluster center may be initialized to be the point exactly
between these two points. Both of these scenarios are already an improvement on random initialization,
as we may have already cut out a couple of iterations, or even saved a centroid from being terribly
misplaced randomly by using this method (Bilenko, et al, 2004).
What if we knew that
these 2 points should
be in the same cluster?
Figure 3: Example of how a pairwise constraint would force the clusters to be arranged
differently from the original cluster assignments obtained during unconstrained clustering.
Jillian Green
7
Similarly, MPCKMeans uses pairwise constraints, but also incorporates different distance metrics
into the algorithm. Metric-learning generally adjusts the distance metric within the feature space to
satisfy the training data, or the data that has been correctly labeled (Bilenko, et al, 2004). Since our
learners are not given any labeled data to begin with, MPCKMeans instead adjusts the distance metric
based off of the pairwise constraints and the unlabeled data. For example, if the clusterer has a must-link
constraint between two points very far apart in the feature space, this suggests that the metric should be
adjusted in some way to make these points naturally lie closer together. The same idea occurs with
cannot-link constraints, where very near points that should not be in the same cluster suggests a stretching
of the feature space. Metric learning has been proven as an effective form of semi-supervised clustering,
as not all attributes affect distance measurements in the same way. In this report however, we will focus
primarily on the results obtained from PCKMeans.
It has also been shown that actively choosing certain data points to query for constraints is a
significant improvement over random selection of these points (Basu, et al, 2004 and Xu et al, 2005). For
example, if a clusterer receives constraints about two items that are already in or near the centers of its
clusters, the constraint might not even change its clustering at all, but if it received a constraint regarding
two points that it was less confident about, it might rearrange its centers to adapt to the new information.
We developed some active selection strategies that a clusterer would use to choose which data points they
would like to query from their neighbors. The goal of this active selection is to choose points that are
most unknown to the clusterer, presumably the points that lie on the cluster borders, and not near a
centroid. The tricky part about this active selection within our network is that if the query is too difficult,
the neighboring sensors are more likely to get the link wrong, but if the query is too easy (or known), it
will not have any effect on the clustering.
To implement this active selection, after initial clustering, the clusterer calculates the distances
from all of the points to their nearest centroids. After taking the difference between the distances of the
two closest cluster centers, the point with the smallest difference of these distances means that it is a point
situated fairly evenly between two cluster centers. We developed a few forms of active selection, one of
them being actively choosing two of these in-between points (which we called “limbo” points), and
another form being choosing one “limbo” point, and pairing it with one point that is very near to a cluster
center. The motivation for this choice is that perhaps if a “limbo” point is paired with a more known
point (a point whose confidence of which cluster it belongs to is very high), the pairwise relationship
between the two may result in being very helpful to determining the cluster of which the “limbo” point
should belong to. Even with three or more clusters, this form of active selection would still inform the
clusterer of some important data relationships. These forms of active selection were compared with
Jillian Green
8
random selection, in which the clusterer randomly chooses two of its data instances to query its neighbors
about.
There is also another option incorporated into our experiment, called broadcast. When a learner
puts together a new constraint based off of its neighbors’ answers to its query, it can either keep the
constraint for itself only or broadcast it to the other learners. Broadcasting can be very effective in
boosting performance if the constraints being passed around are correct, or have a very negative effect if
the constraints are noisy.
Performance Evaluation
Clustering evaluation is usually done in terms of the Adjusted Rand Index, which measures
similarity between data partitions through pairs of data items (Hubert and Arabie, 1985). Since clustering
is done without labels, one cannot evaluate it in terms of the percent of items correctly classified. The
Adjusted Rand Index can be used to compare similarities between two partitions, or, if the true labels of
the data are known, it can be used to calculate similarity to the correct partition of the data. To calculate
the Rand Index, you must go through every pair of items in the data and see whether those two items were
put into the same cluster or put into different clusters. Next, you increment the appropriate variable
according to how the other clusterer being compared clustered those items. If both clusterers put them
together or apart, the pair increments a or b, which hold the pairs of items that were clustered similarily
by both clusterers. If you are comparing your clusterer against the true data labels, you would simply see
if the pair of items fall in the same or in different classes. After going through all of the pairs of data
items, the amount of similarly paired items (a+b) is divided by the total number of pairs (see Figure 4
below).
a = same cluster X, same cluster Y
b = diff cluster X, diff cluster Y
c = same cluster X, diff cluster Y
d = diff cluster X, same cluster Y
Figure 4: Upper left, variables defined for calculating the Rand Index between two
clusterers, X and Y. Lower, the Rand Index, and Upper Right, the Adjusted Rand Index,
which compensates for agreement by chance.
The Adjusted Rand Index compensates for agreement by chance, while the regular Rand Index
simply calculates the percentage of correctly grouped pairs. While the Rand Index gives a number
between zero (not similar at all) and one (the exact same partitions), the Adjusted Rand Index has the
Jillian Green
9
ability to go below one if the agreement is less than that expected by random chance. We used the
Adjusted Rand Index to evaluate our network, comparing our partitions against the true classes of the data
in the original data set from New Mexico Tech. Figure 5 compares the performance of oracle, which
provides an idea of potential performance for every selection strategy because every data pair queried is
provided the correct link, majority vote (MV), consensus vote (CV), and self, in which the clusterer does
not collaborate and only uses constraints from its own clustering. In this experiment, one pairwise
constraint is adopted by one clusterer at each round. One can see by observing these results that
collaboration is indeed useful; the clusterers who only have themselves to rely on (instead of their
neighbors) cannot improve at all after initial clustering. These results are from a four learner network
clustering the data from Mount Erebus, using the activeLimboClose selection strategy, where they query a
pair consisting of one ”limbo“ point (a point that falls directly between two cluster centers, creating doubt
about which cluster it belongs to) and one point that is close to a cluster centroid. It can be seen from
Figure 5 that there is significant room for improvement as far as majority and consensus votes are
concerned, however, the fact that the oracle can improve so much shows hope for collaborative machine
learning in general. It can be seen that the collaboration still improves the performance of MV (even if
only for certain rounds), although not as much as oracle. It should also be noted that the data collected
from Mount Erebus is a particularly difficult dataset.
Figure 5: PCKMeans with a 4 Learner Network. Each unification strategy (oracle, MV,
CV, and self) is shown using the activeLimboClose selection strategy. Average over 10
trials.
Jillian Green
10
One might also notice that majority vote tends to peak at some point, then go down lower than
where it began. This is possibly due to the network as a whole converging towards a wrong solution, due
to incorrect constraints. As each learner adopts new constraints based off of its neighboring sensors’
responses, it will learn according to their solutions, regardless if they are correct or not.
The different selection strategies had a great impact on the results, as can be seen in Figure 6.
Even with only three different strategies, oracle and MV both differ considerably in performance. It is
interesting to note how random and activeTwoLimbos perform the best with oracle, whereas
activeLimboFar does not do as well. But with MV (where sometimes the constraints are incorrect),
activeLimboFar does much better than random selection.
Figure 6: PCKMeans, MEVO Data Set, 4 Learner Network showing 3 different selection
strategies: random, activeLimboFar, and activeTwoLimbos. Average over 10 trials.
The results obtained from the different selection strategies caused us to consider the correctness
of the constraints as a decisive factor in clustering performance. Because of this, we delved into constraint
cleanup, or trying to ensure that the constraints adopted by clusterers in the network do not contradict
each other. We developed a method that goes through each clusterer’s list of constraints and calculates all
the implied constraints from them. An implied constraint is a constraint that follows naturally based on
other constraints. For example, if a clusterer has a must-link constraint between items 1 and 2, and a
must-link constraint between items 2 and 3, then it follows that there is an implied must-link constraint
between items 1 and 3 as well. In this method, we calculate all of these implied constraints associated
Jillian Green
11
with each newly generated constraint and delete the older constraint in the case of a conflict. (It is
assumed that because there is learning happening, the newer constraints are more accurate).
Figure 7 shows PCKMeans running a 10 learner network on the Mount Erebus data, emphasizing
the differences made by constraint cleaning. The figure shows that the tidying up of the constraint pools
for each learner does make a difference, at times keeping the average per-learner ARI in the network just
a bit higher than it would be, and also smoothing out some dips in ARI. It is also interesting to note how
the cleaning has more or less of an impact at different times during the experiment according to the
selection strategy. For random selection, constraint cleaning barely changes any performance until at least
450 rounds, whereas for activeTwoLimbos it does indeed change the constraint pools enough to affect
performance by the 300th round. This points out that certain selection strategies tend to generate more
conflicting constraints than others, and also that there is a need to reconcile them.
Figure 7: PCKMeans with a 10 Learner Network on the MEVO data. Shows the effects of
constraint cleaning compared to without constraint cleaning. Average over 10 trials.
The Mount Erebus data set is a very difficult one to cluster, as some of the events quite literally
map directly on top of each other in the feature space. In order to get a more accurate idea of the
contribution of collaborative machine learning, we experimented with removing a couple of events, thus
making the data set slightly easier to cluster. We hypothesized that if the initial ARI was higher it would
provide more room for learning, as a low start is particularly difficult to recover from. We developed this
idea by studying the results for testing the network on the UCI data sets (Asunsion and Newman, 2007),
Jillian Green
12
and noticed more of an improvement when the initial ARI was higher, most likely because the constraints
are more likely to be correct. The modified data set caused the initial ARI (with no collaboration) to jump
from approximately 17 to 20, and for majority vote the network was able to reach a peak ARI of about 49
instead of around 25. This shows the potential for collaborative clustering networks in general, as much
learning is achieved within the network, without any outside source of information.
Another variation on our techniques is the number of clusterers placed at each sensor node in the
network. Even though they would be clustering the same data, because randomly selected cluster centers
are the root of k-means clustering, their clustering might still differ, and the added clusterers might be
able to offer more information to the network. Figure 8 shows the results for placing one, two, and three
clusterers at each node. The contribution of these additional clusterers is minimal, but present
nonetheless, and one can notice that peak ARI reached from majority vote is able to jump from roughly
48 to over 50 with the added clusterers. In these figures, active selection refers to the activeTwoLimbos
strategy.
1 clusterer at each sensor
2 clusterers at each sensor
3 clusterers at each sensor
Figure 8: Results for PCKMeans clustering with one, two, and three clusterers at each
sensor node. Active selection refers to activeTwoLimbos. Average over 10 seeds.
Conclusions and Future Work
This project has shown that collaboration can be quite beneficial to a network of machine
learners. Even if the constraints adopted are faulty, learning can and does still occur. Future research in
this area should focus on ensuring correct constraints, perhaps by incorporating link or learner
confidences.
Further work can also be done on with querying as well: each clusterer might choose its own high
confidence points to broadcast to the others in the network, rather than querying based on its own pool.
This would be in effect the reverse of the type of constraint adoption strategy that we have developed, but
might offer some different collaborative contributions. Other variations on constraint adoption might also
be researched further. This experiment should also be run on a larger dataset from Mount Erebus, when it
becomes available. The difficulty of the data being clustered has a large impact on the performance of the
Jillian Green
13
collaborating clusterers, as can be seen when comparing the performance of the network on the Mount
Erebus data with the UCI datasets and the slightly smaller Erebus data set. It is probable that with more
data available for clustering, the more likely the clusterers might converge to better solutions as a
network.
Acknowledgements
This research was partly carried out at the Jet Propulsion Laboratory, California Institute of Technology,
and was partly sponsored by the Summer Undergraduate Research Fellowship program and the National
Aeronautics and Space Administration.
References
Aster, R. et al. (2004). "Real-time data received from Mount Erebus volcano, Antarctica," Eos, 85:10, p.
97-104.
Asuncion, A. and Newman, D.J. (2007). UCI Machine Learning Repository
[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of
Information and Computer Science.
Basu, Sugato, Banerjee, Arindam, and Mooney, Raymond J. “Active Semi-Supervision for Pairwise
Constrained Clustering” Proceedings of the SIAM International Conference on Data Mining (SDM2004), Lake Buena Vista, FL, April 2004.
Bilenko, Mikhail, Basu, Sugato, and Mooney, Raymond J. (2004). “Integrating Constraints and Metric
Learning in Semi-Supervised Clustering.” Proceedings of the 21st International Conference on Machine
Learning (ICML-2004), p. 81-88, Banff, Canada.
Hubert, L., and Arabie, P. “Comparing Partitions.” Journal of Classification, vol. 2, pp. 193-218, 1985.
Lane, T. and Wagstaff, K. "Synergistic Machine Learning: Collaboration and Topology Exploitation in
Dynamic Environments," NSF proposal to the Division of Information and Intelligent Systems, accepted
July 24, 2007.
Langer, H., Falsaperla, S., and Thompson, G. Application of Artificial Neural Networks for the
classification of the seismic transients at Soufriere Hills volcano, Montserrat. Geophysical Research
Letters, vol. 30, no. 21, 2003.
Jillian Green
MacQueen, J.B. (1967). "Some Methods for Classification and Analysis of Multivariate Observations",
Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley,
University of California Press, p. 281-297.
Scarpetta, S., Giudicepietro, F., Ezin, E. C., Petrosino, S., Del Pezzo, E., Martini, M., and Marinaro, M.
“Automatic Classification of Seismic Signals at Mt. Vesuvius Volcano, Italy, Using Neural Networks.”
Bulletin of the Seismological Society of America, Vol. 95, No. 1, pp. 185–196, 2005.
Witten, Ian H. and Frank, Eibe. “Data Mining: Practical Machine Learning Tools and Techniques.” 2nd
Edition, Morgan Kaufmann, San Francisco, 2005.
Xu, Quinjun, desJardins, Marie, and Wagstaff, Kiri L. “Active Constrainted Clustering by Examining
Spectral Eigenvectors.” Proceedings of the Eighth International Converence on Discovery Science, p.
294-307, 2005.
14