Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Conference ‘Science in Technology’ SCinTE 2015 Spatio–temporal cluster analysis of seismicity using a modified density–based clustering algorithm Dionysios MOUNTAKIS 1* 1 University of Peloponnese, Dpt. of Computer Science and Technology, Tripoli 22100 *[email protected] Keywords: cluster-analysis, dbscan, seismicity, evaluation, validation Abstract Statistical pattern analysis techniques is a common approach in modern seismicity. Our main goal is to indentify natural underlying structural patterns in seismicity, with the use of density-based clustering analysis. The main issue with seismic clustering is the evaluation of density-based arbitrarily shaped clusters, since there is almost none validation criterion in literature designed for density cluster analysis or to distinguish the presence of noise. A second problem, that also arises, is the fact that different seismic clusters can very well be spatially overlapping, appearing as a single cluster. However, that cluster usually encompasses families of events with distinguished characteristics i.e. “classification of detected clusters into several major types, generally corresponding to singles, burst-like and swarm-like sequences” (Zaliapin and Ben-Zion, 2013). This report will examine the behavior of Density Based Spatial Clustering of Applications with Noise algorithm (DBSCAN) (Daszykowski et al, 2001) with the performance of a modified DBSCAN, within the vicinity of the Hellenic seismic arc and the surrounding Hellenic area. The modified DBSCAN utilizes weighing parameters that weight seismic events depending on energy emission. The algorithm has been implemented on MatlabTM suite. Results will be compared and discussed in order to examine the substantial degree of their alleged benefits. Introduction Recent results, even they are in question, have provided us with the ability to delineate the behavior of seismic activity within the Hellenic seismic arc, which up to this day, remains a challenge for Seismology. Until recent years, earthquakes were believed to be randomly occurring events depending on the movement among the colliding continental plates. Earthquakes occur when tectonic plates collide or when an accumulated amount of elastic energy at a specific area along a regional fault exceeds a threshold causing a rupture. The sensible question that arises is if these incidents can be predicted in space and time domains and what the approximate magnitude of the occurring events would be. Recent studies have come to the conclusion that earthquakes are not a randomly occurring event but follow a certain pattern (Vallianatos et al, 2013). The identification and analysis of the spatiotemporal characteristics of such a pattern would provide better understanding of how the mechanics and underlying physics of the earthquake phenomenon work. Use of modern computational techniques, such as neural networks and clustering algorithms, will assist in decoding such patterns. DBSCAN has the advantage of identifying clusters of arbitrary shape, improving cluster scalability and efficiency. DBSCAN’s algorithm source code has been written by Daszykowski (Daszykowski et al, 2001). Spatio–temporal cluster analysis of seismicity using a modified density–based clustering algorithm Methodology A brief reference to the DBSCAN algorithm has already been made. The classic DBSCAN requires two input parameters, the Epsilon radius (Eps) of the Eps-neighborhood and the Minimum Number of Points (MinPts) that lie in this Eps-Neighborhood. Points belonging to different neighborhoods can either be density-connected, density-reachable or direct density-reachable. The summation of these formations creates a cluster, arbitrarily shaped or not (Ester et al, 1995, Ester et al, 1996, Daszykowski et al, 2001). One of DBSCAN’s major disadvantages is its deficiency when it comes to data of different density areas i.e. some areas of ‘thicker’ data than others (Drakatos and Latoussakis, 2001). Ideally, new sets of input parameters should be selected each time data density changes, instead of globular variables. Since MinPts and Eps-radius combination cannot be chosen for the various density formations independently, a number of separate clusters cannot be individually identified, resulting to spatial overlapping and cluster within cluster. In an effort to provide a solution to the aforementioned problem, our approach encompasses the relations between the aftershock duration and the magnitude of the main shock (Eq. 1) and between the subsurface rupture length and the magnitude of the main shock (Eq. 2). To evaluate the aforementioned parameters the following expressions are used (Drakatos and Latoussakis, 2001): log(T) = 0.51M – 1.15 (1) log(L) = 0.35M – 0.62 (2) The key part of the proposed approach is that the parameter of Minimum Points (MinPts) has been replaced by the earthquakes’ magnitudes and time of occurence, both normalized on the spatial and temporal planes, i.e the algorithm calculates the normalized dimensionless values, which lie inside a predefined Eps-radius. If the result exceeds a specified threshold, then an Eps-neighborhood is formed, if not that point is either border or noise point. Then the cluster’s expansion follows the same rules as the traditional DBSCAN. Our model is inextricably associated with the emitted energies of main events and their aftershock sequences (Yang and Lee, 2004, Petersen et al, 2008, Vallianatos et al, 2013, Yeck et al, 2015). The classic DBSCAN algorithm has been modified in order to fulfill those criteria and the input data have been normalized in a manner that the algorithm comprehends spatial and temporal data as well. The most crucial part however, of this venture is the part of clustering evaluation and validation of the clustering scheme. We need to answer the questions: “How many clusters? How are they placed and distinguished in the spatial plane? Is the clustering reasonable?” Although, a lot of literature has been written about validation indexes for distance based clustering, there are no appropriate criteria for density-based clustering validation. Most well-known classifiers have major drawbacks when it comes to arbitrarily-shaped non-convex clusters and noise i.e. k-means cannot properly identify non-circular clusters nor classify noise as outlier. Such measures compute the within-cluster dispersion to between-cluster separation and results vary depending on different formulations (Cesca et al, 2014). Gaussian Mixture models are efficient regarding the overlapping issue unlike k-means, but still are not ideal for density clusters. Expectation-Maximization algorithm assigns points to clusters by some probability density estimation and not strictly like k-means. Other measures for arbitrary shaped clusters are Minimum Spanning Tree and Dunn index (Dunn, 1974), Proximity Graph by Yang and Lee (2004) etc. International Conference ‘Science in Technology’ SCinTE 2015 Searching in literature three measures seems to be distinguishing: CDbw (Halkidi and Vazirgiannis, 2001), DBCV (Moulavi et al, 2014) Density-Based Clustering Validation index and Density-Based Silhouette diagnostics (Menardi, 2011, Contreras-Reyes, 2013). “Seems to be” translates that we didn’t test them properly: DBCV output is between -1 and 1, most positive value means good clustering structures; however, there is no output regarding a cluster assigning row vector. Regarding CDbw index we were not able to find source code. The Density-Based Silhouette is implemented in R. Evaluation scheme has taken place using the GAP statistic (Tibshirani et al, 2001) with Linkage, Kmeans and Gaussian Mixture Distribution algorithms (Yeck et al, 2015). Results and Discussion The seismic catalogue used extracted from the National Observatory of Athens national catalogue, over an eleven year period 2000 to 2010, with completeness magnitude of M 3.1 Richter. The catalogue has been declustered using Reasenberg and Urhammer methods. Applied the aforementioned techniques, we lead to an optimal solution of 73 clusters, using the Gap criterion with Gaussian Mixture Distribution, Kmeans and Linkage (Ward’s Method) algorithms. However, the solution failed to converge in the 100 iteration mark producing empty clusters. The classifier row vector however, holds the correct solution of 53 clusters. Similarly, the Reasenberg declustered catalogue identified 46 clusters, while the Urhammer came up with 50. The Cophenetic Correlation coefficient values and DBCV validity indices are rather low i.e. our identified structures are accurate, but they could have been even more solid. Figure 1. Density-based clustering results from NOA catalogue for the period 2000 – 2010, optimal solution 53 clusters, with (a) Gaussian Mixture Distribution, (b) Linkage (Ward’s method) and (c) Kmeans algorithms. Spatio–temporal cluster analysis of seismicity using a modified density–based clustering algorithm Concluding Remarks As it can be easily concluded, neither of the evaluation methods used identified solid cluster underlying structures. That was expected however, since very few classifiers in literature are designed for density-based clustering: CDbw (Halkidi and Vazirgiannis, 2001), DBCV (Moulavi et al, 2014) and Density-Based Silhouette diagnostics (Menardi, 2011, Contreras-Reyes, 2013). They have to be properly implemented and tested thoroughly. Thus, we could have a clarified view, whether our approach solves, at some extent, the density-based clustering disadvantages or not. Acknowledgements The work was supported by the THALES Program of the Ministry of Education of Greece and the European Union in the framework of the project entitled ‘‘Integrated understanding of Seismicity, using innovative Methodologies of Fracture mechanics along with Earthquake and non-extensive statistical physics—Application to the geodynamic system of the Hellenic Arc. “SEISMO FEAR HELLARC”, (MIS 380208)”. References [1] CESCA, S., et al., 2014, Seismicity monitoring by cluster analysis of moment tensors, Geophys. J. Int. (2014) 196, pp. 1813–1826, doi: 10.1093/gji/ggt492 [2] CONTRERAS-REYES, E., J., 2013, Nonparametric Assessment of Aftershock Clusters of the Maule Earthqua ke Mw = 8.8, Journal of Data Science 11(2013), pp. 623-638. [3] DASZYKOWSKI, M., et al., 2001, Looking for Natural Patterns in Data. Part 1: Density Based Approach , Chemometrics and Intelligent Laboratory Systems, Volume 56, Issue 2, pp. 83-92. [4] DRAKATOS, G., and LATOUSSAKIS, J., 2001, A catalog of aftershock sequences in Greece (1971–1997): Their spatial and temporal characteristics, Journal of Seismology, Volume 5, pp. 137–145. [5] DUNN, J., C., 1974, Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, Volume 4, pp. 95–104. [6] ESTER, M., et al., 1995. A Database Interface for Clustering in Large Spatial Databases, Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, Montreal, Canada, AAAI Press. [7] ESTER, M., et al., 1996, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). [8] HALKIDI, M., and VAZIRGIANNIS, M., 2001, Clustering Validity Assessment: Finding the optimal partitioning of a data set, Data Mining, ICDM 2001, Proceedings IEEE International Conference on, San Jose CA, pp. 187-194. [9] MENARDI, G., 2011, Density-based Silhouette diagnostics for clustering methods, Stat Comput (2011) 21, Springer Science and Business Media, LLC 2010, pp. 295–308, doi: 10.1007/s11222-010-9169-0. [10] MOULAVI, D., et al., 2014, Density-Based Clustering Validation, Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA, 2014. [11] PETERSEN D., M., et al., 2008, Appendix J: Spatial Seismicity Rates and Maximum Magnitudes for Background Earthquakes, USGS Open File Report 2007-1437J, CGS Special Report 203J, SCEC Contribution #1138J, Version 1.0. [12] TIBSHIRANI, R., et al., 2001, Estimating the number of clusters in a data set via the gap statistic, J. Royal Statistical Society B, 63, Part 2, pp. 411-423. [13] VALLIANATOS, F., et al., 2013, A Non-Extensive Statistical Physics View in the Spatiotemporal Properties of the 2003 (Mw6.2) Lefkada, Ionian Island Greece, Aftershock Sequence, Pure Appl. Geophys. 171 (2014), pp. 1343–1354, doi: 10.1007/s00024-013-0706-6. [14] YANG, J., and LEE, I., 2004, Cluster validity through graphbased boundary analysis. In IKE, pp. 204–210. [15] YECK L., W., et al., 2015, Maximum magnitude estimations of induced earthquakes at Paradox Valley, Colorado, from cumulative injection volume and geometry of seismicity clusters, Geophys. J. Int. (2015) 200, pp. 322–336, doi: 10.1093/gji/ggu394. [16] ZALIAPIN, I., and BEN-ZION, Y., 2013, Earthquake clusters in southern California I: Identification and stability, Journal of Geophysical Research: Solid Earth, Volume 118, pp. 2847-2864, doi:10.1002/jgrb.50179.