Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 A REVIEW ON SPATIAL DATA MINING METHODS AND APPLICATIONS Aakunuri Manjula1, Dr.G.Narsimha2 1 Research Scholor, CSE, JNTUH, Telangana Associate Professor, CSE, JNTUHCEJ, Telangana 2 ABSTRACT: Spatial Data Mining (SDM) is a complex phenomenon as it deals with data that represents both spatial and non-spatial correlations in spatial databases. SDM extracts latent and implicit trends in spatial data to acquire business intelligence which support expert decision making. Spatial database is very vast as it can hold the spatial objects spread across the globe. Mining such databases have plethora of real world utilities such as discovering cancer clusters, crime hotspots, warming of oceans, traffic risk analysis, agriculture land grading, analyzing forest extent changes to mention few. In order to achieve this various techniques are used. This paper focuses on reviewing various spatial data mining techniques and their applications. The essence of SDM is to have applications pertaining to pattern families such as location prediction, spatial interactions, and hotspots. Keywords – Spatial data mining (SDM), SDM techniques, SDM applications [1] INTRODUCTION A spatial database is the database which has been specially optimized to store data pertaining to objects in the real world. In other words spatial data is the data which represents objects in geometric space. The objects are stored in database in the form of lines, points and polygons. A Relational Database Management System (RDBMS) with additional features can support spatial databases which are extensively used in environmental studies, Global Positioning System (GPS), and Geographic Information System (GIS). Spatial Data Mining (SDM) is a process of discovering trends or patterns from large spatial databases that hold geographical data. Objects in space such as roads, rivers, forests, deserts, buildings, cities etc., are stored in spatial database. Spatial databases are so complex and make the SDM more difficult when compared with traditional databases. The major applications of SDM are related to co-location mining, spatial outlier detection and location prediction. PostGIS, Microsoft SQL Server, Oracle Spatial, SpatiaLite etc., are the products available for building spatial databases. Object relational models can support 208 Aakunuri Manjula and Dr. G. Narsimha A Review On Spatial Data Mining Methods And Applications spatial data. Therefore all Object Relational Database Management Systems (ORDBMS) can support spatial data provided primitives such as point, line and region. Spatial objects are uniquely identified by latitude and longitude. Generally a GIS is used to store, retrieve and manipulate spatial data. Spatial database can also include data from CAD and CAM systems that represents smaller scale objects on printed circuit boards, automobile engines and so on. While modeling spatial database geometric types are used to store spatial data. These types include point, line string, polygon, arc line string, arc polygon, compound polygon, compound line string, circle and rectangle as shown in [Figure-1]. Figure: 1. Illustrates geometric types [1] He et al. [2] presented uncertainty issues in SDM. Peng et al. [3] made SDM experiments of POI databases. Thirumurugan and Suresh [4] focused on statistical spatial clustering. Spatial data mining is explored on big data in [5] and [6]. Geo SDM was focused in [7]. Sheng wu [8] focused on the problem of “data rich and knowledge lack” with respect to SDM. Moraes and Bastos [9] throw light into pattern recognition with SDM. The idea of smart earth with SPD is explored in [10]. Bi et al. [11] focused on settlement archeology through SDM. Wang et al. [12] contributed to cluster analysis in SDM by introducing two methods. Spatial data mining has plenty of real world applications such as traffic risk analysis, fire accident analysis, analysis of forest extent changes, grading of agriculture land, analysis of railways etc. In fact it can be integrated GIS systems and existing MIS of organizations. More details on SDM can be found in the rest of the sections of this paper. Our contributions in this paper include the study and review of various spatial data mining techniques and real world applications. The remainder of the paper is structured as follows. Section 2 focuses on the real world applications of SDM. Section 3 throws light into various spatial data mining techniques. Section 4 shows the importance of SDM with GIS. Section 5 describes CUBE usage with SDM for effectiveness. Section 6 bestows importance of visualization of SDM results while section concludes besides providing directions for future work. 209 Aakunuri Manjula and Dr. G. Narsimha International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 [2] APPLICATIONS OF SPATIAL DATA MINING There are many real world applications that need spatial data mining. The following sub sections provide such applications in some detail. [2.1] IDENTIFYING FIRE HOT SPOTS IN FORESTS Tay et al. [13] studied on the problem of knowledge derivation from spatial data. They focused on fire hot spots in forests with the help of satellite images. The fire hotspots can have false alarms. Their approach is based on the assumption that fires do not spread in straight line. The fire hot spots are clustered using region growing algorithm while pattern recognition is used to identify possible false alarms. They used 30 datasets of SPOT and NOVAA for experiments. Their prototype application demonstrates the detection of false alarms. Yu and Bian [14] did their research on fire cases that occur in the geographical space. They employed frequency theory and incremental spatial data mining in order to find out the relation between fire related factors available and the surrounding environment. They mined association rules so as to achieve this. The important steps involved in the solution include data preparation, datasets joint which combines multiple datasets, mining association rules and concluding results besides building an application known as “Forest Fireproof system” which is based on GIS. This research results can be used in real time reaction systems for efficient decision support services. [2.2] RAILWAY GEOGRAPHIC INFORMATION SYSTEM Spatial data mining can be used in GIS pertaining to railways. Xu et al. [15] studied various techniques that can be used in making a Railway Geographic Information System (RGIS) which can be used for spatial data presentation and statistical analysis besides helping in making well informed decisions. Some of the techniques useful for GRIS include spatial analysis, induction, classification and clustering, trend or spatial characteristic analysis, pattern recognition and digital map image analysis. Other approaches that can also be applied to spatial data mining are visualization approaches, rough set and fuzzy set approaches, genetic algorithms, and artificial neural networks. Thus the intelligent RGIS has high utility in the real world when it is used along with intelligent transportation systems. Moreover the RGIS can be integrated with existing management information systems (MIS) pertaining to railways. Such integrated application brings about more intuitiveness to the application. [2.3] EVALUATION OF FOREST EXTENT CHANGES Forests play an important role in eco system which has telling effect on living creatures. The extent of forest has its impact on the eco system dynamics. Interestingly SDM can be used to know these dynamics that will help nations to make well informed decisions. Jayasinghe and Yoshida [16] made significant research on evaluating the extent changes in forest. Their research was based on satellite images that are multi-temporal in nature. They used supervised and 210 Aakunuri Manjula and Dr. G. Narsimha A Review On Spatial Data Mining Methods And Applications unsupervised classification methods to derive thematic maps that are integrated with a GIS. For actual spatial data mining back propagation algorithm was used. The results revealed that there was decrease in extent of forests between 1992 and 2006. [2.4] ANALYZING DISTRIBUTION OF REGIONAL ECONOMY SDM can be used to analyze many social aspects through geographical data. For instance, Lian et al. [17] studied regional economic differences. They could find an interesting fact that regional economy has strong spatial correlation. With respect to rural regional economy, global spatial autocorrelation with respect to the variables such as per capita, agriculture total and output value classification technique pertaining to spatial data mining is used. This is done to discover knowledge pertaining to rural regional economy. Other approaches used for the analysis of regional economy are local spatial autocorrelation and global spatial autocorrelation. The results revealed that considering both global and local spatial autocorrelations yield better output. [2.5] GRADING OF AGRICULTURE LAND THROUGH SDM Grading of agriculture land grading can have important utility in real world. Jian [18] proposed spatial data mining approach for agriculture land grading. An empirical study made by Jia has methodology that covers problems pertaining to land grading. The application can produce missing land information as well besides identifying difficulties that are associated with factors of quantitative analysis. The analysis includes economic, social and natural factors. These results are visualized with annotations. The grading of agriculture land thus helps in understanding the importance of land in given area socially, economically and naturally. [2.6] CLUSTERING AREA GEOGRAPHICAL ENTITIES Guang-xue et al. [19] studied Clustering Area Geographical entities by using clustering algorithms. The algorithms work on the concept of geometric space similarity. Towards it similarity criteria is used. The experiments were made on line segments where space similarity criteria are adapted. Another similarity criterion considered is area geographical similarity. Table 1 summarizes real world spatial data mining applications. However, it needs to be noted that the list is not exhaustive. [2.7] AGRICULTURE CROP YIELD PREDICTION Crop yield prediction has important utility towards precison agriculture. Towards this many researchers proposed techniques. According to Prasad et al. [29] many techniques came into existence using remote sensing data that are linked to crop yield prediction directly or indirectly. The techniques include Temperature Condition Index (TCI), Vegetation Condition Index (VCI) and Normalized Vegetation Index (NDVI). In [30] a methodology was discussed for crop yield prediction. Stathakisa et al. [30] explored neural networks for prediction of wheat crop. This is 211 Aakunuri Manjula and Dr. G. Narsimha International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 done using remote sensing data. They used a system known as Adaptive Neuro-Fuzzy Inference System (ANFIS) which takes many parameters as input and generates forecast pertaining to wheat yield. Table 1 - Summary of Real World Applications of SDM Author (s) Year Application Identifying real forest fire events and disclosing false alarms Railway graphic information system that support visual presentation, query and expert decision making. Remarks Experiments are made on satellite images Tay et al. [13] 2003 Xu, Qin, and Huang [15] 2003 Xingxing et al. [20] 2005 Integrated GIS with mining features Lian et al. [17] 2008 Distribution of regional economy analysis Jinlin et al. [21] 2008 Accident analysis Jia [18] 2009 Grading of agriculture land Wang and Chen [22] 2011 Study of land use Agriculture land dataset Jayasinghe and Yoshida [16] 2013 Analyzing forest extent changes which will help in taking steps to bring balance in eco system Experiments are made using satellite images Ravikumar & Gnanabhaskaran [23] - Traffic risk analysis Data collected from government Data from railway MIS Data is used from existing GIS Spatial data collected from agriculture lands Data collected from GIS Agriculture land dataset [3] SPATIAL DATA MINING TECHNIQUES This section focuses on methods or techniques used for spatial data mining by the researchers earlier. Tay et al. [13] used region growing method and Hough transform for identifying forest fire hotspots besides disclosing false alarms. Xu, Qin, and Huang [15] applied association rule mining, classification and forecast, trend analysis and planning for railway GIS. Xingxing et al. [20] used SPMML based method for integrating SDM and GIS so as to make the GIS more effective in serving spatial information and making expert decisions. Zhang et al. [24] Identified problems with current techniques that deal with data present in data warehouse and developed a new method of data representation namely “Spatial Data Cube” which improves the 212 Aakunuri Manjula and Dr. G. Narsimha A Review On Spatial Data Mining Methods And Applications quality of SDM. Yu and Bian [14] employed frequency based method for making GIS based fireproof system which helps in analyzing fire accidents in forests. Jayasinghe and Yoshida [16] used back propagation algorithm to know the changes in forest extent. Ravikumar & Gnanabhaskaran [23] used ant colony optimization technique for traffic risk analysis. Table 2 summarizes SDM techniques used in some of the data mining applications. However, it needs to be noted that the list is not exhaustive. Table 2- Summary of SDM Techniques Author (s) Tay et al. [13] Year 2003 Xu, Qin, and Huang [15] 2003 Xingxing et al. [20] 2005 Zhang et al. [24] 2005 Yu and Bian [14] 2007 Jayasinghe and Yoshida [16] 2013 Research Area SDM to identify false alarms in forest fire hotspots SDM integrated with railway MIS to improve the effectiveness of railway operations like monitoring railway tracks, material besides analyzing its operations and geographical spread Integration of SDM with GIS Improving spatial data mining effectiveness GIS based fireproof system Spatial Data Mining to evaluate forest extent changes Technique Remarks Region growing method, Hough transform Satellite images are used for experiments Association rule mining, classification and forecast, trend analysis and planning Data is taken from railway MIS SPMML based method A new data model called spatial data cube Frequency theory based method Back propagation algorithm Data is taken from the existing GIS Spatial data from warehouse Sample spatial dataset Satellite images are taken as dataset 213 Aakunuri Manjula and Dr. G. Narsimha International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 Ravikumar & Gnanabhaskaran [23] - Traffic risk analysis Ant colony optimization Data collected from government. [4] SPATIAL DATA MINING WITH GIS Geographical Information System (GIS) is required in order to leverage the benefits of spatial data mining. Xu, Qin, and Huang [15] built an intelligent railway GIS that allows users to perform various data mining tasks pertaining to railways. Since there is a need for integrating both GIS and SDM Xingxing et al. [20] proposed a novel method that could integrate the both. The solution was based on XML and its related markup languages. Thus the integration works seamlessly with any future enhancements in heterogeneous environment as well. The two aspects work in tandem with each other. For instance spatial data mining brings about locations where breast cancer is spread across the globe while the GIS present it nicely to help users make decisions faster. Therefore the division of labor brings about modularity and quality in work and presentation. The prototype built was tested using the spatial data mining on spatial database built in postgreSQL (postGIS extension). The prototype application has components like task manager, data adapter, data mining algorithms, and user-friendly interface. GIS can be used for various domains. Yu and Bian [14] built decision support services with a GIS that will work as fire proof system with respect to forests. It shows the relation between the fire events and their environment. Association rule mining has been made on spatial data in order to achieve this. The application is able to locate fire points and analyze the environment which caused that. This will help authorities to have preliminary estimation of the fire incidents in forests. This is especially useful in places where forests are spread and they need to be monitored to resolve fire problems. Decision support services integrated with forest fire proof system are very useful in making expert decisions. The forecast made by the GIS enabled system might be weak or strong. However, it makes significant progress towards making a comprehensive solution for fire proof mechanisms. Governments of various states or countries can make use of GIS. However, such GIS need to deal with huge amount of geographical data. This is where SDM can come into picture for efficient GIS that can render quality services. Li et al. [25] studied uncertainty of spatial data of governments that affects the quality of SDM. There might be much number of uncertainties such as immaturity uncertainty, inconsonance uncertainty, topology uncertainty, attribute uncertainty, location uncertainty, and error uncertainty. Li et al. emphasized on uncertainty pertaining to attributes and introduced control methods. The control methods include improving attribute definitions, choosing correct data sources, and improving accuracy of the data model which is used in SDM. Traffic accidents can also be analyzed using spatial data mining. Jinlin et al. [21] proposed spatial data mining approach with GIS for traffic analysis. Accident analysis is done using spatial association rule mining which will discover knowledge pertaining to traffic accident distribution in spatial domain. The mining process is carried out as described here. GIS data inquiry is made in 214 Aakunuri Manjula and Dr. G. Narsimha A Review On Spatial Data Mining Methods And Applications order to obtain the user-interested fact to be mined. After obtaining user interested facts, spatial association rule mining algorithm is applied in order to generate association rules. The analysis results in identifying accident prone areas that can help in future to take necessary steps to prevent accidents. The power of grid computing can also be leveraged for SDM. Fan and Luo [26] proposed a model which exploits the power of grid based servers in order to achieve SDM. The application proposed is a GIS that makes use of server associated with grid. Therefore the mining process is decentralized so as to make it faster and also have other features like fault tolerance, reliability, availability and scalability. [Figure-2] shows the architecture for Grid Spatial Data Mining. Figure: 2. Illustrates GIS Based Grid Spatial Data Mining [26] As can be seen in [Figure-2], it is evident that grid is used to discover information based on the client request. Data server provides data services while spatial data mining server does information discovery which is invoked by data mining middleware. [5] SPATIAL DATA MINING WITH CUBE Spatial data mining needs innovative data models as they are complex in nature. Zhang et al. [24] proposed a new data model known as “Spatial Data Cube” which leverages the data modeling in data warehousing. As the current techniques could not handle spatial data mining well in data warehouses, the new data model resolves this issue by providing improved support for SDM. The spatial data cube is very flexible as it supports both spatial and non spatial data seamlessly. Spatial data cube makes use of selective materialization concept in order to process queries efficiently. Not only the data, but the measures and dimensions also can be represented both spatially and non-spatially using spatial data cube. Global climate data explored in [27] is used to build spatial data cube which works with the underlying application. 215 Aakunuri Manjula and Dr. G. Narsimha International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 [6] SPATIAL DATA MINING AND VISUALIZATION Visualization has got high importance in SDM since it brings about quality and comprehensiveness in presentation. Ferrucci, Laurini, and Polese [28] built a tool that can make visual presentation of spatial relationships, perform mining activities on both spatial and nonspatial data with intuitive user interface. The tool named “VisMiner” also guides users to perform various activities pertaining to SDM. The tool has provision for organization of user work by exploiting Windows OS concepts like folder, file, etc. Various characteristics of the tool can be configured. They include modularity, structured organization, interactivity, standardization, besides extending its capabilities. The tool is extendable and its existing algorithms can be improved and new algorithms can be adapted to make it more flexible and intuitive. Tay et al. [13] also visualized false alarms in an application that classifies fire hot spots in forests. [7] CONCLUSION AND FUTURE WORK This paper reviews spatial data mining, its real world applications and mining techniques. It studies the essence of data mining in terms of applications and techniques. Some of the applications of spatial data mining include hop spot classification of fire in forests, identifying false alarms of fire hotspots, building an effective GIS, GIS for railways, evaluation of forest extent changes, agriculture land grading, analysis of distribution of rural regional economy, clustering area geographic entities, GIS and SDM integration, SDM with spatial data cube, SDM and visualization. The techniques used in literature for SDM include region growing method, Hough transform, association rule mining, classification and forecast, planning, trend analysis, SPMML based method, frequency theory based method, back propagation algorithm and ant colony optimization. Traditional data mining techniques cannot be directly used for spatial data mining. As a future direction we are going to make empirical study of SDM on location prediction, spatial interactions, and hotspots. 216 Aakunuri Manjula and Dr. G. Narsimha A Review On Spatial Data Mining Methods And Applications REFERENCES [1] Oracle. (.). Oracle Spatial User's Guide and Reference Release 9.0.1.Available: http://docs.oracle.com/html/A88805_01/sdo_intr.htm. Last accessed 20th June 2014. [2] BIN-BIN HE,TAO FANG,DA-ZHI GUO. (2004). UNCERTAINTY IN SPATIAL DATA MINING. IEEE.0 (0), p1152-p1156. [3] Sun Peng, Jinyun Fang,Chengde Han,Zhenlin Cheng, “VegaMinerPOI: A Spatial Data Mining System for POI Datasets”, p1-4. [4] S.Thirumurugan land L.Suresh, “Statistical Spatial Clustering using Spatial Data mining”, p14. [5] WANG Shuliang, DING Gangyi, ZHONG Ming. (2013). Big Spatial Data Mining. IEEE.0 (0), P13-p21 [6] Shuliang WANG Hanning YUAN.(2013). Spatial Data Mining in the Context of Big Data. IEEE.0 (0), P486-492 [7] MussaratYasmin COMSA TS, Pakistan. (2012). Dynamic Referencing Rules Creation Using Intelligent Agent in Geo Spatial Data Mining. IEEE.0 (0), 0 [8] Hu Shengwu. (2011). Method Development about Spatial Data Mining and Its Problem Analysis. IEEE.v2 (0), 144-147. [9] André Fabiano de Moraes,Lia Caetano Bastos. (2011). Pattern Recognition with Spatial Data Mining in web: An infrastructure to engineering of the urban cadaster. IEEE. 0 (0), 1331-1335 [10] Shuliang Wang. (2011). Spatial Data Mining under Smart Earth. IEEE.0 (0), p717-P722. [11] Shuoben Bi et all.. (2008). Spatial Data Mining in Settlement Archaeological Databases Based on Vector Features. IEEE.0 (0), p277-p281 [12] Zhongyuan Wang et all (2008). Cluster Analysis Based on Spatial Feature Selecting in Spatial Data Mining. IEEE.0 (0), p386-p389. [13] SengChuan TAY et al. (2003). Spatial Data Mining: Clustering of Hot Spots and Pattern Recognition. IEEE.p3685-p3687 [14] Liang YU,Fuling BIAN. (2007). An Incremental Data Mining Method for Spatial Association Rule in GIS Based Fireproof System. IEEE. 0 (0), 5983-5986. [15] Wei XU, Yong QIN,Houkuan Huang. (2003). The Application of Spatial Data Mining in Railway Geographic Information Systems. IEEE.0 (0), p1467-p1471 [16] P.K.S.C.Jayasinghe, Masao Yoshida.(2013). Spatial data mining technique to evaluate forest extent changes using GIS and Remote Sensing. IEEE.0 (0), 0. [17] Jian Lian, Xiaojuan Li et all..(2008). Spatial Data Mining and Analysis of the Distribution of Regional Economy. IEEE.0 (0), p145-p148 [18] Zelu Jia, “An Expert System Based on Spatial Data Mining Used Decision Tree for Agriculture Land Grading”, IEEE, 2009, p1-4. [19] CHEN Guang-xue, LI Xiao-zhou, CHEN Qi-feng,LI Xiao-zhou. (2010). Clustering Algorithms for Area Geographical Entities in Spatial Data Mining. IEEE.0 (0), p1630-p1633. 217 Aakunuri Manjula and Dr. G. Narsimha International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 [20] Jin Xingxing, CaiYingkun, *XieKunqing, Ma Xiujun, Sun Yuxiang, CaiCuo. (2005). A Novel Method to Integrate Spatial Data Mining and Geographic Information System. IEEE.0 (0), p764-p768 [21] Wang Jinlin, Chen Xi, Zhou Kefa, Wang Wei and Zhang Dan, “Application of Spatial Data Mining in Accident Analysis System”, IEEE computer society, 2008, p1-4. [22] Yong Wang, Xi Chen. (2011). Study on Land Use of Changping District with Spatial Data Mining Method. IEEE. 0 (0), 218-222 [23] K.Ravikumar, A.GNANABASKARAN, “ACO based spatial Data Mining for Traffic Risk Analysis”, p1-6. [24] Yuanzhi Zhang, XieKunqing*, Ma Xiujun, Xu Dan, CaiCuo, Tang Shiwei. (2005). Spatial Data Cube: Provides Better Support for Spatial Data Mining. IEEE.0 (0), p795-p798 [25] Bin Li,Lihong Shi, Jiping Liu. (2010). Research on Spatial Data Mining Based on Uncertainty in Government GIS. IEEE.0 (0), p2906-p2909 [26] WenyouFan,WenfenLuo. (2009). The Key Technologies Research of Spatial Data Mining Based on the GIS Grid Services. IEEE.0 (0), 0. [27] T.D. Mitchell, T.R. Carter, P.D. Jones, M. Hulme, M. New. A comprehensive set of highresolution grids of monthly climate for Europe and the globe: the observed record (1901-2000) and 16 scenarios (2001-2100). In Journal of Climate, 2003. [28] S. Bimonte, F. Fenucci,R. Laurini,G. Polese. (2003). Prototype of aVisual Language for Spatial Data Mining Based on the „Miner Trip‟ Metaphor: VisMiner. IEEE.0 (0), p76-p83 [29] Anup K. Prasad, Lim Chai, Ramesh P. Singh, Menas Kafatos, “ [30] D. Stathakis, I. Savin and T. Nègre, “NEURO-FUZZY MODELING FOR CROP YIELD PREDICTION”, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 34, Part XXX, p1-4 218 Aakunuri Manjula and Dr. G. Narsimha