Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Name: Mohd Yousuf Ansari Proposal Title: GIS based spatial data mining approach for spatio-temporal databases GIS based spatial data mining approach for spatio-temporal databases Introduction: The technological infrastructure such as GPS positioning, sensor and mobile device network and tracking facilities have made available massive repositories of spatio-temporal data thus making it harder to analyze them manually. Therefore, new techniques have been developed to help humans to automatically turn this huge volume of data into useful knowledge that enables a better understanding of phenomena occurring in their environment. These techniques make up Knowledge Discovery in Databases (KDD) which is characterized as a multi-step process for discovering valid, novel and potentially useful information [4]. In the present era, civil and defence spatio-temporal applications need multiple stages of analysis and manipulation of both data and pattern, in which the results at each stage are required to become input to the subsequent stage. The analytical questions posed by the end user need to be translated into several tasks such as choosing analysis methods, preparation of data for application of these methods, applying the methods to the data and interpreting and evaluating obtained results. The steps of the knowledge discovery process are as under [2] o Data cleaning: It is also known as data cleansing; in this phase noise data and irrelevant data are removed from the collection. o Data integration: In this stage, multiple data sources, often heterogeneous, are combined in a common source. o Data selection: The data relevant to the analysis is decided on and retrieved from the data collection. o Data transformation: It is also known as data consolidation; in this phase the selected data is transformed into forms appropriate for the mining procedure. o Data mining: It is the crucial step in which clever techniques are applied to extract potentially useful patterns. o Pattern evaluation: In this step, interesting patterns representing knowledge are identified based on given measures. o Knowledge representation: It is the final phase in which the discovered knowledge is visually presented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. The result of data mining algorithms may not be directly useful for analysis purposes, often they need tuning. In addition most data mining tools and methods require deep technical and statistical knowledge by the data analyst, in addition to a clear comprehension of data [8]. To support the process of progressively querying and mining spatio-temporal data (i.e. both movement data and pattern), a unifying framework is needed, where in the spatial data mining technique can act as specific components of the knowledge discovery process. The Geographical Information System (GIS) can be integrated with the framework to leverage the benefit of spatial data mining. Related Work: Knowledge discovery in databases (KDD) is a dynamic research field. A well known literature works on A database perspective on knowledge discovery [3]. Here, the task of extracting useful and interesting knowledge from data is just an exploratory querying process, i.e., human-guided, iterative and interactive. The analyst, exploiting an expressive query language, drives the discovery process through a sequence of complex mining queries, extracts patterns, refines the queries, materializes the extracted patterns in the database, combines the patterns to produce more complex knowledge, and cross-over the data and the patterns. Therefore, an Database Mining System should provide the following features: Coupling with a DBMS : the analyst must be able to retrieve the portion of interesting data (by means of queries). Moreover, extracted patterns should also be stored in the DBMS in order to be further queried or mined (closure principle). Expressiveness of the query language : the analyst must be able to interact with the pattern discovery system having a high-level vision of the data and the patterns. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easily) into other forms that might be more compact (for example, a short report), more abstract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for example, a predictive model for estimating the value of future cases) [16] . Data mining is only a step of this general process. Indeed, using only a data mining technique can lead to the discovery of meaningless patterns for experts. Other steps of the KDD process have been added to deal with this problem. All those steps working together and integrating findings into a unified whole produce new knowledge [12]. The advent of GIS (Geographical Information Systems) technology and the availability of large volume of spatiotemporal data have increased the need for effective and efficient methods to extract unknown and unexpected information [4]. Unfortunately, in many situations, a simple data mining method will often be limited in its ability to retrieve informative knowledge from complex spatiotemporal databases [13]. So there is need to integrate spatial data mining technique with GIS [1]. Xingxing et al. [14] proposed a novel method that could integrate the Spatial Data Mining with GIS. The solution was based on XML and its related markup languages. Thus the integration works seamlessly with any future enhancements in heterogeneous environment as well. There are some approaches in the literature such as the work presented in [7] is specifically tailored for extracting association rules extending the SQL in order to add a new operator MINE RULE, which allows the computation of the pattern representing them in a relational format. The Mining Views approach [11] have proposed as general purpose system which manages different kind of data mining pattern integrating the data mining algorithms in a DBMS. The user interacts with the database using the classic SQL, but when a table of patterns, not yet computed, is requested by a query the system executes the right data mining algorithm to populate it. An innovative work is the Inductive Query Language (IQL) [6] which presents a relational calculus for data mining including functions in their query language giving the possibility to theoretically include every possible data mining algorithm. However, the realization of a running system based on this approach presents some computational difficulties. The data model approach called 3W-Model, proposed a knowledge discovery process, which is an application of a sequence of operators in order to transform a set of tables. Furthermore, it is also fascinating from a methodological point of view: the object representation of 3W entities and the implementation of a suitable set of operators are key elements in the design of a powerful tool for knowledge discovery analysis [9]. Other approaches propose Data Mining Query Languages (DMQL) to support the design of specific procedural workflows which integrate reasoning on the mining results and possibly define ad-hoc evaluation strategies and activations of the Data Mining tasks, see for example Weka [15]. Proposed Work: The work aims to support the knowledge discovery process on spatio-temporal data, involving several steps and different technologies. The main components of this discovery process are moving object database technology to store and manipulate data, the spatiotemporal data mining algorithms and finally the pattern interpretation approach with automatic reasoning techniques. In order to support the knowledge discovery process we need to create a theoretical and practical framework which may lead to the integration of very different aspects of the process with their assumptions and requirements. The result will be a system coupled with Geographical Information System, where in the spatial data mining technique can act as specific components of the knowledge discovery process. The major components of the system would be the Object Relational Database (Oracle spatial Or Post GIS), Data Construction and Transformation Algorithms, Spatial Data Mining Algorithms and Geographical Information System. Object relational Database will support moving object data storage and manipulation, spatial objects representation, mining models storage and semantic technology. The Data Construction and Transformation Algorithms will be used to construct the basic data objects and to manipulate data and model objects. The Spatial Data Mining Algorithms will extract model from data. The Geographical Information System will be used for the visualization of data and models. The system will provide possibility to exploit the power of all components. Further the system will be used for experimentation on real case study on civil and/or defence sptiotemporal data. . References: [1] Aakunuri Manjula, Dr.G.Narsimha. A review on spatial data mining methoda and applications, International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 2014. [2] S. P. Deshpande, V. M. Thakare, Data mining system and sppliactions : A review, International Journal of Distributed and Parallel systems (IJDPS) Vol.1, No.1, September 2010. [3] T. Imielinski, and H. Mannila. A database perspective on knowledge discovery. In Communications of the ACM Vol.39, pages 5864, 1996. [4] H. Alatrista-Salas, J. Azé, S. Bringay, F. Cernesson, N. Selmaoui-Folcher, M. Teisseire. A knowledge discovery process for spatiotemporal data: Application to river water quality monitoring, Ecological Informatics Volume 26, Part 2, March 2015, Pages 127–139, http:// www.elsevier.com/locate/ecolinf [5] R. T. Ng, T. Calders, L. V. S. Lakshmanan, and J. Paredaens. Expressive power of an algebra for data mining. In Transactions on Database Systems Vol.31, pages 1169–1214, 2006. [6] L. De Raedt, and S. Nijssen. Iql: A proposal for an inductive query language. In Proc. KDID, pages 189-207, 2006. [7] Mirela Danubianu, Stefan Gheorghe Pentiuc, Iolanda Tobolcea. Mining Association Rules Inside a Relational Database – A Case Study, ICCGI 2011 : The Sixth International MultiConference on Computing in the Global Information Technology [8] F. Giannotti, B. Kujpers, A. Raffaeta, G. Manco, M. Baglioni, and C. Renso. Querying and reasoning for spatio-temporal data mining. In [23] chapter 12, page 335, 2008. [9] L. V. S. Lakshmanan, and T. Johnson. The 3w model and algebra for unified data mining. In Proc. ACM VLDB, pages 21–32, 2000. [10] MF Mokbel. Continuous Query Processing in Spatio-temporal Databases. Current Trends in Database Technology-EDBT 2004 Workshop, LNCS 3268, pp.100-111, 2004 Springer-Verlag Berlin Heidelberg [11] E. Fromonnd, B. Goethals, A. Prado, H. Blockeel, and T. Calders. Mining views: Database views for data mining. In IEEE ICDE, pages 1608-1611, 2008. [12] Brazhnik, O., 2007. Databases and the geometry of knowledge. Data Knowl. Eng. 61 (2), 207–227 (URL http://www.sciencedirect.com/science/article/pii/S0169023X06000917). [13] Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C., 2011. Combined mining: discovering informative knowledge in complex data. IEEE Trans. Syst. Man Cybern. B 41 (3), 699–712. (URL http://www.ncbi.nlm.nih.gov/pubmed/21592913) [14] Jin Xingxing, CaiYingkun, XieKunqing, Ma Xiujun, Sun Yuxiang, CaiCuo. (2005). A Novel Method to Integrate Spatial Data Mining and Geographic Information System. IEEE.0 (0), p764-p768 [15] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten The weka data mining software. In KDD Explorations, Volume 11, Issue 1, 2009. [16] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. From Data Mining to Knowledge Discovery in Databases. American Association for Artificial Intelligence (URL www.csd.uwo.ca/faculty/ling/cs435/fayyad.pdf)