Download GIS based spatial data mining approach for spatio

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Name: Mohd Yousuf Ansari
Proposal Title:
GIS based spatial data mining approach for
spatio-temporal databases
GIS based spatial data mining approach for spatio-temporal databases
Introduction:
The technological infrastructure such as GPS positioning, sensor and mobile device
network and tracking facilities have made available massive repositories of spatio-temporal
data thus making it harder to analyze them manually. Therefore, new techniques have been
developed to help humans to automatically turn this huge volume of data into useful
knowledge that enables a better understanding of phenomena occurring in their
environment. These techniques make up Knowledge Discovery in Databases (KDD) which is
characterized as a multi-step process for discovering valid, novel and potentially useful
information [4].
In the present era, civil and defence spatio-temporal applications need multiple
stages of analysis and manipulation of both data and pattern, in which the results at each
stage are required to become input to the subsequent stage. The analytical questions posed
by the end user need to be translated into several tasks such as choosing analysis methods,
preparation of data for application of these methods, applying the methods to the data and
interpreting and evaluating obtained results.
The steps of the knowledge discovery process are as under [2]
o
Data cleaning: It is also known as data cleansing; in this phase noise data and
irrelevant data are removed from the collection.
o
Data integration: In this stage, multiple data sources, often heterogeneous, are
combined in a common source.
o
Data selection: The data relevant to the analysis is decided on and retrieved
from the data collection.
o
Data transformation: It is also known as data consolidation; in this phase the
selected data is transformed into forms appropriate for the mining procedure.
o
Data mining: It is the crucial step in which clever techniques are applied to
extract potentially useful patterns.
o
Pattern evaluation: In this step, interesting patterns representing knowledge are
identified based on given measures.
o
Knowledge representation: It is the final phase in which the discovered
knowledge is visually presented to the user. This essential step uses visualization
techniques to help users understand and interpret the data mining results.
The result of data mining algorithms may not be directly useful for analysis purposes,
often they need tuning. In addition most data mining tools and methods require deep
technical and statistical knowledge by the data analyst, in addition to a clear comprehension
of data [8].
To support the process of progressively querying and mining spatio-temporal data
(i.e. both movement data and pattern), a unifying framework is needed, where in the spatial
data mining technique can act as specific components of the knowledge discovery process.
The Geographical Information System (GIS) can be integrated with the framework to
leverage the benefit of spatial data mining.
Related Work:
Knowledge discovery in databases (KDD) is a dynamic research field. A well known
literature works on A database perspective on knowledge discovery [3]. Here, the task of
extracting useful and interesting knowledge from data is just an exploratory querying
process, i.e., human-guided, iterative and interactive. The analyst, exploiting an expressive
query language, drives the discovery process through a sequence of complex mining
queries, extracts patterns, refines the queries, materializes the extracted patterns in the
database, combines the patterns to produce more complex knowledge, and cross-over the
data and the patterns. Therefore, an Database Mining System should provide the following
features:
Coupling with a DBMS : the analyst must be able to retrieve the portion of interesting data
(by means of queries). Moreover, extracted patterns should also be stored in the DBMS in
order to be further queried or mined (closure principle).
Expressiveness of the query language : the analyst must be able to interact with the pattern
discovery system having a high-level vision of the data and the patterns.
The basic problem addressed by the KDD process is one of mapping low-level data
(which are typically too voluminous to understand and digest easily) into other forms that
might be more compact (for example, a short report), more abstract (for example, a
descriptive approximation or model of the process that generated the data), or more useful
(for example, a predictive model for estimating the value of future cases) [16] . Data mining
is only a step of this general process. Indeed, using only a data mining technique can lead to
the discovery of meaningless patterns for experts. Other steps of the KDD process have
been added to deal with this problem. All those steps working together and integrating
findings into a unified whole produce new knowledge [12].
The advent of GIS (Geographical Information Systems) technology and the
availability of large volume of spatiotemporal data have increased the need for effective and
efficient methods to extract unknown and unexpected information [4]. Unfortunately, in
many situations, a simple data mining method will often be limited in its ability to retrieve
informative knowledge from complex spatiotemporal databases [13]. So there is need to
integrate spatial data mining technique with GIS [1]. Xingxing et al. [14] proposed a novel
method that could integrate the Spatial Data Mining with GIS. The solution was based on
XML and its related markup languages. Thus the integration works seamlessly with any
future enhancements in heterogeneous environment as well.
There are some approaches in the literature such as the work presented in [7] is
specifically tailored for extracting association rules extending the SQL in order to add a new
operator MINE RULE, which allows the computation of the pattern representing them in a
relational format. The Mining Views approach [11] have proposed as general purpose
system which manages different kind of data mining pattern integrating the data mining
algorithms in a DBMS. The user interacts with the database using the classic SQL, but when
a table of patterns, not yet computed, is requested by a query the system executes the right
data mining algorithm to populate it. An innovative work is the Inductive Query Language
(IQL) [6] which presents a relational calculus for data mining including functions in their
query language giving the possibility to theoretically include every possible data mining
algorithm. However, the realization of a running system based on this approach presents
some computational difficulties.
The data model approach called 3W-Model, proposed a knowledge discovery
process, which is an application of a sequence of operators in order to transform a set of
tables. Furthermore, it is also fascinating from a methodological point of view: the object
representation of 3W entities and the implementation of a suitable set of operators are key
elements in the design of a powerful tool for knowledge discovery analysis [9].
Other approaches propose Data Mining Query Languages (DMQL) to support the
design of specific procedural workflows which integrate reasoning on the mining results and
possibly define ad-hoc evaluation strategies and activations of the Data Mining tasks, see for
example Weka [15].
Proposed Work:
The work aims to support the knowledge discovery process on spatio-temporal data,
involving several steps and different technologies. The main components of this discovery
process are moving object database technology to store and manipulate data, the spatiotemporal data mining algorithms and finally the pattern interpretation approach with
automatic reasoning techniques.
In order to support the knowledge discovery process we need to create a theoretical
and practical framework which may lead to the integration of very different aspects of the
process with their assumptions and requirements.
The result will be a system coupled with Geographical Information System, where in
the spatial data mining technique can act as specific components of the knowledge
discovery process.
The major components of the system would be the Object Relational Database
(Oracle spatial Or Post GIS), Data Construction and Transformation Algorithms, Spatial Data
Mining Algorithms and Geographical Information System. Object relational Database will
support moving object data storage and manipulation, spatial objects representation,
mining models storage and semantic technology. The Data Construction and Transformation
Algorithms will be used to construct the basic data objects and to manipulate data and
model objects. The Spatial Data Mining Algorithms will extract model from data. The
Geographical Information System will be used for the visualization of data and models.
The system will provide possibility to exploit the power of all components. Further
the system will be used for experimentation on real case study on civil and/or defence sptiotemporal data.
.
References:
[1] Aakunuri Manjula, Dr.G.Narsimha. A review on spatial data mining methoda and
applications, International Journal of Computer Engineering and Applications, Volume VII,
Issue I, Part II, July 2014.
[2] S. P. Deshpande, V. M. Thakare, Data mining system and sppliactions : A review,
International Journal of Distributed and Parallel systems (IJDPS) Vol.1, No.1, September
2010.
[3] T. Imielinski, and H. Mannila. A database perspective on knowledge discovery. In
Communications of the ACM Vol.39, pages 5864, 1996.
[4] H. Alatrista-Salas, J. Azé, S. Bringay, F. Cernesson, N. Selmaoui-Folcher, M. Teisseire. A
knowledge discovery process for spatiotemporal data: Application to river water quality
monitoring, Ecological Informatics Volume 26, Part 2, March 2015, Pages 127–139,
http:// www.elsevier.com/locate/ecolinf
[5] R. T. Ng, T. Calders, L. V. S. Lakshmanan, and J. Paredaens. Expressive power of an
algebra for data mining. In Transactions on Database Systems Vol.31, pages 1169–1214,
2006.
[6] L. De Raedt, and S. Nijssen. Iql: A proposal for an inductive query language. In Proc. KDID,
pages 189-207, 2006.
[7] Mirela Danubianu, Stefan Gheorghe Pentiuc, Iolanda Tobolcea. Mining Association Rules
Inside a Relational Database – A Case Study, ICCGI 2011 : The Sixth International MultiConference on Computing in the Global Information Technology
[8] F. Giannotti, B. Kujpers, A. Raffaeta, G. Manco, M. Baglioni, and C. Renso. Querying and
reasoning for spatio-temporal data mining. In [23] chapter 12, page 335, 2008.
[9] L. V. S. Lakshmanan, and T. Johnson. The 3w model and algebra for unified data mining.
In Proc. ACM VLDB, pages 21–32, 2000.
[10] MF Mokbel. Continuous Query Processing in Spatio-temporal Databases. Current
Trends in Database Technology-EDBT 2004 Workshop, LNCS 3268, pp.100-111, 2004
Springer-Verlag Berlin Heidelberg
[11] E. Fromonnd, B. Goethals, A. Prado, H. Blockeel, and T. Calders. Mining views: Database
views for data mining. In IEEE ICDE, pages 1608-1611, 2008.
[12] Brazhnik, O., 2007. Databases and the geometry of knowledge. Data Knowl. Eng. 61 (2),
207–227 (URL http://www.sciencedirect.com/science/article/pii/S0169023X06000917).
[13] Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C., 2011. Combined mining: discovering
informative knowledge in complex data. IEEE Trans. Syst. Man Cybern. B 41 (3), 699–712.
(URL http://www.ncbi.nlm.nih.gov/pubmed/21592913)
[14] Jin Xingxing, CaiYingkun, XieKunqing, Ma Xiujun, Sun Yuxiang, CaiCuo. (2005). A Novel
Method to Integrate Spatial Data Mining and Geographic Information System. IEEE.0 (0),
p764-p768
[15] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten The weka
data mining software. In KDD Explorations, Volume 11, Issue 1, 2009.
[16] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. From Data Mining to Knowledge
Discovery in Databases. American Association for Artificial Intelligence
(URL www.csd.uwo.ca/faculty/ling/cs435/fayyad.pdf)