Download Data Mining – Decision Tree

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Imagery analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Transcript
GEOINFO 2006
Utilização da biblioteca TerraLib para algoritmos de
agrupamento em Sistemas de Informações Geográficas
Use of the TerraLib library for clustering
algorithms in Geographic Information
Systems
Mauricio P. Guidini
Carlos H. C. Ribeiro
Supervisor
Nov 2006
“... 3000 unregistered flights, with origin
and destiny unkown by authorities, invaded
the Brazilian airspace in the first ten months
of this year. The Air Force calculates that
about 30% of these flights were related to
drug dealing ...
Translated from note from
25/10/2004
Data Mining in GIS
Objetive
To present the integration of a Data Mining
algorithm (k-means) to TerraLib/TerraView,
forming a Geographic Information System for
Unknown Air Traffic analysis (GisTAD).
3
Data Mining in GIS
Summary
Data Mining
Clustering Algorithms
Air Traffic
K-means Implementation
Results
Aplication
4
Data Mining in GIS
Data Mining Definition:
“A non-trivial process of identification of valid, new,
useful standards implicitly present in large volumes of
data”
Knowledge Discovery in Database (KDD) - Fayyad et al. (1996)
5
Data Mining in GIS
How proceed DM?
KDD process
6
Data Mining in GIS
Clustering Algorithms
The clustering process tries to grouping the data into
groups that have highly similar features, helping the
understanding of the information that they hold.
A good clustering algorithm is characterized by the
production of high level classes, where the intraclass
similarity is high, and the interclass similarity is low.
[Han & Kamber 2001]
7
Data Mining in GIS
Major Categories
 Partitioning – k-means, k-medoids
 Hierarchical – CURE, BIRCH
 Density-based – DBSCAN, OPTICS
 Grid-based – STING
 Model-based
Others
 ANN – Kohonen network
 Incremental - Leader
8
Data Mining in GIS
Air Traffic
Movement of aircraft, national or foreign, that fly over
national territory.
Unkown Air Traffic
To unidentified airplanes (flight plan), two lines of action can
be taken[Bernabeu 2004]:
1.Intercept; or
2.Generate an Unkown Air Traffic Report
9
Data Mining in GIS
Traffic Representation
Line segments
Latitude (decimal degrees)
Longitude (decimal degrees)
Distance (miles)
Heading
Restrictions
Acceptable deviations
10
Data Mining in GIS
K-means algorithm
Precondition: set max deviation values to coordinates,
distance and route
Begin: K=0
While criterion condition not satisfied (deviation in clusters)
Increase K
Arbitrarily choose K centers (among data objects)
While centers change (k-means)
(re)assign routes in cluster based on weights
update centers values
end movement intergroups
deviation in groups ok
Save results
End
11
Data Mining in GIS
Distance Measure
peso * lat i  lat m  peso * long i  long m  dist i  dist m  pri  prm
2
2
2
Minimize deviations
Improve cluster quality
lat i  lat m  paramcoord
and long i  long m  paramcoord
12
2
Data Mining in GIS
GIS Integration
 TerraLib
 TerraView
 k-means
13
Data Mining in GIS
Data preparation
 8000 records
 looking for information (what?)
Search space restrictions
14
Data Mining in GIS
 Numeric Tests
 to 500 records
GisTAD Tests
 319 records
 73 groups
Aprox. time = 40 sec.
15
TerraView
16
TerraView
17
18
Data Mining in GIS
Applications
Air Operations
Improper use of air space
19
20
Data Mining in GIS
Conclusion
Considering the problem proposed, the k-means algorithm is
applicable, and returned a good set of clusters.
However, the number of records that must be clustered can
make the application of the algorithm very time consuming.
21
Future Work
Other partitioning algorithms should be implemented, to
verify which one is the most efficient for the problem in
analysis, considering any size of records to be clustered.
The algorithms to be tested are:
Kohonen neural network;
Leader algorithm.
22