Download Future trends in data mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Future trends in data mining
Data Mining
Abraham Otero
Abraham Otero
Data Mining
1/16
Future trends
Good reviews:
Baker, R. and Yacef, K. The state of educational data
mining in 2009: A review and future visions.Journal of
Educational Data Mining (2009) 1:3-17
Hans-Peter Kriegel, Karsten M. Borgwardt, Peer Kröger,
Alexey Pryakhin, Matthias Schubert, Arthur Zimek.
Future trends in data mining. Data Min Knowl Disc
(2007) 15:87–97.
Jeffrey Hsu. Data mining trends and developments: The
key data mining technologies and applications for the
21st Century. Fairleigh Dickinson University, 2002 isedj.org.
Abraham Otero
Data Mining
2/16
1
Future trends
Distributed/collaborative data mining:
Sometimes the data is in different physical locations, for
intellectual property reasons, for organizational reasons
or because they are too large to be on a single physical
location.
It might be interesting to analyze data locally and to
generate partial models.
Then we have to combine different models to form the
global model.
Finally, we would have to validate this model on the
different databases.
Abraham Otero
Data Mining
3/16
Future trends
Distributed/collaborative data mining:
Liu K, KarguptaH, Bhaduri K, Ryan J. Distributed data
mining bibliography, January 2006.
http://www.csee.umbc.edu/ hillol/DDMBIB/
Kargupta, H. et al, “Collective Data Mining,” in
Advances in Distributed Data Mining, Karhgupta and
Chan, editors, MIT Press, 2000.
Kargupta, H. and A. Joshi, “Data Mining To Go:
Ubiquitous KDD for Mobile and Distributed
Environments,” Presentation, KDD-2001, San
Francisco, August 2001.
Abraham Otero
Data Mining
4/16
2
Future trends
Data Mining on social networks:
There are currently hundreds of social networks, some
with several hundred million users.
They have a great amount of profile information on their
users.
This information can be specially valuable when
information from various social networks (identities of
the same person) is integrated.
Ethical problems (big brother).
Although users have made this information public voluntarily ...
Abraham Otero
Data Mining
5/16
Future trends
Data Mining on social networks:
D Jensen, J Neville. Data Mining in Social Networks.
Dynamic Social Network Modeling and Analysis.
National Academies Press, 2003. ISBN 0309089522,
9780309089524.
P Domingos, M RichardsonMining the network value of
customers. Proceedings of the seventh ACM
Knowledge discovery and data mining conference, 5766, 2001
Abraham Otero
Data Mining
6/16
3
Future trends
Geographic and spatial data mining:
Geographical databases are becoming increasingly
common and more detailed.
They can be used for the extraction of implicit
knowledge, spatial relationships and other patterns that
are not explicit in them.
One of the main challenges of this field will be the
design and architecture of the data warehouses to store
the information (given the very particular nature of the
data), as well as the integration of heterogeneous data
sources.
Abraham Otero
Data Mining
7/16
Future trends
Geographic and spatial data mining:
Miller and J. Han (eds.), Geographic Data Mining and Knowledge
Discovery, Taylor and Francis, 2001.
Stefanovic, J. Han, and K. Koperski, "Object-Based Selective
Materialization for Efficient Implementation of Spatial Data Cubes,"
IEEE Transactions on Knowledge and Data Engineering, 12(6),
2000.
Zhou, D. Truffet, and J. Han, "Efficient Polygon Amalgamation
Methods for Spatial OLAP and Spatial Data Mining", 6th
International Symposium on Spatial Databases, SSD'99, Hong
Kong.
Bedard, T. Merrett, and J. Han, "Fundamentals of Geospatial Data
Warehousing for Geographic Knowledge Discovery", H. Miller and
J. Han (eds.), In Geographic Data Mining and Knowledge
Discovery, Taylor and Francis, 2001.
Abraham Otero
Data Mining
8/16
4
Future trends
Time-series data mining
The data mining tools have virtually null support for the
analysis of information that evolves over time and for
the discovery of temporal relations.
Nor significant progress in research has been made.
Temporal information could be of great importance for
multiple patterns (cause and effect relationships,
periodic behaviors...).
This field can borrow/ be based on structural data
mining, given the structural nature of the temporal
relationships among a set of events.
Data Mining
Abraham Otero
9/16
Future trends
Time-series data mining
Kim, J. M.W. Lam, and J. Han, "AIM: Approximate Intelligent
Matching for Time Series Data", Proceedings 2000 Int.
Conferences on Data Warehouse and Knowledge Discovery
(DaWaK'00), Greenwich, U.K., Sept. 2000.
Han, J., G. Dong and Y. Yin, "Efficient Mining of Partial Periodic
Patterns in Time Series Database", Proceedings International
Conference on Data Engineering ICDE'99, Sydney, Australia,
March 1999.
Han, J., J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu,
"FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining",
Proceedings KDD'00, Boston, MA, August 2000.
Abraham Otero
Data Mining
10/16
5
Future trends
Data Mining guided by constraints:
Multiple data mining techniques could benefit from
some form of guidance or supervision by the user.
Incorporating knowledge onto these techniques can
improve the efficiency of the algorithms and help to
discover more interesting knowledge.
It will be necessary to develop a standard mechanism
for representing the constraints.
It will be necessary to develop intuitive user interfaces
for the definition of these restrictions.
A related issue is the incorporation of common sense
into the databases and the data mining techniques .
Ex. "All patients who have had a child in the hospital are
women" (support = 100%, confidence = 100%).
Abraham Otero
Data Mining
11/16
Future trends
Data Mining guided by constraints:
Han, J. , V. S. Lakshmanan, and R. T. Ng, "Constraint- Based,
Multidimensional Data Mining", COMPUTER (special issue on Data
Mining), 32(8): 46-50, 1999.
Wang, K., Y. He and J. Han, "Mining Frequent Itemsets Using
Support Constraints", Proceedings 2000 Int. Conference on Very
Large Data Bases (VLDB'00), Cairo, Egypt, Sept. 2000, pp. 43-52.
Pei, J., J. Han, and L. V. S. Lakshmanan, "Mining Frequent
Itemsets with Convertible Constraints", Proceedings 2001 Int.
Conference on Data Engineering (ICDE'01), Heidelberg, Germany,
April 2001.
Pei and J. Han "Can We Push More Constraints into Frequent
Pattern Mining?", Proceedings KDD'00, Boston, MA, August 2000.
Abraham Otero
Data Mining
12/16
6
Future trends
Mining complex objects:
In most cases, data mining is applied to relational
databases where information is represented by
attributes that take a limited set of possible types
(integers, dates, real ...). The data are vectors.
Knowledge specific to each domain has a high
complexity that often can not be expressed in a
completely satisfactory manner by this simple
representation.
It is increasingly necessary to apply data mining
techniques over more complex data.
One of the most promising trends is "object-oriented mining”.
Data Mining
Abraham Otero
13/16
Future trends
Mining complex objects:
Liu K, KarguptaH, Bhaduri K, Ryan J. Distributed data mining
bibliography, January 2006. http://www.csee.umbc.edu/
hillol/DDMBIB/.
Kanellopoulos Y, Dimopulos T, Tjortjis C, Makris C (2006) Mining
source code elements for comprehending object-oriented systems
and evaluating their maintainability. SIGKDD Explorations 8(1):33–
40.
Kailing K, Kriegel H-P, Pryakhin A, Schubert M (2004) Clustering
multi-represented objects with noise. In: Proceedings of the 8th
pacific-asia conference on knowledge discovery and data mining
(PAKDD), Sydney, Australia, pp 394–403.
Washio T, Motoda H (2003) State of the art of graph-based data
mining. SIGKDD Explorations Newslett 5(1):59–68.
Abraham Otero
Data Mining
14/16
7
Future trends
Others:
Perform data processing in a more rapid, transparent
and structured.
Currently, up to 90% of the time spent on the process of
knowledge discovery can be consumed in this phase.
Increasing the usability of the data mining systems,
allowing them to be used by users with less knowledge
on computers / statistics / machine learning.
Identification / discovery of patterns that evolve over
time and characterization of the evolution.
Abraham Otero
Data Mining
15/16
Abraham Otero
Data Mining
16/16
8