Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Future trends in data mining Data Mining Abraham Otero Abraham Otero Data Mining 1/16 Future trends Good reviews: Baker, R. and Yacef, K. The state of educational data mining in 2009: A review and future visions.Journal of Educational Data Mining (2009) 1:3-17 Hans-Peter Kriegel, Karsten M. Borgwardt, Peer Kröger, Alexey Pryakhin, Matthias Schubert, Arthur Zimek. Future trends in data mining. Data Min Knowl Disc (2007) 15:87–97. Jeffrey Hsu. Data mining trends and developments: The key data mining technologies and applications for the 21st Century. Fairleigh Dickinson University, 2002 isedj.org. Abraham Otero Data Mining 2/16 1 Future trends Distributed/collaborative data mining: Sometimes the data is in different physical locations, for intellectual property reasons, for organizational reasons or because they are too large to be on a single physical location. It might be interesting to analyze data locally and to generate partial models. Then we have to combine different models to form the global model. Finally, we would have to validate this model on the different databases. Abraham Otero Data Mining 3/16 Future trends Distributed/collaborative data mining: Liu K, KarguptaH, Bhaduri K, Ryan J. Distributed data mining bibliography, January 2006. http://www.csee.umbc.edu/ hillol/DDMBIB/ Kargupta, H. et al, “Collective Data Mining,” in Advances in Distributed Data Mining, Karhgupta and Chan, editors, MIT Press, 2000. Kargupta, H. and A. Joshi, “Data Mining To Go: Ubiquitous KDD for Mobile and Distributed Environments,” Presentation, KDD-2001, San Francisco, August 2001. Abraham Otero Data Mining 4/16 2 Future trends Data Mining on social networks: There are currently hundreds of social networks, some with several hundred million users. They have a great amount of profile information on their users. This information can be specially valuable when information from various social networks (identities of the same person) is integrated. Ethical problems (big brother). Although users have made this information public voluntarily ... Abraham Otero Data Mining 5/16 Future trends Data Mining on social networks: D Jensen, J Neville. Data Mining in Social Networks. Dynamic Social Network Modeling and Analysis. National Academies Press, 2003. ISBN 0309089522, 9780309089524. P Domingos, M RichardsonMining the network value of customers. Proceedings of the seventh ACM Knowledge discovery and data mining conference, 5766, 2001 Abraham Otero Data Mining 6/16 3 Future trends Geographic and spatial data mining: Geographical databases are becoming increasingly common and more detailed. They can be used for the extraction of implicit knowledge, spatial relationships and other patterns that are not explicit in them. One of the main challenges of this field will be the design and architecture of the data warehouses to store the information (given the very particular nature of the data), as well as the integration of heterogeneous data sources. Abraham Otero Data Mining 7/16 Future trends Geographic and spatial data mining: Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001. Stefanovic, J. Han, and K. Koperski, "Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes," IEEE Transactions on Knowledge and Data Engineering, 12(6), 2000. Zhou, D. Truffet, and J. Han, "Efficient Polygon Amalgamation Methods for Spatial OLAP and Spatial Data Mining", 6th International Symposium on Spatial Databases, SSD'99, Hong Kong. Bedard, T. Merrett, and J. Han, "Fundamentals of Geospatial Data Warehousing for Geographic Knowledge Discovery", H. Miller and J. Han (eds.), In Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001. Abraham Otero Data Mining 8/16 4 Future trends Time-series data mining The data mining tools have virtually null support for the analysis of information that evolves over time and for the discovery of temporal relations. Nor significant progress in research has been made. Temporal information could be of great importance for multiple patterns (cause and effect relationships, periodic behaviors...). This field can borrow/ be based on structural data mining, given the structural nature of the temporal relationships among a set of events. Data Mining Abraham Otero 9/16 Future trends Time-series data mining Kim, J. M.W. Lam, and J. Han, "AIM: Approximate Intelligent Matching for Time Series Data", Proceedings 2000 Int. Conferences on Data Warehouse and Knowledge Discovery (DaWaK'00), Greenwich, U.K., Sept. 2000. Han, J., G. Dong and Y. Yin, "Efficient Mining of Partial Periodic Patterns in Time Series Database", Proceedings International Conference on Data Engineering ICDE'99, Sydney, Australia, March 1999. Han, J., J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, "FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining", Proceedings KDD'00, Boston, MA, August 2000. Abraham Otero Data Mining 10/16 5 Future trends Data Mining guided by constraints: Multiple data mining techniques could benefit from some form of guidance or supervision by the user. Incorporating knowledge onto these techniques can improve the efficiency of the algorithms and help to discover more interesting knowledge. It will be necessary to develop a standard mechanism for representing the constraints. It will be necessary to develop intuitive user interfaces for the definition of these restrictions. A related issue is the incorporation of common sense into the databases and the data mining techniques . Ex. "All patients who have had a child in the hospital are women" (support = 100%, confidence = 100%). Abraham Otero Data Mining 11/16 Future trends Data Mining guided by constraints: Han, J. , V. S. Lakshmanan, and R. T. Ng, "Constraint- Based, Multidimensional Data Mining", COMPUTER (special issue on Data Mining), 32(8): 46-50, 1999. Wang, K., Y. He and J. Han, "Mining Frequent Itemsets Using Support Constraints", Proceedings 2000 Int. Conference on Very Large Data Bases (VLDB'00), Cairo, Egypt, Sept. 2000, pp. 43-52. Pei, J., J. Han, and L. V. S. Lakshmanan, "Mining Frequent Itemsets with Convertible Constraints", Proceedings 2001 Int. Conference on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001. Pei and J. Han "Can We Push More Constraints into Frequent Pattern Mining?", Proceedings KDD'00, Boston, MA, August 2000. Abraham Otero Data Mining 12/16 6 Future trends Mining complex objects: In most cases, data mining is applied to relational databases where information is represented by attributes that take a limited set of possible types (integers, dates, real ...). The data are vectors. Knowledge specific to each domain has a high complexity that often can not be expressed in a completely satisfactory manner by this simple representation. It is increasingly necessary to apply data mining techniques over more complex data. One of the most promising trends is "object-oriented mining”. Data Mining Abraham Otero 13/16 Future trends Mining complex objects: Liu K, KarguptaH, Bhaduri K, Ryan J. Distributed data mining bibliography, January 2006. http://www.csee.umbc.edu/ hillol/DDMBIB/. Kanellopoulos Y, Dimopulos T, Tjortjis C, Makris C (2006) Mining source code elements for comprehending object-oriented systems and evaluating their maintainability. SIGKDD Explorations 8(1):33– 40. Kailing K, Kriegel H-P, Pryakhin A, Schubert M (2004) Clustering multi-represented objects with noise. In: Proceedings of the 8th pacific-asia conference on knowledge discovery and data mining (PAKDD), Sydney, Australia, pp 394–403. Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explorations Newslett 5(1):59–68. Abraham Otero Data Mining 14/16 7 Future trends Others: Perform data processing in a more rapid, transparent and structured. Currently, up to 90% of the time spent on the process of knowledge discovery can be consumed in this phase. Increasing the usability of the data mining systems, allowing them to be used by users with less knowledge on computers / statistics / machine learning. Identification / discovery of patterns that evolve over time and characterization of the evolution. Abraham Otero Data Mining 15/16 Abraham Otero Data Mining 16/16 8