Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge Discovery from DataBases (KDD) A.K.A. Data Mining & by other names as well Carlo Zaniolo UCLA CS Dept 1 What is Data Mining? Data mining Extraction of interesting (non-trivial, implicit, previously unknown & potentially useful) patterns or knowledge from huge amount of data. Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, ... 2 Why Data Mining? Explosive growth of data available—the Big-Data Revolution Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, ... We are drowning in data -- but starving for knowledge! Knowledge is the key to improve your business and operations Data Mining tools and techniques: automate knowledge discovery from large data sets 3 DM Applications E.g.: Marketing products to customers: 1. Find clusters of customers who share the same characteristics: interest, income level, spending habits, etc., 2. Determine customer purchasing patterns over time 3. Cross-market analysis—Find associations/corelations between product sales (and predict on that basis) 4. Profiling—What types of customers buy what products. 4 DM Applications: Fraud Detection and Security Approaches: Clustering & outlier detection, looking for unusual patterns. Applications: Health care, retail, credit card service, telecomm. Auto insurance: ring of collisions Money laundering: suspicious monetary transactions Medical insurance Professional patients, ring of doctors, and ring of references Unnecessary or correlated screening tests Telecommunications: phone-call fraud Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm Anti-terrorism 5 New Applications Software Bug Mining Graph Mining: e.g. finding social networks Web Mining Personalization and reccomendations Mining and Scientific Applications—Biology Spatio-Temporal and GIS: Find geographical clusters. Mine for trajectories and travel plans. Multi Relational Data Mining Mining for knowledge and relationship from multiple tables, as in Inductive Logic Programming. 6 New Research Topics Theoretical foundations Statistical Data Mining Visual Data Mining Privacy-Preserving Data Mining 7 A Historical Perspective 1. Machine Learning (AI) 2. Decision Support Environments: Scalability, Integration, Warehousing, OLAP (DB) 3. Statistical foundation and synergism with other disciplines—e.g., visualization. 4. Mining Streams of sensor & web data 8 Work plan Introduction Core Techniques: 1. Classification, 2. Association, and 3. Clustering Process and Systems New Applications and Research Directions 9 Knowledge Discovery (KDD) Process Data mining—core of knowledge discovery process Useful New knowledge Pattern& Rules Auditing Task-Specific Data Data Mining Data Warehouse Data Selection & preprocessing Data Cleaning Data Integration Data Sources: transactional & operational data 10