Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Kelby Lee 3-1 Overview Transaction Database What is Data Mining Data Mining Primitives Data Mining Objectives Predictive Modeling Knowledge Discovery Other Objectives to Data Mining What Data Mining is Not Other Factors in Data Mining Categorization Conclusion 3-2 Transaction Database Relation Consisting of Transactions TID (Transaction Identifier) Regularities between Transaction Behavior 3-3 Transaction Database Table 1.1 Transaction Database TID Customer Item Date Price Quantity --------------------------------------------------------------------------------------------------------------------------------100 C1 chocolate 01/11/2001 1.59 2 100 C1 ice cream 01/11/2001 1.89 1 200 C2 chocolate 01/12/2001 1.59 3 200 C2 candy bar 01/12/2001 1.19 2 200 C2 jackets 01/12/2001 120.39 2 300 C3 jackets 01/14/2001 168.88 1 300 C3 color shirts 01/14/2001 27.95 2 400 C4 jackets 01/15/2001 149.49 1 3-4 Association Rules A customer who buys chocolate will likely buy candy bar one type of Data Mining task 3-5 Discovered Rules Table 1.2 Discovered Rules Rule Bought this... ...also bought that ------------------------------------------------------------------------------------------------1 chocolate ice cream 2 candy bar chocolate 3 ski pants colored shirt 4 beer diaper 3-6 What is Data Mining Retrieve individual elements Given a name of a product, find price and producer Analysis Average monthly sales amount and derivation 3-7 Advances Allow For Large amounts of Data to be Handled Aspect of Analysis “Data Rich” but “Knowledge Poor” 3-8 Discover Patterns Improve Business Performance Exploit favorable patterns Avoid problematic patterns Increase Understanding Predict Outcome 3-9 Answer the Key Business Questions Who will buy? What will they buy? How much? Classification and Prediction What are the different types of Customers? Segmentation of Customers 3-10 Answer the Key Business Questions What relationship exists between customers or Website visitors and the products? Association What are the groupings hidden in the data? Clustering Analysis 3-11 Data Mining Definition Non Trivial Extraction of implicit, previously unknown, interesting, and potentially useful information from data 3-12 Different Types of Data Mining Business Data Mining Scientific Data Mining Internet Data Mining 3-13 Data Mining Applications Medical Control Theory Engineering Public Administration Marketing and Finance Data Mining on the Web Scientific Data Base Fraud Detection 3-14 Data Mining Primitives Fundamental Elements Needed to Define a Data Mining Task Eight Elements (P,D,K,B,T,M,I,U) 8 - Tuple 3-15 Elements P - Problem Specification D - Task Relevant Data K - Kind of Knowledge to be Mined B - Background Knowledge T - Specific algorithms or techniques M - Models developed or knowledge patterns extracted I - Interestingness U- User 3-16 Diagram 3-17 Relationship between Elements User Defines Problem (P) and specifies Interestingness (I) Data Miner with K and T as core elements utilizing D and B and incorporates I Data Miner produces M 3-18 Data Mining Objectives Discovery Finding human interpretable patterns describing the data Prediction Using some variables or fields in database to predict unknown or future values or other variables of interest 3-19 Data Mining Objectives Knowledge Discovery Stage somewhat prior to prediction where information is insufficient Closer to decision support 3-20 Predictive Modeling Predict Values Based on Similar Groups of Data Submit records with some unknown fields and system will predict value 3-21 Predictive Modeling Pattern Recognition Association of an observation to past experience or knowledge Interchangeable with classification 3-22 Predictive Modeling Classification Process of assigning finite set of labels to an observation Estimation Assign infinite number of numeric labels to an observation 3-23 Knowledge Discovery Find Patterns in Data Base If someone buys one thing, what else will they buy Interesting + Certain = Knowledge Output called Discovered Knowledge KDD - Knowledge Discovery in Data Base 3-24 Data Mining Is about why, about hidden regularities, important aspect related to perception, learning and evolving Decision support process in which we search patterns of information in data Once found, display in suitable format 3-25 Four Points of KDD Discovered Knowledge Represented in High-Level Language Accurately Portray contents of Database Interesting to user Process is Efficient 3-26 Important Issues Human Centered Under control of human user to meet human needs Incorporate Interestingness Provide Various Types Provide Visualization 3-27 Other Objectives Forensic analysis Applying extracted patterns to find anomalous or unusual data elements largely involved in business applications Find out what the norm is and find those that deviate from the norm 3-28 What Data Mining is Not Analysis vs Monitoring Analysis - previously collected information Monitoring Collect data as it comes in and compare to set of conditions Unexpected Discovery Must have general goal in mind 3-29 Other Factors in Categorization Data Retention Data is retained for future pattern matching Pattern Distillation Analyse data, extract pattern, leave data behind 3-30 Conclusion Transaction Database What is Data Mining Data Mining Primitives Data Mining Objectives Predictive Modeling Knowledge Discovery Other Objectives to Data Mining What Data Mining is Not Other Factors in Data Mining Categorization 3-31