Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MIS 451 Building Business Intelligence Systems Introduction to Data Mining Why data mining? OLAP can only provide shallow data analysis -- what Ex: sales distribution by product 2 Why data mining? Shallow data analysis is not sufficient to support business decisions -- how Ex: how to boost sales of other products Ex: when people buy product 6 what other products do they are likely to buy? – cross selling 3 Why data mining? OLAP can only do shallow data analysis OLAP is based on SQL SELECT PRODUCTS.PNAME, SUM(SALESFACTS.SALES_AMT) FROM DBSR.PRODUCTS PRODUCTS, DBSR.SALESFACTS SALESFACTS WHERE ( ( PRODUCTS.PRODUCT_KEY = SALESFACTS.PRODUCT_KEY ) ) GROUP BY PRODUCTS.PNAME; The nature of SQL decides that complicated algorithm cannot be implemented with SQL. Complicated algorithms need to be developed to support deep data analysis – data mining 4 Why data mining? OLAP results generated from data sets with large number of attributes are difficult to be interpreted Ex: cluster customers of my company --- target marketing Pick two attributes related to a customer: income level and sales amount 5 Why data mining? Ex: cluster customers of my company --- target marketing Pick three attributes related to a customer: income level, education level and sales amount 6 What is data mining? Data mining is a process to extract hidden and interesting patterns from data. Data mining is a step in the process of Knowledge Discovery in Database (KDD). 7 Steps of the KDD Process Step 4: Data Mining Step 2: Cleaning Step 5: Interpretation & Evaluation Knowledge Step 3: Transformation Patterns Step 1: Selection Transformed Data Preprocessed Data Data Target Data 8 Steps of the KDD Process Step 1: select interested columns (attributes) and rows (records) to be mined. Step 2: clean errors from selected data Step 3: data are transformed to be suitable for high performance data mining Step 4: data mining Step 5: filter out non-interesting patterns from data mining results 9 Data mining – on what kind of data Transactional Database Data warehouse Flat file Web data Web content Web structure Web log 10 Major data mining tasks Association rule mining – cross selling Clustering – target marketing Classification – potential customer identification, fraud detection 11 Reading : data mining book chapter 1 12