Download Data mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
MIS 451
Building Business Intelligence Systems
Introduction to Data Mining
Why data mining?

OLAP can only provide shallow data analysis -- what
 Ex: sales distribution by product
2
Why data mining?

Shallow data analysis is not sufficient to support
business decisions -- how


Ex: how to boost sales of other products
Ex: when people buy product 6 what other products do
they are likely to buy? – cross selling
3
Why data mining?

OLAP can only do shallow data analysis

OLAP is based on SQL
SELECT PRODUCTS.PNAME, SUM(SALESFACTS.SALES_AMT)
FROM DBSR.PRODUCTS PRODUCTS, DBSR.SALESFACTS SALESFACTS
WHERE ( ( PRODUCTS.PRODUCT_KEY = SALESFACTS.PRODUCT_KEY ) )
GROUP BY PRODUCTS.PNAME;


The nature of SQL decides that complicated algorithm cannot be
implemented with SQL.
Complicated algorithms need to be developed to support
deep data analysis – data mining
4
Why data mining?

OLAP results generated from data sets with large number of attributes
are difficult to be interpreted

Ex: cluster customers of my company --- target marketing

Pick two attributes related to a customer: income level and sales amount
5
Why data mining?


Ex: cluster customers of my company --- target marketing
Pick three attributes related to a customer: income level, education level
and sales amount
6
What is data mining?

Data mining is a process to extract hidden and
interesting patterns from data.

Data mining is a step in the process of Knowledge
Discovery in Database (KDD).
7
Steps of the KDD Process
Step 4:
Data Mining
Step 2:
Cleaning
Step 5:
Interpretation
& Evaluation
Knowledge
Step 3:
Transformation
Patterns
Step 1:
Selection
Transformed
Data
Preprocessed
Data
Data
Target Data
8
Steps of the KDD Process





Step 1: select interested columns (attributes) and
rows (records) to be mined.
Step 2: clean errors from selected data
Step 3: data are transformed to be suitable for high
performance data mining
Step 4: data mining
Step 5: filter out non-interesting patterns from data
mining results
9
Data mining – on what kind of data




Transactional Database
Data warehouse
Flat file
Web data



Web content
Web structure
Web log
10
Major data mining tasks

Association rule mining – cross selling

Clustering – target marketing

Classification – potential customer identification,
fraud detection
11

Reading : data mining book chapter 1
12