Download Data Mining for Business Intelligence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining for Business
Intelligence
Learning Objectives




Define data mining as an enabling technology
for business intelligence
Understand the objectives and benefits of
business analytics and data mining
Recognize the wide range of applications of
data mining
Learn the standardized data mining processes



5-2
CRISP-DM,
SEMMA,
KDD, …
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Learning Objectives



Understand the steps involved in data
preprocessing for data mining
Learn different methods and algorithms of
data mining
Build awareness of the existing data mining
software tools


5-3
Commercial versus free/open source
Understand the pitfalls and myths of data
mining
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Why Data Mining?





5-4
More intense competition at the global scale
Recognition of the value in data sources
Availability of quality data on customers,
vendors, transactions, Web, etc.
Consolidation and integration of data
repositories into data warehouses
The exponential increase in data processing
and storage capabilities; and decrease in cost
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Definition of Data Mining




5-5
The nontrivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data stored in
structured databases.
- Fayyad et al., (1996)
Keywords in this definition: Process, nontrivial,
valid, novel, potentially useful, understandable.
Data mining: a misnomer?
Other names: knowledge extraction, pattern
analysis, knowledge discovery, pattern
searching,…
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining at the Intersection of
Many Disciplines
ial
e
Int
tis
tic
s
c
tifi
Ar
Pattern
Recognition
en
Sta
llig
Mathematical
Modeling
Machine
Learning
Databases
Management Science &
Information Systems
5-6
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
ce
DATA
MINING
Data Mining Characteristics/Objectives




5-7
Source of data for DM is often a consolidated
data warehouse
DM environment is usually a client-server or a
Web-based information systems architecture
Data is the most critical ingredient for DM
which may include soft/unstructured data
Data mining tools’ capabilities and ease of use
are essential (Web, Parallel processing, etc.)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data in Data Mining



Data: a collection of facts usually obtained as the
result of experiences, observations, or experiments
Data may consist of numbers, words, images, …
Data: lowest level of abstraction (from which
information and knowledge are derived)
Data
- DM with different
data types?
Categorical
Nominal
5-8
- Other data types?
Numerical
Ordinal
Interval
Ratio
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
What Does DM Do?

DM extract patterns from data


Types of patterns




5-9
Pattern? A mathematical (numeric and/or
symbolic) relationship among data items
Association
Prediction
Cluster (segmentation)
Sequential (or time series) relationships
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
A Taxonomy for Data Mining Tasks
Data Mining
Learning Method
Popular Algorithms
Supervised
Classification and Regression Trees,
ANN, SVM, Genetic Algorithms
Classification
Supervised
Decision trees, ANN/MLP, SVM, Rough
sets, Genetic Algorithms
Regression
Supervised
Linear/Nonlinear Regression, Regression
trees, ANN/MLP, SVM
Unsupervised
Apriory, OneR, ZeroR, Eclat
Link analysis
Unsupervised
Expectation Maximization, Apriory
Algorithm, Graph-based Matching
Sequence analysis
Unsupervised
Apriory Algorithm, FP-Growth technique
Unsupervised
K-means, ANN/SOM
Prediction
Association
Clustering
Outlier analysis
5-10
Unsupervised
K-means, Expectation Maximization (EM)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Applications

Customer Relationship Management





Banking and Other Financial



5-11
Maximize return on marketing campaigns
Improve customer retention
Maximize customer value
Identify and treat most valued customers
Automate the loan application process
Detecting fraudulent transactions
Optimizing cash reserves with forecasting
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Applications (cont.)

Retailing and Logistics




Manufacturing and Maintenance


5-12
Optimize inventory levels at different locations
Improve the store layout and sales promotions
Optimize logistics by predicting seasonal effects
Predict/prevent machinery failures
Discover novel patterns to improve product quality
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Applications

Brokerage and Securities Trading





Insurance



5-13
Predict changes on certain bond prices
Forecast the direction of stock fluctuations
Assess the effect of events on market movements
Identify and prevent fraudulent activities in trading
Forecast claim costs for better business planning
Optimize marketing to specific customers
Identify and prevent fraudulent claim activities
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Applications (cont.)










5-14
Computer hardware and software
Science and engineering
Government and defense
Homeland security and law enforcement
Travel industry
Healthcare
Highly popular application
areas for data mining
Medicine
Entertainment industry
Sports
Etc.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process: CRISP-DM
1
Business
Understanding
2
Data
Understanding
3
Data
Preparation
Data Sources
6
4
Deployment
Model
Building
5
Testing and
Evaluation
5-15
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Process: CRISP-DM
Step
Step
Step
Step
Step
Step

5-16
1:
2:
3:
4:
5:
6:
Business Understanding
Data Understanding
Data Preparation (!)
Model Building
Testing and Evaluation
Deployment
Accounts for
~85% of total
project time
The process is highly repetitive and
experimental
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Related documents