Download The Art and Science of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Speakers:
Prof Y V Hui, CityU
Dr H P Lo, CityU
Dr Sammy Yuen, CityU
Dr K W Cheng, SAS Institute
Mr Steven Parker, Standard Chartered
Knowledge Discovery Centre: CityU-SAS Partnership
1
The Art and Science of Data Mining
Y V Hui
City University of Hong Kong
Knowledge Discovery Centre: CityU-SAS Partnership
2
The Driving Forces
• Specialization and focus in business
- To satisfy the needs of customers
- To improve and develop specific
business strategies and processes
- Personalization through mass
customization
Knowledge Discovery Centre: CityU-SAS Partnership
3
The Driving Forces
• Challenges
- local and global competition
- distributed business operations
- product innovation
• Technology development
• Benefit, cost and risk on a product or
customer basis
Knowledge Discovery Centre: CityU-SAS Partnership
4
Data Mining
• Also known as knowledge discovery in
databases. Data mining digs out valuable
information from large and messy data.
(Computer scientist’s definition)
• Data mining is a knowledge discovery
process. It’s the integration of business
knowledge, people, information, statistics
and computing technology.
Knowledge Discovery Centre: CityU-SAS Partnership
5
Data Mining is Hot
• Ten Hottest Job, Time, 22 May, 2000
• 10 emerging areas of technology, MIT’s
Magazine of Technology Review,
Jan/Feb, 2001
Knowledge Discovery Centre: CityU-SAS Partnership
6
Data Mining Philosophy
• A powerful enabler of competitive
advantage.
• Data mining is driven from business
knowledge.
• Data mining is about enabling people to
discover actionable information about
their business.
• Return of profit isn’t about algorithms
Knowledge Discovery Centre: CityU-SAS Partnership
7
Management’s
Decision World
Scope of Data Mining
Interface
Business outlook
Industry conditions
Product offering
Customer analysis
Strategic options
Competitive actions
etc
Data Miner’s
Analytical World
Problem
Project design
development
Data collection and
and management
preparation
Model building
Reporting
Validation
and evaluations
Knowledge Discovery Centre: CityU-SAS Partnership
8
Project Management
• Cross-functional team
• System architecture
Knowledge Discovery Centre: CityU-SAS Partnership
9
Successful applications
• Business transaction
- risks and opportunities
• Customer relationship management
- personalization, target marketing
• Electronic commerce & web
- web mining
Knowledge Discovery Centre: CityU-SAS Partnership
10
Successful applications
•
•
•
•
Science & engineering
Health care
Multi-media
Others
Knowledge Discovery Centre: CityU-SAS Partnership
11
Data Mining Process
Understanding of business
Problem identification
Knowledge Discovery Centre: CityU-SAS Partnership
12
Understanding Your Business
• Do we have a problem?
- What is the current situation? Are there any
undesirable situations that need attention?
- Are there any conditions, processes, etc,
that could be improved?
- Are any problems foreseeable that could
affect the business?
- Are there any potential opportunities that
the company may capitalize on?
A problem is a learning opportunity
Knowledge Discovery Centre: CityU-SAS Partnership
13
Understanding Your Problem
•
•
•
•
•
•
•
Operational or analytical
Convention rule or knowledge discovery
Product based or customer based
Market research or data mining
Ownership of the information
Privacy
Added value
Knowledge Discovery Centre: CityU-SAS Partnership
14
Data Mining Process
Collecting relevant information
Understanding of business
Problem identification
Knowledge Discovery Centre: CityU-SAS Partnership
15
Collecting Relevant Information
•
•
•
•
Data
Data
Data
Data
Search
Collection
Preparation
Mining Database
Knowledge Discovery Centre: CityU-SAS Partnership
16
Data Search
• Exploring the problem space.
Don’t let the data drive the problem.
• Measurement
• Exploring the data sources
Knowledge Discovery Centre: CityU-SAS Partnership
17
Data Collection
•
•
•
•
Data retrieval
Data audit
Data set assembly and data warehouse
Survey
Knowledge Discovery Centre: CityU-SAS Partnership
18
Data Preparation
•
•
•
•
•
•
Data representation
Data exploration
Data normalization
Data transformation
Imputation of missing data
Data tuning
Knowledge Discovery Centre: CityU-SAS Partnership
19
Data Mining Database
• Variable selection
• Record selection
• Data set partition
Knowledge Discovery Centre: CityU-SAS Partnership
20
Data Mining Process
Learning
Collecting relevant information
Model building
Understanding of business
Problem identification
Knowledge Discovery Centre: CityU-SAS Partnership
21
Model Building
• Model based vs non-model based
y1,y2,…,yp=f(x1, …, xq)
Inputs
Outputs
x1, …, xq
y1, …, yp
Knowledge Discovery Centre: CityU-SAS Partnership
22
Model Building
• Parametric vs nonparametric
Knowledge Discovery Centre: CityU-SAS Partnership
23
Model Building
•
•
•
•
Estimation vs trial and error
Directed vs undirected
Multidimensional analysis
Large data set vs small data set
Knowledge Discovery Centre: CityU-SAS Partnership
24
Data Mining Algorithms
Online Analytical
Processing
Discovery Driven Methods
Description
SQL
Prediction
Query Tools
Classification Regressions
Visualization
Clustering
Association
Decision Trees
Neural Networks
Sequential Analysis
Knowledge Discovery Centre: CityU-SAS Partnership
25
Online Analytical Processing
• Query and reporting
Example of SQL query:
How many credit-card customers who made
purchases of over $1,000 on sporting goods
in December have at least $20,000 of
available credit?
• Manual and validation driven
Knowledge Discovery Centre: CityU-SAS Partnership
26
Estimation and Prediction
• Statistical models
• Neural network
Example:
Housing price valuation model
Knowledge Discovery Centre: CityU-SAS Partnership
27
Classification Algorithms
•
•
•
•
•
Statistical techniques
Neural networks
Genetic algorithms
Nearest neighbor method
Rule induction and decision tree
Example: Customer segmentation and
buying behavior description
Knowledge Discovery Centre: CityU-SAS Partnership
28
Association Rules
• Apriori algorithm
Example:
Market basket analysis, cross selling
analysis
Knowledge Discovery Centre: CityU-SAS Partnership
29
Sequential Analysis
• Count-all algorithm
• Count-some algorithm
Example:
Attached mailing, add-on sales
Knowledge Discovery Centre: CityU-SAS Partnership
30
Algorithms Comparison
• No single data mining algorithm can
outperform any other.
Try different algorithms and draw conclusions
from the results. Use your business knowledge.
• Neural networks do no better than statistical
models when the underlying structure is
known. However, neural networks detect
hidden interactions and nonlinearity.
Use the prior information if available.
Knowledge Discovery Centre: CityU-SAS Partnership
31
Algorithms Comparison
• Data mining algorithms cannot handle
dependent records.
Use the prior information. Statistical
models help.
• Data tuning and dimension reduction
enhance data mining before and after
the analysis.
Statistical techniques help.
Knowledge Discovery Centre: CityU-SAS Partnership
32
Data Mining Process
Learning
Collecting relevant data
Model building
Understanding of business
Problem identification
Business strategy
and evaluation
Action
Knowledge Discovery Centre: CityU-SAS Partnership
33
Trends that Effect Data Mining
• Data trends
- data explosion
- data types
Knowledge Discovery Centre: CityU-SAS Partnership
34
Trends that Effect Data Mining
• Hardware trends
- memory
- processing speed
- storage
Knowledge Discovery Centre: CityU-SAS Partnership
35
Trends that Effect Data Mining
• Network trends
- network connectivity
- distributed databases
• Wireless communication
Knowledge Discovery Centre: CityU-SAS Partnership
36
Trends that Effect Data Mining
• Scientific computing trends
- theory, experiment and simulation
Knowledge Discovery Centre: CityU-SAS Partnership
37
Trends that Effect Data Mining
• Business trends
- total quality management,
- customer relationship management,
- business process reengineering,
- enterprise resources planning,
- supply chain management,
- business intelligence and knowledge
management,
- e – business and m – business
Knowledge Discovery Centre: CityU-SAS Partnership
38
Trends that Effect Data Mining
• Privacy and Security
Knowledge Discovery Centre: CityU-SAS Partnership
39
Pot of Gold
• The benefits of knowing one’s business
and customers become so critical that
technologies are coming together to
support data mining.
• Data mining is not a cybernetic magic
that will turn your data into gold. It’s
the process and result of knowledge
production, knowledge discovery and
knowledge management.
Knowledge Discovery Centre: CityU-SAS Partnership
40
Related documents