Download Data Mining - ICAR

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Why Intelligent Data
Analysis?
Joost N. Kok
Leiden Institute of Advanced
Computer Science
Universiteit Leiden
Overview
•
•
•
•
Data Analysis
Data Mining
Applications
Outlook
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Data Analysis
Data Mining
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• ``Data Mining is one of the five
key note technologies that will
have a major impact across a
wide range of industries within
the next three to five years’’
(Gartner)
• ``Data Mining is one of the top
ten new technologies in which
companies will invest during
the next five years’’ (Gartner)
• ``Data Mining is an overhyped
concept’’ (OTR)
Data Analysis
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Data analysis = Processing data
• Exploratory vs. Confirmatory
– are there interesting structures?
– can we predict the value?
• Descriptive vs. Inferential
– statement about data set
– draw more general conclusions
• Data analysis = process of
computing various summaries
and derived values from the
given collection of data
Tools
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Cookbook fallacy:
Data analysis = picking and
applying the right tool.
– Tools are not independent.
– Matching is an iterative process
(which needs intelligence).
Stat vs. ML
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Statistics
– Mathematics
• Machine Learning
– Experimental Computer Science
• ``Statistics is difficult’’
• ``Algorithms are not exact’’
Models
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Models vs. Algorithms
• Empirical vs. Mechanistic
Models
• Understanding vs. Prediction
• Models vs. Patterns
• Overfitting
• Constraints
Algorithms
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Enabling data analysis
• Too many: often no
foundations, no applications
• In practice only a restricted set
of algorithms is used
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
The nature of Data
• Different kinds of data
–
–
–
–
Numerical Data
Text
Images
Sound
• Raw data has
–
–
–
–
–
missing values
distortions
misrecording
inadequate sampling
etc.
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
The nature of data
• Data sets can be large
– horizontal
– vertical
• Curse of dimensionality
• Experiments
• Sampling
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
The nature of data
• Too little
– Example: storm situations
• Too much
– Example: image segmentation
•
•
•
•
Static vs. dynamic
Off-line vs. On-line
Infoglut
What is collected?
Overview
•
•
•
•
•
•
•
•
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Statistical methods and concepts
Bayesian methods
Time series
Rule induction
Neural networks
Fuzzy logic
Stochastic search methods
Applications
Overview
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
 Why Intelligent Data Analysis
 Fundamental Concepts of
Statistics
 Intelligent Data Analysis: Issues
and Challenges
 Artificial Neural Networks
 Fuzzy Logic
 Industrial Applications of NeuroFuzzy Networks
 Statistical Methods for Data
Analysis
 Time Series Analysis
Overview
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
 Chaos and Reality
 Bayesian Networks
 ANN Visualization Tools
 Rule Induction
 Evolutionary Systems
 Data Analysis in Real-World
Applications
Enrichment
• Data Fusion
– combine data sets
• Example:
– customer database
– survey information
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Data Mining
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Database technology
• Data visualization
• Data warehouse vs Operational
database
–
–
–
–
time-dependent
non-volatile
subject-oriented
integrated
• Target: decision making
Data Mining
Data Mining
•
•
•
•
•
•
Selection
Cleaning
Enrichment
Coding
Data Mining
Reporting
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Cleaning
•
•
•
•
•
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Remove duplicates
Check domain consistency
Remove data
Project data
Combine data in one table
Coding
•
•
•
•
Adress - Region
Date of birth - Age
Scaling of numerical data
Date - Number of months
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Data Mining
• SQL queries
• Clustering
• Pattern Recognition
ML
ES
KDD
DB
Visual
Statistics
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Nearest Neighbor
• Search k nearest points
Oil Search
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Shell research
• South-East Asia
coring
measurements
kinds of stone
Applications
Outlook
Outlook
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Positive
–
–
–
–
Moore’s Law
New kinds of computers
Data collection
More data is more easy reachable
• Negative
– Collective memory gets lost
– Infoglut
• Data battle
Outlook
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
• Merge of Machine Learning and
Statistics
• Algorithms
– Adaptive parameters
– Black Box data mining
• From suites to tailored tools
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Intelligent Data Analysis
• Intelligent Data Analysis
– User Interaction
– also uses tools from Machine
Learning
NetTalk
Title:
(Logo UL-onder)
Creator:
Adobe Illus trator( r) 6.0
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer , but not to
other ty pes of printers.
Sound
Sound
Generator
generator
Speech-synthesis
Speech-synthesis
expertsystem
system
expert
INTELLI
NetTalk
Neural Network