Download Data Mining

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Applications of Artificial Intelligence
Data Mining
8
The Problem
Data Mining
Knowledge Discovery
Data Mining Methods
Knowledge Discovery Methods
Slide 1
©J.K. Debenham, 2003
Applications of Artificial Intelligence
The Problem
• How do we identify useful patterns
in large amounts of noisy,
incomplete raw data?
Slide 2
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Remember
•
No inductive technique can
produce a definitive “answer”.
• No inductive technique can make
a firm decision.
Slide 3
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Mining
The Problem
8
Data Mining
Knowledge Discovery
Data Mining Methods
Knowledge Discovery Methods
Slide 4
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Growth databases has far outpaced
our ability to interpret this data
Need for a new generation of tools
and techniques for automated and
intelligent database analysis
These tools and techniques are the
subject of the rapidly emerging field
of knowledge discovery in
databases (KDD)
Slide 5
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Advances in data collection:
from remote sensors
widespread use of bar codes
computerisation of
transactions
have generated a flood of data.
Slide 6
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Advances in data storage
technology:
faster, higher capacity, and
cheaper storage devices
better database management
systems, and data warehousing
technology
have allowed us to transform this
data deluge into “mountains” of
stored data.
Slide 7
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Eg
Wal-Mart handles over 20 million
transactions a day
Mobil Oil Corporation, is developing a
data warehouse capable of storing
over 100 terabytes of data related to
oil exploration
NASA Earth Observing System (EOS)
projected to generate the order of 50
gigabytes of data per hour when
operational in the late 1990s
Slide 8
©J.K. Debenham, 2003
Applications of Artificial Intelligence
History
First International Conference on
Knowledge Discovery and Data
Mining (1995)
Slide 9
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Finding useful patterns (or nuggets of
knowledge) in raw data:
knowledge discovery in databases
data mining
knowledge extraction
information discovery
information harvesting
data archaeology
data pattern processing
Slide 10
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Knowledge Discovery in Databases
(KDD)
coined in 1989
refers to the broad process of
finding knowledge in data
emphasises the “high-level”
application of particular data
mining methods
used by artificial intelligence and
machine learning researchers
Slide 11
©J.K. Debenham, 2003
Applications of Artificial Intelligence
The term
Data Mining
has been commonly used by
statisticians, data analysts and the
MIS community
“fishing” or “dredging,” and
sometimes a “mining,” can be a
dangerous activity in that invalid
patterns can be discovered without
proper interpretation
Slide 12
©J.K. Debenham, 2003
Applications of Artificial Intelligence
KDD
overall process of discovering useful
knowledge from data
= data mining + additional steps
to ensure that useful information
(knowledge) is derived from the data
typically interactive and iterative,
involving the repeated application of
data mining and the interpretation of
the patterns generated
Slide 13
©J.K. Debenham, 2003
Applications of Artificial Intelligence
KDD related to
machine learning,
pattern recognition,
databases,
statistics,
artificial intelligence,
knowledge acquisition for expert
systems, and
data visualisation
Slide 14
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data warehousing
MIS trend for collecting and cleaning
transactional data and making them
available on-line
OLAP (on-line analytical processing)
provides multi-dimensional data
analysis, which is superior to SQL
KDD and OLAP: new generation of
intelligent information extraction and
management tools
Slide 15
©J.K. Debenham, 2003
Applications of Artificial Intelligence
A simple data set with 2 classes
Slide 16
©J.K. Debenham, 2003
Applications of Artificial Intelligence
horizontal axis represents the income
of the person
vertical axis represents the total
personal debt of the person
x’s represent persons who have
defaulted on their loans,
o’s represent persons whose loans
are in good status.
Slide 17
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Mining
The Problem
Data Mining
8
Knowledge Discovery
Data Mining Methods
Knowledge Discovery Methods
Slide 18
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Knowledge discovery in
databases
the non-trivial process of
identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.
Slide 19
©J.K. Debenham, 2003
Applications of Artificial Intelligence
• Data: a set of facts F (eg., cases in a
database). Eg: the 23 cases.
• Pattern: is an expression E in a
language L describing facts in a
subset FE of F. E is called a pattern
if it is simpler than the enumeration
of all facts in FE. Eg the pattern: “If
income < $t, then person has
defaulted on the loan”.
Slide 20
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Single threshold on income variable
to try to classify the loan data set
Slide 21
©J.K. Debenham, 2003
Applications of Artificial Intelligence
• Process: KDD is a multi-step process:
data preparation, search for patterns,
knowledge evaluation, and refinement
involving iteration after modification.
• Non-trivial: to have some degree of
search autonomy. Eg: computing the
mean income of persons does not
qualify as discovery.
• Validity: Discovered patterns should be
valid on new data with some degree of
certainty.
Slide 22
©J.K. Debenham, 2003
Applications of Artificial Intelligence
• A measure of certainty is a function
C mapping expressions in L to an
ordered measurement space MC. An
expression E ∈ L about a subset
FE # F can be assigned a certainty
measure c = C(E, F).
–Eg, if the boundary for the pattern
is moved to the right its certainty
measure would drop since more
good loans would be admitted
into the shaded region (no loan).
Slide 23
©J.K. Debenham, 2003
Applications of Artificial Intelligence
• Novel: Novelty can be measured with
respect to changes in data (by
comparing current values to previous or
expected values) or knowledge (how a
new finding is related to old ones).
• Potentially Useful: The patterns should
potentially lead to some useful actions.
Eg: the expected increase in profits to
the bank (in dollars) associated with the
decision rule.
Slide 24
©J.K. Debenham, 2003
Applications of Artificial Intelligence
• Utility function: U maps expressions in
L to a partially or totally ordered
measure space MU: hence, u = U(E, F).
• Ultimately Understandable: A goal of
KDD is to make patterns
understandable to facilitate a better
understanding of the underlying data.
• Simplicity measure: a function S
mapping expressions E ∈ L to a
partially or totally ordered measure
space MS: hence, s = S(E, F).
Slide 25
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Interestingness: overall measure of
pattern value, combining validity,
novelty, usefulness, and simplicity
Some KDD systems have an explicit
interestingness function
i = I(E,F,C,N,U,S) which maps
expressions in L to a measure space
MI.
Other systems define interestingness
indirectly via an ordering of the
discovered patterns.
Slide 26
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Knowledge: A pattern E in L is called
knowledge if for some userspecified threshold i ∈ MI,
I(E,F,C,N,U,S) > i.
by no means absolute.
purely user-oriented, and
determined by whatever functions
and thresholds the user chooses
I is “interestingness”, E is
“pattern”, F is “data”, C is
“validity”, N is “novelty”, U is
“utility”, S is “simplicity”
Slide 27
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Mining is a step in the KDD
process consisting of particular
data mining algorithms that, under
some acceptable computational
efficiency limitations, produces a
particular enumeration of patterns
Ej over F.
Slide 28
©J.K. Debenham, 2003
Applications of Artificial Intelligence
KDD Process is the process of
using data mining methods
(algorithms) to extract (identify)
knowledge according to the
specifications of measures and
thresholds, using the database F
along with any required preprocessing, sub-sampling, and
transformations of F.
Slide 29
©J.K. Debenham, 2003
Applications of Artificial Intelligence
• The data mining component of the KDD
process is mainly concerned with means
by which patterns are extracted and
enumerated from the data.
• Knowledge discovery
–t h e
evaluation
and
possibly
interpretation of the patterns to make
the decision of what constitutes
knowledge and what does not.
–the choice of encoding schemes, preprocessing, sampling, and projections
of the data prior to data mining.
Slide 30
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Overview of KDD process.
Slide 31
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Steps in KDD process:
1 Developing an understanding
of the application domain, the
relevant prior knowledge, and
the goals of the end-user.
2 Creating a target data set:
selecting a data set, or
focusing on a subset of
variables or data samples, on
which discovery is to be
performed.
Slide 32
©J.K. Debenham, 2003
Applications of Artificial Intelligence
3 Data cleaning and pre-processing:
removal of noise and outliers,
collecting the necessary information
to account for noise, strategies for
missing data fields, accounting for
time and other changes.
4 Data reduction and projection:
representing the data depending on
the goal of the task. Dimensionality
reduction to reduce the effective
number of variables under
consideration.
Slide 33
©J.K. Debenham, 2003
Applications of Artificial Intelligence
5 Choose the data mining task: is goal
of the KDD process: classification,
regression, clustering, etc?
6 Choose the data mining algorithm(s):
select method(s) to be used to search
for patterns. Decide which models
and parameters are appropriate.
Match data mining method with the
overall criteria of the KDD process
–(eg. user may be more interested in
understanding the model than its
predictive capabilities).
Slide 34
©J.K. Debenham, 2003
Applications of Artificial Intelligence
7 Data mining: search for patterns,
classify rules or trees, regression,
clustering, etc.
8 Interpret mined patterns.
9 Consolidate discovered knowledge:
incorporate knowledge into the
performance system, or simply
document it. Includes checking for
and resolving potential conflicts with
previously believed (or extracted)
knowledge.
Slide 35
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Mining
The Problem
Data Mining
Knowledge Discovery
8
Data Mining Methods
Knowledge Discovery Methods
Slide 36
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data mining component of the KDD
process
–iterative application of particular
data mining methods
Terms
–model f(x) = αx2 + βx is a model.
–pattern is an instantiation of a
model, eg., f(x) = 3x2 + x is a pattern
Slide 37
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data mining
–fitting models to,
–or determining patterns from,
observed data.
Fitted models are inferred knowledge:
–whether or not the models reflect
useful or interesting knowledge is
part of the overall, interactive KDD
process where subjective human
judgment is usually required.
Slide 38
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Two primary formalisms for model fitting
–statistical approach allows for nondeterministic effects (for example,
f(x) = αx + e, where e could be a
Gaussian random variable)
–logical approach is purely
deterministic [f(x) = αx] does not
admit uncertainty
Focus on the statistical/probabilistic
approach
–the most widely-used in practice
Slide 39
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data mining methods are based on
–machine learning,
–pattern recognition,
–statistics: classification, clustering,
graphical models, etc.
bewildering array of different algorithms
Consist of three primary components:
–model representation
–model evaluation, and
–search
Slide 40
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Two “high-level” goals of data mining:
–Prediction: involves using some
variables or fields in the database to
predict unknown or future values of
other variables of interest.
–Description: focuses on finding
human-interpretable patterns
describing the data.
in KDD, description tends to be more
important than prediction
Slide 41
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Primary Data Mining tasks:
• Classification
• Regression
• Clustering
• Summarisation
• Dependency Modelling
• Change and Deviation Detection
Slide 42
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Classification
–learning a function that classifies a
data item into one of several
predefined classes.
Eg: classifying trends in financial
markets and automated identification
of objects of interest in large image
databases
Eg: partition loan data into two class
regions (it is not possible to separate
the classes perfectly using a linear
decision boundary.)
Slide 43
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Linear classification boundary
Slide 44
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Regression
–learning a function which maps a
data item to a prediction variable
Eg: estimating the probability that a
patient will die given test results,
predicting consumer demand for a new
product as a function of advertising
expenditure, “total debt” is fitted as a
linear function of “income”:
–the fit is poor since there is only a
weak correlation between the two
variables.
Slide 45
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Regression for the loan data set
Slide 46
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Clustering
–identify finite set of categories or
clusters to describe the data
Categories may be mutually exclusive
and exhaustive, or otherwise
Eg: discover sub-populations for
consumers in marketing databases,
loan data set into 3 clusters
–clusters overlap, data points belong
to more than one cluster.
Slide 47
©J.K. Debenham, 2003
Applications of Artificial Intelligence
– Labels replaced by +’s because class
membership is no longer assumed known
Slide 48
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Probability density estimation
• closely related to clustering
• consists of techniques for
estimating the joint multivariate probability density
function of all of the variables/
fields in the database
Slide 49
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Summarisation
–methods for finding a compact
description for a subset of data
Eg: tabulating the mean and standard
deviations for all fields, the derivation
of summary rules, multivariate
visualisation techniques, and the
discovery of functional relationships
between variables.
often applied to interactive exploratory
data analysis and automated report
generation
Slide 50
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Dependency Modelling
–finding model which describes
dependencies between variables
Two level of models:
–the structural level
»specifies (often in graphical form)
which variables are locally
dependent on each other
–the quantitative level
»specifies the strengths of the
dependencies
Slide 51
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Dependency Modelling contd
Eg: Probabilistic dependency networks
–use conditional independence to
specify the structure of the model
and probabilities to specify the
strengths of the dependencies.
–increasingly finding applications
»Eg: the development of
probabilistic medical expert
systems from databases,
information retrieval, and
modelling of the human genome.
Slide 52
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Change and Deviation
Detection
–focuses on discovering the
most significant changes in
the data from previously
measured or normative
values.
Slide 53
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Components of Data Mining
Algorithms
Three primary components in
any data mining algorithm:
–model representation,
–model evaluation,
–search.
Slide 54
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Model Representation
–the language L for describing
patterns
If representation is too limited, then no
amount of training will produce an
accurate model for the data
– Eg: a decision tree representation, using
univariate (single-field) node-splits, partitions
the input space into hyper-planes which are
parallel to the attribute axes. Such a decisiontree method cannot discover from data the
formula x = y no matter how much training data
it is given.
Slide 55
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Model Representation contd
So it is important to understand the
representational assumptions
inherent in the method.
More powerful representational power
for models increases the danger of
over-fitting the training data resulting
in reduced prediction accuracy on
unseen data. In addition the search
becomes much more complex and
interpretation of the model is typically
more difficult.
Slide 56
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Model Evaluation
estimates how well a particular pattern
meets the criteria of the KDD process
Evaluation of predictive accuracy
(validity) is based on cross validation
Evaluation of descriptive quality
involves predictive accuracy, novelty,
utility, and understandability
Both logical and statistical criteria can be used for
model evaluation. Eg: the maximum likelihood
principle chooses the parameters for the model
which yield the best fit to the training data.
Slide 57
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Search Method
Two components
–Parameter Search
–Model Search
• Parameter search: for parameters which
optimise the model evaluation criteria
given data and a fixed representation.
–Eg: For general models: greedy
iterative methods are commonly used,
eg., the gradient descent method of
backpropagation for neural networks.
Slide 58
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Search Method contd
•Model Search
• a loop over the parameter search
method
• For each specific model
representation, the parameter search
method is instantiated to evaluate the
quality of that particular model
• use heuristic search techniques due
to size of the space.
Slide 59
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data mining methods
–Decision Trees and Rules
–Non-linear Regression and
Classification Methods
–Example-based Methods
–Probabilistic Graphical
Dependency Models
–Relational Learning Models
Slide 60
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Decision Trees and Rules
that use univariate splits have a simple
representational form, making the
inferred model relatively easy to
comprehend by the user
restriction to a particular tree or rule
representation can significantly
restrict the functional form (and thus
the approximation power) of the model
Slide 61
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Threshold “split” severely limits the
type of classification boundaries
Slide 62
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Solution
enlarge the model space to allow
more general expressions (such as
multi-variate hyper-planes at
arbitrary angles)
this makes the model more powerful
for prediction
but more difficult to comprehend.
Slide 63
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Non-linear Regression and
Classification Methods
family of techniques for prediction
which fit linear and non-linear
combinations of basis functions
(sigmoids, splines, polynomials) to
combinations of the input variables
Eg: feedforward neural networks,
adaptive spline methods, projection
pursuit regression, etc.
Slide 64
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Non-linear decision boundary which a
neural network might find for the loan
data
Slide 65
©J.K. Debenham, 2003
Applications of Artificial Intelligence
But while the classification boundaries
of previoous example may be more
accurate than the simple linear
threshold boundary, the linear
boundary has the advantage that the
model can be expressed as a simple
rule of the form “if income is greater
than threshold t then loan will have
good status” to some degree of
certainty.
Slide 66
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Example-based Methods
The representation is simple: use
representative examples from the
databases to approximate a model, ie.,
predictions on new examples are
derived from the properties of
“similar” examples in the model whose
prediction is known.
Techniques include nearest-neighbour
classification and regression
algorithms and case-based reasoning
systems.
Slide 67
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Nearest neighbour classifier:
the class at any new point in the 2dimensional space is the same as the
class of the closest point in the original
training data set
method requires a well-defined distance
metric for evaluating the distance
between data points
Related techniques: kernel density
estimation, and mixture modelling.
Slide 68
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Nearest neighbour classifier for the loan
data set
Slide 69
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Probabilistic Graphical Dependency
Models
Graphical models specify the
probabilistic dependencies using a
graph structure.
In its simplest form, the model specifies
which variables are directly dependent
on each other.
–recent work on methods whereby
both the structure and parameters of
graphical models can be learned
from databases directly
Slide 70
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Relational Learning Models
Relational learning (= inductive logic
programming) uses the more flexible
pattern language of first-order logic.
can easily find formulas such as X=Y.
Most research on relational learning is
logical in nature.
The extra representational power of
relational models comes at the price of
significant computational demands in
terms of search.
Slide 71
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Two important points:
• Automated Search: focus mainly on
automated methods for extracting
patterns and/or models from data.
• Beware the Hype: automated methods
in data mining are still in a fairly early
stage of development. There are no
established criteria for deciding which
methods to use in which
circumstances and many of the
approaches are based on crude
heuristic approximations.
Slide 72
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Mining
The Problem
Data Mining
Knowledge Discovery
Data Mining Methods
8
Slide 73
Knowledge Discovery Methods
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data analyst
– in response to a goal, queries a
database to extract data
– “analyses” the data using data
analysis and/or visualisation tools
– hence “insight” about the data
– use presentation tools to
disseminate this insight to a
broader audience
Slide 74
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Analyst’s task:
Slide 75
©J.K. Debenham, 2003
Applications of Artificial Intelligence
For example:
“What are the factors that lead to a
successful Father’s Day promotion?”
extract data such as sales volume of
products sold during a specific
Father’s Day period; characteristics of
these products; and characteristics of
the promotion itself, such as price
discount, amount of in-store support,
advertising support etc.
Slide 76
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Example contd
analyse
–define a measure that could be used
to quantify achievement of the goal
eg “percentage increase in sales.”
–segment the products based on this
percentage sales increase measure
–investigate the characteristics of
products with relatively higher sales
increases
–contrast their attributes with those
with relatively lower sales increases.
Slide 77
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Example contd:
Visualise the data in each segment
–products were evenly distributed
across the segments
–characteristics of products within a
segment varied significantly
hence probably not something intrinsic
about the type of product that made it a
successful seller
Slide 78
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Example contd:
conjecture concerning some extrinsic
properties of the products
analysis of the correlation of some
specific attributes with the percentage
sales increase could lead to
discovery that price discount, in-store
promotional support, and advertising
support were all higher in the segments
with higher percentage sales increases
Slide 79
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Example
Analyst was involved in three main
tasks:
–(1) model selection and evolution,
–(2) data analysis, and
–(3) output generation.
Slide 80
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Complex Aspects of the KDD Process
–Task Discovery
–Data Discovery
–Data Cleaning
–Background Knowledge
Slide 81
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Task Discovery
the client will state the problem or goal
as if it were clear and focused, but
further investigation is always
warranted
the requirements for the task must be
engineered by spending time with the
customer and various parts of the
customer’s organisation
Slide 82
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Discovery
spend time sifting through the raw data,
just getting a feel for what the data
looks like and what ground it covers
and what it does not cover
the main goal of KDD cannot be pinned
down completely without a detailed
understanding of the structure,
coverage, and quality of the data
Slide 83
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Data Cleaning
client’s data virtually always has problems
it may have been collected in an ad hoc
manner, have unfilled fields in records,
have mistakes in data entry
so KDD process cannot succeed without a
serious effort to clean the data
without the previous data discovery phase,
the analyst will have no idea if the data
quality can support the task at all
Slide 84
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Background Knowledge
background knowledge resident only in
the mind of the expert
some analysis techniques take advantage
of formally-represented knowledge in the
course of fitting data to a model
–ReMind uses a qualitative model in
generating a decision treeT
–time-series forecasting techniques
can take advantage of explicit
representation of seasonality
Slide 85
©J.K. Debenham, 2003
Applications of Artificial Intelligence
Complete KDD process
Slide 86
©J.K. Debenham, 2003