Download Fuzzy-rough data mining - Aberystwyth University Users Site

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Fuzzy-rough data mining
Richard Jensen
Advanced Reasoning Group
University of Aberystwyth
[email protected]
http://users.aber.ac.uk/rkj
Outline
• Knowledge discovery process
• Fuzzy-rough methods
– Feature selection and extensions
– Instance selection
– Classification/prediction
– Semi-supervised learning
Knowledge discovery
• The process
• The problem of too much data
– Requires storage
– Intractable for data mining algorithms
– Noisy or irrelevant data is misleading/confounding
Feature Selection
Feature selection
• Why dimensionality reduction/feature selection?
High dimensional
data
Dimensionality
Reduction
Intractable
Low dimensional
data
Processing System
• Growth of information - need to manage this effectively
• Curse of dimensionality - a problem for machine learning and data
mining
• Data visualisation - graphing data
Why do it?
• Case 1: We’re interested in features
– We want to know which are relevant
– If we fit a model, it should be interpretable
• Case 2: We’re interested in prediction
– Features are not interesting in themselves
– We just want to build a good classifier (or other kind of
predictor)
Feature selection process
• Feature selection (FS) preserves data semantics by
selecting rather than transforming
Feature set
Generation
Subset
Evaluation
Subset
suitability
Continue
Stopping
Criterion
Stop
Validation
• Subset generation: forwards, backwards, random…
• Evaluation function: determines ‘goodness’ of subsets
• Stopping criterion: decide when to stop subset search
Fuzzy-rough feature selection
Fuzzy-rough set theory
• Problems:
– Rough set methods (usually) require data
discretization beforehand
– Extensions, e.g. tolerance rough sets, require
thresholds
– Also no flexibility in approximations
• E.g. objects either belong fully to the lower (or upper)
approximation, or not at all
Fuzzy-rough sets
Rough set
t-norm
Fuzzy-rough set
implicator
Fuzzy-rough feature selection
• Based on fuzzy similarity
(e.g.)
| a ( x)  a ( y ) |
Ra ( x, y )  1 
| a max  a min |
RP ( x, y )  T {Ra ( x, y )}
aP
• Lower/upper approximations
FRFS: evaluation function
• Fuzzy positive region #1
• Fuzzy positive region #2 (weak)
• Dependency function
FRFS: finding reducts
• Fuzzy-rough QuickReduct
– Evaluation: use the dependency function (or
other fuzzy-rough measure)
– Generation: greedy hill-climbing
– Stopping criterion: when maximal evaluation
function is reached (or to degree α)
FRFS
• Other search methods
– GAs, PSO, EDAs, Harmony Search, etc
– Backward elimination, plus-L minus-R, floating
search, SAT, etc
• Other subset evaluations
– Fuzzy boundary region
– Fuzzy entropy
– Fuzzy discernibility function
Ant-based FS
Boundary region
Upper
Approximation
Set X
Lower
Approximation
Equivalence
class [x]B
FRFS: boundary region
• Fuzzy lower and upper approximation define
fuzzy boundary region
• For each concept, minimise the boundary
region
– (also applicable to crisp RSFS)
• Results seem to show this is a more informed
heuristic (but more computationally complex)
Finding smallest reducts
• Usually too expensive to search exhaustively for
reducts with minimal cardinality
• Reducts found via discernibility matrices through,
e.g.:
– Converting from CNF to DNF (expensive)
– Hill-climbing search using clauses (non-optimal)
– Other search methods - GAs etc (non-optimal)
• SAT approach
– Solve directly in SAT formulation
– DPLL approach ensures optimal reducts
Fuzzy discernibility matrices
• Extension of crisp approach
– Previously, attributes had {0,1} membership to clauses
– Now have membership in [0,1]
• Fuzzy DMs can be used to find fuzzy-rough reducts
Formulation
• Fuzzy satisfiability
• In crisp SAT, a clause is fully satisfied if at least one
variable in the clause has been set to true
• For the fuzzy case, clauses may be satisfied to a
certain degree depending on which variables have
been assigned the value true
Example
DPLL algorithm
Experimentation: results
FRFS: issues
• Problem – noise tolerance!
Vaguely quantified rough sets
Pawlak
rough set
VQRS
y belongs to the lower approximation of A iff
all elements of Ry belong to A
y belongs to the upper approximation of A iff
at least one element of Ry belongs to A
y belongs to the lower approximation of A iff
most elements of Ry belong to A
y belongs to the upper approximation of A iff
at least some elements of Ry belong to A
VQRS-based feature selection
• Use the quantified lower approximation,
positive region and dependency degree
– Evaluation: the quantified dependency (can be
crisp or fuzzy)
– Generation: greedy hill-climbing
– Stopping criterion: when the quantified positive
region is maximal (or to degree α)
• Should be more noise-tolerant, but is nonmonotonic
Progress
Qualitative data
Rough set
theory
Quantitative data
Fuzzy rough
set theory
...
Noisy data
VQRS
Fuzzy VPRS
Monotonic
OWA-FRFS
More issues...
• Problem #1: how to choose fuzzy similarity?
• Problem #2: how to handle missing values?
Interval-valued FRFS
• Answer #1: Model uncertainty in fuzzy
similarity by interval-valued similarity
IV fuzzy rough set
IV fuzzy similarity
Interval-valued FRFS
• When comparing two object values for a given
attribute – what to do if at least one is
missing?
• Answer #2: Model missing values via the unit
interval
Other measures
• Boundary region
• Discernibility function
Initial experimentation
Original Dataset
Type-1 FRFS
Cross-validation
folds
Data corruption
IV-FRFS methods
Reduced folds
Reduced folds
JRip
JRip
Initial experimentation
Initial results: lower approx
Instance Selection
Instance selection: basic ideas
Not needed
Remove objects to keep the underlying
approximations unchanged
Instance selection: basic ideas
Noisy objects
Remove objects whose positive region membership is < 1
FRIS-I
FRIS-II
FRIS-III
Fuzzy rough instance selection
• Time complexity is a problem for FRIS-II and
FRIS-III
• Less complex: Fuzzy rough prototype selection
– More on this later...
Fuzzy-rough classification and
prediction
FRNN/VQNN
FRNN/VQNN
Further developments
• FRNN and VQNN have limitations (for
classification problems)
– FRNN only uses one neighbour
– VQNN equivalent to FNN if the same similarity
relation is used
• POSNN uses the positive region to also consider
the quality of neighbours
– E.g. instances in overlapping class regions are less
interesting
– More on this later...
Discovering rules via RST
• Equivalence classes
– Form the antecedent part of a rule
– The lower approximation tells us if this is
predictive of a given concept (certain rules)
• Typically done in one of two ways:
– Overlaying reducts
– Building rules by considering individual
equivalence classes (e.g. LEM2)
QuickRules framework
• The fuzzy tolerance classes used during this
process can be used to create fuzzy rules
Feature set
Generation
Subset
Evaluation and
Rule Induction
Subset
suitability
Continue
Stopping
Criterion
Stop
Validation
• When a reduct is found the resulting rules cover
all instances
Harmony search approach
• R. Diao and Q. Shen. A harmony search based
approach to hybrid fuzzy-rough rule induction,
Proceedings of the 21st International Conference on
Fuzzy Systems, 2012.
Harmony search approach
Musicians
a
b
c
score
Harmony
2
3
1
3
2
3
2
4
4
4
2
9
3
4
5
21
Notes
Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3
Fitness
Harmony
Memory
Key notion mapping
Harmony
Search
Numerical
Optimisation
Hybrid Rule
Induction
Musician
Variable
Fuzzy rule rx
Note
Value
Feature subset
Harmony
Solution
Rule set
Fitness
Evaluation
Combined evaluation
Comparison vs QuickRules
HarmonyRules
56.33±10.00
QuickRules
63.1±11.89
Rule cardinality distribution for dataset web of 2556 features
Fuzzy-rough semi-supervised
learning
Semi-supervised learning (SSL)
• Lies somewhere between supervised and
unsupervised learning
• Why use it?
– Data is expensive to label/classify
– Labels can also be difficult to obtain
– Large amounts of unlabelled data available
• When is SSL useful?
– Small number of labelled objects but large number of
unlabelled objects
Semi-supervised learning
• A number of methods for SSL – self-learning,
generative models etc.
–
–
–
–
Labelled data objects – usually small in number
Unlabelled data objects – usually large in number
A set of features describe the objects
Class label tells us only which labelled objects belong to
• SSL therefore attempts to learn labels (or structure)
for data which has no labels
– Labelled data provides ‘clues’ for the unlabelled data
Co-training
Labelled Dataset
subset 1
subset 2
Unlabelled Data
Learner 1
Learner 2
Predictions
Predictions
Self-learning
Labelled Dataset
Labelled data objects
Learner
Predictions
Unlabelled Data
Fuzzy-rough self learning (FRSL)
• Basic idea is to propagate labels using the upper and
lower approximations
– Label only those objects which belong to the lower
approximation of a class to a high degree
– Can use upper approximation to decide on ties
• Attempts to minimise mis-labelling and subsequent
reinforcement
• Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set
based Semi-Supervised Learning. Proceedings of the 20th
International Conference on Fuzzy Systems (FUZZIEEE’11), pp. 2465-2471, 2011.
FRSL
Labelled data objects
Labelled dataset
Yes
Lower
approximation
membership = 1?
Fuzzy-rough learner
Predictions
Unlabelled Data
No
Experimentation (Problem 1)
SS-FCM
FNN
FRSL
Experimentation (Problem 2)
cluster 1
cluster 2
labelled 1
labelled 2
SS-FCM
FNN
FRSL
Conclusion
• Looked at fuzzy-rough methods for data mining
–
–
–
–
–
Feature selection, finding optimal reducts
Handling missing values and other problems
Classification/prediction
Instance selection
Semi-supervised learning
• Future work
– Imputation, better rule induction and instance
selection methods, more semi-supervised methods,
optimizations, instance/feature weighting
FR methods in Weka
• Weka implementations of all fuzzy-rough methods
can be downloaded from:
http://users.aber.ac.uk/rkj/book/wekafull.jar
• KEEL version available soon (hopefully!)