Download Soft Computing, Machine Intelligence and Data Mining

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Soft Computing, Machine
Intelligence and Data Mining
Sankar K. Pal
Machine Intelligence Unit
Indian Statistical Institute, Calcutta
http://www.isical.ac.in/~sankar
MIU Activities
(Formed in March 1993)

Pattern Recognition and Image Processing


Data Mining
Data Condensation, Feature Selection
 Support Vector Machine
 Case Generation
Soft Computing
 Fuzzy Logic, Neural Networks, Genetic Algorithms, Rough Sets
 Hybridization
 Case Based Reasoning
Fractals/Wavelets
 Image Compression
 Digital Watermarking
 Wavelet + ANN
Bioinformatics




Color Image Processing

Externally Funded Projects





INTEL
CSIR
Silicogene
Center for Excellence in Soft Computing Research
Foreign Collaborations
(Japan, France, Poland, Honk Kong, Australia)

Editorial Activities



Journals, Special Issues
Books
Achievements/Recognitions
Faculty: 10
Research Scholar/Associate: 8
Contents
What is Soft Computing ?
- Computational Theory of Perception
 Pattern Recognition and Machine
Intelligence
 Relevance of Soft Computing Tools
 Different Integrations


Emergence of Data Mining
Need
 KDD Process
 Relevance of Soft Computing Tools
 Rule Generation/Evaluation

• Modular Evolutionary Rough Fuzzy MLP
– Modular Network
– Rough Sets, Granules & Rule Generation
– Variable Mutation Operations
– Knowledge Flow
– Example and Merits

Rough-fuzzy Case Generation
Granular Computing
 Fuzzy Granulation
 Mapping Dependency Rules to Cases
 Case Retrieval
 Examples and Merits


Conclusions
SOFT COMPUTING
(L. A. Zadeh)
Aim :
• To exploit the tolerance for imprecision
uncertainty, approximate reasoning and partial
truth to achieve tractability, robustness, low
solution cost, and close resemblance with
human like decision making
• To find an approximate solution to an
imprecisely/precisely formulated problem.

Parking a Car
Generally, a car can be parked rather easily
because the final position of the car is not
specified exactly. It it were specified to
within, say, a fraction of a millimeter and a
few seconds of arc, it would take hours or
days of maneuvering and precise
measurements of distance and angular
position to solve the problem.
 High precision carries a high cost.
 The challenge is to exploit the tolerance for
imprecision
by
devising
methods
of
computation which lead to an acceptable
solution at low cost. This, in essence, is the
guiding principle of soft computing.
• Soft Computing is a collection of
methodologies (working synergistically, not
competitively) which, in one form or another,
reflect its guiding principle: Exploit the tolerance
for imprecision, uncertainty, approximate
reasoning and partial truth to achieve
Tractability, Robustness, and close resemblance
with human like decision making.
Foundation for the conception and design of high
MIQ (Machine IQ) systems.
Provides Flexible Information Processing
Capability for representation and evaluation of
various real life ambiguous and uncertain
situations.

Real World Computing
•• It may be argued that it is soft computing rather
than hard computing that should be viewed as
the foundation for Artificial Intelligence.
• At this junction, the principal constituents of
soft computing are Fuzzy Logic FL ,
Neurocomputing NC , Genetic Algorithms GA
and Rough Sets RS .
• Within Soft Computing FL, NC, GA, RS are
Complementary rather than Competitive

Role of
FL : the algorithms for dealing with imprecision
and uncertainty
NC : the machinery for learning and curve fitting
GA : the algorithms for search and optimization
RS
handling uncertainty arising from
the granularity in the domain of
discourse
Referring back to example:
“Parking a Car’’
Do we use any measurement and
computation while performing the tasks in
Soft Computing?
We use Computational Theory of
Perceptions (CTP)
AI Magazine, 22(1), 73-84, 2001
Computational Theory of Perceptions
(CTP)
Provides capability to compute and reason with
perception based information
Example: Car parking, driving in city,
cooking meal, summarizing story
 Humans have remarkable capability to
perform a wide variety of physical and
mental tasks without any measurement
and computations

They use perceptions of time, direction, speed,
shape, possibility, likelihood, truth, and other
attributes of physical and mental objects

Reflecting the finite ability of the sensory
organs and (finally the brain) to resolve details,
Perceptions are inherently imprecise
Perceptions are fuzzy (F) – granular
(both fuzzy and granular)


Boundaries of perceived classes are unsharp
Values of attributes are granulated.
(a clump of indistinguishable points/objects)
Example:
Granules in age: very young, young, not so old,…
Granules in direction: slightly left, sharp right,…
Hybrid Systems
 Neuro-fuzzy
 Genetic neural
 Fuzzy genetic
 Probabilistic reasoning  Fuzzy neuro
 Approximate reasoning
genetic
 Case based reasoning
Knowledge-based
Systems
 Fuzzy logic
Machine
Intelligence
Data Driven
Systems
 Neural network
system
 Evolutionary
computing
Non-linear
Dynamics
Rough sets
 Pattern recognition
and learning
 Chaos theory
 Rescaled range
analysis (wavelet)
 Fractal analysis
Machine Intelligence: A core concept for grouping various advanced
technologies with Pattern Recognition and Learning
Relevance of FL, ANN, GAs Individually
to PR Problems is Established
In late eighties scientists thought –
Why NOT Integrations ?
Fuzzy Logic + ANN
ANN + GA
Fuzzy Logic + ANN + GA
Fuzzy Logic + ANN + GA + Rough Set
Neuro-fuzzy hybridization is the most visible
integration realized so far.
Why Fusion
Fuzzy Set theoretic models try to mimic human reasoning
and the capability of handling uncertainty – (SW)
Neural Network models attempt to emulate architecture
and information representation scheme of human brain – (HW)
NEURO-FUZZY Computing
(for More Intelligent System)
FUZZY SYSTEM
ANN used for learning and Adaptation
NFS
ANN
Fuzzy Sets used to Augment its Application
FNN
domain
MERITS

GENERIC

APPLICATION SPECIFIC
Rough Fuzzy Hybridization: A New Trend in Decision Making,
S. K. Pal and A. Skowron (eds), Springer-Verlag, Singapore, 1999
IEEE TNN, .9, 1203-1216, 1998
Incorporate Domain Knowledge using Rough Sets
L M
Fj
H
FjL
1
FjM
2
FjH
3
1
2
GA Tuning
XX|000|XX
0 0 | X X X | 00
Integrating of ANN, FL, Gas and Rough Sets
3
Before we describe
• Modular Evolutionary Rough-fuzzy MLP
• Rough-fuzzy Case Generation System
We explain Data Mining and the significance
of Pattern Recognition, Image Processing and
Machine Intelligence.
Why Data Mining ?
Digital revolution has made digitized information
easy to capture and fairly inexpensive to store.
 With the development of computer hardware and
software and the rapid computerization of
business, huge amount of data have been collected
and stored in centralized or distributed databases.
• Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image), huge (both in
dimension and size) and scattered.
• The rate at which such data is stored is growing at
a phenomenal rate.


As a result, traditional ad hoc mixtures of
statistical techniques and data management
tools are no longer adequate for analyzing
this vast collection of data.
Pattern Recognition and Machine Learning
principles applied to a very large (both in size
and dimension) heterogeneous database
 Data Mining
Data Mining + Knowledge Interpretation
 Knowledge Discovery
Process of identifying valid, novel, potentially
useful, and ultimately understandable patterns
in data
Pattern Recognition, World Scientific, 2001
Data Mining (DM)
•Data
Cleaning
Knowledge
MatheInterpretation
matical
Preprocessed
Model of
•Knowledge
•Dimensionality Data
•Classification
Extraction
Reduction
•Clustering Data
(Patterns) •Knowledge
•Rule
Evaluation
Generation
•Data
Wrapping/
Description
•Data
Condensation
Huge
Raw
Data
Machine
Learning
Knowledge Discovery in Database (KDD)
Useful
Knowledge
Why Growth of Interest ?

Falling cost of large storage devices and increasing ease
of collecting data over networks.

Availability of Robust/Efficient machine learning
algorithms to process data.

Falling cost of computational power  enabling use of
computationally intensive methods for data analysis.
Example : Medical Data





Numeric and textual information may be
interspersed
Different symbols can be used with same meaning
Redundancy often exists
Erroneous/misspelled medical terms are common
Data is often sparsely distributed

Robust preprocessing system is required to
extract any kind of knowledge from even
medium-sized medical data sets

The data must not only be cleaned of errors
and redundancy, but organized in a fashion
that makes sense for the problem
So, We NEED
Efficient
 Robust
 Flexible

Machine Learning Algorithms

NEED for Soft Computing Paradigm
Without “Soft Computing” Machine Intelligence
Research Remains Incomplete.
Modular Neural Networks
Task: Split a learning task into several subtasks,
train a Subnetwork for each subtask, integrate the
subnetworks to generate the final solution.
Strategy: ‘Divide and Conquer’
The approach involves
• Effective decomposition of the problems s.t. the
Subproblems could be solved with compact
networks.
• Effective combination and training of the
subnetworks s.t. there is Gain in terms of both
total training time, network size and accuracy of
solution.
Advantages
• Accelerated training
• The final solution network has more structured
components
• Representation of individual clusters (irrespective
of size/importance) is better preserved in the final
solution network.
• The catastrophic interference problem of neural
network learning (in case of overlapped regions)
is reduced.
3-class problem
Class 1 Subnetwork
3 (2-class problem)
Class 2 Subnetwork
Class 3 Subnetwork
Integrate Subnetwork Modules
Links to be grown
Links with values preserved
Final Training Phase
I
Final Network
Inter-module
links grown
Modular Rough Fuzzy MLP?
A modular network designed using four different Soft
Computing tools.
Basic Network Model: Fuzzy MLP
Rough Set theory is used to generate Crude decision rules
Representing each of the classes from the Discernibility
Matrix.
(There may be multiple rules for each class
multiple subnetworks for each class)
The Knowledge based subnetworks are concatenated to
form a population of initial solution networks.
The final solution network is evolved using a GA with
variable mutation operator. The bits corresponding to the
Intra-module links (already evolved) have low mutation
probability, while Inter-module links have high mutation
probability.
Rough Sets
WB  U
Upper
Approximation BX
Set X
Lower
Approximation BX
.x
[x]B (Granules)
[x]B = set of all points belonging to the same granule as of the point x
in feature space WB.
[x]B is the set of all points which are indiscernible with point x
in terms of feature subset B.
Approximations of the set X  U w.r.t feature subset B
B-lower: BX = {x  U : [ x]B  X } Granules definitely
belonging to X
B-upper: BX = {x U : [ x]B  X  }
Granules definitely
and possibly belonging
to X
If BX = BX, X is B-exact or B-definable
Otherwise it is Roughly definable
Rough Sets
Uncertainty
Handling
(Using lower & upper approximations)
Granular
Computing
(Using information granules)
Granular Computing: Computation is performed using
information granules and not the data points (objects)
Information compression
Computational gain
Information Granules and Rough Set Theoretic Rules
low
medium
high
F2
low
medium
high F
1
Rule  M1  M 2
• Rule provides crude description of the class using
granule
Rough Set Rule Generation
Decision Table:
Object F1 F2 F3 F4 F5 Decision
x1
x2
x3
x4
1
0
1
0
0
0
1
1
1
0
1
0
0
0
1
1
1
1
1
0
Class 1
Class 1
Class 1
Class 2
x5
1
1
1
0
0
Class 2
Discernibility Matrix (c) for Class 1:
cij  {a : a( xi )  a( x j )},1  i, j  p}
Objects
x1
x2
x3
x1

F1 , F 3
F2 , F4
x2
x3

F1,F2,F3,F4

Discernibility function: f xk   j { (ckj ) : 1  j  p, j  k , ckj   }
Discernibility function considering the object x1 belonging to Class 1
= Discernibility of x1 w.r.t x2 (and) Discernibility of x1 w.r.t x3
= ( F1  F3 )  ( F2  F4 )
Similarly,
Discernibility function considering object x2  F1  F2  F3  F4
Dependency Rules (AND-OR form):
Class 1  ( F1  F2 )  ( F1  F4 )  ( F3  F2 )  ( F3  F4 )
Class 1  F1  F2  F3  F4
Knowledge Flow in Modular Rough Fuzzy MLP
IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003
Feature Space
Rough Set Rules
c1  (L1  M 2 )  (M1  H 2 ) (R1)
c2  M 2  H1 (R2)
c2  L2  L1(R3)
Network Mapping
F2
C1
(R1)
C1
C2(R2)
C2(R3)
R1 (Subnet 1)
R2 (Subnet 2)
R3 (Subnet 3)
F1
Partial Training with Ordinary GA
(SN1)
(SN2)
(SN3)
Feature Space
SN1
F2
SN2
Partially Refined Subnetworks
SN3
F1
Concatenation of Subnetworks
high mutation prob.
low mutation prob.
Evolution of the Population of
Concatenated networks with GA
having variable mutation operator
Feature Space
C1
F2
C2
Final Solution Network
F1
Speech Data: 3 Features, 6 Classes
90%
80%
70%
MLP
60%
FMLP
50%
MFMLP
40%
RFMLP
30%
MRFMLP
20%
10%
0%
Train (20%)
Test (80%)
Classification Accuracy
250
200
MLP
FMLP
MFMLP
RFMLP
MRFML
150
100
50
0
Network Size (No. of Links)
3.00
2.50
2.00
1.50
1.00
MLP
FMLP
MFMLP
RFMLP
MRFMLP
0.50
0.00
Training Time (hrs)
DEC Alpha Workstation @400MHz
1. MLP
2. Fuzzy MLP
3. Modular Fuzzy MLP
4. Rough Fuzzy MLP
5. Modular Rough Fuzzy MLP
Network Structure:
IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003
Modular Rough Fuzzy MLP
Structured (# of links few)
Fuzzy MLP
Unstructured (# of links more)
Histogram of weight values
Connectivity of the network obtained
using Modular Rough Fuzzy MLP
Rule Evaluation
Accuracy
 Fidelity (Number of times network and rule
base output agree)
 Confusion (should be restricted within
minimum no. of classes)
• Coverage (a rule base with smaller
uncovered region i.e., % test set for which
no rules are fired, is better)
• Rule base size (smaller the no. of rules,
more compact is the rule base)

• Certainty (confidence of rules)
IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003
84%
82%
80%
78%
76%
74%
72%
70%
68%
66%
PROPOSED
Subset
MofN
X2R
C4.5
Accuracy
User's
Accuracy
Kappa
Comparison of Rules obtained for Speech data
35%
16
30%
14
25%
12
Proposed
Subset
MofN
X2R
C4.5
10
8
6
4
2
Proposed
Subset
MofN
X2R
C4.5
20%
15%
10%
5%
0%
0
Number of Rules
Uncovered Region (Sample)
2
1.4
1.2
1.5
1
Proposed
Proposed
0.8
S ubset
0.6
MofN
0.4
X2R
Subset
1
MofN
X2R
C4.5
0.5
C4.5
0.2
0
0
(sec)
CPU Time
Confusion
Case Based Reasoning (CBR)
• Cases : some typical situations, already experienced
by the system.
conceptualized piece of knowledge representing an
experience that teaches a lesson for achieving the
goals of the system.

CBR involves

—
adapting old solutions to meet new demands
using old cases to explain new situations or to
justify new solutions
—
reasoning from precedents to interpret new
situations.
 learns and becomes more efficient as a
byproduct of its reasoning activity.
• Example : Medical diagnosis and Law
interpretation where the knowledge available is
incomplete and/or evidence is sparse.
Case Selection  Cases belong to the set
of examples encountered.
Case Generation  Constructed ‘Cases’
need not be any of the examples.
Rough Sets
Uncertainty
Handling
(Using lower & upper approximations)
Granular
Computing
(Using information granules)
IEEE Trans. Knowledge Data Engg., to appear
Granular Computing and Case Generation
Information Granules: A group of similar
objects clubbed together by an indiscernibility
relation.
Granular Computing: Computation is
performed using information granules and not
the data points (objects)
– Information compression
– Computational gain
Cases – Informative patterns (prototypes)
characterizing the problems.
• In rough set theoretic framework:
Cases  Information Granules
• In rough-fuzzy framework:
Cases  Fuzzy Information Granules
Characteristics and Merits
• Cases are cluster granules, not sample points
• Involves only reduced number of relevant
features with variable size
•• Less storage requirements
•• Fast retrieval
Suitable for mining data with large dimension
and size
How to Achieve?


Fuzzy sets help in linguistic representation of
patterns, providing a fuzzy granulation of the feature
space
Rough sets help in generating dependency rules to
model ‘informative/representative regions’ in the
granulated feature space.
Fuzzy membership functions corresponding to the
‘representative regions’ are stored as Cases.
Membership value
Fuzzy (F)-Granulation:
1
low
medium
high
0.5
cM
cL
lL
cH
lM
p-function
Feature j
Example
IEEE Trans. Knowledge Data Engg., to appear
F2
CASE 1
0.9
C1  L1  H 2
C 2  H1  L2
0.4
CASE 2
XX X
X X X
X X X
0.2
0.1
0.5
0.7
F1
Parameters of fuzzy linguistic sets low, medium, high
Feature 1 : cL  0.1, lL  0.5, cM  0.5, lM  0.7, cH  0.7, lH  0.4
Feature 2 : cL  0.2, lL  0.5, cM  0.4, lM  0.7, cH  0.9, lH  0.5
Dependency Rules and Cases Obtained:
Class1  L1  H 2
Class 2  H1  L2
Case 1:
Feature No: 1, fuzzset (L): c = 0.1, l = 0.5
Feature No: 2, fuzzset (H): c = 0.9, l = 0.5
Class=1
Case 2:
Feature No: 1, fuzzset (H): c = 0.7, l = 0.4
Feature No: 2, fuzzset (L): c = 0.2, l = 0.5
Class=2
Case Retrieval

Similarity (sim(x,c)) between a pattern x and
a case c is defined as:
1
sim ( x, c) 
n
n
 (
j
fuzzset
( x))
2
j 1
n: number of features present in case c
j
 fuzzset
(x) : the degree of belongingness of
pattern x to fuzzy linguistic set fuzzset for
feature j.

For classifying an unknown pattern, the
case closest to the pattern in terms of
sim(x,c) is retrieved and its class is assigned
to the pattern.
Evaluation in terms of:
1-NN classification accuracy using the
cases. Training set 10% for case
generation, and Test set 90%
b) Number of cases stored in the case base.
c) Average number of features required to
store a case (navg).
d) CPU time required for case generation (tgen).
e) Average CPU time required to retrieve a case
(tret). (on a Sun UltraSparc @350 MHz
Workstation)
a)
Iris Flowers: 4 features, 3 classes, 150 samples
4
3.5
3
2.5
2
Rough-fuzzy
1.5
1
IB4
IB3
Random
0.5
0
100%
98%
96%
94%
92%
90%
88%
86%
84%
82%
80%
Classification Accuracy
(1-NN)
avg. feature/case
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Rough-fuzzy
IB3
IB4
Random
0.01
0.008
Rough-fuzzy
IB3
IB4
Rough-fuzzy
0.006
IB3
0.004
IB4
Random
Random
0.002
0
tgen(sec)
Number of cases = 3 (for all methods)
tret(sec)
Forest Cover Types: 10 features, 7 classes, 5,86,012 samples
70%
10
60%
8
50%
6
4
Rough-fuzzy
40%
IB3
30%
IB4
Random
2
Rough-fuzzy
IB3
IB4
Random
20%
10%
0%
0
Classification Accuracy
(1-NN)
avg. feature/case
8000
60
7000
50
6000
5000
Rough-fuzzy
4000
IB3
3000
IB4
2000
Random
40
Rough-fuzzy
30
IB3
IB4
20
Random
10
1000
0
tgen(sec)
0
Number of cases = 545 (for all methods)
tret(sec)
Hand Written Numerals: 649 features, 10 classes, 2000 samples
80%
700
600
500
Rough-fuzzy
400
IB3
300
IB4
200
Random
100
0
70%
60%
50%
40%
30%
20%
10%
0%
Classification Accuracy
(1-NN)
avg. feature/case
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
Rough-fuzzy
IB3
IB4
Random
600
500
Rough-fuzzy
IB3
400
Rough-fuzzy
300
IB3
IB4
Random
IB4
200
Random
100
tgen(sec)
0
Number of cases = 50 (for all methods)
tret(sec)
For same number of cases
Accuracy: Proposed method much superior to
random selection and IB4, close IB3.
Average Number of Features Stored: Proposed
method stores much less than the original data
dimension.
Case Generation Time: Proposed method requires
much less compared to IB3 and IB4.
Case Retrieval Time: Several orders less for
proposed method compared to IB3 and random
selection. Also less than IB4.
Conclusions
• Relation between Soft Computing, Machine
Intelligence and Pattern Recognition is
explained.
• Emergence of Data Mining and Knowledge
Discovery from PR point of view is explained.
• Significance of Hybridization in Soft
Computing paradigm is illustrated.
• Modular concept enhances performance,
accelerates training and makes the network
structured with less no. of links.
• Rules generated are superior to other related
methods in terms of accuracy, coverage,
fidelity, confusion, size and certainty.
• Rough sets used for generating information granules.
• Fuzzy sets provide efficient granulation of feature
space (F -granulation).
• Reduced and variable feature subset representation
of cases is a unique feature of the scheme.
• Rough-fuzzy case generation method is suitable for
CBR systems involving datasets large both in
dimension and size.
• Unsupervised case generation, Rough-SOM
(Applied intelligence, to appear)
• Application to multi-spectral image segmentation
(IEEE Trans. Geoscience and Remote Sensing, 40(11), 2495-2501, 2002)
• Significance in Computational Theory of Perception
(CTP)
Thank You!!