Download Fuzzy metaqueries for guiding the Discovery Process in KDD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Fuzzy metaqueries for guiding the Discovery Process in KDD
JesGs Cerquides
Ramon L6pez tie Mhtaras
Artificial Intelligence Research Institute, IIIA
Spanish Council for Scientific Research, CSIC
08193, Bellaterra, Barcelona, Spain
[email protected]
Artificial Intelligence Research Institute, IIIA
Spanish Council for Scientific Research, CSIC
08193, Bellaterra, Barcelona, Spain
[email protected]
Abstract
This paper introduces the concept of fuzzy metaqueries
and describes a framework for knowledge discovery that
has fuzzy metaqueries as its base. Fuzzy metaqueries are
second order like fizzy rules, very useful for the integration
of inductive learning, deductive verification, human intuition and uncertainty handling.
2 Metaqueries
Metaqueries have been proposed in [1],[8] as a method
for integrating induction, deduction and human guidance.
They are a second order expression that describes the
type of pattern to be discovered. Suppose P,Q and R are
predicate variables, and X,Y,Z variables for objects, then
the metaquery
1 Introduction
Knowledge Discovery in Databases (KDD) is the
process of extracting and refining useful knowledge from
large databases. Integration between inductive learning,
deductive vedifcation and human intuition has become
a key subject for this field, because none of the above
can, by now, make the work alone. Inductive learning
focuses on data and tries to generate hypotheses from it.
Deductive verification evaluates the evidential support for
some previously given hypotheses. Human intuition is
necessary for guiding the discovery so that it gathers the
information we want, and in an acceptable time.
In most realistic settings, the information on which
we have to work is imprecise, incomplete or not reliable.
Fuzzy logic, as a tool for approximate reasoning, has shown
very useful in working with this kind of information.
In this paper we suggest an approach to the integration
of the three essential discovery processes mentioned above
with fuzzy logic, so that the discovery system can get
improved by taking into account uncertainty and imprecision when discovering. We first review the concept of
metaquery. Then we make a brief analysis of the kind of
rules we are looking for. As a conclusion of this two points,
we obtain the concept of fuzzy metaquery. Finally we show
a framework for the use of fuzzy metaqueries as a basis for
a Knowledge Discovery system.
0-7803-3796-4/97/$10.00019971EEE
tells the discovery system that we are trying to find
transitivity relations as
where p, q, and r are specific predicates. The + does
not mean implication, it slates for plausible deduction, and
may be false for a subset of the cases.
A metaquery can be decomposed in two fundamental
parts: his left-hand side (everything before the + sign) and
his right-hand side (everything after it). The left-hand side
specifies the part of the database on which we want to focus
the discovery. The right-hand side performs an action (usually inductive) with the data gathered by the left-hand side.
Some of the actions that can be done in the right-hand side
on the data that accomplish the left-hand side are:
0
0
0
Evaluate the strength of a predicate (the percentage of
the data tuples that fulfills the left-hand side that also
fultills the right-hand side).
Generate a set of class descriptions for correctly classifying the data. We have to select the variable that will
act as class in the classification. The rest of variables
will appear in the class description.
Generate a set of cluster descriptions for the data.
1555
Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply.
FUZZ-IEEE’97
e
Plot some characteristics of the data.
e
Choose some of the variables as independent variables
and others as dependent variables and return the approximated fuction.
fuzzy and crisp knowledge:
n e youngs that work in construction have a low salary
This can be translated for computation into:
Metaqueries, as declarative expressions,can serve as interface between human discoverers and the discovery system.
The discoverer can then focus the discovery process on
those areas of the database that feels are more important to
the discovery task at hand.
If WorksfX,Construction) and Age(X) is young then
Salary(X) is low
Our system must also find rules that, like the previous
one, mix crisp restrictions with fuzzy ones. The translation
step from natural language to a computationally analizable
rule is easy to make for a non-expert after a few examples
have been seen, and the rules in this language are easy to
read.
3 Fuzzy Rules
In this section we will try to define a little more
accurately the search space in which we will perform
the discovery. Suppose we are working with a chemical
company database, and that hidden in our database is the
following information:
4 Fuzzy Metaqueries
Metaqueries have its root at generalizing the idea of
crisp rule, using variables for both predicates and objects.
In the previous point we have introduced fuzzy propositions
in the rules. Now we will generalize them to get a fuzzy
metaquery.
r f the compound has an elevate proponion of a high
price element then the compound has a high price.
It would be hard to find this rule for a laowledge discovery method that is not designed to work with imprecision
and uncertainty. This kind of statementsare a very common
expression of knowledge, because sometimes it is neither
correct nor desirable to be more accurate. Also, they are
more easily understandable to humans, and have more
sense that a statement like:
We have seen that fuzzy rules can have both fuzzy and
crisp conjuncts. For crisp conjuncts, the generalization
is the one that comes from second order logic. A crisp
conjunct is generalized for metaquerying as:
If the compound has a proportion > 0.85 of an element
of price > 95$ then the compound has a price > 300$ is
Where P is a predicate variable and each Xi is an object
variable. Applying a similar idea to the fuzzy conjuncts,
we find that a fuzzy conjunct can be generalized as:
true for the 95% of the cases.
We will try to develop a theoretical framework where
the search for this kind of knowledge is easy, by fuzzifying
the idea of metaquery.
F(X1,...,XN)is C
Where F is a function variable, each Xi is an object
variable and C is a concept variable.
The first rule can be expressed more formally as:
A fuzzy metaquery has the following structure:
IfProponion(X,Y) is high1 and Price(Y) is high2 then
Price(X)is hig h3
Cl and ... and CN =+ Action(Parameters)
Where highl, highz and highs are different concepts of
high (it’s not the same a high proportion that a high price).
and unqualified fuzzy proposition. The fact that the system
Where C, is either a crisp or a fuzzy conjunct, Action is
the inductive action to be performed on the data gathered
by the left-hand side and Parameters are the parameters
(different for each inductive action) that it requires.
is able to work with imprecision does not mean that
we must treat all the information at hand as if it were
imprecise, because some of it requires an exact treatment.
The following rule is an example of combination between
Metaqueries have been implemented in Knowledge
Miner. For processing a metaquery the system acts as follows:
Every conjunct in the previous rule is an unconditional
1556
Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply.
FUZZ-I EEE’97
Instantiate each metaquery predicate variable to a concrete predicate with the specified arity. This also instantiates the type of each object variable to a determinate attribute or domain.
Collect the data that accomplishesthe left-hand side of
the metaquery, by performing a query to a deductive
database system. This step returns a table where each
register is an instantiation of the object variables that
fulfills the premises.
Perform the inductive action that appears in the righthand side with the table resulting from the previous
point.
0
The fact that the data gathered by the left-hand side
in a fuzzy metaquery is a fuzzy relation, increases the
information that the actions that can be performed in the
right-hand side receive from it. Hence, it increases the
quality and range of the actions that can be performed with
this data.
A subset of the inductive actions that can be done with
the fuzzy relation gathered by the left-hand side are:
0
This is done until no more instantiationsare found in the
iirst step.
We can apply the same approach for fuzzy metaqueries,
introducing some modifications. The first thing we must
notice is that fuzzy metaqueries can be reduced to what
we call execution form . As we try to match the query
against a deductive database, it’s interesting to transform
the functions that appear into the fuzzy conjuncts into predicates. An easily automatizable way to do it is transform
the standard fuzzy conjunct previously described to:
0
0
PF(X1,...,X N , Y )and Y is C
Evaluate the strength of a predicate or fuzzy proposition. We can evaluate the strength of a fuzzy rule by
calculating:
Generate a set of class descriptions for classifying, or a
set of cluster descriptions from the data, taking into account the membership degree of each tuple when constructing the descriptions.
Plot characteristics of the data. Some new interesting
plots can now be done, like a-cuts, 3-D plots of the
membership degree in front of two different factors,
...
Once every fuzzy conjunct is expressed in this form, we
can reorder the conjuncts, separating the crisp predicates
from the fuzzy restrictions. This bring us to the execution
form, that follows the structure:
Cl and ... and CN and FI and ... and FM
tion(Parameters)
Perform the induction action over the fuzzy table calculated in the last step.
E
+-Ac-
A framework for KDD centered on the
fuzzy metaquery concept
In this section we analyze a possible architecture for a
fuzzy metaquery based framework for KDD. In Figure 1
we can see a fuctional decomposition of the system and the
relationship between his parts.
Where each Ciis a crisp conjunct and each Fi is a fuzzy
restriction. We will call the conjunction C1 and ... and C,V
the crisp projection of the fuzzy metaquery and F1 ... FN
the fizzy projection. For processing a fuzzy metaquery the
system acts as follows:
The functionality of each part is the following:
The intelligent database interface allows its users
(the fuzzy concept learner and the metaquery execution module) to perform querys independently of the
DBMS and with the power of deductive database systems.
Transform the fuzzy metaquery to executionform
Process the crisp projection, returning the table.
The fuzzy concept learner inplements some of the
known techniques for this task, as Lagrange interpolation, least-square curve fitting or neural network construction. Further information on this topic can be
found in Section 10.7 of [2].
Instantiate the fuzzy projection, considering the restrictions that the previous point have imposed over the
possible concepts.
Evaluate the instantiated fuzzy projection for each tuple in the table. This step returns a fuzzy subset of the
previous table.
The fuzzy concept editor allows the user to view concepts learnt by the fuzzy concept learner, modify them,
1557
Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply.
FUZZ-IEEE‘97
>
RULE
<
EDITOR
<
Figure 1. Architecture of the KDD fuzzy metaquery based framework
fuzzy metaqueries are designed for reusable and intuitive
knowledge extraction. We are implementing a prototype of
the framework, based in the Knowledge Miner system by
W.M. Shen et al.. We are sure that the inductive actions
that can be performed with a fuzzy relation can be a lot
richer than those that can be done with a crisp one. There
is a lot of work to be done in the area. Also the inclusion
of other characteristics of fuzzy logic as fuzzy quantifiers,
hedges and qualified propositions in fuzzy metaqueries has
not been studied, and its introduction can surely improve
the quality of knowledge discovered by them. Strategies for
metaquery suggestion are studied in [ 5 ] . A similar study
must be realized for fuzzy metaqueries.
delete those that he feels are not significant and define
new ones that he thinks are important.
e
0
e
o
The metaquery execution module, realizes most of the
work. It executes the loop that appeared previously,
instantiating and executing each metaquery. It is the
core of the system.
The background knowledge base includes knowledge
from the domain. This knowledge can be expressed in
form of concepts, rules or cases of a case-based system, and is a mix of knowledge introducedby the user
and discovered by the system, It allows the easy reuse
of knowledge discovered.
7 Acknowledgements
The metaquery suggester heuristically suggests the
user the metaquery that the system feels must be executed, helping him in his discovery guidance task.
Jesus Cerquides research is supported by a doctoral
scholarshipof the CIRIT (Generalitatde Catalunya).
The rule editor allows the user to review rules, decid-
ing which of them must be included in the background
knowledge, and which of them must be refused, or kept
as suspicious.
References
[l] B. Kero, L. Russell, S . Tsur, and W.M. Shen.
An
Overview of Database Mining Techniques. In DO0095
Workshop on the Integration of Knowledge Discovery
with Deductive and Object Oriented Databases, 1995.
6 Conclusions and future work
We have introduced the concept of fuzzy metaqueries,
and have described a framework for knowledge discovery
that has fuzzy metaqueries as its core. We have shown that
[2] G.J. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic.
Theory and Applications. F’rentice Hall,1995.
1558
Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply.
FUZZ-IEEE'97
[31 B. Leng and W.M. Shen. A Metapattem-Based Automated Discovery Loop. In DCOD95 IVorkslzop oil
the Integration ofKnowledge Discovely Ii.irh Deducriie
and Object Oriented Databases. 1995.
[4] C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro.
Systems for Knowledge Discovery in Databases. IEEE
Transactions on Knowledge and Data Engineering,
5(6), 1993.
[5] W.M. Shen and B. Leng. A Metapattem-Based Automated Discovery Loop for Integrated Data Mining.
IEEE Transactions on Knowledge and Data Engineering,to appear, 1996.
[6] W.M. Shen and B. Leng. Metapattern Generation for
Integrated Data Mining. In The 2nd International Conference on KDD, 1996.
171 W.M. Shen, B.Leng, and A. Chatterjee. Applying the
Metapattem Mechanism to Time Sequence Analysis.
Technical report, USC-ISI-95-117,1995.
[SI W.M. Shen, K. Ong, B. Mitbander, and C. Zaniolo.
Metaqueries for Data Mining. In Smyth Fayyad,
Piatetsky-Shapiro and Uthurusamy, editors, Advances
in Knowledge Discovery and Data Mining. MIT Press,
1996.
[9] L.A. Zadeh. The concept of a linguistic variable and its
application to approximate reaming. Infomlation Sciences, 8 and 9, 1976.
1559
Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply.