Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fuzzy metaqueries for guiding the Discovery Process in KDD JesGs Cerquides Ramon L6pez tie Mhtaras Artificial Intelligence Research Institute, IIIA Spanish Council for Scientific Research, CSIC 08193, Bellaterra, Barcelona, Spain [email protected] Artificial Intelligence Research Institute, IIIA Spanish Council for Scientific Research, CSIC 08193, Bellaterra, Barcelona, Spain [email protected] Abstract This paper introduces the concept of fuzzy metaqueries and describes a framework for knowledge discovery that has fuzzy metaqueries as its base. Fuzzy metaqueries are second order like fizzy rules, very useful for the integration of inductive learning, deductive verification, human intuition and uncertainty handling. 2 Metaqueries Metaqueries have been proposed in [1],[8] as a method for integrating induction, deduction and human guidance. They are a second order expression that describes the type of pattern to be discovered. Suppose P,Q and R are predicate variables, and X,Y,Z variables for objects, then the metaquery 1 Introduction Knowledge Discovery in Databases (KDD) is the process of extracting and refining useful knowledge from large databases. Integration between inductive learning, deductive vedifcation and human intuition has become a key subject for this field, because none of the above can, by now, make the work alone. Inductive learning focuses on data and tries to generate hypotheses from it. Deductive verification evaluates the evidential support for some previously given hypotheses. Human intuition is necessary for guiding the discovery so that it gathers the information we want, and in an acceptable time. In most realistic settings, the information on which we have to work is imprecise, incomplete or not reliable. Fuzzy logic, as a tool for approximate reasoning, has shown very useful in working with this kind of information. In this paper we suggest an approach to the integration of the three essential discovery processes mentioned above with fuzzy logic, so that the discovery system can get improved by taking into account uncertainty and imprecision when discovering. We first review the concept of metaquery. Then we make a brief analysis of the kind of rules we are looking for. As a conclusion of this two points, we obtain the concept of fuzzy metaquery. Finally we show a framework for the use of fuzzy metaqueries as a basis for a Knowledge Discovery system. 0-7803-3796-4/97/$10.00019971EEE tells the discovery system that we are trying to find transitivity relations as where p, q, and r are specific predicates. The + does not mean implication, it slates for plausible deduction, and may be false for a subset of the cases. A metaquery can be decomposed in two fundamental parts: his left-hand side (everything before the + sign) and his right-hand side (everything after it). The left-hand side specifies the part of the database on which we want to focus the discovery. The right-hand side performs an action (usually inductive) with the data gathered by the left-hand side. Some of the actions that can be done in the right-hand side on the data that accomplish the left-hand side are: 0 0 0 Evaluate the strength of a predicate (the percentage of the data tuples that fulfills the left-hand side that also fultills the right-hand side). Generate a set of class descriptions for correctly classifying the data. We have to select the variable that will act as class in the classification. The rest of variables will appear in the class description. Generate a set of cluster descriptions for the data. 1555 Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply. FUZZ-IEEE’97 e Plot some characteristics of the data. e Choose some of the variables as independent variables and others as dependent variables and return the approximated fuction. fuzzy and crisp knowledge: n e youngs that work in construction have a low salary This can be translated for computation into: Metaqueries, as declarative expressions,can serve as interface between human discoverers and the discovery system. The discoverer can then focus the discovery process on those areas of the database that feels are more important to the discovery task at hand. If WorksfX,Construction) and Age(X) is young then Salary(X) is low Our system must also find rules that, like the previous one, mix crisp restrictions with fuzzy ones. The translation step from natural language to a computationally analizable rule is easy to make for a non-expert after a few examples have been seen, and the rules in this language are easy to read. 3 Fuzzy Rules In this section we will try to define a little more accurately the search space in which we will perform the discovery. Suppose we are working with a chemical company database, and that hidden in our database is the following information: 4 Fuzzy Metaqueries Metaqueries have its root at generalizing the idea of crisp rule, using variables for both predicates and objects. In the previous point we have introduced fuzzy propositions in the rules. Now we will generalize them to get a fuzzy metaquery. r f the compound has an elevate proponion of a high price element then the compound has a high price. It would be hard to find this rule for a laowledge discovery method that is not designed to work with imprecision and uncertainty. This kind of statementsare a very common expression of knowledge, because sometimes it is neither correct nor desirable to be more accurate. Also, they are more easily understandable to humans, and have more sense that a statement like: We have seen that fuzzy rules can have both fuzzy and crisp conjuncts. For crisp conjuncts, the generalization is the one that comes from second order logic. A crisp conjunct is generalized for metaquerying as: If the compound has a proportion > 0.85 of an element of price > 95$ then the compound has a price > 300$ is Where P is a predicate variable and each Xi is an object variable. Applying a similar idea to the fuzzy conjuncts, we find that a fuzzy conjunct can be generalized as: true for the 95% of the cases. We will try to develop a theoretical framework where the search for this kind of knowledge is easy, by fuzzifying the idea of metaquery. F(X1,...,XN)is C Where F is a function variable, each Xi is an object variable and C is a concept variable. The first rule can be expressed more formally as: A fuzzy metaquery has the following structure: IfProponion(X,Y) is high1 and Price(Y) is high2 then Price(X)is hig h3 Cl and ... and CN =+ Action(Parameters) Where highl, highz and highs are different concepts of high (it’s not the same a high proportion that a high price). and unqualified fuzzy proposition. The fact that the system Where C, is either a crisp or a fuzzy conjunct, Action is the inductive action to be performed on the data gathered by the left-hand side and Parameters are the parameters (different for each inductive action) that it requires. is able to work with imprecision does not mean that we must treat all the information at hand as if it were imprecise, because some of it requires an exact treatment. The following rule is an example of combination between Metaqueries have been implemented in Knowledge Miner. For processing a metaquery the system acts as follows: Every conjunct in the previous rule is an unconditional 1556 Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply. FUZZ-I EEE’97 Instantiate each metaquery predicate variable to a concrete predicate with the specified arity. This also instantiates the type of each object variable to a determinate attribute or domain. Collect the data that accomplishesthe left-hand side of the metaquery, by performing a query to a deductive database system. This step returns a table where each register is an instantiation of the object variables that fulfills the premises. Perform the inductive action that appears in the righthand side with the table resulting from the previous point. 0 The fact that the data gathered by the left-hand side in a fuzzy metaquery is a fuzzy relation, increases the information that the actions that can be performed in the right-hand side receive from it. Hence, it increases the quality and range of the actions that can be performed with this data. A subset of the inductive actions that can be done with the fuzzy relation gathered by the left-hand side are: 0 This is done until no more instantiationsare found in the iirst step. We can apply the same approach for fuzzy metaqueries, introducing some modifications. The first thing we must notice is that fuzzy metaqueries can be reduced to what we call execution form . As we try to match the query against a deductive database, it’s interesting to transform the functions that appear into the fuzzy conjuncts into predicates. An easily automatizable way to do it is transform the standard fuzzy conjunct previously described to: 0 0 PF(X1,...,X N , Y )and Y is C Evaluate the strength of a predicate or fuzzy proposition. We can evaluate the strength of a fuzzy rule by calculating: Generate a set of class descriptions for classifying, or a set of cluster descriptions from the data, taking into account the membership degree of each tuple when constructing the descriptions. Plot characteristics of the data. Some new interesting plots can now be done, like a-cuts, 3-D plots of the membership degree in front of two different factors, ... Once every fuzzy conjunct is expressed in this form, we can reorder the conjuncts, separating the crisp predicates from the fuzzy restrictions. This bring us to the execution form, that follows the structure: Cl and ... and CN and FI and ... and FM tion(Parameters) Perform the induction action over the fuzzy table calculated in the last step. E +-Ac- A framework for KDD centered on the fuzzy metaquery concept In this section we analyze a possible architecture for a fuzzy metaquery based framework for KDD. In Figure 1 we can see a fuctional decomposition of the system and the relationship between his parts. Where each Ciis a crisp conjunct and each Fi is a fuzzy restriction. We will call the conjunction C1 and ... and C,V the crisp projection of the fuzzy metaquery and F1 ... FN the fizzy projection. For processing a fuzzy metaquery the system acts as follows: The functionality of each part is the following: The intelligent database interface allows its users (the fuzzy concept learner and the metaquery execution module) to perform querys independently of the DBMS and with the power of deductive database systems. Transform the fuzzy metaquery to executionform Process the crisp projection, returning the table. The fuzzy concept learner inplements some of the known techniques for this task, as Lagrange interpolation, least-square curve fitting or neural network construction. Further information on this topic can be found in Section 10.7 of [2]. Instantiate the fuzzy projection, considering the restrictions that the previous point have imposed over the possible concepts. Evaluate the instantiated fuzzy projection for each tuple in the table. This step returns a fuzzy subset of the previous table. The fuzzy concept editor allows the user to view concepts learnt by the fuzzy concept learner, modify them, 1557 Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply. FUZZ-IEEE‘97 > RULE < EDITOR < Figure 1. Architecture of the KDD fuzzy metaquery based framework fuzzy metaqueries are designed for reusable and intuitive knowledge extraction. We are implementing a prototype of the framework, based in the Knowledge Miner system by W.M. Shen et al.. We are sure that the inductive actions that can be performed with a fuzzy relation can be a lot richer than those that can be done with a crisp one. There is a lot of work to be done in the area. Also the inclusion of other characteristics of fuzzy logic as fuzzy quantifiers, hedges and qualified propositions in fuzzy metaqueries has not been studied, and its introduction can surely improve the quality of knowledge discovered by them. Strategies for metaquery suggestion are studied in [ 5 ] . A similar study must be realized for fuzzy metaqueries. delete those that he feels are not significant and define new ones that he thinks are important. e 0 e o The metaquery execution module, realizes most of the work. It executes the loop that appeared previously, instantiating and executing each metaquery. It is the core of the system. The background knowledge base includes knowledge from the domain. This knowledge can be expressed in form of concepts, rules or cases of a case-based system, and is a mix of knowledge introducedby the user and discovered by the system, It allows the easy reuse of knowledge discovered. 7 Acknowledgements The metaquery suggester heuristically suggests the user the metaquery that the system feels must be executed, helping him in his discovery guidance task. Jesus Cerquides research is supported by a doctoral scholarshipof the CIRIT (Generalitatde Catalunya). The rule editor allows the user to review rules, decid- ing which of them must be included in the background knowledge, and which of them must be refused, or kept as suspicious. References [l] B. Kero, L. Russell, S . Tsur, and W.M. Shen. An Overview of Database Mining Techniques. In DO0095 Workshop on the Integration of Knowledge Discovery with Deductive and Object Oriented Databases, 1995. 6 Conclusions and future work We have introduced the concept of fuzzy metaqueries, and have described a framework for knowledge discovery that has fuzzy metaqueries as its core. We have shown that [2] G.J. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic. Theory and Applications. F’rentice Hall,1995. 1558 Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply. FUZZ-IEEE'97 [31 B. Leng and W.M. Shen. A Metapattem-Based Automated Discovery Loop. In DCOD95 IVorkslzop oil the Integration ofKnowledge Discovely Ii.irh Deducriie and Object Oriented Databases. 1995. [4] C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro. Systems for Knowledge Discovery in Databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993. [5] W.M. Shen and B. Leng. A Metapattem-Based Automated Discovery Loop for Integrated Data Mining. IEEE Transactions on Knowledge and Data Engineering,to appear, 1996. [6] W.M. Shen and B. Leng. Metapattern Generation for Integrated Data Mining. In The 2nd International Conference on KDD, 1996. 171 W.M. Shen, B.Leng, and A. Chatterjee. Applying the Metapattem Mechanism to Time Sequence Analysis. Technical report, USC-ISI-95-117,1995. [SI W.M. Shen, K. Ong, B. Mitbander, and C. Zaniolo. Metaqueries for Data Mining. In Smyth Fayyad, Piatetsky-Shapiro and Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. MIT Press, 1996. [9] L.A. Zadeh. The concept of a linguistic variable and its application to approximate reaming. Infomlation Sciences, 8 and 9, 1976. 1559 Authorized licensed use limited to: Universitat de Barcelona. Downloaded on October 7, 2009 at 14:54 from IEEE Xplore. Restrictions apply.