Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ana Estela Antunes da Silva Lidia Martins da Silva Centro Universitário Cândido Rondon Universidade Metodista de Piracicaba UNIMEP UNIRONDON Piracicaba, São Paulo Cuiabá, Mato Grosso SUMARY This presentation will discuss: Introduction; Data Mining Domain Knowledge ; The Classification Task Domain; Knowledge Base Of The Classification Task; Execution And Test Of The Knowledge Base; Conclusion. INTRODUCTION Expert systems are computer programs used for executing rules on a base of knowledge making possible to solve specific problems. Data mining consists of a set of tasks that, through the use of specific algorithms, are able to exploit a large data set, creating from them, knowledge in the form of assumptions and rules. DATA MINING DOMAIN KNOWLEDGE The set of activities of the Knowledge Discovery in Databases (KDD) process contains the phases: data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and, knowledge presentation. DATA MINING DOMAIN KNOWLEDGE This work focus on the data mining phase. The data mining process is the application of a set of techniques which explore data in order to discover new patterns and relations in data. DATA MINING DOMAIN KNOWLEDGE The main tasks utilized to perform data mining are: association, classification and clustering. Association is most applied to problems which can be modeled using transactions. Classification is the task which separates all tuples of a table into classes. Clustering separates data into groups without the help of a label. THE CLASSIFICATION TASK DOMAIN In order to construct a knowledge base to instruct the choice of the classification task in data mining, the domain problem was studied according to the main characteristics which could lead to the choice or the rejection of the classification task depending on the type of problem presented by the user. KNOWLEDGE BASE OF THE CLASSIFICATION TASK A knowledge base is a set of representations of actions and events in the world. Each representation is called a sentence. It is considered the main part of a knowledge based system and contains knowledge under one or more of the techniques mentioned above. In this work production rules are used to represent the knowledge base. EXECUTION AND TEST OF THE KNOWLEDGE BASE The knowledge base was created specifically for the classification task with the following levels of adequacy for the application of the classification task: low, medium, high, very high and not_possible. In order to get values to the attributes in the knowledge base specific questions are asked to users. EXECUTION AND TEST OF THE KNOWLEDGE BASE Figure 1 presents the production rules that represent the knowledge base of the classification task. Each predicate represents a part of the domain problem. R01. If various_attributes=yes Then classification=medium R02. If various_attributes=no Then classification=low R03. If all_numerical=yes and classification=medium Then classification=low R04. If all_numerical=no and various_attributes = yes Then classification=medium R05. If large_data_volume=yes and classification = medium Then classification=high R06. If large_data_volume=no Then classification=low R07. If categorical_attribute=yes and classification = medium Then classification=high R08. If categorical_attribute=no and (classification=medium or classification=high) Then task_not_identified=yes R09. If transaction=yes Then classification=low R10. If transaction=no and classification = medium Then classification=high R11. If single_target=yes and classification=medium Then classification=high R12. If single_target=no and classification=low Then classification=low R13. If identify=yes and classification = medium and single_target = yes Then classification=high R14. If identify=no and single_target=no Then task_not_identified=yes R15. If training_data=yes and classification = medium Then classification=high R16. If training_data=no Then classification=low R17. If training_data=yes and single_target=yes and identify=yes and classification=high Then classification=very_high R18. If different_sets=yes and classification = medium Then classification=high R19. If different_sets=no Then classification=low R20. If different_sets=no and training_data=no and classification=high Then task_not_identified=yes R21. If applied_results=yes and classification= medium Then classification=high R22. If applied_results=no Then classification=low R23. If decision_results=yes and applied_attribute=yes and classification=high Then classification=very_high R24. If decision_results=no and classification= low Then Classification=low R25. If task_not_identified= yes Then classification = not_possible Figure 1. Knowledge Base Representing the degree of use adequacy of the classification task. EXECUTION AND TEST OF THE KNOWLEDGE BASE The answers to these questions represent the values of eleven from the thirteen predicates existing in the knowledge base. Do you wish to identify several attributes for the mining process? Are chosen attributes all numeric? Is there a large volume of data in your database? Is the classification attribute a category? Are chosen attributes part of a transactional database? For your mining process do you need to choose only one attribute to characterize the whole process? Can you identify such attribute among the chosen attributes or create it? Is there a data set to be used for a training process? Is there a different data set to be used for a testing process? Do you wish to use mining results in order to apply them to other available data? Do you wish to use mining results in order to make an immediate decision about your organization? Figure 2. Questions asked to users EXECUTION AND TEST OF THE KNOWLEDGE BASE • After answering the questions the knowledge base can be executed. One example of execution is presented in Figure 3. Attribute Values: 1 - various_attributes : yes; 2 - all_numerical: no; 3 - large_data_volum: no; 4 - categorical_attribute: yes; 5- transaction : no; 6 - single_target: yes; 7 - identify : yes; 8 - training_data: yes; 9 - different_sets: yes; 10 - applied_results: yes; 11 - decision_result: yes. Successful Rules: R01, R04, R05, R07 and R17 Figure 3. Example of forward chaining reasoning. • In the example in Fig. 3, the result of the execution of the knowledge base is a very_high level of adequacy for the application of the classification task in the problem domain presented by the user through the answers of the questions presented in Fig.2. CONCLUSION This work presents the modeling and testing of a knowledge base for instructing users to choose the task of classification through the use of questions that lead them to finding out if the classification task is suitable to be used in their domain problem. CONCLUSION Among the contributions of this work the following are pointed out: creation of questions to instruct users about the choice of the classification task when applying mining techniques to their problems; acquisition of knowledge about classification task; knowledge modeling to solve the problem of the adequacy of application of the classification task in data mining for general domains of problems; initial tests of the knowledge base. CONCLUSION As future work authors propose: The performance of more tests to validate the knowledge base; The creation of pertinence functions for the predicates: low, medium, high and very high, turning them fuzzy sets; The expansion of the knowledge domain by including the association task; The insertion of the knowledge base in the KIRA tool. The tool is an instructional tool for the data mining process including the phases: problem description; data cleaning; data selection; application of mining tasks and data analysis. REFERENCES AMO, S. Course of Data Mining. Masters Program in Computer Science, Federal University of Uberlandia, 2003. Available at: <http://www.deamo.prof.ufu.br/CursoDM.html>. Accessed: 01 Out. 2009. BERRY, M. J. A.; LINOFF, G.. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. 2 ed. Wiley Publishing. USA, 2004. BINDILATTI, A. Modeling a Knowledge Base for Data Mining Process of the Kira Tool . Under-graduation Research Project. Unimep: Methodist University of Piracicaba. Piracicaba, São Paulo, 2009. LIU B.; HSU W.; MA Y. Integrating Classification and Association Rule Mining. Proceeding of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). Nova York, USA. pp. 80-86. 1998. LUGER, G. F. Artificial Intelligence Structures and Strategies for Complex Problem Solving. Fifth Edition. England. Addison-Wesley. 2005.