Download Modeling and Testing a Knowledge Base for Instructing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Ana Estela Antunes da Silva
Lidia Martins da Silva
Centro Universitário Cândido Rondon Universidade Metodista de Piracicaba
UNIMEP
UNIRONDON
Piracicaba, São Paulo
Cuiabá, Mato Grosso
SUMARY
 This presentation will discuss:
 Introduction;
 Data Mining Domain Knowledge ;
 The Classification Task Domain;
 Knowledge Base Of The Classification Task;
 Execution And Test Of The Knowledge Base;
 Conclusion.
INTRODUCTION
 Expert systems are computer programs used for
executing rules on a base of knowledge making
possible to solve specific problems.
 Data mining consists of a set of tasks that, through the
use of specific algorithms, are able to exploit a large
data set, creating from them, knowledge in the form of
assumptions and rules.
DATA MINING DOMAIN KNOWLEDGE
 The set of activities of the Knowledge Discovery in
Databases (KDD) process contains the phases:
 data cleaning,
 data integration,
 data selection,
 data transformation,
 data mining,
 pattern evaluation and,
 knowledge presentation.
DATA MINING DOMAIN KNOWLEDGE
 This work focus on the data mining phase.
 The data mining process is the application of a set of
techniques which explore data in order to discover new
patterns and relations in data.
DATA MINING DOMAIN KNOWLEDGE
 The main tasks utilized to perform data mining are:
 association, classification and clustering.
 Association is most applied to problems which can be
modeled using transactions.
 Classification is the task which separates all tuples of a
table into classes.
 Clustering separates data into groups without the help
of a label.
THE CLASSIFICATION TASK DOMAIN
 In order to construct a knowledge base to instruct the
choice of the classification task in data mining, the
domain problem was studied according to the main
characteristics which could lead to the choice or the
rejection of the classification task depending on the
type of problem presented by the user.
KNOWLEDGE BASE OF THE CLASSIFICATION
TASK
 A knowledge base is a set of representations of actions
and events in the world. Each representation is called a
sentence.
 It is considered the main part of a knowledge based
system and contains knowledge under one or more of
the techniques mentioned above. In this work
production rules are used to represent the knowledge
base.
EXECUTION AND TEST OF THE KNOWLEDGE
BASE
 The knowledge base was created specifically for the
classification task with the following levels of
adequacy for the application of the classification task:
low, medium, high, very high and not_possible.
 In order to get values to the attributes in the
knowledge base specific questions are asked to users.
EXECUTION AND TEST OF THE KNOWLEDGE
BASE
Figure 1 presents the production rules that represent the knowledge base of the
classification task. Each predicate represents a part of the domain problem.
R01. If various_attributes=yes Then classification=medium
R02. If various_attributes=no Then classification=low
R03. If all_numerical=yes and classification=medium
Then classification=low
R04. If all_numerical=no and various_attributes = yes
Then classification=medium
R05. If large_data_volume=yes and classification = medium
Then classification=high
R06. If large_data_volume=no Then classification=low
R07. If categorical_attribute=yes and classification = medium
Then classification=high
R08. If categorical_attribute=no and (classification=medium or
classification=high) Then task_not_identified=yes
R09. If transaction=yes Then classification=low
R10. If transaction=no and classification = medium
Then classification=high
R11. If single_target=yes and classification=medium
Then classification=high
R12. If single_target=no and classification=low Then
classification=low
R13. If identify=yes and classification = medium and
single_target = yes
Then classification=high
R14. If identify=no and single_target=no Then
task_not_identified=yes
R15. If training_data=yes and classification = medium
Then classification=high
R16. If training_data=no Then classification=low
R17. If training_data=yes and single_target=yes and
identify=yes and
classification=high Then classification=very_high
R18. If different_sets=yes and classification = medium
Then classification=high
R19. If different_sets=no Then classification=low
R20. If different_sets=no and training_data=no and
classification=high
Then task_not_identified=yes
R21. If applied_results=yes and classification= medium
Then classification=high
R22. If applied_results=no Then classification=low
R23. If decision_results=yes and applied_attribute=yes and
classification=high Then classification=very_high
R24. If decision_results=no and classification= low
Then Classification=low
R25. If task_not_identified= yes Then classification =
not_possible
Figure 1. Knowledge Base Representing the degree of use adequacy of the classification task.
EXECUTION AND TEST OF THE KNOWLEDGE
BASE
 The answers to these questions represent the values
of eleven from the thirteen predicates existing in the
knowledge base.
Do you wish to identify several attributes for the mining process?
Are chosen attributes all numeric?
Is there a large volume of data in your database?
Is the classification attribute a category?
Are chosen attributes part of a transactional database?
For your mining process do you need to choose only one attribute to
characterize the whole process?
Can you identify such attribute among the chosen attributes or create it?
Is there a data set to be used for a training process?
Is there a different data set to be used for a testing process?
Do you wish to use mining results in order to apply them to other available data?
Do you wish to use mining results in order to make an immediate decision about
your organization?
Figure 2. Questions asked to users
EXECUTION AND TEST OF THE KNOWLEDGE
BASE
• After answering the questions the knowledge base can be executed. One
example of execution is presented in Figure 3.
Attribute Values:
1 - various_attributes : yes;
2 - all_numerical: no;
3 - large_data_volum: no;
4 - categorical_attribute: yes;
5- transaction : no;
6 - single_target: yes;
7 - identify : yes;
8 - training_data: yes;
9 - different_sets: yes;
10 - applied_results: yes;
11 - decision_result: yes.
Successful Rules: R01, R04, R05, R07 and R17
Figure 3. Example of forward chaining reasoning.
• In the example in Fig. 3, the result of the execution of the knowledge base is
a very_high level of adequacy for the application of the classification task in
the problem domain presented by the user through the answers of the
questions presented in Fig.2.
CONCLUSION
 This work presents the modeling and testing of a
knowledge base for instructing users to choose the
task of classification through the use of questions that
lead them to finding out if the classification task is
suitable to be used in their domain problem.
CONCLUSION
 Among the contributions of this work the following
are pointed out:
 creation of questions to instruct users about the choice
of the classification task when applying mining
techniques to their problems;
 acquisition of knowledge about classification task;
 knowledge modeling to solve the problem of the
adequacy of application of the classification task in data
mining for general domains of problems;
 initial tests of the knowledge base.
CONCLUSION
 As future work authors propose:
 The performance of more tests to validate the knowledge
base;
 The creation of pertinence functions for the predicates: low,
medium, high and very high, turning them fuzzy sets;
 The expansion of the knowledge domain by including the
association task;
 The insertion of the knowledge base in the KIRA tool. The
tool is an instructional tool for the data mining process
including the phases: problem description; data cleaning;
data selection; application of mining tasks and data analysis.
REFERENCES
 AMO, S. Course of Data Mining. Masters Program in Computer Science,




Federal
University
of
Uberlandia,
2003.
Available
at:
<http://www.deamo.prof.ufu.br/CursoDM.html>. Accessed: 01 Out. 2009.
BERRY, M. J. A.; LINOFF, G.. Data Mining Techniques: For Marketing,
Sales, and Customer Relationship Management. 2 ed. Wiley Publishing.
USA, 2004.
BINDILATTI, A. Modeling a Knowledge Base for Data Mining Process of
the Kira Tool . Under-graduation Research Project. Unimep: Methodist
University of Piracicaba. Piracicaba, São Paulo, 2009.
LIU B.; HSU W.; MA Y. Integrating Classification and Association Rule
Mining. Proceeding of the Fourth International Conference on Knowledge
Discovery and Data Mining (KDD-98). Nova York, USA. pp. 80-86. 1998.
LUGER, G. F. Artificial Intelligence Structures and Strategies for Complex
Problem Solving. Fifth Edition. England. Addison-Wesley. 2005.