Download Competence and Performance-Improving approach for maintaining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Philosophy of artificial intelligence wikipedia , lookup

Neuromancer wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Transcript
Proc. Int. Conf. on Computational Intelligence and Information Technology, CIIT
Competence and Performance-Improving approach
for maintaining Case-Based Reasoning Systems
Abir Smiti and Zied Elouedi
LARODEC, Université de Tunis, Institut Supérieur de Gestion de Tunis,
41 Avenue de la liberté, cité Bouchoucha, 2000 Le Bardo, Tunisia,
[email protected], [email protected]
Abstract. The competence and the performance of Case Based Reasoning
(CBR) system depend on the quality of the case base (CB) and the speed of
the retrieval process that can be costly in time especially when the number of
cases gets bulky. To guarantee the system's quality, maintaining the contents
of a CB becomes unavoidable. In this paper, we present a novel approach for
automatically maintaining a CB to improve the competence and the
performance of the CBR system. Our policy is mainly based on clustering
and competence computing. We support our approach with empirical
evaluation.
Keywords: Case base maintenance, Clustering, DBSCAN, Gaussian Means,
Competence, Coverage, Density
1 Introduction
Case Based Reasoning [1, 2] is a diversity of reasoning by analogy. To solve the
problems, CBR system calls the past cases, it reminds to the similar situations
already met. Then, it compares them with the current situation to build a new
solution which, in turn, will be added to the case base (CB). A CBR system is built
to work for long periods of time, it adds cases to the CB through the retain process.
As a result, the CB can grow very fast in the sense that it can affect negatively the
CBR's quality results. To ensure the system's betterment, maintaining CBR systems
(CBM) becomes required. Recently, the CBM issue has drawn more and more
attention to two major gauges that supply to the evaluation of a CB. The first one is
the CB's performance [4, 9] which is the answer time that is needed to calculate a
solution for case targets. The second one is the CB's competence [6] which is the
range of target problems that can be successfully solved. In order to create a high
quality of CB, we need a CBM strategy, that offers a CB with small size, eliminates
disagreeable cases such as noises as well it is able to increase the classification
accuracy and improving the competence. This paper presents a novel approach for
automatically maintaining CBs, while improving both the competence and the
performance of the CBR system. We name it Clustering, competence Model using
Coverage for Deletion method (CMCD). The rest of the paper is organized as
follows: In Section 2, competence model will be approached. Section 3 describes
our new approach CMCD. Section 4 analyzes experimental results.
© Elsevier, 2012
231
2 Related Works on CBM: Competence Model
The motivation of this paper is inspired by (M&S) [3, 5, 8, 12]. It is assuming that
the competence is based on a number of factors including the size and the density
of cases. The density of an individual case (Dens) can be defined as the average
similarity between this case (c) and other clusters of cases called competence
groups (Equation 1). Hence, the density of a cluster of cases is measured as a whole
as the average local density over all cases in the group (G) (Equation 2). The
coverage of each competence group (Cov) is then measured by the group size and
density (Equation 3). In the final step, the overall competence of the case base is
simply the sum of the coverage of each group.
(1)
(2)
(3)
Where |G| is the number of cases in the group G.
3 Clustering, competence Model using Coverage for Deletion
Method (CMCD)
In order to have a good CB quality, we should keep cases whose deletions directly
reduce the competence of the system. Suppose that we have a group G of similar
cases { x1 , x2 ,..xnG }. Each case, in this group, has a different value of coverage
Cov i and there is a case in this set has the maximum value of coverage Cov max .
An interesting question is "what is the best value that the group can totally have?"
The answer is when each case of G has coverage's value equal to Covmax . Hence,
the best value of coverage in G is defined as following:
(4)
Hence, from Equation 3, we can define the best value of the density group as:
(5)
232
Hence, in order to obtain the "best" subset of cases from the group G, its
Dens(G) should be equal to DensBest(G) value. As a deduction, we can keep only
cases whose the sum of their densities satisfies this value.
(6)
To apply our idea described above, we need first to create multiple, groups from
the CB that are located on different sites. Each group contains cases that are closely
related to each other. In that way, we can define the coverage group. This can be
done by a clustering technique. For each cluster, the cases whose sum of their
densities equals to the best density of the group, which in their turns have the best
coverage, are kept and the rest of cases is removed. Therefore, we obtain a new
small CB with high competence. The basic process of our proposed CMCD
maintaining method is in the following steps:
1. Clustering cases: a CB is decomposed into groups of closely related cases.
2. For each cluster: we calculate the coverage for each case, select cases which
have the best coverage according to their density, and delete the other cases.
Step 1: Clustering cases using DBSCAN-GM Among several clustering
approaches, we should, ideally, use a clustering method that discovers structure in
such data sets and has the maximum of the following main properties: It is
automatic and it has the capability of handling noisy cases to delete them. To
overcome all these conditions, we use a new clustering method called "DBSCANGM" [17]. It combines Gaussian Means [19, 20] and density-based clustering
method (DBSCAN) [18] methods. DBSCAN-GM clustering method benefits from
the advantages of both algorithms to cover the conditions cited above: The first
stage, DBSCAN-GM runs Gaussian-Means to generate automatically a set of
clusters with their centers, in purpose to estimate the parameters of DBSCAN. In
this manner, the parameter identification problem of DBSCAN is solved. The
second stage, it runs DBSCAN with their determined parameters to handle noises
and discover arbitrary size and shaped clusters. In this fashion, the noisy data
shortcoming of Gaussian-Means is unraveled.
Step 2: Competence model and coverage computing Once we have partitioned
the original case memory by DBSCAN-GM, our CMCD deletes noisy cases.
Based on a partitioned CB, and the deletion of noisy cases, the smaller case bases
are built on the basis of clustering result with high quality. Each cluster is
considered as a small independent CB. For each cluster, the competence model is
applied to remove the cases with low coverage. In order to find the "best" subset of
cases which can realize the highest competence value, we compute for each case in
one cluster its coverage using the similarity measure. A case is considered to be
40
significant in the CB, if it solves many similar cases: its similarity value should be
greater than a verge
. The concept of case coverage as follows:
(7)
233
For each cluster Gi, we compute the maximum coverage of the group (Equation 4)
and the maximum density (Equation 5). We sort cases, in descending order,
according to their individual densities values, we select only cases whose sum of
their densities nears to the maximum density of the group, and we remove the other
cases. In this way, we obtain only cases with the best coverage values.
4 Results and Analysis
In order to evaluate the performance rate of CMCD, we test on ten diverse data sets
with different sizes. In this paper, we use public datasets obtained from the U.C.I.
repository of Machine Learning databases [7]: Iris, Ionosphere, Breast-W, Blood-T,
Indian and Vehicle. We will consider the following principal criteria: "Size (S%)"
the average storage percentage, "PCC (PCC%)" the mean percentage of correct
classification over stratified tenfold cross validation runs in front of 1-NearestNeighbor (1NN) and "Time" the retrieval time in seconds exerted in 1NN. Our
CMCD method can be appreciated when compared with other well-known
reduction techniques. Thus, we run the WCOID [11], COID [10], CNN [14], RNN
[16], ENN [15] and IBL schemes [13] on the previous data sets. From Tables 1, 2,
and 3 we observe that the experimental results obtained by using our CMCD
method are remarkably better than the ones provided by the other policies:
Table 1. Comparing storage size of CMCD to well known reduction schemes
Dataset
CBR
CMCD
WCOID COID
CNN
RNN
ENN
IB2
IB3
IR-150
IO-351
BW-698
BT-748
IN-768
V-846
MM-961
C-1023
Y-1483
AB-4176
100
100
100
100
100
100
100
100
100
100
52,00
29,34
25,50
12,03
3,12
17,77
17,48
11,57
51,30
25,86
57,33
13,39
9,46
20,45
15,34
20,80
51,00
32,45
8,07
30,67
27,63
15,30
16,30
37.30
43,66
43,62
64,21
62,30
12,34
38,50
93,33
83,91
16,87
38.72
42,08
43,63
54,26
72,89
14,02
58,55
95,33
86,89
81,69
32,10
24,39
76,48
82,52
58,46
69,59
87,50
24,00
25,07
35,48
26,09
22,12
46,57
53,48
84,85
69,79
51,92
24,00
25,00
30,46
26,00
21,00
50,73
53,93
87,85
69,98
51,92
47,30
13,96
6,59
19,00
16,00
51,89
53,18
39,29
8,22
35,58
234
Table 2. Comparing classification accuracy of CMCD to well known reduction schemes
Dataset
IR-150
IO-351
BW-698
BT-748
IN-768
V-846
MM-961
C-1023
Y-1483
AB-4176
CBR
97,33
93,16
97,99
78,81
83,92
82,03
84,92
61,24
86,16
97,93
CMCD WCOID COID
98,98
96,56
96,94
98,57
97,77
98,90
98,15
95,45
98,01
99,97
79,56
79,12
97,97
94,71
92,06
99,99
95,45
87,24
89,98
89,12
89,31
97,13
96,22
96,54
97,56
88,10
86,78
98,61
96,44
96,19
CNN
73,00
70,83
68,18
67,94
67,40
57,58
70,82
82,92
83,56
68,75
RNN
94,23
70,83
67,05
66,65
69,19
57,45
78,60
89,43
83,92
62,50
ENN
91,60
98,36
94,66
71,63
99,14
82,45
77,04
70,40
88,08
96,70
IB2
91,67
95,45
69,69
74,69
87,39
74,44
66,28
59,16
73,82
91,20
IB3
91,67
94,89
70,56
74,21
88,26
73,73
66,42
59,50
73,38
91,67
Table 3. Comparing retrieval time of CMCD to well known reduction schemes
Dataset
IR-150
IO-351
BW-698
BT-748
IN-768
V-846
MM-961
C-1023
Y-1483
AB-4176
CBR
0,0163
0,0724
0,1043
0,0902
0,1581
0,2521
0,3157
0,5324
0,3682
0,4023
CMCD
0,014
0,0352
0,0341
0,0078
0,0044
0,0044
0,0607
0,0051
0,3449
0,0706
WCOID
0,0244
0,5500
0,1160
0,0212
0,0102
0,0755
0,0664
0,0671
0,5440
0,1890
COID
0,0252
0,5800
0,1340
0,0760
0,0263
0,0771
0,0532
0,0536
0,7710
0,1570
CNN
0,011
0,006
0,430
0,098
0,050
0,064
0,208
1,802
0,640
0,111
RNN
0,0101
0,8100
0,3500
0,1830
0,0670
0,0604
0,1990
1,8420
0,6040
0,1010
ENN
0,013
0,616
0,734
0,194
0,026
0,134
0,815
0,931
0,134
0,137
IB2
0,0024
0,0122
0,2440
0,2035
0,0102
0,0595
0,3390
0,2079
0,5950
0,2400
IB3
0,0026
0,1020
0,2270
0,1970
0,0092
0,0581
0,0327
0,1746
0,5810
0,2600
5 Conclusion and future work
In this paper, we have proposed a case-base maintenance method named CMCD Clustering, competence Model using Coverage for Deletion method- which is able
to maintain the case bases by improving the performance and the competence of
the CBR system. Our deletion policy can be improved in future works by
introducing weighting methods in order to check the reliability of our reduction
policy.
References
1. Aamodt A. and Plaza E.: Case-Based Reasoning: Foundational Issues, Methodological
Variations, and System Approaches. In Artificial Intelligence Communications, vol. 7,
pp. 39-52 (1994)
2. Leake, D.B., Wilson, D.C.: Maintaining Case-Based Reasoners: Dimensions And
Directions. In: Computational Intelligence. vol. 17, pp. 196-213 (2001)
235
3. Smyth, B., Keane, M.T.: Remembering To Forget: A Competence-Preserving Case
Deletion Policy for Case-Based Reasoning. In: the 14th International Joint Confer-ence
on Arti cial Intelligent, pp. 377-382 (1995)
4. Smyth, B., McKenna, E.: Building Compact Competent Case-Bases, In: Case-Based
Reasoning Research and Development: Proceedings of the Third International
Conference on Case-Based Reasoning, pp. 329-342 (1999)
5. Smyth, B., Mckenna, E.: Competence guided incremental footprint-based retrieval. In:
Journal of Knowledge-Based Systems, vol. 14, pp. 155-161 (2002)
6. Surma, J., Tyburcy, J.: A Study on Competence-Preserving Case Replacing Strategies in
Case-Based Reasoning. In: Advances in Case-Based Reasoning, Proceedings of the 4th
European Workshop on Case-Based Reasoning, EWCBRY98, Dublin, Ireland. SpringerVerlag, vol. 1488, pp. 233-238 (1998)
7. Asuncion,
A.,
Newman
D.J.,
UCI
Machine
Learning
Repository.
http://www.ics.uci.edu/mlearn. In: University of California, Irvine, School of Information
and Computer Sciences (2007)
8. Smiti, A., Elouedi, Z.: Overview of Maintenance for Case based Reasoning Systems. In:
International Journal of Computer Applications. Published by Foundation of Computer
Science, New York, USA. vol.32, pp. 49-56 (2011)
9. Yang, Q., Wu, J.: Keep it simple: A case-base maintenance policy based on clustering
and information theory. In: Canadian Conference on AI, pp. 102-114 (2000)
10.Smiti, A., Elouedi, Z.: COID: Maintaining case method based on Clustering, Outliers and
Internal Detection. In: book chapter in Software Engineering, Artificial Intelligence,
Networking and Parallel/Distributed Computing, SNPD'10, Springer Berlin / Heidelberg,
vol.295, pp. 39-52 (2010)
11.Smiti, A., Elouedi, Z.: WCOID: Maintaining case-based reasoning systems using
Weighting, Clustering, Outliers and Internal cases Detection. In: The Eleventh International on Intelligent Systems Design and Applications, pp. 356-361 (2011)
12.McKenna, E., Smyth, B.: A Competence Model for Case-Based Reasoning. In: 9th Irish
Conference on Artificial Intelligence and Cognitive Science, pp: 208-220 (1998)
13.Aha, D., W., Kibler, D., Albert, M., K.: Instance-based learning algorithms. In: Machine
Learning, Springer Netherlands, vol.6, pp. 37-66 (1991)
14.Chou, C., H., Kuo, B., H., Chang, F.:The Generalized Condensed Nearest Neigh-bor Rule
as A Data Reduction Method. In: International Conference on Pattern Recognition, IEEE
Computer Society, vol.2, pp. 556-559 (2006)
15.Wilson, D., L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. In:
Transactions on Systems, Man and Cybernetics, vol.2, pp. 408-421 (1972)
16.Li, J., Manry, M., T., Yu, C., Wilson, D., R.: Prototype Classifier Design with Pruning.
In: International Journal on Arti cial Intelligence Tools, pp. 261-280 (2005)
17.Smiti A. and Elouedi Z.: DBSCAN-GM: An improved clustering method based on
Gaussian Means and DBSCAN techniques, In: International Conference on Intelli-gent
Engineering Systems (INES), IEEE Computer Society, pp. 573-578 (2012)
18.Ester, M., Kriegel, H., P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering
Clusters in Large Spatial Databases with Noise. In: Proceeding of 2nd International
Conference on Knowledge Discovery, pp. 226-231 (1996)
19.Hamerly, G., Elkan, C.: Learning the k in k-means. In: MIT Press, vol.17 (2003)
20.MacQueen, J., B.: Some Methods for Classification and Analysis of Multivariate
Observations. In: Proceeding of the fifth Berkeley Symposium on Mathematical Statistics
and Probability, University of California Press, vol.1, pp. 281-297 (1967)
236