Download Improving Classifications of Medical Data Based on Fuzzy ART2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012
444
Improving Classifications of Medical Data Based on Fuzzy ART2 Decision
Trees
Yo-Ping Huang, Shin-Liang Lai, Frode Eika Sandnes, and Shen-Ing Liu
Abstract1
Analyzing given medical databases provide valuable references for classifying other patients symptoms. This study presents a strategy for discovering
fuzzy decision trees from medical databases, in particular Harbeman’s Survival database and the Blood
Transfusion Service Center database. Harbeman’s
Survival database helps doctors treat and diagnose a
group of patients who show similar past medical
symptoms and the Blood Transfusion Service Center
database advises individuals about when to donate
blood. The proposed data mining procedure involves
neural network based clustering using Adaptive
Resonance Theory 2 (ART2), and the extraction of
fuzzy decision trees for each homogeneous cluster of
data records using fuzzy set theory. Besides, another
objective of this paper is to examine the effect of the
number of membership functions on building decision trees. Experiments confirm that the number of
erroneously clustered patterns is significantly reduced compared to other methods without preprocessing data using ART2.
Keywords: Data classification, fuzzy ART2 algorithm,
fuzzy decision tree, medical data.
1. Introduction
Data mining, or knowledge discovery from data
(KDD), is the process of extracting desirable knowledge,
or interesting patterns, from existing databases for speCorresponding Author: Yo-Ping Huang is with the Department of
Electrical Engineering National Taipei University of Technology,
Taipei, Taiwan 10608.
E-mail: [email protected]
Shin-Liang Lai is with the Department of Computer Science and
Engineering, Tatung University, Taipei, Taiwan 10451.
E-mail: [email protected]
Frode Eika Sandnes is with the Faculty of Technology, Art and Design, Oslo and Akershus University College of Applied Sciences,
0130 Oslo, Norway.
E-mail: [email protected]
Shen-Ing Liu is with the Department of Psychiatry, Mackay Memorial Hospital, Taipei, Taiwan 10449.
E-mail: [email protected]
Manuscript received 26 March, 2012; revised 9 June, 2012; accepted
13 June, 2012.
cific purposes. Many types of knowledge and technology
have been proposed, and the extraction of decision trees
from transaction data is one of the most commonly studied forms of data mining.
Decision tree based classification is a supervised
learning method that constructs decision tree from a set
of samples [1, 2]. A decision tree is a tree structure
where each leaf node is assigned a class label. The root
node in the tree contains all training samples that are to
be divided into classes. All nodes except the leaves are
called decision nodes, since they specify decision to be
performed at this node based on a single feature. Each
decision node has a number of child nodes equal to the
number of values that a given feature assumes. All decision tree algorithms are based on Hunt’s concept learning algorithm [3, 4]. The concept learning algorithm
mimics how humans learn simple concepts, that is, how
key distinguishing features between two categories are
found, represented by positive and negative (training)
examples. Hunt’s algorithm is based on a divide-and-conquer strategy where the task is to divide the
set S, consisting of n samples belonging to c classes, into
disjoint subsets that create a partition of the data into
subsets containing samples from one class only. Decision trees have many advantages, such as rapid training
(tree construction) and high classification accuracy.
Artificial neural networks (ANNs) are commonly used
for solving pattern classification problems and for finding decision trees [5]. In the medical domain, neural
networks and other strategies have been used for diagnostic decision support such as cancer diagnosis [6] and
fall detection [31]. Other fault detection models based on
abdicative network model, and combined fuzzy logic
and neural networks have been proposed [7-11] as well
as a hybrid model combining Adaptive Resonance Theory (ART) and fuzzy c-mean clustering for medical classification and diagnosis with missing features [12].
This study proposes a method using fuzzy ART2 to
improve data classification accuracy when discovering
fuzzy decision trees. The input patterns are first fuzzified.
These fuzzified patterns are clustered into groups where
patterns have similar properties by the ART2 neural
network. Finally, these groups are organized into a decision tree. The erroneous classification rate is greatly reduced and the efficiency of the algorithm in finding decision tree rules is increased since data patterns are clus-
© 2012 TFSA
Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees
tered in advance. ART2 is computationally effective and
allows the user to easily control the degree of pattern
similarity within each cluster [30].
The proposed strategy has been applied on, but not
limited to, two different medical databases. The first database includes information about patients who have
undergone surgery for breast cancer. Those with similar
symptoms are used to discover group decision trees rules
to assist the doctors in treatment and diagnosis. The second database contains data about blood donors. The
purpose is to find decision tree rules that can be used for
providing advice about when to donate blood.
2. Related Work
A. Fuzzy Neural Networks
ANN learning algorithms are either supervised or unsupervised. Supervised learning involves training instances, or examples, with labeled outputs. Unsupervised
learning involves an unknown goal with no
pre-determined categorizations. ANN technology helps
summarize, organize, and classify data. Requiring a few
assumptions, they also identify patterns among input
data with a high predictive accuracy [13].
Both ANNs and fuzzy models have been successfully
applied in many areas [9-11, 14-16] and approaches for
successfully combining these two approaches have become an active area of research. A fuzzy neural network
(FNN) system uses the ANN learning algorithm to produce parameters. It then adapts these parameters for optimization.
B. Fuzzy ID3 Decision Trees
A decision tree is a simple structure, built through supervised learning, where nonterminal nodes represent
tests on one or more attributes and terminal nodes reflect
decision outcomes. Decision trees have been successfully applied to classification problems and several decision tree construction algorithms have been proposed
[17-19, 34-36]. Decision tree induction methods such as
ID3, C4.5 and CART [20-22] generated a tree structure
through recursively partitioning the attribute space until
the whole decision space is completely partitioned into a
set of non-overlapping subspaces. Besides, through the
pruning processes, decision trees can be further simplified to classify data without sacrificing their accuracy
[34].
Fig. 1 shows an example of a decision tree obtained
using the C4.5 algorithm [2] created from Roiger and
Geats data [23]. The decision tree generalizes the data.
Then, this decision tree can be mapped to a set of production rules by writing one rule for each path of the tree,
in particular.
z
If a patient has swollen glands, the diagnosis is strep
445
throat.
If a patient does not have swollen glands and does
have a fever, the diagnosis is a cold.
z
If a patient does not have swollen glands and does
not have a fever, the diagnosis is an allergy.
The decision tree describes how to accurately diagnose a patient by only considering whether the patient
has swollen glands and a fever. The attributes sore throat,
congestion, and headache do not play a role in determining this diagnosis.
ID3 is the most widely cited decision tree construction
algorithm and several variations and enhancements have
been proposed [24-26]. There are also a related branch of
approaches that are not based on ID3 [27-29].
z
Figure 1. Decision tree (based on data in [23]).
The strategy proposed herein is coined fuzzy ART2. It
is based on a fuzzy dataset where each item is associated
with a membership grade. Fuzzy decision trees are generated using these fuzzy sets defined by a user. A fuzzy
decision tree consists of nodes for testing attributes,
edges for branching and leaves with class names and
probabilities.
3. Methodology
The procedures used for the implementation of the
proposed system are shown in Fig. 2. There are five
principal modules in the system, including data collection, data transformation, data clustering, decision trees
construction and decision trees simplification.
Data collection module: This module is used to collect
data for analysis in the study. It is effectively a database.
Data transformation module: It is used to preprocess
the raw data for underlying analysis.
Data clustering module: The module is to cluster data
into smaller groups according to fuzzy ART2 model.
This module helps reduce the complexity of constructing
the decision trees.
International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012
446
Decision trees construction module: Data in the same
cluster are used to create the decision trees that can be
further transformed into semantically fuzzy rules.
Decision trees simplification module: The purpose is
to further prune the trees.
Figure 2. Block diagram of the proposed method.
A. Preprocessing
The ART2 input patterns are fuzzified and the clustered results are used to find the association rules among
data. Assume that each input pattern contains n attributes
(X1, X2, …, Xn) and a membership function μA of set A
maps each attribute value to a real number in the interval
[0,1], μA(x): X →[0, 1]. For convenience, the fuzzy
membership function is denoted μf, where the subscript f
indicates the corresponding fuzzy subset. The set of
fuzzy membership functions for all attributes is denoted
by S. The membership functions labels for each attribute
are normally determined by the distribution of the attribute values.
The ART2 input pattern are represented by new patterns in form of membership degrees (μ11, μ12, … ,
μ1S1, …, μn1, μn2, …, μnSn) where n is the number of attributes in the dataset, and (μi1, μi2, …, μiSi) represent the
membership functions of the ith attribute in fuzzy subset
Si. To better facilitate the discovery of association rules,
linguistic terms are used instead of membership functions.
B. Fuzzy ART2 Neural Network
The following paragraphs describe the proposed fuzzy
neural network.
1) Network Structure: After the fuzzy inputs have
been extracted, the fuzzy ART2 clusters the patterns.
The fuzzy ART2 can be described as follows:
Input layer: The input layer consists of short term
memories (STM).
Weight layer: The weight layer consists of bottom-up
bij and top-down tij weights, called long term memories
(LTM).
Output layer: The output layer is used to express the
clustering results for the given data.
2) Learning Procedures:
The fuzzy ART2 learning procedure consists of the
following four steps:
Step 1: The fuzzy vector is input into the input layer.
Step 2: The distance between the bottom-up weights
and fuzzy inputs are calculated and the shortest distance is identified.
Step 3: If the shortest distance fails a vigilance test (a
threshold defined to test the similarity between a new pattern and a cluster centroid), a
new node is created with its weights equal to
the fuzzy inputs. If a cluster wins the vigilance
test, the centroid of the cluster is adjusted to
adopt the new input.
Step 4: The process is repeated until all the data are
clustered.
If all winners fail the vigilance test a new cluster is
created and the corresponding weights are added. If the
state is “resonance”, the current fuzzy input is assigned
to this cluster by modifying the corresponding weights.
The fuzzy ART2 is performed before discovering the
decision tree rules to divide the volume of data from a
potentially large database into smaller and more manageable subsets. Moreover, since patterns in a cluster
have similar characteristics, the time taken to construct
the decision tree for the subset will be shorter than if the
process is applied to the full dataset.
C. Finding Decision Tree Rules
After all patterns of the original data are grouped into
clusters, the data classification algorithms are used to
find the decision tree rules for each cluster.
The fuzzy decision tree [32] is an extension of the traditional decision tree and an effective method for extracting knowledge given uncertain classification problems. Fuzzy set theory is applied to the dataset. Tree
growing and pruning are combined to determine the tree
structure.
ID3 selects the test attribute based on the information
gain which is computed according to the probability of
ordinary data. The algorithm proposed herein is similar
to ID3, but its gain is computed according to the membership degrees.
Assume that we have a dataset D, where each datum
has l numerical attributes A1, A2, …, Al, one class set C =
{C1, C2, …, Cn} and fuzzy subset DFij for the attribute Aij
(assuming attribute Ai has m different values). Let DCk be
a fuzzy subset in D whose class is Ck and |D| the sum of
the membership values in the dataset D. Then, the fuzzy
decision tree is generated as follows:
(a) Generate the root node T.
(b) If a node t with a fuzzy dataset D satisfies the following conditions, then it is aleaf node and assigned
the majority class in D.
(1). The proportion of a dataset of a class Ck is
greater than or equal to athreshold θr , that is,
D Ck
D
≥ θr .
(1)
Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees
(2). The sum of the membership values in a fuzzy
dataset is less than a threshold θ n , that is,
D Ck ≤ θ n .
(2)
(3). There are no attributes for further classification.
(c) If it does not satisfy the above conditions, it is not
a leaf and the test node is generated as follows:
(1). For Ai’s (i = 1, 2, …, l), calculate the information gains G(Ai, D), described below, and select the maximizing test attributes Amax.
(2). Divide D into fuzzy subsets D1, D2, …, Dm according to Amax, generate new nodes t1, t2, …,
tm for the fuzzy subsets D1, D2, …, Dm.
(3). Replace D by Dj (j = 1, 2, …, m) and repeat
recursively from (b).
The information gain G(Ai, D) for attribute Ai with
a
fuzzy
dataset
D
is
defined
by
G ( A , D ) = I ( D ) − E ( A , D ) , where
i
i
n
I ( D ) = − ∑ pk . log 2 pk ,
(3)
k =1
E ( Ai , D ) =
m
∑ p .I ( D
ij
Fij
),
(4)
j =1
D
pk =
pij =
Ck
,
D
DF
ij
∑
m
h =1
DF
(5)
,
(6)
ih
where |DFij| is the sum of membership values in a fuzzy
subset DFij for the attribute Aij.
Eq.(1) indicates that if all data belong to a single cluster, then there is no need for further split on the decision
node. Eq.(2) is used to check whether a cluster Ck contains only few data. In our experiments, we set θ r = 1
and θ n = 0 .
The following sleeping quality measurement example
illustrates one cycle of the algorithm with the data in
Table 1. Given a fuzzy dataset D where MV is the
membership value of each data set. There are two distinct classes with values G (Good) and B (Bad). A (root)
node T is created for the samples in D. To find the splitting criterion for these samples, the information gains of
the four test attributes are computed according to the
fuzzy values and the results compared.
First, the expected information needed to classify a
sample
in
D
from
Eq.(3)
is
I (D) = −
9
20
log 2
9
20
−
11
20
log 2
11
20
= 0.99
bits. Next, the
expected information requirement for each attribute is
found. For each linguistic term in each attribute, the total
447
fuzzy values of each linguistic term in the same class G
or B are found by applying Eq.(4). For attribute
“clothes” in Table 2 shows the total fuzzy values of the
linguistic terms in each class. Note that the first figure in
the left-hand-side column in Table 2, “much” (2-5),
means that two data belong to the class G while 5 data in
class B for clothes with an attribute value of “much.”
The expected information needed to classify a sample in
D if the examples are partitioned according to “clothes”
is:
E ( Clothes, D ) =
5
13.7
I ( DF 11 ) +
4.3
13.7
I ( DF 12 ) +
4.4
13.7
I ( DF 13 ) = 0.54
bits. Therefore, the information gained by branching on
attribute “clothes” is:
G ( clothes, D ) = I ( D ) − E ( clothes , D ) = 0.45 bits.
Table 1. The example dataset.
Cloth
es
MV
Temp
MV
Humidity
MV
Noise
MV
Quality
much
1.0
high
0.6
prodigious
0.2
no
0.9
B
much
0.5
high
0.9
prodigious
0.7
0.3
B
much
0.7
high
0.3
prodigious
0.5
noisy
medium
0.6
B
0.4
high
0.5
prodigious
0.8
no
0.7
G
medium
0.2
G
normal
normal
enoug
h
enoug
h
enoug
h
enoug
h
much
much
enoug
h
enoug
h
0.8
0.3
0.6
0.7
1.0
0.9
0.2
0.6
0.3
much
0.7
much
1.0
normal
normal
normal
enoug
h
normal
1.0
0.8
high
moderate
moderate
0.2
prodigious
1.0
0.7
prodigious
0.9
no
0.8
B
1.0
B
1.0
prodigious
0.6
medium
high
1.0
comfortable
0.8
no
0.3
G
high
0.3
comfortable
0.5
noisy
0.2
B
0.6
prodigious
0.9
no
0.9
B
medium
1.0
B
no
0.3
B
0.4
B
0.7
G
moderate
moderate
moderate
moderate
moderate
moderate
moderate
moderate
0.7
prodigious
0.4
0.4
comfortable
0.2
medium
medium
0.9
comfortable
0.7
0.3
comfortable
0.6
1.0
comfortable
1.0
noisy
0.5
G
0.8
prodigious
0.2
noisy
0.9
G
0.2
G
0.9
prodigious
1.0
medium
0.3
high
0.5
comfortable
0.9
no
0.5
G
0.9
moderate
0.3
prodigious
0.3
noisy
0.7
B
0.8
medium
0.3
G
1.0
high
1.0
comfortable
Table 2. Distributions of the samples in original “clothes”.
Clothes
For clothes = “much” (2-5)
p1=1.7
n1=3.3
For clothes =”normal” (6-0)
p2=4.3
n2=0
For clothes = “enough” (1-6)
p3=0.7
n3=3.7
I(DF11)=0.92
I(DF12)=0
I(DF13)=0.63
Similarly, the information gained by branching on the
three remaining attributes can be calculated in the same
International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012
448
way with the attribute “clothes”. Finally, the results of
the gain in information by branching on all four attributes are displayed in decreasing bit values in Table 3.
The attribute “clothes” in Table 3 has the highest information gain and becomes the splitting attribute at the
root node T of the fuzzy decision tree. The root node
“clothes” has three linguistic terms “much”, “normal”
and “enough”, thus three leaves are grown from this
node. Proceeding the same way for other attributes the
final fuzzy decision tree in Fig. 3 was obtained.
Table 3. Information gained by branching on four attributes.
Attribute
Gain
clothes
0.45
humidity
noise
0.08
0.06
temperature
0.02
Table 4. Abbreviations of the patient parameters.
Attribute
Age of patient
Patient’s year
of operation
Number of
positive auxiliary nodes
detected
Survival status
No.
1
2
3
4
5
1
2
3
1
2
3
4
5
1
2
Attribute
Frequency - total number
of donation
Monetary - total blood
donated in c.c.
4. Experimental Results and Analyses
Benchmark data were used to illustrate the effectiveness of the proposed model. The data sets are available
from the UCI machine learning repository [33].
The first experiment involves Haberman's Survival
problem. The Harbeman’s Survival dataset consists of
306 samples. Each sample comprises four attributes abbreviated Attr1, Attr2, Attr3, and Attr4. Table 4 lists the
linguistic terms used for these four attributes in each
pattern for this experiment. The survival status attribute
value is clearly indicated in the medical data, not
self-defined information. Besides, either one or two indicates whether or not the patient has survived five or
more years.
The second experiment addressed the Blood Transfusion problem. There are 748 samples in this dataset
where each sample contains five attributes. The classifying attribute is a binary variable representing whether
the donor should donate blood at a certain time. The
linguistic terms for the attributes are listed in Table 5.
Linguistic term
Little_Young
Young
Middle_Aged
Senior
Little_Senior
Old
Average
Recent
Very_Small
Small
Medium
Large
Very_Large
High
Low
Table 5. Abbreviations of the donor parameters.
Recency - months since last
donation
Figure 3. Final decision tree using fuzzy ID3 algorithm.
Parameter
Ly
Yo
Ma
Se
Ls
Ol
Av
Re
Vs
Sm
Me
La
Vl
Hi
Lo
Time - months since first
donation
A binary variable representing whether he/she
donated blood in March
2007
No.
Parameter
Linguistic Term
1
Vs
Very_Small
2
3
4
5
1
2
3
Sm
Me
Bi
Vb
Fe
Me
Ma
Small
Medium
Big
Very_Big
Few
Medium
Many
1
Tl
Too_Little
2
3
4
5
1
2
3
4
5
Li
Av
Mu
Tm
Vr
Re
Mo
Lg
Vl
Little
Average
Much
Too_Much
Very_Recent
Recent
Moderate
Long
Very_Long
1
Hi
High
2
Lo
Low
A. Haberman’s Survival Problem
The vigilance value in the ART2 model dominates the
clustering results that in turn affect the number of patterns assigned to each cluster. Depending on the database size and the properties of the data, it is hard to decide a suitable number of clusters before clustering. In
the first experiment, some mid-size clusters, such as 3 to
7, are tested for the medical data. Small clusters are
checked to see whether they are outliers or not. For example, we found that one pattern (83, 58, 2, 2) is numerically distant from the remaining data. Therefore,
only 305 patterns are used in the simulations to avoid the
overhead of processing outliers.
Experiment 1 started by fuzzifying the input patterns
using the membership function labels 3, 3, 3 and 2 for
the four attributes (case HS-1 with the vigilance value is
set to 3.6). The membership functions of the four attributes are also plotted in Fig. 4.
Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees
The overlapping range and shape of the membership
functions may affect the decision rules. Although these
parameters can be further optimized, this is not the focus
of this study. Here, all membership functions are determined empirically.
449
pattern 53.
- The tree for cluster 2 contains the leaf MaAvHi (indicated with an arrow in Fig. 5(b)). The classification result (Hi) is consistent with the evidence.
Another objective of this experiment was to examine
the effect of the number of membership functions on
building decision trees. The number of membership
functions for the patient age attribute and the number of
positive auxiliary nodes attribute are changed to 5 membership functions as shown in Fig. 7 (case HS-2, the vigilance value is set to 3.4).
Figure 4. Linguistic terms for the four attributes for case HS-1.
After applying the fuzzy ART2, the data are grouped
into 3 clusters containing 131, 115 and 59 patterns, respectively. The fuzzy ID3 algorithm is then applied to
find the decision tree rules for each cluster. Fig. 5 shows
results after applying fuzzy ID3 to each of the three
clusters in case HS-1. The decision tree rules are found
by traversing the tree from the root to each leaf. Each
path of branches from the root to the leaf is converted
into a rule where the conditional part represents the attributes of the passing branches from the root to the leaf
with the highest classification truth level. The end nodes
contain the classified attributes (Lo or Hi). For example,
the right leaf in Fig. 5(a) yields the result YoAvSmHi
meaning “if the age of the patient is young (Yo) and the
patient’s year of operation is average (Av) and number
of positive auxiliary nodes detected is small (Sm), then
the patient survived 5 years or longer (Hi)”.
The resulting fuzzy decision trees are compared with
the results obtained only using the fuzzy ID3 for the
original dataset (without using fuzzy ART2) to mine all
the 305 patterns. The whole decision tree is shown in Fig.
6.
To compare the whole tree with the other smaller trees
of clusters, we use all patterns in each cluster to check
for erroneously clustered patterns. For example, picking
the pattern 53 (MaAvMeHi) in cluster 2 to verify its
correctness in the whole tree and the tree of cluster 2, we
found that:
- The entire tree contains the leaf MeMaAvLo (indicated with an arrow in Fig. 6). It means the patient
did not survive five or more years. The classification
result (Lo) is contradictory to the evidence given in
Figure 5. The decision trees for case HS-1: (a) Cluster 1, (b)
Cluster 2 and (c) Cluster 3.
Attr3
La
Sm
Me
Attr1
Attr1
Ma
Yo
Hi
Lo
Attr1
Ma
Attr2
Re
Lo
Av
Lo
Se
Yo
Attr2
Ol
Lo
Re
Lo
Attr2
Av Ol
Hi
Hi
Re
Hi
Hi
Av Ol
Hi
Yo
Ma
Lo
Attr2
Re
Hi
Av
Hi
Ol
Hi
Figure 6. The decision tree for the original dataset in case
HS-1.
Figure 7. Linguistic terms of attribute 1 and attribute 3 for case
HS-2.
In this case, the dataset is clustered into 5 clusters
(larger than that of case HS-1) where the numbers of
patterns are 78, 66, 57, 85 and 19. In two cases the full
advantage of fuzzy ART2 is taken in finding decision
International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012
450
tree rules. The dataset is split into smaller parts to reduce
the computational effort. Table 6 lists the results of the
first experiment including the data ratio and reduction
percentage of erroneous classifications in the clusters for
the two cases.
To compare the whole tree with the other smaller trees
of clusters, we use all patterns in each cluster to check
for erroneously clustered patterns. Table 6 also shows
Table 6. Results of experiment 1.
Case
HS-1
Case
HS-2
Cluster
No. of
patterns
1
2
3
1
2
3
4
5
131
115
59
78
66
57
85
19
Data
ratio
(%)
43.0
37.7
19.3
25.6
21.6
18.7
27.9
6.2
erroneous classifications
Full
From
Reduction
tree
cluster
(%)
5
19
10
5
27
22
7
13
1
16
1
2
18
7
4
0
80
15.8
90
60
33.3
68.2
42.9
100
B. Blood Transfusion Service Center Database
The proposed method was also applied to the Blood
Transfusion Service Center database. First, seven outliers were discarded and 741 patterns were used. Two
cases were considered with the vigilance values set to
3.9. The inputs of the first case, BT-1, were fuzzified
using the number of membership functions of 3, 3, 3, 3
and 2 for the five attributes. The simplified trees from
clusters 1, 2, 3, 4 and full tree in case BT-1 are shown in
Fig. 9. For the second case, BT-2, the recency (months
since last donation attribute), the monetary (total blood
donated in c.c.) attribute and time (months since first
donation) were respectively fuzzified using five membership functions.
that the full tree always contains more erroneously clustered patterns than the small trees. For example, cluster 1
has five erroneously clustered patterns in the full tree
while the small tree only has one in case HS-1. The error
classification rates are reduced by 47.1% and 58.1% in
cases HS-1 and HS-2, respectively.
The trees in case HS-2 are more complicated than case
HS-1 because increasing the number of linguistic terms
for attributes also incurs an increase in branches for each
attribute. The results in Table 6 show that there are more
erroneously clustered patterns in the full tree than in the
small trees. Cluster 5 has no erroneously clustered pattern (the best case).
The decision tree and the corresponding classification
rules are further simplified [34]. The trimmed trees from
Fig. 5 are shown in Fig. 8. It is easier to search for decision rules in these reduced trees. For example, Fig. 8(a)
has only three leaves in the reduced tree and the rules are
formed: SeHi, MaHi and YoHi. Thus, if any pattern belongs to cluster 1, it will be classified to attribute Hi.
Figure 9. The simplified trees from clusters 1, 2, 3, 4 and full
tree in case BT-1.
Figure 8. The simplified trees from clusters 1, 2 and 3 in case
HS-1.
Table 7 presents the grouping with distributions of
patterns achieved with the fuzzy ART2 for the two cases.
The datasets of the two cases are grouped into 4 and 5
clusters, respectively. The decision tree rules help make
decisions about whether someone should donate blood
during a certain period.
To demonstrate the efficiency of the fuzzy ART2
algorithm, the erroneously clustered patterns are also
extracted and the results are listed in Table 7. The results
show that when processing the whole tree more errone-
Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees
ously clustered patterns appear than for the small trees.
The error classification rates are reduced by 34.5% and
92.9% in cases BT-1 and BT-2, respectively. Finally, the
resulting trees are simplified as described previously.
Although there are 4 and 5 attributes in the Haberman’s Survival database and Blood Transfusion Service
Center database, respectively, the proposed method will
not be deterred its applications to multi-attribute databases. The proposed method can be easily applied to
construct the decision trees from multi-attribute databases.
Table 7. Results for experiment 2.
Cluster
Case
BT-1
Case
BT-2
1
2
3
4
1
2
3
4
5
No. of
patterns
104
252
228
157
91
259
297
44
50
Data
ratio
(%)
14.0
34.0
30.8
21.2
12.3
35.0
40.1
5.9
6.7
erroneous classifications
Full
From
Reduction
tree
cluster
(%)
28
0
1
0
22
28
1
5
0
19
0
0
0
0
0
0
4
0
32.1
0
100
0
100
100
100
20
0
5. Conclusions and Future Work
A novel approach is proposed for finding decision tree
rules by combining fuzzy ART2 and fuzzy ID3. First, a
fuzzy model and ART2 neural network is used for clustering the fuzzified dataset into groups with similar
properties. Next, decision tree rules are mined using the
fuzzy ID3.
Two medical datasets have been explored to validate
the effectiveness of the proposed approach, namely the
Haberman’s Survival database for grouping patients who
have undergone surgery for breast cancer into groups
with similar properties, and the Blood Transfusion Service Center database for clustering blood donors into
groups. The experiments confirm that the erroneous
classification rates are significantly reduced by processing simplified decision trees. Next, the experimental results show that the discovered decision tree rules from
individual cluster are more efficient than those based on
all of the data. The number of erroneously clustered patterns in the decision tree of individual cluster is smaller
than for the entire tree. Applications of the proposed
method are not limited to medical databases. As long as
the accuracy and efficiency of data classification are
concerned, the presented work is applicable to the multi-attribute databases.
Although the ID3 algorithm is very popular in the
field of machine learning it is associated with several
drawbacks. For example, it tends to select attributes that
have more attribute values as the expanding attribute in
451
the process of building decision tree, but the selected
attribute is often not the one that contributes most to the
classification. Future work should thus focus on improving the ID3 algorithm and the classification accuracy.
Acknowledgments
This work was supported in part by National Science
Council, Taiwan under Grant NSC100-2221-E-027-110and in part by Joint project between National Taipei
University of Technology and Mackay Memorial Hospital under Grants NTUT-MMH-99-03 and NTUT-MMH100-09.
References
[1]
J. R. Quinlan, “Introduction of decision trees,”
Machine Learning, vol. 1, pp. 81-106, 1986.
[2] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
[3] G.-G. Li, Y. Li, F.-C. Li, and C.-X. Jin. “Decision
tree inductive learning algorithm based on removing noise gradually,” in Proc. of Int. Conf. on Machine Learning and Cybernetics, Hong Kong, China, vol. 5, pp. 2723-2728, Aug. 2007.
[4] X.-D. Song and X.-L. Cheng “Decision tree algorithm based on sampling,” in Proc. of IFIP Int.
Conf. on Network and Parallel Computing Workshops, Dalian, China, pp. 689-694, Sep. 2007.
[5] R. Li and Z. Wang, “Mining classification rules
using rough sets and neural networks,” European
Journal of Operational Research, vol. 157, no. 2,
pp. 439-448, Sep. 2004.
[6] M. Ringnér and C. Peterson, “Microarray-based
cancer diagnosis with artificial neural networks,”
BioTechniques, vol. 34, pp. 30-34, Mar. 2003.
[7] D. S. S. Lee, B. J. Lithgow, and R. E. Morrison,
“New fault diagnosis of circuit breakers,” IEEE
Trans. on Power Delivery, vol. 18, no. 2, pp.
454-459, Apr. 2003.
[8] J. M. Benitez, J. L. Castro, and I. Requena, “Are
artificial neural networks black boxes,” IEEE Trans.
on Neural Networks, vol. 8, no. 5, pp. 1156-1164,
Sep. 1997.
[9] Y.-P. Huang, T.-W. Chang, L.-J. Kao, and F. E.
Sandnes, “Using fuzzy SOM strategy for satellite
image retrieval and information mining,” Journal
of Systemics, Cybernetics and Informatics, vol. 6,
no. 1, pp. 1-6, Nov. 2008.
[10] S. A. Mingoti and J. O. Lima, “Comparing SOM
neural network with fuzzy c-means, k-means and
traditional hierarchical clustering algorithms,” Eu-
452
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012
ropean Journal of Operational Research, vol. 16,
no. 3, pp. 1742-1759, Nov. 2006.
R. J. Kuo and Y. T. Su, “Integration of ART2 neural network and fuzzy sets theory for market segmentation,” Int. Journal of Operations Research,
vol. 1, no. 1, pp. 67-68, Feb. 2004.
E. Kolman and M. Margaliot, “Are artificial neural networks white boxes,” IEEE Trans. on Neural
Networks, vol. 16, no. 4, pp. 844-852, July 2005.
B. S. Ahn, S. S. Cho and C. Y. Kim, “The integrated methodology of rough set theory and artificial neural network for business failure prediction,”
Expert Systems with Applications, pp. 65-74, Feb.
2000.
J Zhao and Z. Zhang, “Fuzzy rough neural network
and its application to feature selection,” Int. Journal of Fuzzy Systems, vol. 13, no. 4, pp. 270-275,
Dec. 2011.
T.-P. Hong, C.-T. Chiu, S.-L. Wang, and K.-Y. Lin,
“Mining a complete set of fuzzy multiple-level
rules,” Int. Journal of Fuzzy Systems, vol. 13, no. 2,
pp. 111-119, June 2011.
I.-J. Ding, “Fuzzy rule-based system for decision
making support of hybrid SVM-GMM acoustic
event detection,” Int. Journal of Fuzzy Systems, vol.
14, no. 1, pp. 118-130, March 2012.
W. Du and Z. Zhan, “Building decision tree classifier on private data,” in Proc. of Workshop on Privacy, Security, and Data Mining at the IEEE Int.
Conf. on Data Mining, Maebashi, Japan, Dec.
2002.
A. Abdelhalim and I. Traore, “A new method for
learning decision trees from rules,” in Proc. of Int.
Conf. on Machine Learning and Applications,
Miami Beach, FL, USA, pp. 693-698, Dec. 2009.
Q.-W. Meng, H. Qiang, L. Ning, X.-R. Du, and
L.-N. Su, “Crisp decision tree induction based on
fuzzy decision tree algorithm,” in Proc. of 1st Int.
Conf. on Information Science and Engineeging,
Nanjing, China, pp. 4811-4814, Dec. 2009.
S. Ruggieri, “Efficient C4.5 [classification algorithm],” IEEE Trans. on Knowledge and Data Engineering, vol. 14, pp. 438-444, Mar./Apr. 2002.
Y.-P. Huang and Vu Thi Thanh Hoa “General criteria on building decision trees for data classification,” in Proc. of the 2nd Int. Conf. on Interaction
Sciences: Information Technology, Culture and
Human, Seoul, Korea, vol. 403, pp. 649-654, Nov.
2009.
G. Vagliasindi, P. Arena, and A. Murari, “CART
data analysis to attain interpretability in a fuzzy
logic classifier,” in Proc. of the Int. Joint Conf. on
Neural Networks, Atlanta, Georgia, USA, pp.
1925-1932, June 2009.
[23] R. J. Roiger and M. W. Geats, Data Mining: A Tutorial-Based Primer, Addison-Wesley Publishing
Company, Inc., Boston, MA, USA, 2003.
[24] M. Huang, W. Niu, and X. Liang, “An improved
decision tree classification algorithm based on ID3
and the application in score analysis,” in Proc. of
Conf. on Control and Decision, Guilin, China, pp.
1876-1879, June 2009.
[25] S. Wu, L.-Y. Wu, Y. Long, and X.-D. Gao, “Improved classification algorithm by Minsup and
Minconf based on ID3,” in Proc. of the Int. Conf.
on Management Science and Engineering, Lille,
France, pp. 135-139, Oct. 2006.
[26] J. Chen, D.-L Luo, and F.-X. Mu, “An improved
ID3 decision tree algorithm,” in Proc. of 4th Int.
Conf. on Computer Science and Education, Nanning, China, pp.127-130, July 2009.
[27] Y.-P. Huang, L.-J. Kao, and F. E. Sandnes, “Efficient mining of salinity and temperature association
rules from ARGO data,” Expert Systems with Applications, vol. 35, no. 1-2, pp. 59-68, Aug. 2008.
[28] L. V. S. Lakshmanan, C. K. -S. Leung, and R. T.
Ng, “The segment support map: scalable mining of
frequent itemsets,” ACM SIGKDD Explorations
Newsletter, vol. 2, no. 2, pp. 21-27, Dec. 2000.
[29] J. S. Park, M.-S. Chen, and P. S. Yu, “Using a
hash-based method with transaction trimming for
mining association rules,” IEEE Trans. on Knowledge and Data Engineering, vol. 9, no. 5, pp.
813-825, Oct. 1997.
[30] J. A. Freeman and D. M. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques, Addison-Wesley Publishing
Company, Inc., Boston, MA, USA, 1992.
[31] C. N. Doukas and I. Maglogiannis, “Emergency
fall incidents detection in assisted living environments utilizing motion, sound, and visual perceptual components,” IEEE Trans. on Information
Technology in Biomedicine, vol. 15, no. 2, pp.
277-289, Mar. 2011.
[32] M. Umano, H. Okamoto, I. Hatono, H. Tamura, F.
Kawachi, S. Umedzu, and J. Kinishita, “Fuzzy decision tree by fuzzy ID3 algorithm and its application to diagnosis systems,” in Proc. of the 3rd IEEE
Conf. on Fuzzy Systems, Orlando, FL, USA, pp.
2113-2118, June 2006.
[33] Http://archive.ics.uci.edu/ml/
[34] J. R. Quinlan, “Simplifying decision trees,” Int.
Journal of Man Machine Studies, vol. 27, pp.
221-234, 1987.
[35] R. Jin and G. Agrawal, “Efficient decision tree
construction on streaming data,” in Proc. of the
Ninth ACM SIGKDD Int. Conf. on Knowledge
Discovery and Data Mining, Washington, D.C,
Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees
USA, pp. 571-576, Aug. 2003.
[36] Z.-H. Huang, T. D. Gedeon, and M. Nikaravesh,
“Pattern trees inductions: A new machine learning
method,” IEEE Trans. on Fuzzy Systems, vol. 16,
pp. 958-970, Aug. 2008.
Yo-Ping Huang received his Ph.D. in
electrical engineering from Texas Tech
University, Lubbock, TX, USA. He is
currently a Professor in the Department
of Electrical Engineering at National
Taipei University of Technology (NTUT),
Taiwan. He also serves as CEO of Joint
Commission of Technological and Vocational College Admission Committee in
Taiwan. He was the secretary general at NTUT, Chairman of
IEEE CIS Taipei Chapter, and Vice Chairman of IEEE SMC
Taipei Chapter. He was professor and Dean of College of
Electrical Engineering and Computer Science, Tatung University, Taipei, before joining NTUT. His research interests include medical knowledge mining, intelligent control systems,
handheld device application systems design. Prof. Huang is a
senior member of the IEEE and a fellow of the IET.
Shin-Liang Lai received BS and MS
degrees in mathematics from the National Taipei University of Education,
Taipei, Taiwan, in 1998 and 2002, respectively. He is currently a researcher
and Ph.D. candidate at the department of
computer science and engineering,
Tatung University, Taipei, Taiwan. His
research interests include content-based
music information retrieval, data mining, and fuzzy inference
system.
Frode Eika Sandnes received a B.Sc. in
computer science from the University of
Newcastle upon Tyne, U.K., and a Ph.D.
in computer science from the University
of Reading, U.K. Sandnes is a Professor
in the Institute of Information
Technology, Faculty of Technology, Art
and Design at Oslo and Akershus
University College of Applied Sciences,
Norway. He he currently serves as pro-Rector of research and
internationalisation at Oslo and Akerhus University College of
Applied Sciences. His research interests include
error-correction, intelligent systems, human computer
interaction and universal design.
Shen-Ing Liu received her M.D. degree
from the China Medical University,
Taiwan, and Ph.D. degree in psychiatric
epidemology from the Institute of
Psychiatry, King’s College London,
University of London, U.K. She is currently an Associate Professor in the Department of Medicine at Mackay Medi-
453
cal College, Taiwan and also serves as an attending psychiatrist in Mackay Memorial Hospital, Taipei, Taiwan. Her
research interests include epidemiology, clinical psychiatry
and interventional study.