Download Extended Naive Bayes classifier for mixed data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Naive Bayes classifier wikipedia , lookup

Transcript
Available online at www.sciencedirect.com
Expert Systems
with Applications
Expert Systems with Applications 35 (2008) 1080–1083
www.elsevier.com/locate/eswa
Extended Naive Bayes classifier for mixed data
Chung-Chian Hsu a, Yan-Ping Huang
a,b,*
, Keng-Wei Chang
a
a
b
Department of Information Management, National Yunlin University of Science and Technology, 123, Section 3, University Road,
Douliu Yunlin 640, Taiwan, ROC
Department of Information Management, Chin Min Institute of Technology, 110, Hsueh-Fu Road, Tou-Fen, Miao-Li 351, Taiwan, ROC
Abstract
Naive Bayes induction algorithm is very popular in classification field. Traditional method for dealing with numeric data is to discrete
numeric attributes data into symbols. The difference of distinct discredited criteria has significant effect on performance. Moreover, several researches had recently employed the normal distribution to handle numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Therefore, the research for classification of mixed data using Naive Bayes classifiers is not
very successful. In this paper, we propose a classification method, Extended Naive Bayes (ENB), which is capable for handling mixed
data. The experimental results have demonstrated the efficiency of our algorithm in comparison with other classification algorithms ex.
CART, DT and MLP’s.
Ó 2007 Elsevier Ltd. All rights reserved.
Keywords: Naive Bayes classifier; Classification; Mixed data
1. Introduction
Naive Bayes classifiers are very robust to discriminate
irrelevant attributes, and to classify evidence from many
attributes to make the final prediction. Naive Bayes classifiers are generally easy to understand and the induction of
these classifiers is extremely fast, requiring only a signal
pass through the data. However, the algorithm is limited
to categorical or discrete data. In other words, the classification of mixed data, which includes categorical and
numeric data, is inapplicable.
Traditional method for dealing with numeric data is to
discrete numeric attributes data into symbols. However,
the difference of distinct discredited criteria has significant
effect on performance. Moreover, several researches has
recently employed the normal distribution to handle
numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Hence, the
research for classification of mixed data using Naive Bayes
classifiers is not very successful.
In this paper, we propose a classification method, ENB
which is capable of handling mixed data. For categorical
data, we utilize the original approach in the Naive Bayes
algorithm to calculate the probabilities of categorical values. For continuous data, we adopt the statistical theory,
in which we not only take the average into account but also
consider the variance of numeric values. For an unknown
input pattern, the product of the probabilities and the
P-values are calculated and then the class which results in
the maximum product is designated as the target class to
which the input pattern belongs.
2. Naive Bayesian classifier
*
Corresponding author. Address: Department of Information Management, National Yunlin University of Science and Technology, 123, Sec.
3, University Road, Douliu, Yunlin 640, Taiwan, ROC. Tel.: +886
37627153; fax: +886 97605684.
E-mail addresses: [email protected] (C.-C. Hsu), sunny@
chinmin.edu.tw (Y.-P. Huang), [email protected] (K.-W. Chang).
0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2007.08.031
Bayesian networks have been successfully applied to a
great number of classification problems. There has been a
surge of interests in learning Bayesian networks from data.
The goal is to induce a network that best captures the
dependencies among the variables for the given data. A
C.-C. Hsu et al. / Expert Systems with Applications 35 (2008) 1080–1083
Naive Bayesian classifier assumes conditional independence among all attributes given the class variable. It learns
from training data the conditional probability of each attribute given its class label (Duda & Hart, 1973; Langley, Iba,
& Thompson, 1992).
A simple Bayesian network classifier, which in practice
often performs surprisingly well, is the Naive Bayesian
classifier. This classifier basically learns the class-conditional probabilities P(Xi = xijC = cl) of each attribute Xi
given the class label cl. A new test case X1 = x1, X2 =
x2, . . ., Xn = xn is then classified by using Bayesian rule to
compute the posterior probability of each class cl given
the vector of observed attribute values according to the following equation:
P ðC ¼ cl j X 1 ¼ x1 ; X 2 ¼ x2 ; . . . ; X n ¼ xn Þ
¼
P ðC ¼ cl ÞP ðX 1 ¼ x1 ; X 2 ¼ x2 ; . . . ; X n ¼ xn j C ¼ cl Þ
P ðX 1 ¼ x1 ; X 2 ¼ x2 ; . . . ; X n Þ
ð1Þ
The simplifying assumption behind the Naive Bayesian
classifier is that the attributes are conditionally independent, given the class label according to the following
equation:
P ðX 1 ¼ x1 ; X 2 ¼ x2 ; . . . ; X n ¼ xn j C ¼ cl Þ
n
Y
¼
P ðX i ¼ xi j C ¼ cl Þ
ð2Þ
i¼1
This assumption simplifies the estimation of the classconditional probabilities from the training data. Notice
that one does not estimate the denominator in Eq. (1) since
it is independent of the class. Instead, one normalizes the
nominator term to 1 over all classes.
Bayesian network models are widely used for discriminative prediction tasks such as classification. In recent
years, it has been recognized, both theoretically and experimentally, that in many situations it is better to use a
matching ‘discriminative’ or ‘supervised’ learning algorithm such as conditional likelihood maximization (Friedman, Geiger, & Goldszmidt, 1997; Greiner, Grove, &
Schuurmans, 1997; Kontkanen, Myllymaki, & Tirri,
2001; Ng & Jordan, 2001).
Naive Bayesian classifiers have been proven successful in
many domains, despite the simplicity of the model and the
restrictiveness of the independent assumptions it made.
Naive Bayesian algorithm handles only categorical data,
but could not reasonably express the probability between
two numeric values and preserve the structure of numeric
values. Extended Naive Bayesian algorithm is used in data
mining as a simple and effective classification algorithm.
The ENB algorithm properly handles the mixed data.
3. Extended Naive Bayesian classification algorithm
The ENB algorithm has been widely used in data mining
as a simple and effective classification algorithm. It handles
1081
the mixed data. For a categorical attribute, the conditional
probability that an instance belongs to a certain class c
given that the instance has an attribute value A = a,
P(C = c j A = a) is given by the following equation:
P ðC ¼ cjA ¼ aÞ ¼
P ðC ¼ c \ A ¼ aÞ nac
¼
P ðA ¼ aÞ
na
ð3Þ
where nac is the number of instances in the training set
which has the class value c and an attribute value of a,
while na is the number of instances which simply has an
attribute value of a. Due to horizontal partitioning of data,
each party has partial information about every attribute.
Each party can locally compute the local count of instances. The global count is given by the sum of the local
counts. For a numeric attribute, the necessary parameters
are the mean l and variance r2 for all different classes.
Again, the necessary information is split between the parties. In order to compute the mean, each party needs to
sum the attribute values of the appropriate instances having the same class value. These local sums are added together and divided by the total number of instances
having the same class to get the mean for that class value.
The ENB clustering algorithm usually has the following
steps
Step 1. A training dataset requires xi parties, Ck class
values and w attribute values. If there occur
numeric data attributes, calculate the mean value
and the variance value. However, the time is
counted if categorical data attributes occur.
Step 2. It calculates the probability of an instance having
the class and the attribute value. If there occur
numeric data attributes, the probability of attribute value xi in class Ck determines the probability
according to the following equation:
pðxi jC k Þ ¼ pðC k Þ
0
atti
Y
i¼1
pðwi;t jC k Þ attj
Y
j¼iþ1
2
1
B
C
jX j X 0j j ðlj l0j Þ
rffiffiffiffiffiffiffiffiffiffiffiffiffi
pB
mC
@z P
A:
^2j
^02
r
r
j
þ
0
nj
n
j
ð4Þ
If there occur categorical data attributes, the probability of attribute value xi domain value wi,t in
class Ck determines the probability such that
P(Cijx) > P(Cjjx), x is in class Ci; else x is in class Cj.
The Bayesian approach to classify the new instance
is to assign the most probable target value, P(Cijx),
given the attribute values {w1, w2, . . ., wn} that
describe the instance according the following
equation:
P ðC i jxÞ ¼
P ðC i \ xÞ P ðxjC i ÞP ðC i Þ
¼
P ðxÞ
P ðxÞ
ð5Þ
1082
C.-C. Hsu et al. / Expert Systems with Applications 35 (2008) 1080–1083
Table 1
The ARR score in five datasets, six kinds of classification algorithm
Algorithm
Credit approval
German credit
Hepatitis
Horse
Breast cancer
ARR
ENB
6.9245 Duda and Hart
(1973)
(6.2083, 8.1053)
3.0322 Duda and Hart
(1973)
(2.6022, 3.8551)
11.9333 Duda and Hart
(1973)
(7.0000, 27.0000)
5.2185 Friedman et al.
(1997)
(3.9286, 6.6667)
25.0150 Langley et al.
(1992)
25.0150 (22.1000,
32.0000)
0.7667
MLP’s
5.8885 Greiner et al.
(1997)
(5.0702, 6.8636)
2.6528 Greiner et al.
(1997)
(2.2843, 2.8953)
5.5871 Greiner et al.
(1997)
(4.0909, 10.2000)
3.9193 Greiner et al.
(1997)
(2.6316, 5.2727)
29.2250 Duda and Hart
(1973)
(20.0000, 45.2000)
0.4000
CART
5.4120 Ng and Jordan
(2001)
(4.1642, 7.2381)
2.5758 Ng and Jordan
(2001)
(2.2212, 2.9412)
6.9730 Friedman et al.
(1997)
(3.3077, 10.2000)
26.4563 Duda and Hart
(1973)
(3.3125, 68.0000)
14.0205 Ng and Jordan
(2001)
(7.8846, 20.0000)
0.3867
DT
6.2979 Friedman et al.
(1997)
(4.7667, 7.8718)
2.5758 Ng and Jordan
(2001)
(2.2212, 2.9412)
8.1958 Langley et al.
(1992)
(4.0909, 17.6667)
21.1898 Langley et al.
(1992)
(5.2727, 33.5000)
14.8546 Greiner et al.
(1997)
(8.2400, 22.1000)
0.3567
The ENB classifier makes the simplification
assumption that the attribute values are conditionally independent given the target value according
the following equation:
P ðxjC i Þ ¼
n
Y
P ðxk jC i Þ:
ð6Þ
k¼1
The categorical values have been normalized before calculating the probability of each class. It
determines the normalized according to the following equation:
P
1 þ xi 2Ck N ðwi;t ; xi ÞpðC k jxi Þ
P ðwi;t jC k Þ ¼
P
PjV j
jV j þ xi 2Ck t¼1 N ðwi;t ; xi ÞpðC k jxi Þ
ð7Þ
where jVj is the total domain value in the attribute
value xi.
Step 3. All parties calculate the probability of each class,
according to the following equation:
pi ¼
m
Y
p½j
ð8Þ
j¼1
Step 4. It selects the maximal of the probability of each
class according to the following equation:
classCount
max ðp½iÞ
i¼1
ð9Þ
The real mixed datasets Australian Credit Approval,
German Credit Data, Hepatitis, Horse Colic and Breast
Cancer from the UCI repository (Merz & Murphy, 1996).
Australian Credit Approval has 690 records of 14 attributes, including eight categorical attributes and six numerical attributes. German Credit Data has 1000 records of 20
attributes, including thirteen categorical attributes and six
numerical attributes. Hepatitis has 155 records of 19 attributes, including thirteen categorical attributes and six
numerical attributes. Horse Colic has 368 records of 27
attributes, including twenty categorical attributes and
seven numerical attributes. Breast Cancer has 699 records
of 9 numerical attributes
This study uses the average reciprocal rank (ARR)
(Quinlan, 1986; Voorhees & Tice, 2000) as our evaluation
metric. ARR is defined as shown in Eq. (10). Assume that
the system only retrieves n relevant items and they are
ranked as r1, r2, . . ., rn.
n
1 X
1
ð10Þ
ARR ¼
N i¼1 ri
The ARR score for classification using the accuracy rate
are shown in Table 1. The ARR score for classification
using an average amount l, minimize value and maximize
value. The accuracy rate is shown in Table 1. The higher
the ARR values, the better the classification result. ENB
has the highest ARR score. The results clearly show that
the ENB algorithm has good classification quality.
5. Conclusions and future work
4. Experiments and results
The algorithm is developed by using Java 2.0, Access
database and SPSS Clementine 7.2. In the experiments, it
presents the results of the ENB algorithm with mixed data
and compares with other classification algorithms, ex. Decision Tree (DT) (Quinlan, 1993), Classification And Regression Tree (CART) (Breiman, Friedman, Olshen, & Stone,
1984), and Multiplayer Perceptions (MLP’s) (Simon, 1999).
The aim of this paper which demonstrates the technology for building classification algorithm from examples is
fairly robust. This paper also proposes an efficient ENB
algorithm for classification. Meanwhile, it achieves ARR
value reappearing nearly 76% and it also endeavors to
improve the accuracy in the classification. The further
study can apply this algorithm in other time series databases, like financial databases.
C.-C. Hsu et al. / Expert Systems with Applications 35 (2008) 1080–1083
References
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984).
Classification and Regression Trees. The Wadsworth and Brooks.
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene
Analysis. John Wiley & Sons.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network
classifiers. Machine Learning, 29(2), 131–163.
Greiner, R., Grove, A., & Schuurmans, D. (1997). Learning Bayesian nets
that perform well. In Proceedings of the thirteenth annual conference on
uncertainty in artificial intelligence (pp. 198–207). San Francisco.
Kontkanen, P., Myllymaki, P., & Tirri, H. (2001). Classifier learning with
supervised marginal likelihood. In Proceedings of the seventeenth
conference on uncertainty in artificial intelligence (pp. 277–284). San
Francisco.
1083
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian
classifiers. In Proceedings of the international conference on artificial
intelligence.
Merz, C.J., & Murphy, P. (1996). UCI repository of ML databases, http://
www.ics.uci.edu/~mlearn/MLRepository.html.
Ng, A., & Jordan, M. (2001). On discriminative vs. generative classifiers: A
comparison of logistic regression and Naive Bayes. Advances in neural
information processing systems, 14, 605–610.
Quinlan, J. R. (1986). Induction of decision tree. Machine Learning., 1,
81–106.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo.
CA: Morgan Kaufman.
Simon, H. (1999). Neural Networks: A Comprehensive Foundation.
Voorhees E.M., & Tice, D.M. (2000). The TREC-8 Question Answering
Track Report. The Eighth Text Retrieval Conference (TREC-8).