Download Efficient and Scalable Multi-Class Classification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Efficient and Scalable Multi-Class
Classification using naive Bayes Tree
Yan Zhu
Agenda
 Overview
 Objective
 The Proposed Algorithm
 Experiments And Results
 Conclusion
Overview
 The goal of multi-class classification is to predict the class labels
of new instances whose attribute values are known, but class
values are unknown.
 Decision tree (DT) is a most popular multi-class classification tool
that commonly used in many real world classification problems
such as weather predictions, astronomical and intrusion detection
etc.
Overview
 DT provides a rapid and useful solution for classifying instances in
large datasets with a large number of variables.
 Decision tree classification has various advantages: (a) simple to
understand, (b) easy to implement, (c) requiring little prior
knowledge, (d) able to handle both numerical and categorical
data, (e) robust, and (f) dealing with large and noisy datasets.
Objective
 Two common issues
 the growth of the tree to enable it to accurately categorize the
training dataset
 the pruning stage, whereby superfluous nodes and branches are
removed in order to improve classification accuracy.
Overview
 The naive Bayes (NB) classifier is also widely used for
classification problems in data mining and machine learning fields
because of its simplicity and impressive classification accuracy.
 It has several advantages such as (a) easy to use, (b) only one
scan of the training data required, (c) handling missing attribute
values, and (d) continuous data.
Objective
 In this paper, we propose an adaptive naive Bayes tree (NBTree)
algorithm for scaling up the classification accuracy for multi-class
classification tasks.
 NBtree is a hybrid classifier using both decision tree and naive
Bayes classifiers.
 In NBTree nodes contain and split as regular decision tree, but the
leaves are replaced by naive Bayes classifier.
The Proposed Algorithm
 The naive Bayes tree classifier
 The naive Bayes tree (NBTree) classifier is a hybrid learning
approach of decision tree (DT) and naive Bayesian (NB) classifiers.
 In NBTree nodes contain and split as regular decision trees, but
the leaves are replaced by NB classifier
The Proposed Algorithm
 In a given training dataset, each instance, xi , contains values
{𝑥𝑖1 , 𝑥𝑖2 , · · · , 𝑥𝑖ℎ }. There is a set of attributes used to describe
the training data, D = {𝐴1 , 𝐴2 , · · · , 𝐴𝑛 }. Each attribute contains
attribute values Ai = {𝐴𝑖1 , 𝐴𝑖2 , · · · , 𝐴𝑖𝑘 }. A set of classes C = {𝐶𝑖 ,
𝐶𝑖 , · · · , 𝐶𝑛 } is also used to label the training instances, where
each class Ci = {𝐶𝑖1 , 𝐶𝑖2 , · · · , 𝐶𝑖𝑘 } also has some values.
The Proposed Algorithm
 The aim of DT learning is to construct a tree model from training
dataset, D, and correspondingly the Bayes theorem, if attribute Ai
∈ D is discrete or continuous, we will have:
 Where P(𝐶𝑖 |𝐴𝑖𝑗 ) denote the probability.
The Proposed Algorithm
 The algorithm calculates the class conditional probabilities of
attributes in each leaf node of the tree, T.
 For each attribute, 𝐴𝑖 , the number of occurrences of each
attribute value, 𝐴𝑖𝑗 , can be counted to determine P(𝐴𝑖 ).
 Similarly, the probability P(𝐴𝑖𝑗 |𝐶𝑖 ) also can be estimated by
counting how often each 𝐴𝑖𝑗 occurs in 𝐶𝑖 of leaf node of the DT, t.
The Proposed Algorithm
 To calculate P(𝐶𝑖 |𝑥𝑖 ), we need P(𝐶𝑖 ) for each 𝐶𝑖 , and P(𝑥𝑖 |𝐶𝑖 ),
and estimate the likelihood that 𝑥𝑖 . The posterior probability,
P(𝐶𝑖 |𝑥𝑖 ), is then found for 𝐶𝑖 . The class, 𝐶𝑖 , with the highest
probability is used to label the instance, 𝑥𝑖
Experiments
 Data Sets
 10 real benchmark datasets from UCI machine learning repository
Experiments
 Experimental setup
 10-fold cross validation
 Measurement
 Accuracy
 Precision
 Sensitivity-specificity analysis
Results
Results
Results
Conclusion
 This paper proposed an adaptive NBTree algorithm to improved the
classification accuracy rates of multiclass classification problems.
 It used DT induction to select a subset of attributes from training dataset for
the production of naive assumption of class conditional independence.
 The performances of the proposed algorithm was tested against traditional DT
and NB classifiers and the experimental results showed that the proposed
NBTree algorithm has produced impressive results for the classification of real
life challenging multi-class problems.
Reference
Farid, D. M., Rahman, M. M., & Al-Mamuny, M. A. (2014, May).
Efficient and scalable multi-class classification using naïve Bayes tree.
In Informatics, Electronics & Vision (ICIEV), 2014 International
Conference on(pp. 1-4). IEEE.