Download Mining Complex Data Streams - Journal of Advances in Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Naive Bayes classifier wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
129
Mining Complex Data Streams: Discretization,
Attribute Selection and Classification
Dewan Md. Farid and Chowdhury Mofizur Rahman
Department of Computer Science and Engineering,
United International University, Dhaka-1209, Bangladesh
Email: [email protected]
Abstract—Due to the large volume of data set as well as
complex and dynamic properties of data instances, several
data mining algorithms have been applied for mining
complex data streams in the last decades. Now a day,
knowledge extraction from data streams is getting more
complex because the structure of the data instance does not
match the attribute values when considering the tabulated
data, texts, web, images or videos etc. In this paper, we
address some difficulties of mining complex data streams
such as dealing with continuous attributes, input attribute
selection, and classifier construction. The proposed
discretization algorithm finds the possible cut points in
continuous attributes using information gain heuristic and
naïve Bayesian classifier that can separate the class
distributions. We evaluate the proposed algorithms on
several benchmark data sets from UCI machine learning
repository. The experimental results demonstrate that the
proposed method improves the quality of discretization of
continuous attributes and scales up the classification
accuracy for different types of classification problem.
Index Terms—cut point, contradictory example, data stream,
data mining, interval border, redundant attribute
I. INTRODUCTION
Mining complex data stream is the process of
extracting knowledge from large volume of data set that
come from real life applications. In data stream
classification technique, we need to consider many issues
such as discretization of continuous attributes, important
input attributes selection from training dataset, removing
contradictory examples from training dataset, and
classifier construction. The contradictory examples
appear more than once in the training data set with
different class labels. It confuses the learning algorithms,
so these examples should be avoided or labeled correctly
before learning.
A. Motivation
To develop highly adaptive system that allows selflearning from streaming data (collected in real-time
sensory data, video, speech, advanced industrial
applications, intrusion detection etc) is becoming a
Manuscript received October 17, 2012; accepted March 3, 2013.
Corresponding author: Dewan Md. Farid
Email: [email protected]
©2013 ACADEMY PUBLISHER
doi:10.4304/jait.4.3.129-135
challenging research issue for intelligent computational
researchers. Data mining is the process of analyzing data
from large stores of data to discover patterns and
summarizing it into useful information. Data mining uses
sophisticated mathematical algorithms to segment the
data and evaluate the probability of future events. Data
mining is also known as Knowledge Discovery in Data
(KDD). In classification problems of supervised learning
most of the learning algorithms only deal with the
discretized or symbolic attribute values. So it is necessary
to transform the continuous attribute into discretized
attribute [1]-[4]. The discretized intervals can be treated
in a similar way to nominal values during learning and
classification. The classification accuracy of data mining
algorithm is depending on the quality of discretization of
continuous attributes.
This paper significantly extends our previous work on
supervised discretization of continuous attributes in data
mining [29] in several ways. First, in our previous work,
we did not consider the attribute selection process.
Second, we have proposed an algorithm for handling the
contradictory training instances in this paper. Third, more
discussions are added for improved readability, clarity,
and analytical enrichment. The discretization algorithm
finds the possible cut points in continuous attributes that
can separate the class distributions, and then considers the
best cut point as an interval border with information gain
heuristic and naïve Bayesian classifier. We have
successfully tested the proposed algorithms on a number
of benchmark problems from UCI machine learning
repository. The proposed discretization algorithm can
also select the redundant attributes from training dataset
and also improve the data stream classification accuracy.
B. Dealing with Continuous Attributes
The objective of dealing with continuous attribute is to
discretize the continuous attribute into a number of
intervals (i.e., real number or integers). Ideally, each
interval of the discretization should contain only the data
instances that belong to the same class. It is very
important to find the right places to set up the interval
borders [5]. The simplest technique is to place the
interval borders of continuous attribute between each
adjacent pair of attribute values that are not classified into
the same class. Suppose the pair of adjacent values on
attribute Ai are Ai1 and Ai2, “A = (Ai1+Ai1)/2” can be taken
as an interval border.
130
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
Existing discretization methods partition the attribute
range into two or several intervals using a single or a set
of cut points, which are categorized into top-down or
bottom-up approach. Top-down discretization starts with
a single interval that encompasses the entire value range,
and then repeatedly splits it into sub-intervals until some
stopping criterion is satisfied. It gives a list of k-1
boundary points. From each boundary point, a binary
partition is built by splitting the population at this point.
The point which optimizes a given criterion is then
retained as the cut point. The process stops if no
improvement can be done. Bottom-up discretization
starts with each value in a separate interval, and then
repeatedly merges adjacent intervals until a stopping
criterion is satisfied.
The information gain heuristic and naïve Bayesian
classifier are well known discretization methods that
enhance the performance of a classifier [6]-[9]. The
information gain technique finds the most informative
border to split the values of the continuous attribute. The
maximum gain value always considers the cut point (the
midpoint) between two attribute values of different
classes. In C4.5 algorithm, each of the possible cut point
is not the midpoint between the two nearest values,
rather than the greatest value in the entire dataset that
does not exceed the midpoint. The naïve Bayesian
classifier also uses for discretization of continuous
attributes by constructing a probability curve for each
class in the dataset. When the curves for every class
have been constructed, interval borders are placed on
each of those points where the leading curves are
different from its two sides. A few other methods
such as equal distance division, grouping, k-nearest
neighbors, and fuzzy borders are also applied for dealing
with continuous attributes.
C. Input Attribute Selection
Effective and useful input attributes selection from the
training dataset is a form of search. Input attribute
selection involves the selection of a subset of attributes
d from a total of D original attributes of training dataset,
based on a given optimization principle that improves the
performance of classifier. Ideally, attribute selection
methods search through the subsets of attributes, and try
to find the best ones among the competing 2N candidate
subsets according to some evaluation function. In
complex classification domains, input attribute selection
is very important, because irrelevant and redundant
attributes may lead to complex classification model as
well as reduce the classification accuracy [10]-[16].
Sometimes, input attributes of training dataset may
contain false correlations, which hamper the
classification process. Some attributes in the training
dataset may be redundant, because the information they
add is contained in other attributes. Also, some extra
attributes can increase the computational time, and can
have impact on the classification accuracy.
D. Classification
Classification maps data into predefined groups or
classes. It is often referred to as supervised learning in
©2013 ACADEMY PUBLISHER
data mining, because the classes are determined before
examining the data instances. Classification creates a
function from training dataset. The training dataset
consist of number of attributes, and desired output. The
output of the function can be a continuous value, or can
predict a class label of the input object. The task of the
classification is to predict the value of the function for
any valid input object after having seen only a small
number of training examples [30]-[31].
E. Structure of the Paper
The remainder of the paper is organized as follows.
Section II describes the related works, ID3 classifier, and
naïve Bayesian classifier. Section III presents the
proposed algorithms. Section IV describes the
experimental results based on a number of widely used
benchmark datasets from the UCI machine learning
repository [17]. Finally, Section V presents our
conclusions with future works.
II. STATE OF THE ART
The decision tree (DT) and naïve Bayesian (NB)
classifiers are powerful and popular tools for
classification problem in supervised learning, which are
widely used in many applications in the fields of data
mining, information retrieval, image processing,
bioinformatics, stock prediction, and weather
forecasting etc. DT is easy to implement and requires
little prior knowledge. In DT the successive division of
the set of training examples proceeds until all the
subsets consists of examples of a single class. DT can
be constructed from training dataset with many
attributes. In NB classifier the prior and conditional
probabilities are calculated from a given training dataset
and then these probabilities are used to classify the
known or unknown examples.
A. Related Works
In 1993, Fayyad and Irani [18] used an information
criterion called minimum description length principle
cut (MDLPC) based on information gain for
discretization of continuous attributes. Also in 1993,
Van de Merckt [19] proposed a criterion that took into
account the homogeneity of the classes and also the
point density. In 1995, Dougherty et al. [20] proposed
supervised and unsupervised methods for discretization
of continuous attributes by using entropy based and
statistics based methods to find the relationship between
the intervals and the classes. Supervised methods are
only applicable when the datasets are divided into
classes, which refer to the class information when
selecting discretization cut points, but unsupervised
methods do not use the class information. In 1995,
Pfahringer [21] used entropy to select a large number of
candidate splitting points and employed a best first
search with a minimum description length heuristic to
determine a good discretization. In 1998, Perner and
Trautzsch [22] applied decision tree learning for
discretization of continuous attributes, which discretized
the attribute according to class values were calculated.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
In 2000, Bay introduced univariate method to discretize
an attribute without reference to attributes other than the
class, which considered relationships among attributes
during discretization [23]. In 2000, Hsu et al. [24] used
mapping function to discretize continuous attributes as
it was needed during classification time. In 2001,
Ishibuchi et al [25] applied fuzzy discretization that
created a fuzzy mapping function using an initial
primary method and then used other primary methods to
adjust the initial cut points. In 2005, Boulle [26] used
bottom-up discretization to optimize naïve Bayes’
classifier. It used a standard discretization model
distributed according to a three-stage prior, which is
Bayes optimal for a given set of examples to be
discretized.
B. Information Gain Heuristic
The ID3 algorithm applies information gain heuristic
for attribute selection from training dataset [27]. The
attribute with the highest information gain value is
chosen as the splitting attribute for node N in decision
tree. This attribute minimizes the information needed to
classify the examples in the resulting partitions and
reflects the least randomness or “impurity” in these
partitions. The expected information needed to classify
an example in training dataset D is given by
m
Info( D) = −∑ pi log 2 ( pi )
(1)
i
Where pi is the probability that an arbitrary example in
training dataset D belongs to class Ci and is estimated
by |Ci,D|/|D|. A log function to the base 2 is used,
because the information is encoded in bits. Info(D) is
just the average amount of information needed to
identify the class label of an example in dataset D.
Partitioning (e.g., where a partition m a y contain a
c o l l e c t i o n of examples from different classes rather
than f r o m a single class) needs in following information
to produce an exact classification of the examples
131
where i = 1,…,n-1 is a possible cut point, if Ai
and Ai+1 have been taken by different class values in
the training dataset. The information gain heuristic
check each of the possible cut points and find the best
splitting cut point. It’s a top-down discretization
approach. It produces very large number of interval
borders, if the attribute is not very informative.
C. Naïve Bayesian Classifier
The Naïve Bayesian (NB) classifier is a simple
probabilistic classifier [28]. NB classifier is based on
probability models that incorporate strong independence
assumptions, which often have no bearing in reality,
hence are (deliberately) naïve. A more descriptive term
for the underlying probability model would be
independent feature model. Furthermore the probability
model can be derived using Bayes’ Theorem. NB
classifier estimates the class-conditional probability by
assuming that the attributes are conditionally
independent, given the class label c. The conditional
independence assumption can be formally stated as
follows:
n
P( A | C = c) = ∏ P( Ai | C = c)
Where each attribute set A = {A1,A2,….,An} consists
of n attribute values. With the conditional independence
assumption, instead of computing the class-conditional
probability for every combination of A, only estimate
the conditional probability of each Ai, given C. The
latter approach is more practical because it does not
require a very large training set to obtain a good
estimate of the probability. To classify a test example,
the Naïve Bayesian classifier computes the posterior
probability for each class C.
n
P(C | A) =
P(C )∏ P( Ai | C )
i =1
P( A)
v
| Di |
* Info( Di )
j =1 | D |
InfoA ( D) = −∑
(3)
The information gain heuristic adopted in ID3
classifier can be used to find the most informative border
to split the value domain of the continuous attribute,
when the continuous attribute values are in ascending
order. The maximum information gain is always
considered at a cut point or the midpoint between the
values taken by two examples of different classes. Each
attribute value of the formula “A = (Ai +Ai+1)/2”
©2013 ACADEMY PUBLISHER
(5)
(2)
The term |Dj|/|D| acts as the weight of the jth partition.
InfoA(D) is the expected information required to
classify an example from training dataset D based on
the partitioning by A. The information gain is defined as
the difference between the original information
requirement and the new requirement, that is,
Gain( A) = Info( D) − Info A ( D)
(4)
i =1
Since P(A) is fixed for every A, it is sufficient to
choose the class that maximizes the numerator term,
n
P(C )∏ P( Ai | C )
(6)
i =1
The naïve Bayesian classifier has several advantages.
It is easy to use, and unlike other classification
approaches, only one scan of the training data is required.
The naïve Bayesian classifier can easily handle missing
attribute values by simply omitting the probability when
calculating the likelihoods of membership in each class.
According to Bayesian formula:
132
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
P(C j | A) =
P( A | Cj ) P(C j )
m
∑ P( A | C ) P(C )
k =1
k
follows:
(7)
k
Where P(Cj|A) is the probability of an example
belonging to class Cj. If the example takes the continuous
attributes then the probability P(Cj|A) of the example
taking attribute value if it is classified in the class Cj.
Given P(Cj) and P(Cj|A), a probability curve can be
constructed for each class Cj:
f j ( A) = P( A | C j ) P(C j )
(8)
When the curves for every class have been
constructed, the interval borders are placed on each of
those points where the leading curves are different on its
two sides.
III. PROPOSED METHODS
In this section, we introduce algorithms for
discretization of continuous attributes and handling the
contradictory examples in training data set. The
proposed discretization algorithm improved the quality
of discretization of continuous attribute, which scaled
up the classification rates in classification problems.
A. Discretization Algorithm
Dealing with a continuous domain means
discretization of the numerical domain into a certain
number of intervals. The discretized intervals can be
treated in a similar way to nominal values during
induction and deduction. This section presents the
algorithm that produce remarkably accurate rules for
difficult problem domains, and also the problems
contain continuous attributes.
Given a training data set, N indicates the number of
the examples in the training set, C = {C1, C2,…,Cm}, is
the number of the classes that the examples in training
set are classified into a particular class Cj, A= {A1,
A2,…,An} is the number of attributes used to describe
each example in training set, and each attribute Ai
contains the following attribute values {Ai1, Ai2,…,Aih}.
For each continuous attribute Ai in training dataset, sort
the attribute values {Ai1, Ai2,…,Aih} in ascending order.
Then find the minimum (Min) and maximum (Max)
attribute value for each class value Cj in training dataset
and select the cut points in the attributes values based
on the Min and Max values of each class Cj. Next find
the information gain value and probability P(Cj|A)
value on each cut point, and select the best cut point
with maximum values of information gain and
probability. This cut point is selected as an interval
border. If the cut point does not satisfy maximum
information gain and probability value, then select the
next cut point where information gain value and
probability value are maximum at the same point. The
main procedure of proposed algorithm is described as
©2013 ACADEMY PUBLISHER
Algorithm I: Discretization of continuous attributes.
Input:
ƒ N, number of training examples.
ƒ Ai, continuous attributes.
ƒ Cj, class values in training dataset.
Output: Interval borders in Ai.
Procedure:
1. for each continuous attribute Ai in training dataset
do
2. Sort the values of continuous attribute Ai in
ascending order.
3. for each class Cj in training dataset do
4. Find the minimum (Min) attribute value of Ai for Cj.
5. Find the maximum (Max) attribute value of Ai for Cj.
6. endfor
7. Find the cut points in the continuous attributes
values based on the Min and Max values of each
class Cj.
8. Find the information gain on each cut point, and
select the cut point with maximum information gain
value.
9. Find the probability P(Cj|A) on each cut point
and select the cut point with maximum probability
value.
10. if both the cut point using the maximum
information gain value and the maximum
probability value is same then it can be taken as an
interval border else consider the next cut point,
where information gain value and probability value
are maximum at the same point.
11. endfor
If the attribute values of a continuous attribute are
same for all the classes in the training dataset that cannot
separate the class distributions, then these attributes are
redundant. So, we can remove the redundant attributes
from the training dataset.
B. Algorithm for Handling Contradictory Examples
Contradictory example is the example that appears
more than once in the training dataset with different class
labels. Contradictory examples confuse the learning
algorithms, so these examples should be avoided or
labeled correctly before learning. Redundant examples
are the multiple copies of the same example in the
training dataset. It does not create problem, if they do not
form contradictions, but redundancy can change the
decision trees produced by ID3 algorithm. For mining
complex data, it is better to remove redundancy from
training data by keeping only a unique example in the
training dataset. By doing so, it not only saves the space
of storage in dataset but also speeding significantly the
learning process. The main procedure of handling
contradictory example is described as follows:
Algorithm II: Handling Contradictory Examples.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
Input:
ƒ N, number of training examples.
ƒ Cj, class values in training dataset.
Output: Correctly labeled contradictory examples.
Procedure:
1. Search the multiple copies of same example in
training examples N, if found then keeps only one
unique example in training examples N.
2. Search the contradictory examples in training data;
whose attribute values are same, but class value is
different.
3. If found contradictory example in training data;
then find the similarity of attribute values of
contradictory example with each other examples in
training data.
4. Find the majority of the similarity of attributes
value among the examples in training data.
5. Replace the class value of contradictory example
by examples’ majority class value in training dataset
having similar attributes.
6. Step 2 to 5 will be repeated until all the
contradictory examples in training data are correctly
labeled.
IV. EXPERIMENTAL ANALYSIS
In this section, we discuss some of the benchmark
problems from UCI machine learning repository that
have been used for experimental analysis.
A. Benchmark Problems From UCI Repository
Data set is the set of data items, which is a very basic
concept of data mining or machine learning. A data set
is a collection of examples. Each example consists of a
number of attributes that is classified into a particular
class. Following describes the six benchmark datasets
that ware used for experimental analysis.
1) Soybean (Small) Database: A small subset of the
original soybean database, which contains 47
examples, 35 attributes, and 4 class values.
There is no missing attribute value in this dataset.
2) Breast Cancer Dataset: This dataset was obtained
from the University Medical Centre, Institute of
Oncology, Ljubljana, Yugoslavia that includes 201
instances of one class and 85 instances of another
class. The instances are described by 9 attributes.
3) Iris Plants Database: This is one of the best known
databases in the pattern recognition literature. The
data set contains 3 classes of 50 instances each,
where each class refers to a type of iris plant. One
class is linearly separable from the other 2 classes.
There are 4 attributes in this dataset.
4) Heart Disease Databases: This database contains 76
attributes, but all published experiments refer to
using a subset of 14 of them. The "goal" field refers
to the presence of heart disease in the patient. It is
integer valued from 0 (no presence) to 4.
5) Australian Credit Approval: This data set
concerns credit card applications, which contains
©2013 ACADEMY PUBLISHER
133
690 instances with 14 attributes. All attribute names
and values have been changed to meaningless
symbols to protect confidentiality of the data.
6) Ecoli Data Set: This data set contains 336 instances,
8 attributes, and 6 classes for classification.
B. Experimental Result
All the experiments have been done using the
above six datasets from UCI machine learning repository
[17] laid out in Table I. In all, the six datasets have 71
continuous attributes.
TABLE I
DATASETS AND THEIR SUMMARY
Dataset
Continuous Attributes Size Class Values
Soybean
35
47
4
Breast
9
286
2
Iris
4
150
3
Heart
10
270
2
Australian
6
690
2
Ecoli
7
336
6
Total = 71
We have discretized each continuous attribute of the
above shown datasets using proposed algorithm I, and
then removed the redundant attributes from training
datasets. After that using proposed algorithm II, we have
removed the redundant examples from training datasets,
and correctly labeled the class value of contradictory
examples in training datasets. Table II shows the
comparison of classification rates of each dataset after
discretization of continuous attributes using proposed
discretization approach, discretization by ID3 classifier,
and discretization by naïve Bayesian (NB) classifier. The
results show that proposed discretization method
improves the classification rates with compare to ID3
classifier and NB classifier.
TABLE II
Classification rate (%) for each dataset
Dataset
Proposed Approach ID3 NB
Soybean
100
93
90
Breast
95
79
75
Iris
89
85
83
Heart
77
63
59
Australian
83
80
77
Ecoli
93
87
91
Fig. 1, shows the comparison of accuracy of
information gain heuristic, Bayesian classifier, and
proposed algorithm for each data set.
Figure 1. Comparison of the accuracy of methods for each data set.
134
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
V. CONCLUSION
In this paper, we have introduced new algorithms for
dealing with continuous attributes and contradictory
examples in training dataset, which improved the
classification accuracy for different types of
classification problems. It also removed the redundant
attributes and examples from training dataset. The
proposed discretization algorithm used information gain
heuristic and naïve Bayesian classifier to improve the
quality of discretization. It found the possible cut points
in continuous attribute that can separate the class
distributions, and also considered the best cut point as
an interval border with maximum information gain value
and maximum probability value. The proposed
algorithm for handling contradictory examples in
training dataset replaces the class value of contradictory
example by examples’ majority class value in training
dataset having similar attributes. The proposed
algorithms have been tested on a number of benchmark
problems from UCI machine learning repository, and the
experimental results proved that it improved the quality
of discretization and classification rate. The future
works focus on applying the proposed methods in real
life problem domains of classification task of
supervised learning and clustering task of unsupervised
learning.
ACKNOWLEDGMENT
Support for this research received from Department of
Computer Science and Engineering, United International
University, Dhaka, Bangladesh.
REFERENCES
[1] H. Liu, F. Hussain, C. L. Tan, and M. Dash,
“Discretization: An enabling technique,” Data Mining and
Knowledge Discovery, Vol. 6, No. 4, 2002, pp. 393-423.
[2] A. Kusiak, “Feature transformation methods in data
mining,” IEEE Trans. On Electronics Packaging
Manufacturing, Vol. 24, No. 3, 2001, pp. 214-221.
[3] D. A. Zighed, S. Rabaseda, and R. Rakotomalala,
“Discretization methods in supervised learning,”
Encyclopedia of Computer Science and Technology, Vol.
40, 1998, pp. 35-50.
[4] J. Y. Ching, A. K. C. Wong, and K. C. C. Chan, “Classdependent discretization for inductive learning from
continuous a n d m i x e d m o d e d a t a ,” IEEE Trans.
On Pattern Analysis and Machine Intelligence, Vol. 17,
No. 7, 1995, pp. 641-651.
[5] M. R. Chmielewski, and J. W. GrzymalanBusse, “Global
discretization of continuous attributes as pre-processing for
machine learning,” In 3rdInternational Workshop on Rough
Sets and Soft Computing, 1994, pp. 294-301.
[6] U. M. Fayyed, and K. B. Irani, “On the handling of
continuous-valued attributes in decision tree generation,”
Machine Learning, Vol. 8, 1992, pp. 87-102.
[7] J. R. Quinlan, “Improved use of continuous attributes in
C4.5,” Journal of Artificial Intelligence Research, Vol. 4,
1996, pp. 77-90.
[8] I. Kononenko, “naïve Bayesian classifier and continuous
attributes,” Informatics 16, 1992, pp. 53-54.
©2013 ACADEMY PUBLISHER
[9] M. J. Pazzani, “An iterative improvement approach for the
discretization of numeric attributes in Bayesian classifiers,”
In Proc. of the 1st International Conference on Knowledge
Discovery and Data Mining, 1995, pp. 228-233.
[10] A. Tsymbal, S. Puuronen, and D. W. Patterson, “Ensemble
feature selection with the simple Bayesian classification,”
Information Fusion, vol. 4, no. 2, pp. 87–100, 2003.
[11] L. S. Oliveira, R. Sabourin, R. F. Bortolozzi, and C.
Y. Suen, “Feature selection using multi-objective genetic
algorithms for handwritten digit recognition,” In Proc.
of the 16th International Conference on Pattern
Recognition (ICPR 2002), Quebec: IEEE Computer
Society, pp. 568–571, 2002.
[12] C. G. Salcedo, S. Chen, D. Whitley, and S. Smith, “Fast
and accurate feature selection using hybrid genetic
strategies,” In Proc. of the Genetic and Evolutionary
Computation Conference, pp. 1–8, 1999.
[13] S. Chebrolu, A. Abraham, and J. P. Thomas, “Feature
deduction and ensemble design of intrusion detection
systems,” Computer & Security, vol. 24, no. 4, pp. 295–
307, 2004.
[14] A. H. Sung and S. Mukkamala, “Identifying important
features for intrusion detection using support
vector
machines and neural networks,” In Proc. of International
Symposium on Applications and the Internet (SAINT 2003),
pp. 209–217, 2003.
[15] P. V. W. Radtke, R. Sabourin, and T. Wong, “Intelligent
feature extraction for ensemble of classifiers,” In Proc.
of the 8th International Conference on Document
Analysis and Recognition (ICDAR 2005), Seoul: IEEE
Computer Society, pp. 866–870, 2005.
[16] W. K. Lee and S. J. Stolfo, “A framework for
constructing features and models for intrusion detection
systems,” ACM Transactions on Information and System
Security, vol. 3, no. 4, pp. 227–261, 2000.
[17] Frank, A. & Asuncion, A. (2010). UCI Machine Learning
Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and
Computer Science.
[18] U. M. Fayyad, and K. B. Irani, “Multi-internal
discretization of continuous-valued attributes for
classification learning,” In Proc. Of the 13th International
Joint Conf. on Artificial Intelligence, Morgan Kaufmann,
San Mateo, CA, 1993, pp. 1022-1027.
[19] T. Van de Merckt, “Decision trees in numerical attribute
spaces,” In Proc. Of the 13th International Joint Conf. on
Artificial Intelligence, Morgan Kaufmann, San Mateo, CA,
1993, pp. 1013-1021.
[20] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised
and unsupervised discretization of continuous features,”
In Proc. Of the 12th International Conference on Machine
Learning, 1995, pp. 194-202.
[21] Bernhard Pfahringer, “Compression-based discretization
of continuous attributes,” ICML 1995, 1995, pp. 456-463.
[22] P. Perner, and S. Trautzsh, “Multi-interval discretization
methods for decision tree learning,” In Proc. of
Advances in Pattern Recognition,
Joint
IAPR
International Workshops SSPRS98, 1998, pp. 475-482.
[23] S. D. Bay, “Multivariate discretization of continuous
th
variables for set mining,” In Proc. of t he 6 ACM
SIGKDD International Conf. on Knowledge Discovery
and Data Mining, 2000, pp. 315-319.
[24] C. N. Hus, H. J. Huang, and T. T. Wong, “Why
discretization work for naïve Bayesian classifier,” In
Proc. Of the 17th International Conference on Machine
Learning, 2000, pp. 309-406.
[25] H. Ishibuchi, T. Yamamoto, and T. Nakashima,
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 4, NO. 3, AUGUST 2013
[26]
[27]
[28]
[29]
[30]
[31]
135
“Fuzzy data mining: Effect of fuzzy discretization,” In
the 2001 IEEE International Conference on Data Mining,
2001.
M. Boulle, “A Bayes optimal approach for partitioning
the values of categorical attributes,” Journal of Machine
Learning Research, Vol. 6, 2005, pp. 1431-1452.
J. R. Quinlan, “Induction of Decision Tree,” Machine
Learning, Vol. 1, pp. 81-106, 1986.
P. Langley, “Induction of recursive Bayesian classifier,”
In Proc. of the European Conference on Machine
Learning, 1993, pp. 153-164.
D. M. Farid, “Improve the Quality of Supervised
Discretization of Continuous Valued Attributes in Data
Mining,” In Proc. of the 14th International Conference on
Computer and Information Technology (ICCIT 2011), 2224 December 2011, Dhaka, Bangladesh, pp. 61-64, and
IEEE Xplore Digital Archive.
A. Biswas, D. M. Farid, and C. M. Rahman, “A New
Decision Tree Learning Approach for Novel Class
Detection in Concept Drifting Data
Stream
Classification,” Journal of Computer Science and
Engineering (JCSE-UK), Vol. 14, Issues 1, July 2012, pp.
1-8.
D. M. Farid, and C. M. Rahman, “Novel Class Detection
in Concept-Drifting Data Stream Mining Employing
Decision Tree,” 7th International Conference on
Electrical and Computer Engineering (ICECE 2012), 2022 December 2012, Dhaka, Bangladesh, pp. 630-633, and
IEEE Xplore Digital Archive.
Dr. Dewan Md. Farid is a post
doctoral fellow in the Department of
Computer
Science
and
Digital
Technology, Northumbria University,
Newcastle upon Tyne, UK. He was an
Assistant Professor in the Department
of Computer Science and Engineering,
United
International
University,
Bangladesh. He received B.Sc. in
Computer Science and Engineering from Asian University of
Bangladesh in 2003, M.Sc. in Computer Science and
Engineering from United International University, Bangladesh
in 2004, and Ph.D. in Computer Science and Engineering from
Jahangirnagar University, Bangladesh in 2012. Dr. Farid has
published 1 book chapter, 10 international journals and 12
international conferences in the field of data mining, machine
learning, and intrusion detection. He has participated and
presented his papers in international conferences at Malaysia,
Portugal, Italy, and France. He is a member of IEEE and IEEE
Computer Society. He worked as a visiting researcher at ERIC
Laboratory, University Lumière Lyon 2 – France from 01-092009 to 30-06-2010. He received Senior Fellowship I & II
awarded by National Science & Information and
Communication Technology (NSICT), Ministry of Science &
Information and Communication Technology, Government of
Bangladesh, in 2008 and 2011 respectively.
©2013 ACADEMY PUBLISHER
Professor Dr. Chowdhury Mofizur
Rahman had his B.Sc. (EEE) and
M.Sc.
(CSE)
from
Bangladesh
University
of
Engineering
and
Technology (BUET) in 1989 and 1992
respectively. He earned his Ph.D. from
Tokyo Institute of Technology in 1996
under the auspices of Japanese
Government
scholarship.
Prof
Chowdhury is presently working as the Pro Vice Chancellor
and acting treasurer of United International University (UIU),
Dhaka, Bangladesh. He is also one of the founder trustees of
UIU. Before joining UIU he worked as the head of Computer
Science & Engineering department of Bangladesh University of
Engineering & Technology which is the number one technical
public university in Bangladesh. His research area covers Data
Mining, Machine Learning, AI and Pattern Recognition. He is
active in research activities and published around 100 technical
papers in international journals and conferences. He was the
Editor of IEB journal and worked as the moderator of NCC
accredited centers in Bangladesh. He worked as the
organizing chair and program committee member of a number
of international conferences held in Bangladesh and abroad.
At present he is acting as the coordinator from Bangladesh for
EU sponsored cLINK project. Prof Chowdhury has been
working as the external expert member for Computer Science
departments of a number of renowned public and private
universities in Bangladesh. He is actively contributing towards
the national goal of converting the country towards Digital
Bangladesh.