Download PDF [FULL TEXT]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metastability in the brain wikipedia , lookup

Development of the nervous system wikipedia , lookup

Neural engineering wikipedia , lookup

Catastrophic interference wikipedia , lookup

Nervous system network models wikipedia , lookup

Artificial neural network wikipedia , lookup

Neuroeconomics wikipedia , lookup

Time series wikipedia , lookup

Process tracing wikipedia , lookup

Convolutional neural network wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
International Journal of Research in Computer and ISSN (Online) 2278- 5841
Communication Technology, Vol 3, Issue 4, April- 2014 ISSN (Print) 2320- 5156
Classification Methods in Data Mining: A Detailed Survey
Mariammal.D1 ,Jayanthi.S2 and Dr.P.S.K.Patra
1
Department of CSE, Agni college of technology,
Thalambur,Chennai,Tamilnadu, India
[email protected]
2
Asst.prof,Department of CSE,Agni college of technology,
Thalambur,Chennai,Tamilnadu, India
[email protected]
3
The HOD,Department of CSE,Agni college of technology,
Thalambur,Chennai,Tamilnadu, India
[email protected]
Abstract
Classification is a data mining (machine learning) technique and
a model finding process that is used for assigning the data into
different classes according to specific constrains. In other words
we can say that classification is used to predict group
membership for data instances. There are several major kinds of
classification algorithms including Genetic algorithm C4.5,
Naïve Bayes, SVM, KNN, decision tree, Neural Network and
CART. The goal of this survey is to provide a detailed review of
two classification techniques and its applications in various
emerging fields. This paper also presents the comparison of these
techniques that are being widely used.
classifying newly available data. Thus it can be outlined as
an predictable part of data mining and is gaining more
popularity. In the present paper a detailed study on three
classification techniques and its applications have been
made. Section II describes decision tree, and section IV
deals with neural network. Finally last section concludes
the paper.
Index Terms— Classification,Neural network, Decision Tree
1. Introduction
Data mining is the process of extracting interesting nontrivial, implicit and previously unknown patterns from
huge volume of information repositories such as: relational
database, data warehouses, transactional database, etc.
Also data mining is known as one of the central part of
Knowledge Discovery in Database (KDD). So data mining
is very useful in collecting and managing the huge volume
of data. Not only collecting and managing of data, data
mining(DM) also includes analysis and prediction.
Classification techniques in data mining are capable of
processing a large amount of data so that large amount of
data can be involved in processing.. It can predict
categorical class labels and classifies data based on
training set and class labels and hence can be used for
www.ijrcct.org
Page 503
504
II. DECISION TREE
Decision tree is a analytical model which can be used to
represent both classifiers and regression models.On the
other hand, decision trees refer to a hierarchical model of
decisions and their consequences. The decision maker
employs decision trees to identify the strategy most likely
to reach their goal. When a decision tree is used for
classification, it is referred to as a classification tree.
When it is used for regression, it is referred as a regression
tree. In this paper we concentrate mainly on classification.
Classification trees are mainly used to classify an object or
an instance to a predefined set of classes based on their
attributes. The classification tree is useful as an
investigative technique. However it does not attempt to
replace existing traditional statistical methods and there are
many other techniques that can be used classify or predict
the membership of instances to a predefined set of classes.
A decision tree can be also used to analyze the payment
nature of customers who received a credit.
incoming edges. All other nodes have precisely one
incoming edge. A node with outgoing edges is called
“internal” or “test” node. All other nodes are referred as
“leaves” (also known as decision nodes). Each internal
nodes in a decision tree splits the instance space into two
or more sub-spaces according to a certain discrete function
of the input attribute values Each leaf is assigned to one
class representing the most appropriate target value. On the
other hand, the leaf may hold a probability vector (affinity
vector) indicating the probability of the target attribute
having a certain value. Internal nodes are represented as
circles, whereas leaves are denoted as triangles. Two or
more branches may grow from each internal node (i.e. not
a leaf). Each node corresponds with a certain characteristic
and the branches correspond with a specific range of
values. These ranges must give a partition of the set of
values of the given characteristic. Instances are classified
by navigating them from the root of the tree down to a leaf,
corresponding to the outcome of the tests along the path.
Decision tree incorporates both nominal and numeric
attributes.
2.2 Constructing decision trees
Most algorithms that have been developed for learning
decision trees are variations on a core algorithm that
employs a top-down, greedy search through the space of
possible decision trees. Decision tree programs construct a
decision tree T from a set of training cases.
Fig2: Decision tree for providing loan
The use of a decision tree is a most popular technique in
data mining. Many researchers are considering decision
trees are popular due to their simplicity and transparency.
Decision trees are easy to understand so it does not require
any domain knowledge. Decision trees are generally
represented graphically as hierarchical structures to
making them easier to interpret.
2.1 Characteristics of Classification Trees:
A decision tree is a classifier expressed as a recursive
partition of the instance space. The decision tree is a
directed tree with a node called a “root” which has no
www.ijrcct.org
function ID3
Input: (R: a set of non-target attributes,
C: the target attribute,
S: a training set) returns a decision tree;
Begin
If S is empty, return a single node with value Failure;
If S consists records all with the same Value
for the target attribute,
return a single leaf node with that value;
If R is empty, then return a single node with the value of
the most frequent of the values of the target attribute that
are found in records of S; [in that case there may be be
errors, examples that will be improperly classified];
Let A be the attribute with largest Gain(A,S) among
attributes in R;
Let {aj| j=1,2, .., m} be the values of attribute A;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value aj for A;
Return a tree with root labeled as A and arcs labeled a1,
a2, .., am going respectively to the trees (ID3(R-{A}, C,
S1), ID3(R-{A}, C, S2),ID3(R-{A}, C, Sm);
Recursively apply ID3 to subsets {Sj| j=1,2, .., m}
until they are empty
end
Page 504
505
Figure 2: ID3 Decision Tree Algorithm
ID3 searches through the attributes of the training
instances and extracts the attribute that best separates the
given examples. If the attribute perfectly classifies the
training sets then ID3 stops; otherwise it recursively
operates on the m (where m = number of possible values of
an attribute) partitioned subsets to get their best attribute to
classify. The algorithm uses a greedy search, that is, it
picks the best attribute and never looks back to reconsider
earlier choices. Note that ID3 may misclassify data.The
central focus of the decision tree growing algorithm is
selecting which attribute to test at each node in the tree.
2.3 Advantages of Decision Tree Approach:
The advantages of decision tree classifier over traditional
statistical classifier include its simplicity, ability to handle
missing and noisy data, and non-parametric nature.
Decision trees are not constrained by any lack of
knowledge of the class distributions. It can be trained
quickly, takes less computational time.
2.4 Applications of Decision Tree:
This section shows some recent successes in applying
decision tree learning
to solve real-world problems.
1. Predicting Library Book Use:
Decision trees are developed that predict the future use of
books in a library. Forecasting book usage helps librarians
to select low-usage titles and move them to relatively
distant and less expensive off-site locations that use
efficient compact storage techniques. For this, it is
important to adopt a book choice strategy that minimizes
the expected frequency of requesting removed titles. For
any choice policy, this frequency depends, on the
percentage of titles that have to be removed for off-site
storage ;the higher percentage is, the higher this frequency
is expected to be.
2. Exploring the Relationship Between the Research
Octane Number and Molecular Substructures
Figuring out what molecule information one needs to
predict the research octane number (RON) is a nontrivial
problem of particular interest to chemists. In the work of
Blurock , substructure presence absence information is
used for RON prediction, not only because this is believed
to give good prediction results, but also because asking
directly about the presence or absence of substructures in
molecules is easily interpretable by chemists, and so,
valuable intuitive information can be gained by studying
the substructure–RON relationship. In addition to
www.ijrcct.org
demonstrating the predictive power of the learned decision
trees, analyzing these trees was useful in providing insight
about the significance of different substructures for RON
prediction. These findings are viewed as a contribution to
better understanding of the underlying principles that
determine RON of molecules.
3. Characterization of Leiomyomatous Tumors
The goal was to generate hypotheses about tumor
diagnosis/prognosis problems when confronted with a
large number of features. For a given tumor, it is desired to
know to which group this tumor belongs and
why.Traditionally, tumor characterization is made on the
basis of features that are difficult for a pathologist to
evaluate. The job is, thus, carried out subjectively, and the
quality of the results is determined by the pathologist’s
experience with the group of tumors concerned. To
accomplish a higher level of objectivity, many more
quantitative measurements (related to DNA content,
morphonuclear characteristics, and immunohistochemical
specificities) need to be considered. Furthermore, useful
information can result from interactions between several of
these features that cannot be detected using traditional
univariate statistical analysis. In the work of Decaestecker
et al. , decision tree learning was applied to the difficult
problem of leiomyomatous (or soft muscle) tumor
diagnosis. Furthermore, the authors note that the decision
tree approach is more suitable for this task because it led to
explicit logical rules that can be interpreted by human
experts, which meet the exploratory nature of their job.
4. Star/Cosmic-Ray Classification in Hubble
Space Telescope Images
Salzberg et al. applied decision tree learning to the task of
distinguishing between stars and cosmic rays in images
collected by the Hubble Space Telescope. In addition to
high accuracy, a classifier for this task must be fast due to
the large number of classifications and to the need for
online classification. In their experiments, a set of 2211 pre
classified images was used as a training sample for
decision tree construction, and a part of 2282 preclassified
images was used to measure the generalization
performance of the learned decision tree. Each of these
images was described using 20 numerical features and
labeled as either a star or a cosmic ray. The reported
experiments show that quite compact decision trees (no
more than 9 nodes) achieve generalization accuracy of
over 95%. Moreover, the experiments suggest that this
accuracy will get even higher when methods for
eliminating background noise are employed.
III. NEURAL NETWORK
An Artificial Neural Network (ANN) is an information
processing paradigm that is inspired by the way biological
nervous systems(brain), process information. The key
Page 505
506
element of the ANN paradigm is the novel structure of the
information processing system. It is self-possessed of a
large number of highly interconnected processing elements
(neurons) working in unison to solve specific problems
[1].Neural network is configured for a specific application
including pattern recognition or data classification, through
a learning process. Learning process in biological systems
involves adjustments to the synaptic connections that exist
between the neurons. A neural network is a powerful data
modeling tool that is able to capture and represent complex
input/output relationships. The enthusiasm for the
development of neural network technology stemmed from
the desire to develop an artificial system that could
perform "intelligent" tasks similar to those performed by
the human brain.ANN resemble the human brain in the
following two ways: they acquire knowledge through
learning process, and the knowledge is stored within interneuron connection strengths known as synaptic weights
[1,2]. The true power and advantage of neural networks
lies in their ability to represent both linear and non-linear
relationships and in their ability to learn these relationships
directly from the data being modeled.
1. Network structures: An ANN may have either a
recurrent or non recurrent structure. A recurrent network
[4, 5] is a feedback network in which the network
calculates its outputs based on the inputs and feeds them
back to modify the inputs. For a stable recurrent network,
this process normally produces smaller and smaller output
changes until the output become constant.
2. Parallel processing ability: Each neuron in the ANN is
a processing element similar to a Boolean logical unit in a
conventional computer chip, except that a neuron’s
function is programmable. Computations required to
simulate ANNs are mainly matrix ones, and the parallel
structure of the interconnection between neurons facilitates
such calculations.
3. Distributed memory: The network does not store
information in a central memory. Information is stored as
patterns throughout the network structure. The state of
neurons represents a short-term memory as it may change
with the next input vector. The values in the weight matrix
form a long-term memory and are changeable only on a
longer time basis.
4. Fault tolerance ability: The network’s parallel
processing ability and distributed memory make it
relatively fault tolerant. In a neural computer, the failure of
one or more parts may degrade the accuracy but it does not
break the system. A system failure occurs only when all
parts fail at the same time. This provides a measure of
damage control.
3.2 Neural network algorithm:
Fig 3: Diagram of 4-layer neural network with two hidden layers
Conventional linear models are simply inadequate when it
comes to modeling data that contains non-linear
characteristics.The most common neural network model is
known as a supervised network because it requires a
desired output in order to learn. The objective of this
network type is to create a model that maps the input to the
output using historical data so that the model can then be
used to produce the output when the desired output is
unknown. A graphical representation of a Multi-Layer
Perceptron (MLP) [1, 3] is shown in Figure 3.
3.1 Characteristics of neural networks:
There are six main characteristics of ANN technology: the
network structures, the parallel processing ability, the
distributed memory, and the fault tolerance ability.
www.ijrcct.org
Step1:Inputtrainingvector
Step2:Hidden nodes are calculate their outputs.
Step3:Output nodes are calculate their outputs on the
basisofStep2.
Step 4: Calculate the differences between the results of the
previous
step
and
targets.
Step 5: Apply the first half of the training rule using the
results
of
Step
4.
Step6: For all hidden node, n, calculate d(n).
Step7: Apply the second part of the training rule using the
results of Step 6.
Steps 1 through 3 are often called the forward process, and
steps 4 through 7 are often called the backward process.
Therefore, the name: back-propagation.
Table1:Comparison of Classification methods
3.3 Advantage of neural network:
Page 506
507
• Neural network classifiers, without any priori
assumptions about data distributions, are able to learn
discontinuous patterns in the distribution of classes.
• Neural networks can readily accommodate auxiliary data
such as textural information, slope, and elevation.
• Neural networks are quite flexible and can be adapted to
improve performance for particular problems.
3.4 Applications of neural network:
Application areas include system identification and control
including vehicle control, process control, natural
resources management, quantum chemistry, game-playing
and decision making (backgammon, chess, poker), pattern
recognition in the system like radar systems, face
identification, object recognition and more, sequence
recognition
in gesture, speech, handwritten text
recognition, medical diagnosis, financial applications in
automated trading systems, data mining, visualization and
e-mail spam filtering. ANN have also been used to
diagnose several cancers. An Artificial Neural Network
based hybrid lung cancer detection system (HLND)
improves the accuracy of diagnosis and the speed of lung
cancer radiology. It also been used to diagnose prostate
cancer. The diagnoses process can be used to make
specific models taken from a large group of patients
compared to information of one given patient. This models
do not depend on assumptions about correlations of
different variables. Another type of cancer called
Colorectal cancer has also been predicted using the neural
networks.
IV Comparison of Classification Methods:
Table I gives a comparison on various parameters of these
classification techniques.
Classifi
cation
Method
Decisio
n Tree
Generatie
Or
Discriminatve
Discriminative
Loss
Function
Bayesia
n
Networ
k
Neural
Networ
k
Generative
logP(X,Y
)
Variable
Elimination
Discriminative
Sumsquared
error
Forward
Propagation
Zero-one
loss
Classification methods in data mining are typically strong
in modeling interactions This paper covers two
classification techniques widely used in data mining. These
two technique has got its own pros and cons as given in
this paper. Decision trees and Neural Network (NN)
generally have different operational profiles, when one is
very accurate the other is not and vice versa. On the
contrary, decision trees and rule classifiers have a similar
operational profile. The goal of classification result
integration algorithms is to generate more certain, precise
and accurate system results.
References
[1] José C. Principe, Neil R. Euliano, Curt W. Lefebvre “Neural
and Adaptive Systems: Fundamentals Through Simulations”,
ISBN 0-471-35167-9
[2]
NeuroIntelligence-Alyuda
Research,
http://www.alyuda.com/neural-network-software.htm
[3] NeuroDimension Inc web site. - Neural Network Software,
http://www.nd.com/
[4] Hopfield, J.J. “Neural Networks and Physical Systems with
Emergent Collective Computational Abilities”, Proceedings of
the Nationul Academy of Science, Vol. 79.
[511982. pp. 255442558. Hopfield, J.J. “Neurons with Graded
Response Have Collective Computational Properties Like Those
of Two State Neurons”, Proceedings qf the National Academy of
Science, Vol. 81, 1984, pp. 30X8-3092.
[6] Dong Xiao Ni” Application of Neural Networks to Character
Recognition” Proceedings of Students/Faculty Research Day,
CSIS, Pace University, May 4th, 2007
[7] Eldon Y. Li,” Artificial neural networks and their business
applications” Information & Management 27 (1994) 303-313
[8] Thair Nu Phyu,” Survey of Classification Techniques in Data
Mining”Proceedings of the International MultiConference of
Engineers and Computer Scientists 2009 Vol I IMECS 2009,
March 18 - 20, 2009, Hong Kong
[9] Ms. Aparna Raj, Mrs. Bincy , Mrs. T.Mathu,” Survey on
Common Data Mining Classification Techniques” International
Journal of Wisdom Based Computing, Vol. 2(1), April 2012
Parameter
estimation
algorithm
C4.5
Conclusion
www.ijrcct.org
Page 507