Download Generating Better Radial Basis Function Network for Large

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Machine learning wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Pattern recognition wikipedia , lookup

Convolutional neural network wikipedia , lookup

Time series wikipedia , lookup

Transcript
International Journal of Software Engineering and Its Applications
Vol. 4, No. 2, April 2010
2010
Generating Better Radial Basis Function Network for Large Data Set
of Census
Hyontai Sug
Division of Computer & Information Engineering,
Dongseo University, Busan, 617-716, Korea
[email protected]
Abstract
Radial basis function networks are known to have good performance compared to other
artificial neural networks like multilayer perceptrons. Because the size of target data sets in
data mining is very large and artificial neural networks including radial basis function
networks require intensive computing, sampling is needed. So, because the sample size should
be relatively small due to computational load to train radial basis function networks, simple
random sampling for small size might not generate perfect and balanced samples. This paper
suggests a better sampling technique based on branching information of decision tree for
radial basis function networks when target data set is very large like census data.
Experiments with census income data in UCI machine learning repository shows a promising
result.
Keywords: Data mining, census, radial basis function networks.
1. Introduction
Artificial neural networks have been developed eagerly as a subfield of machine learning
for decades, and many good results have been reported [1]. There are many kinds of artificial
neural networks that have been reported very successful [2, 3, 4, 5, 6]. Among them radial
basis function networks (RBFNs) are reported to have better performance than other neural
networks for some domain [7, 8]. Traditionally researchers of machine learning do not treat
data sets in big size, so is true for artificial neural networks. On the other hand, the main
concern of researchers in data mining is to find some hidden important knowledge from very
large data sets. So, most data mining systems based on artificial neural networks rely on some
sampling [9].
In data mining field, neural networks are used mostly for prediction tasks. So, neural
networks with the smallest error rates for a given data set have been a major concern for
their success. But even though neural networks are one of the most successful data
mining and machine learning methodologies, there are some points of improvement
with respect to performance due to the fact that they are built based on greedy
algorithms and mostly by the knowledge of experts. Some known weak points of
RBFNs are weak performance with irrelevant features and error data. On the other hand,
real world data often contain these characteristics. So we might have some difficulty in
applying RBFNs for data mining.
Moreover, because most target databases for data mining are very large, we might
need random sampling process to the target databases to train the neural networks. But
random sampling might not generate perfect samples that are well balanced with respect
15
International Journal of Software Engineering and Its Applications
Applications
Vol. 4, No. 2, April 2010
2010
to the whole population, so the found knowledge models based on random samples are
exposed to possible sampling errors. An alternative strategy may be to use the original
database. But, it may not be a good idea, since it may be computationally very
expensive and generated neural networks may not be good enough, so that they have
almost no improvement in error rates. In this paper we want to find a better way of
sampling for large data sets like census when our target artificial neural network is
RBFN. This is an extended version of the paper for ICHIT’09 [10].
In section 2, we provide the related work to our research, and in sections 3 we
present our method. Experiments were run to see the effect of the method in section 4.
Finally section 5 provides some conclusions.
2. Related work
Artificial neural networks have been very successful in the field of machine learning
after a pioneering book ‘Parallel Distributed Processing’ [11]. There are two kinds of
neural networks based on how the networks are interconnected – feed-forward neural
networks and recurrent neural networks [12]. RBFNs are one of the most popular feedforward networks. Even though RBFNs have three layers including the input layer like
multilayer perceptrons, they differ from them, because in RBFNs the hidden units
perform some computation. Many researchers have reported successful application of
RBFNs [13, 14, 15].
On the other hand, decision tree algorithms are also one of the representative data
mining methods. There have been a lot of efforts to build better decision trees with
respect to error rates. As a way to achieve this goal, many splitting criteria have been
invented. For example, one of the representative decision tree algorithms, C4.5 [16]
uses entropy-based measure, while CART [17] uses purity-based measure. C4.5
generates decision trees in quick and dirty manner, while CART spends more time to
generate more optimized decision trees. There have been also scalability related efforts
to generate decision trees for large databases with intention to apply for data mining.
For example, SLIQ [18], SPRINT [19], and PUBLIC [20] are some scalable decision
tree algorithms. SLIQ saves computing time especially when a data set contains lots of
continuous attributes by using a pre-sorting technique in tree-growth phase. SPRINT is
an improved version of SLIQ to solve the scalability problem by building trees in
parallel. PUBLIC tries to save some computing time by integrating the steps of pruning
and generating branches.
There is also research on sample size [21, 22], the property of samples [23] as well as
sampling method [24]. In paper [21] the effect of sample size is discussed for parameter
estimates in a family of functions for classifiers. In paper [22] small-sized samples are
preferred for feature selection and error estimation for several classifiers. In paper [23]
the authors showed that class imbalance in training data has effects in neural network
development especially for medical domain. In [24] several re-sampling techniques like
cross-validation, the leave-one-out, etc. are tested to see the effect of the sampling
techniques in the performance of neural networks, and discovered that the re-sampling
techniques have very different accuracies depending on feature space and sample size.
16
International Journal of Software Engineering and Its Applications
Vol. 4, No. 2, April 2010
2010
3. Suggested method
Because our target data set is relatively very large, and we don’t want to spend too much
time for sampling, so the method first builds a decision tree using some fast decision tree
generation algorithms like C4.5. Then, we do random sampling based on the structure
of the generated decision tree. We do random sampling for each branch of the decision
tree, and the size of sample is dependent on the number of training objects in the branch
of the decision tree. In other words, in order to find reasonable sample sizes we use the
information of the number of training objects in the terminal nodes of the generated
decision tree.
The number of training examples in the terminal nodes of the decision tree could be
small, so in order to get big enough data sets for random sampling, we integrate sibling
terminal nodes together until the number of training objects reaches some predefined
limit. The predefined limit should not only be large enough to do the random sampling,
but also should be small enough to divide the training objects into proper sizes. The
following is the steps of the method
---------------------------------------------------------------------------------------------------------------------------1. Generate a decision tree with a fast algorithm for the whole data set;
2. j := 0;
3. Do
Do
Integrate sibling terminal nodes of the decision tree in bottom-up and left-t-right manner;
While the number of training objects in the integrated node < predefined limit;
j++;
Let the training objects be Dj;
Until there is no node to visit;
the_number_of_sampling_groups := j;
4. For i := the_number_of_sampling_groups Do
Do random sampling of size k for each Di where k = (target sample size) × |Di| / | ∑ Di |;
Let the sample set be Si;
End do
5. Integrate all the random samples Si (i=1~ the_number_of_sampling_groups);
----------------------------------------------------------------------------------------------------------------------------
The integrated final samples will be used to train RBFNs.
17
International Journal of Software Engineering and Its Applications
Applications
Vol. 4, No. 2, April 2010
2010
4. Experiments
'census-income' in UCI machine learning repository [25] is a census data set. It has
199,523 objects for training, and 99,762 objects for testing. There are 41 attributes and
8 of them are continuous-valued attributes. Class probabilities for class values -50000
and 50000+ are 93.8% and 6.2% respectively. C4.5 [16] was used to generate a decision
tree for the whole training data. Because the data set has continuous attributes, the
entropy-based discretization [26] was performed before generating the tree to generate
the decision tree rapidly. The generated tree has 1,821 nodes, and the total number of
leaves is 1,661. After having generated the decision tree, our sampling method has been
applied. The given criterion of the predefined limit of the number of training examples
for node integration is 30,000 for the experiment. Table 1 shows the groups divided by
the suggested method. In the table X = '(x 1, x 2]'means x 1<X≤ x2.
Table 1. Groups of objects
Group no.
Property of objects
The number of
objects
1
Capital_gains = '(-∞, 57]' & dividends_from_stocks =
'(-∞, 0.5]' & weeks_worked_in_year = '(-∞, 0.5]'
89,427
2
Capital_gains = '(-∞, 57]' & dividends_from_stocks =
'(-∞, 0.5]' & weeks_worked_in_year = '(0.5, 51.5]'
28,617
3
Capital_gains = '(-∞, 57]' & dividends_from_stocks =
'(-∞, 0.5]' & weeks_worked_in_year = '(50.5, ∞]' &
capital_loses = '(-∞, 1,881.5]' & sex = Female
25,510
4
Capital_gains = '(-∞, 57]' & dividends_from_stocks =
'(-∞, 0.5]' & weeks_worked_in_year = '(50.5, ∞]' &
capital_loses = '(-∞, 1,881.5]' & sex = Male
28,410
5
Capital_gains = '(-∞, 57]' & dividends_from_stocks =
'(-∞, 0.5]' & weeks_worked_in_year = '(50.5, ∞]' &
capital_loses = '(1,881.5, -∞]'
1,132
6
Capital_gains = '(-∞, 57]' & dividends_from_stocks =
'(0.5, ∞]'
19,048
7
Capital_gains = '(57, ∞]'
7,379
Table 2~5 summarizes the results of accuracy of RBFNs with conventional and suggested
sampling method for several sample sizes. Sampling is performed four times for each sample
size of 1,000, 2,000, 5,000, and 10,000. The test has been done with the test data. The used
radial basis function is Gaussian.
18
International Journal of Software Engineering and Its Applications
Vol. 4, No. 2, April 2010
2010
Table 2. Accuracy of RBFNs for conventional and suggested method with sample size 1,000
Sample size: 1,000
Average
Accuracy in conventional
method (%)
Accuracy in suggested
method (%)
93.5857
93.5958
94.1511
93.4865
93.1006
93.9035
93.7802
93.5396
93.6544
93.6321
Table 3. Accuracy of RBFNs for conventional and suggested method with sample size 2,000
Sample size: 2,000
Average
Accuracy in conventional
method (%)
Accuracy in suggested
method (%)
93.5476
93.4614
93.6439
93.5948
94.6503
93.7992
93.3412
93.6970
93.79573
93.6336
Table 4. Accuracy of RBFNs for conventional and suggested method with sample size 5,000
Sample size: 5,000
Average
Accuracy in conventional
method (%)
Accuracy in suggested
method (%)
93.6870
94.0208
94.1491
93.7762
93.6168
94.1621
93.8534
93.5166
93.82658
93.86893
Table 5. Accuracy of RBFNs for conventional and suggested method with sample size 10,000
Sample size: 10,000
Average
Accuracy in conventional
method (%)
Accuracy in suggested
method (%)
93.5707
93.9666
93.7311
94.0739
93.7992
93.9015
93.9596
93.9356
93.76515
93.9694
19
International Journal of Software Engineering and Its Applications
Applications
Vol. 4, No. 2, April 2010
2010
If we compare accuracies of RBFNs for each sample size in the tables, the
performance of our sampling method is poorer than conventional sampling method for
smaller sample sizes like 1,000, and 2,000. But, the trend is reversed for larger sample
sizes like 5,000 and 10,000. In other words the conventional sampling method has the
property of some logarithmic increase. But the suggested method generates RBFNs of
accuracy in some monotonic increase. All in all, while conventional sampling method
cannot improve the performance of RBFNs much as the sample size grows, our
sampling method improves the performance much as the sample size grows.
5. Conclusions
There are many kinds of artificial neural networks that have been reported very successful.
Among them radial basis function networks (RBFNs) are reported to have better performance
than other neural networks for some domain. Traditionally researchers of machine learning do
not treat data sets in big size, so is true for artificial neural networks. On the other hand, the
main concern of researchers in data mining is to find some hidden important knowledge from
very large data sets. So, most data mining systems based on artificial neural networks rely on
some sampling.
Because most target databases for data mining are very large, we need random
sampling process to the target databases to train the neural networks. But conventional
random sampling might not generate perfect samples that are good for RBFNs, because
some known weak points of RBFNs are weak performance with irrelevant features and
error data. But real world data often contain these characteristics. So we might need
some better way of sampling for RBFNs to be applied for data mining. This paper
showed a better way of sampling for large data sets like census when our target
artificial neural network is RBFN.
In order to train RBFNs for a target data set of very large size, in this paper a
sampling method that considers the upper structure of decision tree is suggested. By
dividing the target data set into several groups of different sizes based on where an
object belongs to the decision tree and picking up different number of objects randomly
from each group based on the size of group, the method exploit the structure of decision
trees. For experiment census income data set in UCI machine learning repository were
selected. Experiments showed that when sample size is relatively small, conventional
sampling method is better, but when sample size is relatively large, suggested sampling
method showed better result.
References
[1] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University press, 1995.
[2] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning Internal Representation by Error Propagation, In
Parallel Distributed Processing, vol. 1, Rumelhart, D.E., McClelland, J.L. Eds., The MIT Press, 1986.
[3] J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities,
In Proceedings of National Academic Science, vol. 79, pp. 2554-2588, 1982.
[4] G.A. Carpenter, S. Grossberg, ART-3: Hierarchical Searching Using Chemical Transmitters in SelfOrganizing Pattern Recognition Architectures, Neural Networks, vol. 3, pp. 129-152, 1990.
[5] G.E. Hinton, T.J. Sejnowski, D.H. Ackley, Boltzmachines: Constraint satisfaction networks that learn,
Carnegie-Mellon University, Technical Report CMU-CS-84-119, 1984.
[6] K. Fukushima, S. Miyake, T. Ito, Neocognitron: A neural network model for a mechanism of visual
pattern recognition, IEEE Transactions on Systems, Man and Cybernetics, vol. 3, no. 5, pp. 826-834,
1983.
20
International Journal of Software Engineering and Its Applications
Vol. 4, No. 2, April 2010
2010
[7] L. Nikolaos, Radial basis Function Networks to Hybrid Neuro-Genetic RBFNs in Financial Evaluation
of Corporations, International Journal of Computers, vol. 2, issue 2, pp. 176-183, 2008.
[8] A. Hofmann, B. Sick, Evolutionary Optimization of Radial Basis Function Networks for Intrusion
Detection, Proceedings of the International Joint Conference on Neural Networks, Vol. 1, pp. 415420, 2003.
[9] P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining. Addison Wesley, 2006.
[10] H. Sug, Sampling Scheme for Better RBF network, Proceeding of International Conference on
Convergence & Hybrid Information Technology, pp. 413-416, 2009.
[11] D.E. Rumelhart, J.L. McClelland, eds., Parallel Distributed Processing, The MIT Press, vol. 1, 1984.
[12] P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.
[13] G. Baylor, E.I. Konukseven, A.B. Koku, Control of a Differentially Driven Mobile Robot Using
Radial Basis Function Based Neural Networks, WSEAS Transcations on Systems and Control, vol. 3,
issue 12, pp. 1002-1013, 2008.
[14] A. Esposito, M. Marinaro, D. Oricchio, S. Scarpetta, Approximation of Continuous and
Discontinuous Mappings by a Growing Neural RBF-based Algorithm, Neural Networks, Vol. 13, No.
6, pp. 651-656, 2000.
[15] O. Buchtala, M. Klimek, B. Sick, Evolutionary Optimization of Radial Basis Function Classifiers for
Data Mining Applications, IEEE Transactions on Systems, Man, and Cybernetics—Part B:
Cybernetics, Vol. 35, No. 5, pp. 928-947, 2005.
[16] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., 1993 .
[17] L Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Wadsworth
International Group, 1984.
[18] M. Mehta, R. Agrawal, J. Rissanen, SLIQ : A Fast Scalable Classifier for Data Mining, Proceedings
of EDBT'96, Avignon, France, 1996.
[19] J. Shafer, R. Agrawal, M. Mehta., SPRINT : A Scalable Parallel Classifier for Data Mining,
Proceedings of Int. Conf. Very Large Data Bases, Bombay, India. 544-555, 1996.
[20] R. Rastogi, K. Shim, PUBLIC : A Decision Tree Classifier that Integrates Building and Pruning,
Data Mining and Knowledge Discovery, vol. 4, no. 4. Kluwer International, 315-344, 2002.
[21] K. Fukunaga, R.R. Hayes, Effects of Sample Size in Classifier Design, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 11, issue 8, pp. 873-885, 1989.
[22] S.J. Raudys, A.K. Jain, Small Sample Size Effects in Statistical Pattern recognition:
Recommendations for Practitioners, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 13, no. 3, pp. 252-264, 1991.
[23] M.A. Mazuro, P.A. Habas, J.M. Zurada, J.Y. Lo, J.A. Baker, G.D. Tourassi, Training neural network
classifiers for medical decision making: The effects of imbalanced datasets on classification
performance, Neural Networks, vol. 21, issues 2-3, pp. 427-436. 2008.
[24] S. Berkman, H. Chan, L. Hadjiiski, Classifier performance estimation under the constraint of a finite
sample size: Resampling scheme applied to neural network classifiers, Neural Networks, vol. 21,
issues 2-3, pp. 476–483, 2008.
[25] D. Newman, UCI KDD Archive [http://kdd.ics.uci.edu], Irvine, CA: University of California,
Department of Information and Computer Science, 2005.
[26] H. Liu, F. Hussain, C.L. Tan, M. Dash , Discretization: An Enabling Technique, Data Mining and
Knowledge Discovery, vol. 6, no. 4. Pp. 393-423, 2002.
21
International Journal of Software Engineering and Its Applications
Applications
Vol. 4, No. 2, April 2010
2010
Author
Hyontai Sug received BS from Pusan National Unversity, Busan,
Korea majoring Computer science & Statistics in 1983, MS from
Hankuk University of Foreign Studies, Seoul, Korea majoring
Computer science in 1986, and Ph.D. from University of Florida
majoring Computer engineering in 1998. He was a researcher of
Agency for Defense Development, Korea from 1986 to 1992, and a
full-time lecturer of Pusan University of Foreign Studies, Busan, Korea
from 1999 to 2001. Currently he is an associate professor of Dongseo
University, Busan, Korea since 2001. His research interest includes data mining, knowledge
engineering, and databases.
22