Download Performance Analysis of Various Activation Functions in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Existential risk from artificial general intelligence wikipedia , lookup

Neural modeling fields wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Gene expression programming wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Bekir Karlik and A. Vehbi Olgac
Performance Analysis of Various Activation Functions in
Generalized MLP Architectures of Neural Networks
Bekir Karlik
[email protected]
Engineering Faculty/Computer Engineering/
Mevlana University, Konya, 42003,
Turkey
A Vehbi Olgac
[email protected]
Engineering Faculty/Computer Engineering/
Fatih University, Istanbul, 34500,
Turkey
Abstract
The activation function used to transform the activation level of a unit (neuron) into an output
signal. There are a number of common activation functions in use with artificial neural networks
(ANN). The most common choice of activation functions for multi layered perceptron (MLP) is
used as transfer functions in research and engineering. Among the reasons for this popularity are
its boundedness in the unit interval, the function’s and its derivative’s fast computability, and a
number of amenable mathematical properties in the realm of approximation theory. However,
considering the huge variety of problem domains MLP is applied in, it is intriguing to suspect that
specific problems call for single or a set of specific activation functions. The aim of this study is to
analyze the performance of generalized MLP architectures which has back-propagation algorithm
using various different activation functions for the neurons of hidden and output layers. For
experimental comparisons, Bi-polar sigmoid, Uni-polar sigmoid, Tanh, Conic Section, and Radial
Bases Function (RBF) were used.
Keywords: Activation Functions, Multi Layered Perceptron, Neural Networks, Performance Analysis
1. INTRODUCTION
One of the most attractive properties of ANNs is the possibility to adapt their behavior to the
changing characteristics of the modeled system. Last decades, many researchers have
investigated a variety of methods to improve ANN performance by optimizing training methods,
learn parameters, or network structure, comparably few works is done towards using activation
functions. Radial basis function (RBF) neural network is one of the most popular neural network
architectures [1]. The standard sigmoid reaches an approximation power comparable to or better
than classes of more established functions investigated in the approximation theory (i.e.,splines
and polynomials) [2]. Jordan presented the logistic function which is a natural representation of
the posterior probability in a binary classification problem [3]. Liu and Yao improved the structure
of Generalized Neural Networks (GNN) with two different activation function types which are
sigmoid and Gaussian basis functions [4]. Sopena et al. presented a number of experiments (with
widely–used benchmark problems) showing that multilayer feed–forward networks with a sine
activation function learn two orders of magnitude faster while generalization capacity increases
(compared to ANNs with logistic activation function) [5]. Dorffner developed the Conic Section
Function Neural Network (CSFNN) which is a unified framework for MLP and RBF networks to
make simultaneous use of advantages of both networks [6]. Bodyanskiy presented a novel
double-wavelet neuron architecture obtained by modification of standard wavelet neuron, and
their learning algorithms are proposed. The proposed architecture allows improving the
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
111
Bekir Karlik and A. Vehbi Olgac
approximation properties of wavelet neuron [7]. All these well-known activations functions are
used into nodes of each layer of MLP to solve different non-linear problems. But, there is no
performance of comparison studies using these activation functions. So, in this study, we have
used five different well-known activation functions such as Bi-polar sigmoid, Uni-polar sigmoid,
Tanh, Conic Section, and Radial Bases Function (RBF) to compare their performances.
2. ACTIVATION FUNCTION TYPES
The most important unit in neural network structure is their net inputs by using a scalar-to-scalar
function called “the activation function or threshold function or transfer function”, output a result
value called the “unit's activation”. An activation function for limiting the amplitude of the output of
a neuron. Enabling in a limited range of functions is usually called squashing functions [8-9]. It
squashes the permissible amplitude range of the output signal to some finite value. Some of the
most commonly used activation functions are to solve non-linear problems. These functions are:
Uni-polar sigmoid, Bi-polar sigmoid, Tanh, Conic Section, and Radial Bases Function (RBF). We
did not care about some activation function such as identity function, step function or binary step
functions as they are not used to solve linear problems.
2.1 Uni-Polar Sigmoid Function
Activation function of Uni-polar sigmoid function is given as follows:
g ( x) =
1
1 + e −x
(1)
This function is especially advantageous to use in neural networks trained by back-propagation
algorithms. Because it is easy to distinguish, and this can interestingly minimize the computation
capacity for training. The term sigmoid means ‘S-shaped’, and logistic form of the sigmoid maps
g(x)
1
0
-6
-4
-2
x
0
2
4
6
the interval (-∞, ∞) onto (0, 1) as seen in
FIGURE 1.
g(x)
1
0
-6
-4
-2
x
0
2
4
6
FIGURE 1: Uni-Polar Sigmoid Function
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
112
Bekir Karlik and A. Vehbi Olgac
2.2 Bipolar Sigmoid Function
Activation function of Bi-polar sigmoid function is given by
g ( x) =
1 − e−x
1 + e −x
(2)
This function is similar to the sigmoid function. For this type of activation function described in Fig.
2, it goes well for applications that produce output values in the range of [-1, 1].
g(x)
1
0
-6
-4
-2
0
2
4
6
x
-1
FIGURE 2: Bi-Polar Sigmoid Function
2.3 Hyperbolic Tangent Function
This function is easily defined as the ratio between the hyperbolic sine and the cosine functions or
expanded as the ratio of the half‐difference and half‐sum of two exponential functions in the
points x and –x as follows :
sinh( x) e x − e − x
tanh( x) =
=
cosh( x) e x + e − x
(3)
Hyperbolic Tangent Function is similar to sigmoid function. Its range outputs between -1 and 1 as
seen in
FIGURE . The following is a graphic of the hyperbolic tangent function for real values of its
argument x:
FIGURE 3: Hyperbolic Tangent Function
2.4 Radial Basis Function
Radial basis function (RBF) is based on Gaussian Curve. It takes a parameter that determines
the center (mean) value of the function used as a desired value (see Fig. 4). A radial basis
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
113
Bekir Karlik and A. Vehbi Olgac
function (RBF) is a real-valued function whose value depends only on the distance from the
origin, so that
g ( x) = g ( x )
(4)
or alternatively on the distance from some other point c, called a center, so that
g ( x, c) = g ( x − c )
(5)
Sums of radial basis functions are typically used to approximate given functions. This
approximation process can also be interpreted as a simple type of neural network.
RBF are typically used to build up function approximations of the form
N
y ( x) = ∑ wi g ( x − ci )
(6)
i =1
where the approximating function y(x) is represented as a sum of N radial basis functions, each
associated with a different center ci, and weighted by an appropriate coefficient wi. The weights wi
can be estimated using the matrix methods of linear least squares, because the approximating
function is linear in the weights. Fig. 4 shows that two unnormalized Gaussian radial basis
functions in one input dimension. The basis function centers are located at c1=0.75 and c2=3.25
[10].
FIGURE 4: Two unnormalized Gaussian radial basis functions in one input dimension
RBF can also be interpreted as a rather simple single-layer type of artificial neural network called
a radial basis function network, with the radial basis functions taking on the role of the activation
functions of the network. It can be shown that any continuous function on a compact interval can
in principle be interpolated with arbitrary accuracy by a sum of this form, if a sufficiently large
number N of radial basis functions is used.
2.5 Conic Section Function
Conic section function (CSF) is based on a section of a cone as the name implies. CSF takes a
parameter that determines the angle value of the function as seen in Fig. 5.
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
114
Bekir Karlik and A. Vehbi Olgac
Figure 5: Conic Section Parabola (ω = 90)
The equation of CSF can be defined as follows:
N +1
f ( x ) = ∑ (a i − ci )wi − cos wi ( a − ci )
(7)
i =1
where ai is input coefficient, ci is the center, wi is the weight in Multi Layered Perceptron (MLP),
2w is an opening angle which can be any value in the range of [-π/2, π/2] and determines the
different forms of the decision borders.
The hidden units of neural network need activation functions to introduce non-linearity into the
networks. Non-linearity makes multi-layer networks so effective. The sigmoid functions are the
most widely used functions [10-11]. Activation functions should be chosen to be suited to the
distribution of the target values for the output units. You can think the same for binary outputs
where the tangent hyperbolic and sigmoid functions are effective choices. If the target values are
positive but have no known upper bound, an exponential output activation function can be used.
This work has explained all the variations on the parallel distributed processing idea of neural
networks. The structure of each neural network has very similar parts which perform the
processing.
3. COMPARISON WITH DIFFERENT ACTIVATION FUNCTIONS
In this study, different activation functions depend on different number of iterations for comparing
their performances by using the same data. For all the activation functions, we used the number
of nodes in the hidden layer; firstly 10 nodes, secondly 40 nodes (numbers of iterations are the
same for both of them). After presenting the graphs for different parameters from Figure 6
through Figure 10, interpretations of their results will follow right here.
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
115
Bekir Karlik and A. Vehbi Olgac
FIGURE 6: 100 Iterations - 10 Hidden Neurons - Bi-Polar Sigmoid
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
116
Bekir Karlik and A. Vehbi Olgac
FIGURE 7: 100 Iterations - 10 Hidden Neurons - Uni-Polar Sigmoid
FIGURE 8: 100 Iterations - 10 Hidden Neurons - Tangent Hyperbolic
We used the same number of hidden neurons (10 nodes) and iterations (100 iterations) to
compare the differences between the activation functions used above. It is also requires less
number of iterations.
FIGURE 9: 100 Iterations - 10 Hidden Neurons - Conic Section
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
117
Bekir Karlik and A. Vehbi Olgac
FIGURE 10: 100 Iterations - 10 Hidden Neurons – RBF
According to the Figures from 6 through 15, we found Conic Section function the best activation
function for training. Moreover, it requires less number of iterations than the other activation
functions to solve the non-linear problems. However, with regard to testing, we found that the
accuracy of Tanh activation function was much better than CSF and the other activation
functions. This situation explains that total Mean Square Error (MSE) according to iterations
cannot determine the network accuracy. Hence, we can obtain the real accuracy results upon
testing.
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
118
Bekir Karlik and A. Vehbi Olgac
FIGURE 11: 500 Iterations - 40 Hidden Neurons - Bi-Polar Sigmoid
FIGURE 12: 500 Iterations - 40 Hidden Neurons - Uni-Polar Sigmoid
Figures 11 through 15 show the error graphics for each activation function for different numbers
of nodes in the hidden layer which may vary from 10 up to 40, and the number of iterations from
100 up to 500.
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
119
Bekir Karlik and A. Vehbi Olgac
FIGURE 13: 500 Iterations - 40 Hidden Neurons - Tangent Hyperbolic
FIGURE 14: 500 Iterations - 40 Hidden Neurons - Conic Section
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
120
Bekir Karlik and A. Vehbi Olgac
FIGURE 15: 500 Iterations - 40 Hidden Neurons – RBF
According to the first five graphics generated by the same iteration numbers and the same
number of neurons in the hidden layer, the Tangent Hyperbolic Activation function resulted in the
most successful one for all the test parameters. Conic Section Function and Radial Basis
Function were not able to successfully use the data type and the network parameters. The
second group of result graphs is similar to the first group. But, the accuracy of the algorithms was
observed to increase for the sigmoid and tangent hyperbolic functions. The rest of the functions
used in this study were not successful and not accurate enough in this group. Table 1 shows
those total accuracy and error values for the testing phase.
Bi-Polar S.
Uni-Polar S.
Tanh
Conic
RBF
100 Iterations
10 Hidden Neurons
Error Accuracy (%)
0,056
93
0,026
92
0,025
95
0,001
34
0,003
30
500 Iterations
40 Hidden Neurons
Error Accuracy (%)
0,034
89
0,006
88
0,002
99
0,045
23
0,001
19
TABLE 1: Results of the testing phase
4. CONCLUSIONS
In this study, we have used five conventional differentiable and monotonic activation functions for
the evolution of MLP architecture along with Generalized Delta rule learning. These proposed
well-known and effective activation functions are Bi-polar sigmoid, Uni-polar sigmoid, Tanh, Conic
Section, and Radial Bases Function (RBF). Having compared their performances, simulation
results show that Tanh (hyperbolic tangent) function performs better recognition accuracy than
those of the other functions. In other words, the neural network computed good results when
“Tanh-Tanh” combination of activation functions was used for both neurons (or nodes) of hidden
and output layers.
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
121
Bekir Karlik and A. Vehbi Olgac
We presented experimental results by proposing five different activation functions for MLP neural
networks architecture along with Generalized Delta Rule Learning to solve the non-linear
problems. These results demonstrate that it is possible to improve the ANN performance through
the use of much effective activation function. According to our experimental study, we can say
that the Tanh activation function can be used in the vast majority of MLP applications as a good
choice to obtain high accuracy. Furthermore, the sole use of validation test cannot approve the
work even if it would be able to predict very low MSE. So, we can only decide about the real
accuracy after a thorough testing of the neural network. In the future work, non–monotic
activation functions will probably be sought for the test results in the learning speed and the
performance of the neural network.
REFERENCES
1. T. Poggio, F. Girosi, “A theory of networks for approximation and learning”. A.I. Memo No.
1140, Artificial Intelligence, Laboratory, Massachusetts Institute of Technology, 1989
2. B DasGupta, G. Schnitger, “The Power of Approximating: A Comparison of Activation
Functions”. In Giles, C. L., Hanson, S. J., and Cowan, J. D., editors, Advances in Neural
Information Processing Systems, 5, pp. 615-622, San Mateo, CA. Morgan Kaufmann
Publishers, 1993
3. M.I. Jordan, “Why the logistic function? A tutorial discussion on probabilities and neural
networks”. Computational Cognitive Science Technical Report 9503, Massachusetts Institute
of Technology, 1995
4. Y. Liu, X. Yao, “Evolutionary Design of Artificial Neural Networks with Different Nodes”. In
Proceedings of the Third IEEE International Conference on Evolutionary Computation, pp.
570-675, 1996
5. J. M. Sopena, E. Romero, R. Alqu´ezar, “Neural networks with periodic and monotonic
activation functions: a comparative study in classification problems”. In Proceedings of the
9th International Conference on Artificial Neural Networks, pp. 323-328, 1999
6. G. Dorffner, “Unified frameworks for MLP and RBFNs: Introducing Conic Section Function
Networks”. Cybernetics and Systems, 25: 511-554, 1994
7. Y. Bodyanskiy, N. Lamonova, O. Vynokurova, “Double-Wavelet Neuron Based on Analytical
Activation Functions”. International Journal Information Theories & Applications, 14: 281-288,
2007
8. G. Cybenko, “Approximation by superposition of a sigmoidal function”. Mathematics of
Control, Signals, and Systems, 2(4):303–314, 1987
9. R. P. Lippmann, “An introduction to computing with neural nets”. IEEE Acoustics, Speech and
Signal Processing, 4(2):4–22, 1987
10. M.D. Buhmann, (2003), Radial Basis Functions: Theory and Implementations, Cambridge
University Press, ISBN 978-0-521-63338-3.
11. B. Widrow, M.A. Lehr, “30 years of adoptive neural netwoks; perceptron, madaline, and back
propagation”. Proc. IEEE, 78: 1415–1442, 1990
12. B. Ciocoiu, “Hybrid Feedforward Neural Networks for Solving Classification Problems”.
Neural Processing Letters, 16(1):81-91, 2002
International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4)
122