Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Artificial neural networks – Lecture 5 Prof. Kang Li Email: [email protected] Prof. K. Li 1 /20 Last lecture Sequential BP Some issues about MLP training Advanced MLP learning algorithms This Lecture RBF Prof. K. Li 2 /20 Radial Basis Function networks h w11 u1 un y w11 hj xj Input layer yk y wkj h wLn y1 v1 vk whji xL Prof. K. Li h1 ui x1 hL Hidden layer vm ym y wmL Output layer 3 /20 Radial-basis function: a function which depends on the radial distance from a center XOR problem quadratically separable (|| x ci ||) Prof. K. Li 4 /20 Some nonlinear functions (a) Multiquadrics ( u ) ( x ) 2 2 1/ 2 (b) Inverse multiquadrics 2 1 / 2 ( u ) ( x ) 2 (c) Gaussian ( u ) exp( x 2 ) 2 2 (d) Thin plate spline ( u ) x log x 2 Prof. K. Li 5 /20 Global Support basis functions: Thin plate spline and multiquadratic functions generalise globally or have global support ( ( u ) as u )adjusting the parameters of an individual neuron in the network can affect the network output for all points in the input space. The MLP also falls into this category. Local support basis functions: Gaussian and inverse multiquadratic (IMQ) are only significantly greater than zero for a finite interval around their centres ( ( u ) 0 as u ): Networks with these basis functions are said to have local support because adjusting the parameters associated with a given neuron only affects a small portion of the input space. This property is particularly useful when training a neural network on-line as it means new information can be learned without degrading information previously learned at other points in the input space. Global support networks do not have this property. Prof. K. Li 6 /20 Global support functions Prof. K. Li Local support functions 7 /20 The locality of basis functions is determined by the parameter which is usually referred to as the width of the basis function. y Guassian function for 0.1, 0.5, 0.7 , 1.0 , 1.3, 1.6 Prof. K. Li 8 /20 • Initially it is to use a weighted sum of the outputs from the basis functions for various problem such as classification, density estimation etc. • It is motivated by many things (regularisation, Bayesian inference, classification, kernel density estimation, noisy interpolation etc), but all suggest that basis functions are set to represent the data. • Centers can be thought of as prototypes of input data. MLP Prof. K. Li vs RBF 9 /20 Mathematical equations for RBF The complete network equation for an RBF network with Gaussian basis functions is given by: uc 2 L i y hi exp 2 w j 1 i where ci, wi and hi are the centre, width and height of the ith basis function respectively. L is the number of hidden layer neurons In local support networks the approximation is generated by the neuron forming overlapping ‘bumps’ which combine to give the overall mapping. The more neurons used, the greater the overlap and the smoother the approximation obtained. Prof. K. Li 10 /20 Approximation capabilities –curse of dimensions for RBFNN Theorem: An RBF consisting of a single hidden layer of radial basis functions and a linear output neuron can approximate arbitrarily well any bounded continuous function. (Hartman et al., 1990) Another important theoretical result by Barron (1993) relates to the bounds on approximation error when using RBFs and MLPs. Barron shows that the approximation bound for single hidden layer MLP with sigmoidal nonlinearities is of the order For RBFNN E O( 1 2 ( Nh ) d Prof. K. Li Nh: number of hidden nodes 1 E O( ) Nh ) d: input dimensions 11 /20 RBF training – a two-step approach • The first step in the two-step approach is to select the centres and widths of RBF neurons without having to train weights of the output layer. • Traditionally this has involved placing the centres on a uniform grid with the widths chosen as a function of the inter-neuron spacing to give good interpolation. • This strategy guarantees good approximation capabilities, the number of neurons needed increases exponentially with the dimension of the mapping. This ‘curse of dimensionality’ is a major restriction in the use of local support networks. • Uniform placement of the centres is also very inefficient in problems where only a small portion of the input space is active. Solution: Place a small number of neurons in a manner which reflects the distribution of the training data. Prof. K. Li 12 /20 RBF training (cont.) – step one The simplest approach is to employ a subset of the training data, selected at random, as centres. Alternatively an optimal placement with respect to the distribution of the data can be obtained by minimising the total Euclidean distance (Ek) between the training patterns and the closest centres, that is: N Ek min uk c( i ) k 1 i This can be determined using an unsupervised training procedure known as k-means clustering. Prof. K. Li 13 /20 K-means clustering algorithm - divides data points into K subgroups based on similarity 1. Given N samples, initial center c0(i), i=1,…,K. initial learning rate 0 , and iteration j, j=0 at start. 2. For i =1 to N, do m arg [min ui c j ( k ) ] k c j 1( m ) c j ( m ) j ( ui c j ( m )) 3. Reduce j , so that j 1 j j 1 4. j++, go to 2 until converges Prof. K. Li 14 /20 RBF training (cont.) Step 2 - Linear training The important consequence of being able to select suitable centres and widths for RBF networks is that: • the remaining weights appear linearly in the network equation •can be efficiently computed with standard linear least squares, or SVD. • consequently RBF networks can be trained much more rapidly than MLPs Cost function 1 N T E ( d ( i ) y( i ) d ( i ) y( i ) ) / 2 Nm i 1 1 ( D Y )T ( D Y ) 2 Nm Prof. K. Li 15 /20 RBF training (cont.) Definitions uc 2 i i exp 2 w i L y hii T h j 1 1 L T , h h1 hL T Y Φh Then Prof. K. Li Φ ( 1 ) ( N )T 1 T E ( D h ) ( D h ) 2 Nm 16 /20 1 2 h arg min( Φh D ) ( T )1 T D h 2mN 1 T ( ) and is generally computed pseduo-inverse of ( ) using Singular Value Decomposition (SVD). T • It is possible to treat RBFN as MLP and gradient descent algorithm is applicable to RBFN. • Other training algorithms are also available, such as OLS (orthogonal least square) algorithm which is able to both select the radial basis functions (data points) as well compute the weights of the output layer. Prof. K. Li 17 /20 Problems with RBF 1. Due to the local nature of basis functions, RBF has problems in ignoring ‘noisy’ input dimensions unlike MLPs. 2. Optimal choice of basis function parameters may not be optimal for the output task, and optimal RBF network is not achievable if two-step training algorithm is used. 3. Because of dependence on distance, if variation in one parameter is small with respect to the others it will contribute very little to the outcome (l + e)2 ~ l2. Therefore, pre-process all data vector for each parameters to give zero mean and unit variance via simple transformation ~ x ( x x )/ 4. Curse of dimensions Prof. K. Li 18 /20 Comparison of MLP to RBFN RBF MLP hidden unit outputs are functions of distance from prototype vector (centre) hidden unit outputs are monotonic functions of a weighted linear sum of the inputs localised hidden units mean that few contribute to output => no interference between units => faster convergence distributed representation as many hidden units contribute to network output => interference between units => non-linear training => slow convergence one hidden layer can have more than one hidden layer hybrid learning with supervised learning global supervised learning of all weights in one set of weights localised approximations to nonlinear mappings Prof. K. Li global approximations to nonlinear mappings 19 /20