Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural Network-Based Clustering A. Selçuk MERCANLI Supervisor: Assist. Prof.Dr. Turgay İBRİKÇİ 1 Why NN? • NN have solved a wide range of problems and have a good learning capabilities. Their strengths include adaptation, ease of implementation, parallezition, speed and flexibility. NN based clustering is closely related to the consept of competitive learning. 2 d w: weigth, initialy random k: # of clusters s(x,wj)= w i 1 x ji i 3 Updating Weights w j (t 1) w j (t ) ( x(t ) w j (t )) : Learning rate. İf it’s zero no learning, if it’s 1 fast learning To avoid the problem of unlimited growth of the weight, the weight vector must be normalized if the input pattern is normalized. 4 WTA - WTM The competitive learning paradigm allows learning for a particular winning neuron that matches best with the given input pattern. Thus, it is also known as winner - take – all (WTA) On the other hand, learning can also occur in a cooperative way, which means that not just the winning neuron adjusts its prototype, but all other cluster prototypes have the opportunity to be adapted based on how proximate they are to the input pattern. The learning scheme is called soft competitive learning or winner - take - most (WTM) - Hard competition Only one neuron is activated - Soft competition Neurons neighboring the true winner are activated. 5 HARD COMPETITIVE LEARNING CLUSTERING • • • • Online K-means Algorithm Leader Follower Clustering Algorithm Adaptive Resonance Theory Fuzzy ART 6 Online K-means Algorithm 1. Initialize K cluster prototype vectors, m1 , … , mK ℜd randomly; 2. Present a normalized input pattern x ℜd ; 3. Choose the winner J that has the smallest Euclidean distance to x , J =argmin ||x−mj ||; 4. Update the winning prototype vector towards x , mJ(new) =mJ(old)+η(x−mJ(old)), where η is the learning rate; 5. Repeat steps 2 – 4 until the maximum number of steps is reached. 7 K-means Algorithm iterate { Compute distance from all points to all kcenters Assign each point to the nearest k-center Compute the average of all points assigned to all specific k-centers Replace the k-centers with the new averages } From Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007, Distributed Computing Seminar, p 12 8 Disadvantages of K-means • In the begining of K-means algorithm, it requires a certain determination of number of clusters in advance. The number of clusters must be estimated via the procedure of cluster analysis. An inappropriate selection of number of clusters may distort the real clustering structure, that’s why Leader follower is needed. 9 Disadvantages of K-means • η , the learning rate, will be very small in the last stages that cause a disadvantage of not learning very well of new patterns. • where η0 and η1 are the initial and final values of the learning rate, respectively, and t1 is the maximum number of iterations allowed 10 Leader - Follower Clustering Algorithm 1. Initialize the first cluster prototype vector m1 with the first input pattern; 2. Present a normalized input pattern x ; 3. Choose the winner J that is closest to x based on the Euclidean distance, j=argmin ||x−mj ||; 4. If || x − mj || < θ , update the winning prototype vector, mJ(new) =mJ(old)+η(x−mJ(old)) where η is the learning rate. Otherwise, create a new cluster with the prototype vector equal to x ; 5. Repeat steps 2 – 4 until the maximum number of steps is reached. 11 Leader - Follower Distance > Threshold • Find the closest cluster center – Distance above threshold ? Create new cluster – Or else, add instance to cluster and update cluster center From Johan Everts, Clustering algorithms, Kunstmatige Intelligentie, p 31 12 Performance Analysis • K-Means – Depends a lot on a priori knowledge (K) – Very Stable • Leader Follower – Depends a lot on a priori knowledge (Threshold) – Faster but unstable From Johan Everts, Clustering algorithms, Kunstmatige Intelligentie, p 39 13 Adaptive Resonance Theory • An important problem with competitive learning - based clustering is stability. The stability of an incremental clustering algorithm in terms of two conditions: “ • (1) No prototype vector can cycle, or take on a value that it had at a previous time (provided it has changed in the meantime). • (2) Only a finite number of clusters are formed with infinite presentation of the data. ” The first condition considers the stability of individual prototype vectors of the clusters, and the second one concentrates on the stability of all the cluster vectors. 14 Adaptive Resonance Theory • K-means and Leader Follower algorithms dosen’t produce stable clusters. The plasticity of two algorithms may cause lost of previously learned rules. • Adaptive resonance theory (ART) was developed by Carpenter and Grossberg (1987a, 1988) • ART is not, as is popularly imagined, a neural network architecture. It is a learning theory hypothesizing that resonance in neural circuits can trigger fast learning. 15 Adaptive Resonance Theory • Stability-Plasticity Dilemma • Stability: system behaviour doesn’t change after irrelevant events • Plasticity: System adapts its behaviour according to significant events • Dilemma: how to achieve stability without rigidity and plasticity without chaos? – Ongoing learning capability – Preservation of learned knowledge From:Arash Ashari,Ali Mohammadi, ART powerpoint16 ART-1 • The basic ART1 architecture consists of two - layer nodes or neurons, the feature representation field F1 , and the category representation field F2 • The neurons in layer F1 are activated by the input pattern, while the prototypes of the formed clusters are stored in layer F2 . 17 ART-1 Architecture 18 ART-1 • The two layers are connected via adaptive weights:a bottom – up weight matrix and a top down weight matrix • F2 performs a winner - take - all competition, between a certain number of committed neurons and one uncommitted neuron. The winning neuron feeds back its template weights to layer F1. This is known as top - down feedback expectancy.This template is compared with the input pattern 19 ART-1 • If the match meets the vigilance criterion, weight adaptation occurs, where both bottom - up and top down weights are updated simultaneously. This procedure is called resonance, which suggests the name of ART. On the other hand, if the vigilance criterion is not met, a reset signal is sent back to layer F2 to shut off the current winning neuron. • This new expectation is then projected into layer F1 , and this process repeats until the vigilance criterion is met. If an uncommitted neuron is selected for coding, a new uncommitted neuron is created to represent a potential new cluster. It is clear that the vigilance parameter ρ has a function similar to that of the threshold parameter θ of the leader - follower algorithm. 20 ART-1 Flowchart 21 Fuzzy ART • FA maintains architecture and operations similar to ART1 while using the fuzzy set operators to replace the binary operators so that it can work for all real data sets. We describe FA by emphasizing its main difference with ART1 in terms of the following five phases, known as preprocessing, initialization, category choice, category match, and learning. • Preprocessing. Each component of a d dimensional input pattern x = ( x1 , … , xd ) must be in the interval [0,1]. 22 Fuzzy ART • Initialization. The real - valued adaptive weights W = { w ij }, representing the connection from the i th neuron in layer F2 to the j th neuron in layer F1 , include both the bottom - up and top - down weights of ART1. Initially, the weights of an uncommitted node are set to one. Larger values may also be used, however, this will bias the tendency of the system to select committed nodes 23 Fuzzy ART • Category choice. After an input pattern is presented, the nodes in layer F2 compete by calculating the category choice function, defined as Tj= | x wj | | wj | Where is the fuzzy AND operator defined by (x y)i= min (xi,yi) , 24 Fuzzy ART • Category match. The category match function of the winning neuron is then tested with the vigilance criterion. If resonance occurs. Otherwise, the current winning neuron is disabled and a new neuron in layer F2 is selected and examined with the vigilance criterion. This search process continues until upper criteria sattisfied. 25 Fuzzy ART • Learning. The weight vector of the winning neuron that passes the vigilance test at the same time is updated using the following learning rule, : [0 1] learning rate parameter. 26 SOFT COMPETITIVE LEARNING CLUSTERING • • Leaky Learning, One of the major problems with hard competitive learning is the underutilized or dead neuron problem, which refers to the possibility that the weight vector of a neuron is initialized farther away from any input patterns than other weight vectors so that it has no opportunity to ever win the competition and, therefore, no opportunity to be trained. One solution to addressing this problem is to allow both winning and losing neurons to move towards the presented input pattern, but with different learning rates. • where ηw and ηl are the learning rates for the winning and losing neurons, respectively, and ηw >> ηl . 27 • Conscience Mechanism we need to modify the distance definition described in upside. Desieno (1988) adds a bias term bj to the squared Euclidean distance. x : Data set Wj : j=1,2,…K neurons weights bj : Bias term 28 • Rival Penalized Competitive Learning • x : Data set • Wj : j=1,2,…K neurons weights • j : Bias term 29 Learning Vector Quantization • Learning vector quantization (LVQ) , (Kohonen1990) is a unsupervised learning pattern classification method. Essentially same as the Kohonens SOM. LVQ algo is to find the output unit that is closest to the input vector. If x and wt belong to same class, then we move the weights toward the new vector; if they belong to different classes then we move the weights away from this input vector.(Fundamentals of Neural Networks, L.Fausett,) 30 Flowchart of LVQ X: input pattern J(w,x) : cost function w: weights 31 LVQ J is the winning neuron and cost function defined on locally weighted error between x and w 32 LVQ : Prespesified threshold 33 LVQ Application 10 number of data clustered to two cluster and wieved by red and cyan colors 34 SOM • A competitive network. Output neurons of the network compete among themselves to be activated or fired. Neighboorhood function usually decrease by linear, rectangular or hexagonal. 35 Neural Network a Comprehensive Foundation, Simon Haykin , Prentice36 Hall, p 467 SOM Neighboorhood Application of neural Network and other Learning Technologies in Process Engineering, I.M. Majtaba, M.A. Hussain, Imperial College Press, 2001, P 53 37 SOM BMU Best matching unit, Update weights of winner and neighbours Decrease learning rate & neighbourhood size 38 Flowchart of SOFM 39 Basic steps of SOFM • 1. Determine the topology of the SOFM. Initialize the weight vectors w j (0) for j = 1, … , K , randomly; • 2. Present an input pattern x to the network. Choose the winning node J that has the minimum Euclidean distance to x , i.e. J=argmin(||x−wj||) • 3. Calculate the current learning rate and size of the neighborhood; • 4. Update the weight vectors of all the neurons in the neighborhood of J using wj(t+1)=wj(t)+(t)hji(t)(x-wj(t)) ; • 5. Repeat steps 2 to 4 until the change of neuron position is below a prespecified small positive number. 40 SOM Application Learning A character 41 SOM Application Learning circle with SOM 42 SOM Application SOM Examples from Bernd Fritzke, Ruhr Univercity Draft 5 April 1997, p32 43 Neural Gas NG is capable of adaptively determining the updating of the neighborhood by using a neighborhood ranking of the prototype vectors within the input space, rather than a neighborhood function in the output lattice 44 Neural Gas • h λ ( k j ( x, W )) is a bell - shaped curve hλ (kj(x,W)) = exp(−kj(x,W) λ ). • Prototype vectors are updated as wj(t + 1) = wj(t) +η(t)hλ (kj(x,W))(x − wj(t)). • Learning rate η and characteristic decay constant λ • η0 and ηf : initial and final values • λ0 and λf : initial and final decay constants • T : maximum number of iteration 45 NG Algorithm 1. 2. 3. 4. The major process of the NG algorithm is as follows: Initialize a set of prototype vectors W = { w1 , w2 , … , wK } randomly; Present an input pattern x to the network. Sort the index list in order from the prototype vector with the smallest Euclidean distance from x to the one with the greatest distance from x ; Calculate the current learning rate and hλ ( k j ( x, W )) (bell shaped curve). Adjust the prototype vectors using the learning rule Repeat steps 2 and 3 until the maximum number of iterations is reached. 46 NG Application NG always adding new centers and stops when it reaches maxiteration 47 NG Application NG Examples from Bernd Fritzke, Ruhr Univercity Draft 5 April 1997, p22 48 Growing Neural Gas • A type of SOM. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. 49 Growing Neural Gas • When prototype learning occurs, not only is the prototype vector of the winning neuron J1 updated towards x , but the prototypes within its topological neighborhood NJ1 are also adapted • Different from NG, GCS, or SOFM, GNG is developed as a self - organizing network that can dynamically increase (usually) and remove the number of neurons in the network. A succession of new neurons is inserted into the network every λ iterations near the neuron with the maximum accumulated error. At the same time, a neuron removal rule could also be used to eliminate the neurons featuring the lowest utility for error reduction 50 GNG GNG Examples from Bernd Fritzke, Ruhr Univercity Draft 5 April 1997, p29 58 Some Applications Magnetic Resonance Imaging Segmentation MRI provides a visualization of the internal tissues and organs in the living organism, which is valuable in its applications in disease diagnosis (such as cancer and heart and vascular disease ), treatment and surgical planning. MRI segmentation can be formulated as a clustering problem in which a set of feature vectors, which are obtained through transforming image measurements and positions, is grouped into a relatively small number of clusters. 59 Magnetic Resonance Imaging Segmentation • After the patient was given Gadolinium, the tumor on the T1 weighted image (Fig. 5.17 (d)) becomes very bright and is isolated from surrounding tissue. From N. Karayiannis and P. Pai. Segmentation of magnetic resonance images using fuzzy algorithms for learning vector quantization. IEEE Transactions on Medical Imaging, vol. 18, pp. 172 – 180, 1999. Copyright © 1999 IEEE.) 60 Condition Monitoring of 3G Cellular Networks • The 3G mobile networks combine new technologies such as WCDMA and UMTS and provide users with a wide range of multimedia services and applications with higher data rates (Laiho et al., 2005 ). At the same time, emerging new requirements make it more important to monitor the states and conditions of 3G cellular networks. Specifically, in order to detect abnormal behaviors in 3G cellular systems, four competitive learning neural networks, LVQ, FSCL, SOFM (see another application of SOFM in WCDMA network analysis in Laiho et al. (2005) ), and NG, were applied to generate abstractions or clustering prototypes of the input vectors under normal conditions, which are further used for network behavior prediction 61 Condition Monitoring of 3G Cellular Networks The clustering prototypes provide a good summary of the normal behaviors of the cellular networks, which can then be used to detect abnormalities. 62 Summary Neural network – based clustering is tightly related to the concept of competitive learning. Prototype vectors, associated with a set of neurons in the network and representing clusters in the feature or output space, compete with each other upon the presentation of an input pattern. The active neuron or winner reinforces itself (hard competitive learning) or its neighborhood within certain regions (soft competitive learning). More often, the neighborhood decreases monotonically with time. One important problem that learning algorithms need to deal with is the stability and plasticity dilemma. A system should have the capability of learning new and important patterns while maintaining stable cluster structures in response to irrelevant inputs. 63