Download Introduction to Machine Learning

CSCE833 Machine Learning Lecture 10 Artificial Neural Networks Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 University of South Carolina Department of Computer Science and Engineering Outline  Midterm moved to March 15th  Neural Network Learning  Self-Organizing Maps    Origins Algorithm Example 2 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Neuron: no division, only 1 axon 3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Neural Networks       Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 1010 Large connectitivity: 105 Parallel processing Distributed computation/memory Robust to noise, failures 4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Understanding the Brain  Levels of analysis (Marr, 1982) 1. 2. 3.   Computational theory Representation and algorithm Hardware implementation Reverse engineering: From hardware to theory Parallel processing: SIMD vs MIMD Neural net: SIMD with modifiable local memory Learning: Update by training/experience 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Perceptron d y  w j x j  w0  wT x j 1 w  w 0 , w1 ,..., w d  T x  1, x 1 ,..., x d  T (Rosenblatt, 1962) 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) What a Perceptron Does  Regression: y=wx+w0 y w0  Classification: y=1(wx+w0>0) y y s w0 w w x w0 x x0=+1 x 1 y  sigmoid o   1  exp  w T x   7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) K Outputs Regression: d y i   wij x j  wi 0  wiT x j 1 y  Wx Classification: oi  w iT x expoi yi  k expok choose C i if y i  max y k k 8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Training  Online (instances seen one by one) vs batch (whole sample) learning:      No need to store the whole sample Problem may change in time Wear and degradation in system components Stochastic gradient-descent: Update after a single pattern Generic update rule (LMS rule):   w   ri  y x t ij t t i t j Update  LearningFa ctor DesiredOut put  ActualOutp ut  Input 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Training a Perceptron: Regression  Regression (Linear output):    1 t E w | x ,r  r  y t 2 w tj   r t  y t x tj t t  t   2   1 t  r  wT xt 2  2 10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Classification  Single sigmoid output  y t  sigmoid w T x t       E t w | x t , r t  r t log y t  1  r t log 1  y t    w tj   r t  y t x tj  K>2 softmax outputs expwiT x t y  T t exp w k kx t    E t wi i | x t , r t  rit log y it  i wijt   rit  y it x tj 11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Learning Boolean AND 12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) XOR  No w0, w1, w2 satisfy: w0 w2  w 0 w1  w0 w1  w 2  w 0 0 0 0 0 (Minsky and Papert, 1969) 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Multilayer Perceptrons H y i  v z   v ih zh  v i 0 T i h 1  zh  sigmoid w hT x    1  exp   1 d j 1 w hj x j  w h 0  (Rumelhart et al., 1986) 14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Backpropagation H y i  v z   vih zh  vi 0 T i h 1  zh  sigmoid whT x     1  exp  1 d j 1 whj x j  wh 0 E E y i zh  whj y i zh whj 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)   1 E W,v | X    r t  y t 2 t Regression H   vh   r t  y t zht y   vh z  v 0 t  2 t h t h 1 Backward w hj Forward  T h zh  sigmoid w x  E   w hj E y t zht   t t  y  z t h w hj        r t  y t v h zht 1  zht x tj t x       r t  y t v h zht 1  zht x tj t 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Regression with Multiple Outputs yi  1 E W, V | X    rit  y it 2 t i  2 vih H y it   vih zht vi 0 h 1  zh  vih   rit  y it zht whj t whj       t t t    ri  y i vih zh 1  zht x tj t  i  xj 18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) whx+w0 zh vhzh 21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Two-Class Discrimination  One sigmoid output yt for P(C1|xt) and P(C2|xt) ≡ 1-yt H  t y  sigmoid   vh zh  v 0   h 1  E W,v | X   r t log y t  1  r t log 1  y t t  t     r   v z 1  z x   vh    r t  y t zht t whj t  yt h t h t h t j t 22 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) K>2 Classes  t exp o t i y it   P C | x i t exp o k k H oit   vih zht  v i 0 h 1 E W,v | X    rit log y it t  i   v ih   rit  y it zht t w hj       t t t    ri  y i vih zh 1  zht x tj t  i  23 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Multiple Hidden Layers  MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks  d   z1h  sigmoid w x  sigmoid   w1hj x j  w1h 0 ,h  1,..., H 1  j 1   H1  T  z2l  sigmoid w 2l z1  sigmoid   w 2lhz1h  w 2l 0 , l  1,..., H 2  h 1    T 1h   H2 y  v z 2   v l z2l  v 0 T l 1 24 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Improving Convergence  Momentum t  E wit    wit 1 wi  Adaptive learning rate   a if E t    E t     b otherwise 25 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Overfitting/Overtraining Number of weights: H (d+1)+(H+1)K 26 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Tuning the Network Size   Destructive Weight decay:   Constructive Growing networks E w i     w i wi  E'  E   wi2 2 i (Ash, 1989) (Fahlman and Lebiere, 1989) 28 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Dimensionality Reduction 29 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Learning Time: sequential learning  Applications:     Sequence recognition: Speech recognition Sequence reproduction: Time-series prediction Sequence association Network architectures   Time-delay networks (Waibel et al., 1989) Recurrent networks (Rumelhart et al., 1986) 31 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Time-Delay Neural Networks 32 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Recurrent Networks 33 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Unfolding in Time 34 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Self-Organizing Maps : Origins Self-Organizing Maps  Ideas first introduced by C. von der Malsburg (1973), developed and refined by T. Kohonen (1982)  Neural network algorithm using unsupervised competitive learning  Primarily used for organization and visualization of complex data  Biological basis: ‘brain maps’ Teuvo Kohonen Self-Organizing Maps SOM - Architecture     Lattice of neurons (‘nodes’) accepts and responds to set of input signals Responses compared; ‘winning’ neuron selected from lattice Selected neuron activated together with ‘neighbourhood’ neurons j Adaptive process changes weights to more closely resemble inputs 2d array of wj1 wj2 wj3 x1 x2 x3 wjn ... neurons Weighted synapses xn Set of input signals (connected to all neurons in lattice) Self-Organizing Maps SOM – Result Example Classifying World Poverty Helsinki University of Technology ‘Poverty map’ based on 39 indicators from World Bank statistics (1992) SOM – Result Example Self-Organizing Maps Classifying World Poverty Helsinki University of Technology ‘Poverty map’ based on 39 indicators from World Bank statistics (1992) Self-Organizing Maps SOM – Algorithm Overview 1. 2. 3. 4. 5. 6. Randomly initialise all weights Select input vector x = [x1, x2, x3, … , xn] Compare x with weights wj for each neuron j to determine winner Update winner so that it becomes more like x, together with the winner’s neighbours Adjust parameters: learning rate & ‘neighbourhood function’ Repeat from (2) until the map has converged (i.e. no noticeable changes in the weights) or pre-defined no. of training cycles have passed Initialisation (i)Randomly initialise the weight vectors wj for all nodes j Input vector In computer texts are shown as a frequency distribution of one word.  Region (ii) Choose an input vector x from the training set A Text Example: Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen which reduce the dimensions of data through the use of selforganizing neural networks. The problem that data visualization attempts to solve is that humans simply cannot visualize high dimensional data as is so technique are created to help us understand this high dimensional data. Self-organizing maps data visualization technique Professor invented Teuvo Kohonen dimensions ... Zebra 2 1 4 2 2 1 1 1 1 0 Finding a Winner  (iii) Find the best-matching neuron w(x), usually the neuron whose weight vector has smallest Euclidean distance from the input vector x  The winning node is that which is in some sense ‘closest’ to the input vector ‘Euclidean distance’ is the straight line distance between the data points, if they were plotted on a (multi-dimensional) graph Euclidean distance between two vectors a and b, a = (a1,a2,…,an), b = (b1,b2,…bn), is calculated as:   d a, b   a  bi  2 i i Euclidean distance Weight Update  SOM Weight Update Equation  wj(t +1) = wj(t) + (t) w(x)(j,t) [x - wj(t)] “The weights of every node are updated at each cycle by adding  Current learning rate × Degree of neighbourhood with respect to winner × Difference between current weights and input vector L.  to the current weights”  Example of (t) Example of rate w(x)(j,t)  No. of cycles –x-axis shows distance from winning node –y-axis shows ‘degree of neighbourhood’ (max. 1) Example: Self-Organizing Maps    The animals should be ordered by a neural networks. And the animals will be described with their attributes(size, living space). e.g. Mouse = (0/0) Mouse Lion Size: small medium Size small=0 medium=1 big=2 Land Land Living space Air=2  (0/0) (1/0) Horse space: Shark Dove Living small big Land=0 big Water=1 Land (2/0) Water (2/1) Air (0/2) Example: Self-Organizing Maps  This training will be very often repeated. In the best case the animals should be at close quarters ordered by similarest attribute. (0.75/0.6875) (0.1875/1.25) Dove (1.125/1.625) (1.375/0.5) (1/0.875) (1.5/0) Hourse (1.625/1) Shark (1/0.75) Lion (0.75/0) Mouse Land animals Example: Self-Organizing Maps Animal names and their attributes is has likes to Small Medium Big 2 legs 4 legs Hair Hooves Mane Feathers Hunt Run Fly Swim Dove 1 0 0 1 0 0 0 0 1 0 0 1 0 Hen 1 0 0 1 0 0 0 0 1 0 0 0 0 Duck 1 0 0 1 0 0 0 0 1 0 0 0 1 Goose 1 0 0 1 0 0 0 0 1 0 0 1 1 Owl 1 0 0 1 0 0 0 0 1 1 0 1 0 Hawk 1 0 0 1 0 0 0 0 1 1 0 1 0 Eagle 0 1 0 1 0 0 0 0 1 1 0 1 0 Fox 0 1 0 0 1 1 0 0 0 1 0 0 0 Dog 0 1 0 0 1 1 0 0 0 0 1 0 0 Wolf 0 1 0 0 1 1 0 1 0 1 1 0 0 Cat 1 0 0 0 1 1 0 0 0 1 0 0 0 Tiger 0 0 1 0 1 1 0 0 0 1 1 0 0 Lion 0 0 1 0 1 1 0 1 0 1 1 0 0 Horse 0 0 1 0 1 1 1 1 0 0 1 0 0 Zebra 0 0 1 0 1 1 1 1 0 0 1 0 0 Cow 0 0 1 0 1 1 1 0 0 0 0 0 0 A grouping according to similarity has emerged peaceful birds hunters [Teuvo Kohonen 2001] Self-Organizing Maps; Springer; Conclusion  Advantages     SOM is Algorithm that projects high-dimensional data onto a two-dimensional map. The projection preserves the topology of the data so that similar data items will be mapped to nearby locations on the map. SOM still have many practical applications in pattern recognition, speech analysis, industrial and medical diagnostics, data mining Disadvantages   Large quantity of good quality representative training data required No generally accepted measure of ‘quality’ of a SOM e.g. Average quantization error (how well the data is classified) Discussion topics   What is the main purpose of the SOM? Do you know any example systems with SOM Algorithm? References  [Witten and Frank (1999)] Witten, I.H. and Frank, Eibe. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco, CA, USA. 1999  [Kohonen (1982)] Teuvo Kohonen. Self-organized formation of topologically correct feature maps. Biol. Cybernetics, volume 43, 59-62  [Kohonen (1995)] Teuvo Kohonen. Self-Organizing Maps. Springer, Berlin, Germany  [Vesanto (1999)] SOM-Based Data Visualization Methods, Intelligent Data  Analysis, 3:111-26  [Kohonen et al (1996)]  PAK: The Self-Organizing Map program package, " Report  A31, Helsinki University of Technology, Laboratory of  Computer and Information Science, Jan. 1996  [Vesanto et al (1999)]  Organizing Map in Matlab: the SOM Toolbox. In Proceedings  of the Matlab DSP Conference 1999, Espoo, Finland, pp. 35-40, 1999.  [Wong and Bergeron (1997)] Pak Chung Wong and R. Daniel Bergeron. 30 Years of Multidimensional Multivariate Visualization. In Gregory M.  Nielson, Hans Hagan, and Heinrich Muller, editors, Scientific  Visualization - Overviews, Methodologies and Techniques, pages 3-33, Los Alamitos, CA, 1997. IEEE Computer Society Press.  [Honkela (1997)]  Processing, PhD Thesis, Helsinki, University of Technology,  Espoo, Finland  [SVG wiki]  [Jost Schatzmann (2003)] Final Year Individual Project Report Using Self-Organizing Maps to Visualize Clusters and Trends in Multidimensional Datasets  Imperial college London 19 June 2003 T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen, "SOM J. Vesanto, J. Himberg, E. Alhoniemi, J Parhankangas. Self- T. Honkela, Self-Organizing Maps in Natural Language http://en.wikipedia.org/wiki/Scalable_Vector_Graphics

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction to Machine Learning