Download Deep Learning Overview

Deep Learning Overview Jaya Thomas Computer Science Department SUNY Korea Sources:https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10workshop-tutorial-final.pdf http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/ Deep Learning = Learning Representations/Features Deep Learning = Learning Hierarchical Representations Trainable Feature Hierarchy Three Types of Training Protocols Deep Learning: Why Training is hard  Depending on the situation one or other situation tend to prevail  If first hypothesis (under fitting): use better optimization   Active area of research If second hypothesis (over fitting): use better regularization    Unsupervised learning Stochastic <<dropout>> training Solution: Initialize hidden layers using unsupervised learning  Force network to represent latent structure of input distribution  Encourage hidden layers to encode that structure Unsupervised Pre-training  We will use greedy, layer wise procedure  Train one layer at a time, from first to last, with unsupervised criterion  Fix the parameters of previous hidden layers  Previous layers viewed as feature extraction   Procedure:  First layer: find hidden unit features that are more common in training input than in random inputs  Second layer: find combinations of hidden unit features that are more common than random hidden unit features  Third layer: find combination of….  Etc. Pre-training initializes the parameters in a region such that the near local optima overfit less the data Fine tuning  Once all the layers are pre-trained  Add output layer  Train the whole network using supervised learning (Back propagation)  Supervised learning is performed as in a regular feed forward network  Forward propagation, back propagation and update  We call this last phase fine-tuning  All parameters are “tuned” for the supervised task at hand  Representation is adjusted to be more discriminative Deep Learning What kind of unsupervised learning Algorithm  Stacked restricted Boltzmann machine  Stacked Autoencodes  Stacked denoise autoencoders  Stacked semi-supervised embeddings  Stacked kernel PCA  Stacked independent subspace analysis Advantage  Architecture of a CNN is designed to take advantage of the 2D structure of an input image   Achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features. CNN are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. Architecture    A CNN consists of a number of convolutional and subsampling layers optionally followed by fully connected layers Input to a convolutional layer is a m x m x r image where m x m is the height and width of the image and r is the number of channels, e.g. an RGB image has r=3 Convolutional layer will have k filters (or kernels)  size n x n x q  n is smaller than the dimension of the image and,  q can either be the same as the number of channels r or smaller and may vary for each kernel Fig 1: First layer of a convolutional neural network with pooling. Units of the same color have tied weights and units of different color represent different filter maps  A convolutional neural network consists of several layers. These layers can be of three types:    Convolutional Max Pooling Fully-Connected Source: http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/ Convolutional Convolutional: Convolutional layers consist of a rectangular grid of neurons. It requires that the previous layer also be a rectangular grid of neurons. Each neuron takes inputs from a rectangular section of the previous layer; the weights for this rectangular section are the same for each neuron in the convolutional layer. Thus, the convolutional layer is just an image convolution of the previous layer, where the weights specify the convolution filter. In addition, there may be several grids in each convolutional layer; each grid takes inputs from all the grids in the previous layer, using potentially different filters. Feature Extraction using CNN Locally Connected Networks Solution to this problem is to restrict the connections between the hidden units and the input units, allowing each hidden unit to connect to only a small subset of the input units.  Each hidden unit will connect to only a small contiguous region of pixels in the input. This idea of having locally connected networks also draws inspiration from how the early visual system is wired up in biology. Specifically, neurons in the visual cortex have localized receptive fields (i.e., they respond only to stimuli in a certain location Pooling Using features obtained after Convolution for Classification  In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Example : Consider images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size (96−8+1)∗(96−8+1) =7921(96−8+1)∗(96−8+1)=7921, and since we have 400 features, this results in a vector of 892∗400=3,168,400892∗400=3,168,400 features per example! Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to overfitting  Max-Pooling: After each convolutional layer, there may be a pooling layer.  The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block.  There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block.  Our pooling layers will always be max-pooling layers; that is, they take the maximum of the block they are pooling.  Fully-Connected: Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers.   A fully connected layer takes all neurons in the previous layer (be it fully connected, pooling, or convolutional) and connects it to every single neuron it has. Fully connected layers are not spatially located anymore (you can visualize them as one-dimensional), so there can be no convolutional layers after a fully connected layer. Forward Propagation 1. Compute activations for layers with known inputs: 2. Compute inputs for the next layer from these activations: 3. Repeat steps 1 and 2 until you reach the output layer, and know values of yL. Forward Propagation in Convolutional Neural Network  Suppose we have some N×N square neuron layer which is followed by our convolutional layer. If we use an m×m filter ω, our convolutional layer output will be of size (N−m+1)×(N−m+1). In order to compute the pre-nonlinearity input to some unit xℓij in our layer, we need to sum up the contributions (weighted by the filter components) from the previous layer cells:  Then, the convolutional layer applies its nonlinearity: Back Propagation Back Propagation in Convolutional Network Back Propagation  upsample operation has to propagate the error through the pooling layer by calculating the error w.r.t to each unit incoming to the pooling layer  Ex> if mean pooling then upsample simply uniformly distributes the error for a single pooling unit among the units which feed into it in the previous layer. In max pooling the unit which was chosen as the max receives all the error since very small changes in input would perturb the result only through that unit. Gradient Descent Thank You

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Deep Learning Overview