Download overview imagenet neural networks alexnet meta-network

V ISUAL CATEGORISATION OF I MAGES G ALEN D EERING L EIDEN I NSTITUTE OF AND M ICHAEL L EW A DVANCED C OMPUTER S CIENCE O VERVIEW I MAGE N ET Computers are not as efficient at recognising objects in images as humans are, despite decades of research. Improvements are being made by leaps and bounds. We’ll be taking a look at current techniques of image categorisation and potential further developments that can be made in this highly complex field. I MAGE N ET refers to a large manually annotated database of images, created for the ImageNet Large Scale Visual Recognition Challenge. N EURAL N ETWORKS The go-to basis to creating common solutions to the image categorisation problem are neural networks. Considerable networks of interconnected neurons (Fig.2), each of which receives inputs, performs a weighted dot product, and sends the result to its output axon. Fig.2; Anatomy of a single neuron http://cs231n.github.io/convolutional-networks/ Neurons will typically be grouped into layers inside a single network (Fig.3), and take their inputs from the previous layer of neurons (with the exception of the input layer), and send their output to the next layer (with the exception of the output layer). The input of the input layer of neurons in equivalent with the input of the neural network as a whole, and the output of the output layer is intuitively equivalent with the output of the network. Fig.3; A neural network http://cs231n.github.io/convolutional-networks/ → Fig.1; Example manual annotation of an image O. Russakovsky, J. Deng et al ILSVRC, founded in 2010, is an annual competition focused on the categorisation of images using AI, commonly including participants from over fifty institutions. Every year, groups of researchers attempt to develop techniques that perform image categorisation more efficiently or more accurately than previous techniques. ILSVRC has in this way encouraged the greatest breakthroughs in the field since its founding year. A LEX N ET C OMPARISONS Previous winner of ImageNet, the large, deep convolutional neural network trained by Alex Krizhevsky et al to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2012 contest into the 1000 different classes. AlexNet was constructed similarly to L E N ET, but was expanded in every dimension and used several stacked convolutional layers, as opposed to a single convolutional layer immediately followed by a POOL-layer, which as common at the time. AlexNet has led to many significant improvements in the field and as such is an interesting target for comparison with others. Other networks developed with the same goal in mind, using subtly different techniques: M ETA - NETWORK Working on the assumption that each set of techniques develops its own strengths and weaknesses, one might imagine that the results of these different algorithms could be combined to produce even more accurate results. To achieve this we seek to create a "metanetwork"; a convolutional neural network that takes the output of several other AI techniques as its input, and performs its categorisation based on these values. Using this, we can potentially eliminate the weaknesses of certain techniques by allowing the meta-network to focus on the strengths of other techniques for any given image. The efficiency of this technique is still in question, however. • LeNet • ZF Net (2013 ILSVRC winner) • SqueezeNet • GoogLeNet (2014 ILSVRC winner) • VGGNet • ResNet (2015 ILSVRC winner) Whilst many of these networks are reworks of or expansions on one another, a significant portion also introduces new techniques to the scene of their respective years. These many different branches each have subtly different flavours, and it will be interesting to see what the major differences are in their performance in each ILSVRC subcategory. The subcategories in which performance may differ include: • Image classification (Fig. 5, Fig. 6) • Single-Object localisation • Object detection (Fig. 1) Fig.5; Example images O. Russakovsky, J. Deng et al In the context of this research we speak of C ONVOLUTIONAL N EURAL N ETWORKS, a type of neural network designed with the assumption that the inputs are images. In practice this means that in each layer of neurons the neurons are arranged in three dimensions denoted as depth, height, and width (Fig.4). Using this, one can create a neural network where the input layer holds an image, such that the layer’s height and width match the dimensions of the image, and the depth would be, for example, equal to 3 (for the values of the Red, Green, and Blue channels) Fig.4; A abstract visualisation of a convolutional Neural Network (neurons pictured) http://cs231n.github.io/convolutional-networks/ Fig.6; Different Convolutional Neural Network layers perform different operations on the input from the image, and the FC (Fully Connected) layer computes the class scores of the potential categories of the image. In this example, the final results indicate a far greater likelihood that the input image on the left contains a car than a horse. http://cs231n.github.io/convolutional-networks/

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download overview imagenet neural networks alexnet meta-network