Download overview imagenet neural networks alexnet meta-network

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural modeling fields wikipedia , lookup

Computer vision wikipedia , lookup

Biological neuron model wikipedia , lookup

Neural oscillation wikipedia , lookup

Neural coding wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Optogenetics wikipedia , lookup

Metastability in the brain wikipedia , lookup

Synaptic gating wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Artificial neural network wikipedia , lookup

Neural engineering wikipedia , lookup

Central pattern generator wikipedia , lookup

Development of the nervous system wikipedia , lookup

Nervous system network models wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Catastrophic interference wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
V ISUAL CATEGORISATION OF I MAGES
G ALEN D EERING
L EIDEN I NSTITUTE
OF
AND
M ICHAEL L EW
A DVANCED C OMPUTER S CIENCE
O VERVIEW
I MAGE N ET
Computers are not as efficient at recognising objects in images as humans are, despite
decades of research. Improvements are being
made by leaps and bounds. We’ll be taking
a look at current techniques of image categorisation and potential further developments
that can be made in this highly complex field.
I MAGE N ET refers to a large manually annotated database of images, created for the
ImageNet Large Scale Visual Recognition
Challenge.
N EURAL N ETWORKS
The go-to basis to creating common solutions
to the image categorisation problem are neural
networks. Considerable networks of interconnected neurons (Fig.2), each of which receives
inputs, performs a weighted dot product, and
sends the result to its output axon.
Fig.2; Anatomy of a single neuron
http://cs231n.github.io/convolutional-networks/
Neurons will typically be grouped into layers inside a single network (Fig.3), and take
their inputs from the previous layer of neurons (with the exception of the input layer),
and send their output to the next layer (with
the exception of the output layer). The input of
the input layer of neurons in equivalent with
the input of the neural network as a whole,
and the output of the output layer is intuitively equivalent with the output of the network.
Fig.3; A neural network
http://cs231n.github.io/convolutional-networks/
→
Fig.1; Example manual annotation of an image
O. Russakovsky, J. Deng et al
ILSVRC, founded in 2010, is an annual competition focused on the categorisation of images using AI, commonly including participants from over fifty institutions.
Every year, groups of researchers attempt to
develop techniques that perform image categorisation more efficiently or more accurately
than previous techniques. ILSVRC has in this
way encouraged the greatest breakthroughs in
the field since its founding year.
A LEX N ET
C OMPARISONS
Previous winner of ImageNet, the large, deep
convolutional neural network trained by Alex
Krizhevsky et al to classify the 1.2 million high-resolution images in the ImageNet
LSVRC-2012 contest into the 1000 different
classes. AlexNet was constructed similarly to
L E N ET, but was expanded in every dimension and used several stacked convolutional
layers, as opposed to a single convolutional
layer immediately followed by a POOL-layer,
which as common at the time. AlexNet has led
to many significant improvements in the field
and as such is an interesting target for comparison with others.
Other networks developed with the same goal
in mind, using subtly different techniques:
M ETA - NETWORK
Working on the assumption that each set of
techniques develops its own strengths and
weaknesses, one might imagine that the results of these different algorithms could be
combined to produce even more accurate
results.
To achieve this we seek to create a "metanetwork"; a convolutional neural network that
takes the output of several other AI techniques
as its input, and performs its categorisation
based on these values. Using this, we can potentially eliminate the weaknesses of certain
techniques by allowing the meta-network to
focus on the strengths of other techniques for
any given image.
The efficiency of this technique is still in
question, however.
• LeNet
• ZF Net (2013 ILSVRC winner)
• SqueezeNet
• GoogLeNet (2014 ILSVRC winner)
• VGGNet
• ResNet (2015 ILSVRC winner)
Whilst many of these networks are reworks
of or expansions on one another, a significant
portion also introduces new techniques to the
scene of their respective years. These many
different branches each have subtly different
flavours, and it will be interesting to see what
the major differences are in their performance
in each ILSVRC subcategory.
The subcategories in which performance may
differ include:
• Image classification (Fig. 5, Fig. 6)
• Single-Object localisation
• Object detection (Fig. 1)
Fig.5; Example images
O. Russakovsky, J. Deng et al
In the context of this research we speak
of C ONVOLUTIONAL N EURAL N ETWORKS, a
type of neural network designed with the assumption that the inputs are images. In practice this means that in each layer of neurons
the neurons are arranged in three dimensions
denoted as depth, height, and width (Fig.4).
Using this, one can create a neural network
where the input layer holds an image, such
that the layer’s height and width match the dimensions of the image, and the depth would
be, for example, equal to 3 (for the values of
the Red, Green, and Blue channels)
Fig.4; A abstract visualisation of a convolutional Neural Network (neurons pictured)
http://cs231n.github.io/convolutional-networks/
Fig.6; Different Convolutional Neural Network layers perform different operations on the input from the image, and the FC (Fully Connected) layer computes the class scores of the potential
categories of the image. In this example, the final results indicate a far greater likelihood that the input image on the left contains a car than a horse.
http://cs231n.github.io/convolutional-networks/