Download The biology of neural networks

Document related concepts
no text concepts found
Transcript
Neural networks
Eric Postma
IKAT
Universiteit Maastricht
Overview
Introduction: The biology of neural networks
•
the biological computer
•
brain-inspired models
•
basic notions
Interactive neural-network demonstrations
•
Perceptron
•
Multilayer perceptron
•
Kohonen’s self-organising feature map
•
Examples of applications
A typical AI agent
Two types of learning
•
Supervised learning
– curve fitting, surface fitting, ...
•
Unsupervised learning
– clustering, visualisation...
An input-output function
Fitting a surface to four points
(Artificial) neural networks
The digital computer
versus
the neural computer
The Von Neumann architecture
The biological architecture
Digital versus biological computers
5 distinguishing properties
• speed
• robustness
• flexibility
• adaptivity
• context-sensitivity
Speed: The “hundred time steps” argument
The critical resource that is most obvious is
time. Neurons whose basic computational
speed is a few milliseconds must be made to
account for complex behaviors which are
carried out in a few hudred milliseconds
(Posner, 1978). This means that entire complex
behaviors are carried out in less than a hundred
time steps.
Feldman and Ballard (1982)
Graceful Degradation
performance
damage
Flexibility: the Necker cube
vision = constraint satisfaction
Adaptivitiy
processing implies learning
in biological computers
versus
processing does not imply learning
in digital computers
Context-sensitivity: patterns
emergent properties
Robustness and context-sensitivity
coping with noise
The neural computer
•
Is it possible to develop a model after the
natural example?
•
Brain-inspired models:
– models based on a restricted set of structural en
functional properties of the (human) brain
The Neural Computer (structure)
Neurons,
the building blocks of the brain
Neural activity
out
in
Synapses,
the basis of learning and memory
Learning: Hebb’s rule
neuron 1
synapse
neuron 2
Connectivity
An example:
The visual system is a
feedforward hierarchy of
neural modules
Every module is (to a
certain extent)
responsible for a certain
function
(Artificial)
Neural Networks
•
Neurons
– activity
– nonlinear input-output function
•
Connections
– weight
•
Learning
– supervised
– unsupervised
Artificial Neurons
•
•
•
input (vectors)
summation (excitation)
output (activation)
i1
i2
i3
e
a = f(e)
Input-output function
•
nonlinear function:
1
f(x) =
1 + e -x/a
a0
f(e)
a
e
Artificial Connections
(Synapses)
•
wAB
– The weight of the connection from neuron A
to neuron B
A
wAB
B
The Perceptron
Learning in the Perceptron
•
Delta learning rule
– the difference between the desired output t
and the actual output o, given input x
•
Global error E
– is a function of the differences between the
desired and actual outputs
Gradient Descent
Linear decision boundaries
The history of the Perceptron
•
Rosenblatt (1959)
•
Minsky & Papert (1961)
•
Rumelhart & McClelland (1986)
The multilayer perceptron
input
hidden
output
Training the MLP
•
supervised learning
–
–
–
–
each training pattern: input + desired output
in each epoch: present all patterns
at each presentation: adapt weights
after many epochs convergence to a local minimum
phoneme recognition with a MLP
Output:
pronunciation
input:
frequencies
Non-linear decision boundaries
Compression with an MLP
the autoencoder
hidden representation
Learning in the MLP
Preventing Overfitting
GENERALISATION
= performance on test set
•
•
•
Early stopping
Training, Test, and Validation set
k-fold cross validation
– leaving-one-out procedure
Image Recognition with the MLP
Hidden Representations
Other Applications
•
•
Practical
– OCR
– financial time series
– fraud detection
– process control
– marketing
– speech recognition
Theoretical
– cognitive modeling
– biological modeling
Some mathematics…
Perceptron
Derivation of the delta learning rule
Target output
Actual output
h=i
MLP
Sigmoid function
•
May also be the tanh function
– (<-1,+1> instead of <0,1>)
•
Derivative f’(x) = f(x) [1 – f(x)]
Derivation generalized delta rule
Error function (LMS)
Adaptation hidden-output weights
Adaptation input-hidden weights
Forward and Backward Propagation
Decision boundaries of Perceptrons
Straight lines (surfaces), linear separable
Decision boundaries of MLPs
Convex areas (open or closed)
Decision boundaries of MLPs
Combinations of convex areas
Learning and representing
similarity
Alternative conception of neurons
•
Neurons do not take the weighted sum of their
inputs (as in the perceptron), but measure the
similarity of the weight vector to the input
vector
•
The activation of the neuron is a measure of
similarity. The more similar the weight is to
the input, the higher the activation
•
Neurons represent “prototypes”
Course Coding
2nd order isomorphism
Prototypes for preprocessing
Kohonen’s SOFM
(Self Organizing Feature Map)
•
•
Unsupervised learning
Competitive learning
winner
output
input (n-dimensional)
Competitive learning
•
•
Determine the winner (the neuron of which
the weight vector has the smallest distance
to the input vector)
Move the weight vector w of the winning
neuron towards the input i
i
i w
w
Before learning
After learning
Kohonen’s idea
•
Impose a topological order onto the
competitive neurons (e.g., rectangular map)
•
Let neighbours of the winner share the
“prize” (The “postcode lottery” principle.)
•
After learning, neurons with similar weights
tend to cluster on the map
Topological order
neighbourhoods
• Square
– winner (red)
– Nearest neighbours
•
Hexagonal
– Winner (red)
– Nearest neighbours
A simple example
•
A topological map of 2 x 3 neurons
and two inputs
visualisation
2D input
input
weights
Weights before training
Input patterns
(note the 2D distribution)
Weights after training
Another example
•
Input: uniformly randomly distributed points
•
Output: Map of 202 neurons
•
Training
– Starting with a large learning rate and
neighbourhood size, both are gradually decreased
to facilitate convergence
Dimension reduction
Adaptive resolution
Application of SOFM
Examples (input)
SOFM after training (output)
Visual features (biologically plausible)
Relation with statistical methods 1
•
Principal Components Analysis (PCA)
Projections of data
pca1
pca2
pca1
pca2
Relation with statistical methods 2
•
•
Multi-Dimensional Scaling (MDS)
Sammon Mapping
Distances in highdimensional space
Image Mining
the right feature
Fractal dimension in art
Jackson Pollock (Jack the Dripper)
Fractal dimension
Taylor, Micolich, and Jonas (1999). Fractal Analysis of Pollock’s drip
paintings. Nature, 399, 422. (3 june).
}
Creation date
Range for
natural images
Our Van Gogh research
Two painters
•
Vincent Van Gogh paints Van Gogh
•
Claude-Emile Schuffenecker paints Van Gogh
Sunflowers
•
Is it made by
– Van Gogh?
– Schuffenecker?
Approach
•
•
Select appropriate features (skipped here,
but very important!)
Apply neural networks
van Gogh
Schuffenecker
Training Data
Van Gogh (5000 textures)
Schuffenecker (5000 textures)
Results
•
Generalisation performance
•
96% correct classification on untrained data
Resultats, cont.
•
Trained art-expert
network applied to
Yasuda sunflowers
•
89% of the textures is
geclassificeerd as a
genuine Van Gogh
A major caveat…
•
Not only the painters are
different…
•
…but also the material
and maybe many other things…