Download Principles of connectionism

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Cognitive Computing 2012
The computer and the mind
4. CONNECTIONISM
Professor Mark Bishop
The representational theory of mind

Cognitive states are relations to mental
representations which have content.

A cognitive state is a state (of mind) denoting
knowledge; understanding; beliefs etc.

Cognitive processes are mental operations on
these representations.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
2
Computational theories of mind

Cognitive states are computational relations to
mental representations which have content.

Cognitive processes (changes in cognitive
states) are computational operations on the
mental representations.

Strong computational theories of mind claim that
the mental representations are themselves
fundamentally computational in character.


Hence the mind - thoughts, beliefs,
intelligence, problem solving etc. - is ‘merely’
a computational machine.
Computational theories of mind typically come in
two flavours:


11/05/2017
Mental Representation
Computations,
e.g. +, -, x, / etc.
The connectionist computational theory of
mind, (CCTM);
The digital computational theory of mind,
(DCTM).
(c) Bishop: An introduction to Cognitive Science
“Grass is green”
3
Basic connectionist computational
theory of mind (CCTM)

The basic connectionist theory of mind is
neutral on exactly what constitutes
[connectionist]’ mental representations’


I.e. The connectionist ‘mental representations’
might not be realised ‘computationally’.
Cognitive states are computational
relations to mental representations which
have content.

Under the CCTM the computational
architecture and (mental) representations
are connectionist.

Hence for CCTM cognitive processes
(changes in cognitive states) are
computational operations on these
connectionist mental representations.
11/05/2017
Mental Representation
Computations,
e.g. +, -, x, / etc.
(c) Bishop: An introduction to Cognitive Science
Happiness
4
A ‘non-computational’ connectionist
theory of mind

Conceptually it is also possible to formulate a connectionist non-computational theory of
mind where:

Cognitive states are relations to mental representations which have content.

But the mental representations might not be ‘computational’ in character; perhaps they are
instantiated on a non-computational connectionist architecture
AND / OR


the relation between cognitive state and mental representation is non-computational; or the
relationship between one cognitive state and the next is non-computational.
The term ‘non-computational’ here typically refers to a mode of [information] processing that,
in principle, cannot be carried out via Turing Machine.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
5
The connectionist computational
theory of mind

A form of ‘Strong AI’ which holds that a suitably programmed
computer ‘really is a mind’, (it has thoughts, beliefs, intelligence etc.):

Cognitive states are computational relations to fundamentally
computational mental representations which have content defined by
their core computational structure.

Cognitive processes (changes in cognitive states) are computational
operations on these computational mental representations.

The computational architecture and representations are
computationally connectionist.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
6
Artificial neural networks


What is Neural Computing /
Connectionism?

It defines a mode of computing that seeks
to include the style of computing used
within the brain.

It is a style of computing based on learning
from experience as opposed to classical,
tightly specified, algorithmic, methods.
A Definition:

11/05/2017
“Neural computing is the study of networks
of adaptable nodes which, through a
process of learning from task examples,
store experiential knowledge and make it
available for use.”
(c) Bishop: An introduction to Cognitive Science
7
The link between connectionism and
associationism


By considering that:

the input nodes of an artificial neural network represent data from sensory transducers (the 'sensations');

the internal (hidden) network nodes to encode ideas;

the inter-node weights indicate strengths between ideas;

the output nodes define behaviour;
… then we see a correspondence between connectionism and associationism.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
8
The neuron: the adaptive node of the brain

Within the brain neurons are often
organized into complex regular structures.
Input to neurons occurs at points called
synapses located on the cell’s dendritic
tree.

Synapses are either excitatory, where
activity aids the overall firing of the neuron,
or inhibitory where activity inhibits the
firing of the neuron.

The neuron effectively takes all firing
signals into account by summing the
synaptic effects and firing if this is greater
than a firing threshold, T.

The cell’s output is transmitted along a
fibre called the axon. A neuron is said to
fire when the axon transmits burst of
pulses at around 100Hz.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
9
The McCulloch/Pitts cell

In the MCP model adaptability comes from representing each
synaptic junction by a variable weight Wi, indicating the degree to
which the neuron should react to this particular input.

By convention positive weights represent excitatory synapses
and negative inhibitory synapses.
The neuron firing threshold is represented by a variable T

In modern MCP cells T is usually clamped to zero and a
threshold implemented using a variable bias, b.

A bias is simply a weight connected to an input clamped
to [+1].

In the MCP model the firing of the neuron is represented by the
number 1, and no firing by 0.

Equivalent to a proposition TRUE or FALSE
“Thus in Psychology, .. , the fundamental relations are those of
two valued logic”, MCP (1943).



Activity at the ith input to the neuron is represented by the symbol
Xi and the effect of the ith synapse by a weight Wi.

Net input at a synapse on the MCP cell is: Xi x Wi

The MCP cell will fire if: (( Xi x Wi) + b)  0
11/05/2017
(c) Bishop: An introduction to Cognitive Science
10
So, what type of tasks can neural
networks do?
11/05/2017

From McCulloch & Pitts, (1943), a
network of MCP cells can ,
“compute only such numbers as
a Turing Machine; second that
each of the latter numbers can be
computed by such a net”.

A neural network classifier,
(above) maps an arbitrary input
vector to an (arbitrary), output
class.
(c) Bishop: An introduction to Cognitive Science
11
Vector association

An associative neural network is one
that maps (associates), a given input
vector to a particular output vector.

Associative Networks in 'prediction'.

11/05/2017
eg. Given input vector [age and alcohol
consumed], map to the output vector,
[the subjects response time].
(c) Bishop: An introduction to Cognitive Science
12
What is a learning rule?

To enable a neural network to either associate or classify
correctly we need to correctly specify all its weights and
thresholds.

In a typical network there may be many thousands of
weight and threshold values.

A neural network learning rule is an procedure for
automatically calculating these values.

11/05/2017
Typically there are far too many to calculate by hand.
(c) Bishop: An introduction to Cognitive Science
13
Hebbian learning

“When an axon of cell A is near enough to excite
cell B and repeatedly or persistently takes part in
firing it, some growth process or metabolic changes
take place in one or both cells such that A’s
efficiency as one of the cells firing B, is increased,”

... from, Hebb, D., (1949), The Organisation of
Behaviour.

ie. When two neurons are simultaneously excited
then the strength of the connection between them
should be increased.

"The change in weight connecting input Ii and output
Oj is proportional (via a learning rate tau, ) to the
product of their simultaneous activations."

11/05/2017
W ij =  Ii Oj
(c) Bishop: An introduction to Cognitive Science
14
Training sets

The function that the neural
network is to learn is defined
by its ‘training set’.

11/05/2017
For example, to learn the
logical OR function the
training set would consist of
four input-output vector
pairs defined as follows.
The OR Function
Pat
1.
2.
3.
4.
(c) Bishop: An introduction to Cognitive Science
I/P1
0
0
1
1
I/P2
0
1
0
1
O/P
0
1
1
1
15
Rosenblatt’s perceptron

When Rosenblatt first
published information on the,
‘Perceptron Convergence
Procedure’ in 1959, it was
seen as a great advance on
the work of Hebb.

The full (‘classical’)
perceptron model can be
divided into three layers (see
opposite):
11/05/2017
(c) Bishop: An introduction to Cognitive Science
16
Perceptron structure

The First Layer (Sensory or S-Units)


The Second Layer (Association or A-Units)



The first layer, retina, comprises of a regular array of S-Units.
The input to each A-Unit is the weighted sum of the output of a randomly selected set of SUnits. These weights do not change.
Thus A-Units respond only to particular patterns, extracting specific localized features from the
input.
The Third Layer (Response or R-Units)


11/05/2017
Each R-Unit has a set of variable weighted connection to a set of A-Units. An R-Unit outputs
+1 if the sum of its weighted input is greater than a threshold T, -1 otherwise.
In some perceptron models, an active R-Unit will inhibit all A-Units not in its input set.
(c) Bishop: An introduction to Cognitive Science
17
The ‘perceptron convergence
procedure’

If the perceptron response is correct, then no change is made in the weights to RUnits.

If the response of an R-Unit is incorrect then it is necessary to:

Decrement all active weights if the R-Unit fires when it is not meant to and increase the
threshold.

Or conversely increment active weights and decrement the threshold, if the R-Unit is not
firing when it should.
The Perceptron Convergence Theorem (Rosenblatt)


11/05/2017
... states that the above procedure is guaranteed to find a set of weights to perform a
specified mapping on a single layer network, if such a set of weights exist!
(c) Bishop: An introduction to Cognitive Science
18
The ‘order’ of a perceptron



The order of a perceptron is defined as the largest
number of inputs to any of its A-Units.
Perceptrons will only be useful if this 'order' remains
constant as the size of the retina is increased.

Consider a simple problem - the perceptron should fire
if there is one or more groups of [2*2] black pixels on
the input retina.

Opp. - A [4x4] blob detecting Perceptron

This problem requires that perceptron has as many AUnits as there are pixels on the retina, less duplications
due to edge effects. Each A-Unit covers a [2*2] square
and computes the AND of its inputs.

If all the weights to the R-Unit are unity and the
threshold is just lower than unity, then the perceptron
will fire if there is a black square anywhere on the
retina.
The order of the problem is thus four O(4). This is
order remains constant irrespective of the size of the
retina.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
19
The delta rule: a modern formulation
of the perceptron learning procedure

The modern formulation of the single layer perceptron
learning rule for changing weights in a single layer network of
MCP cells, following the presentation of input/output training
pair, P, is:
p Wij =  (Tpj - Opj) Ipi =  pj Ipi



 is called the learning rate, (eta).
(Tpj - Opj) is the error (or delta) term, pj, for the jth neuron.
Ipi is the ith element of the input vector, Ip.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
20
Two input MCP cell

The output function can be represented in
two dimensions



Using the x-axis for one input
The y-axis for the other.
Examining the MCP equation for two inputs:

X1 W1 + X2 W2 > T

The MCP output function can be represented
by a line dividing the two dimensional input
space into two areas.

The above equation can be re-written as an
equation representing the line dividing the
input space into two classes:


11/05/2017
X1 W1 + X2 W2 = T OR
X2 = T / W2 - X1 W1 / W2
(c) Bishop: An introduction to Cognitive Science
21
Linearly separable problems

The two input MCP cell can
correctly classify any function that
can be separated by a straight
dividing line in input space.

This class of problems are
defined as ‘Linearly Separable’
problems.


eg. the OR/AND functions.
The MCP threshold parameter
performs a simple affine
transformation on the line dividing
the two classes.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
22
Linearly inseparable problems

There are many problems that cannot
be linearly divided in input space

Minsky and Papert defined these,
‘Hard Problems’.

The most famous example of this
class of problem is the ‘XOR’
problem.

The two input XOR problem is not
linearly separable in two dimensions

11/05/2017
See figure opposite.
(c) Bishop: An introduction to Cognitive Science
23
To make a problem linearly separable

To solve the two input XOR problem it needs to be
made linearly separable in input space.


Consider an XOR function defined by three inputs
(a,b,c), where (c = a AND b)


Thus embedding the 2 input XOR in a 3 dimensional
input space.
In general a two class, k-input problem can be
embedded in a higher n-dimensional hypercube (n >
k).


Hence an extra input (dimension) is required.
A two class problem is linearly separable in n
dimensions if there exists a hyper-plane to separate the
classes.
cf. The ‘Support Vector Machine’

11/05/2017
Here we map from an input space, (where data are not
linearly separable), to a sufficiently large feature space,
where classes are linearly separable.
(c) Bishop: An introduction to Cognitive Science
24
Hard problems

In their book ‘Perceptrons’,
Minsky & Papert showed that
there were several simple image
processing tasks that could not be
performed by Single Layer
Perceptrons (SLP‘s) of fixed
order.

All these problems are easy to
compute using ‘conventional’
algorithmic methods.
11/05/2017
(c) Bishop: An introduction to Cognitive Science
25
Connectedness

A Diameter Limited Perceptron is one where the
inputs to an A-Units must fall within a receptive field
of size D.

Clearly only (b) and (c) are connected, hence the
perceptron should fire only on (b) and (c).

The A-Units can be divided into three groups. Those
on the left, the middle and the right of the image.



11/05/2017
Clearly for images (a) & (c) it is only the left group that
can tell the difference, hence there must be higher
weights activated by the left A-Units in image (c) than
image (a).
Clearly for images (a) & (b) it is only the right group that
can tell the difference, hence there must be higher
weights activated by the right A-Units on (b) than on (a).
However the above two requirements give (d) higher
activation than (b) and (c), which implies that if a
threshold is found that can classify (b) & (c) as
connected, then it will incorrectly classify (d)!
(c) Bishop: An introduction to Cognitive Science
26
Multi-layer Perceptrons

Solutions to Minsky & Papert’s Hard problems arose with the development of learning rules for multi-layer
perceptions.

The most famous of these is called ‘Back [error] Propagation’ and was initially developed by the control
engineer Paul Werbos and published in the Appendix to this PhD thesis in 1974, but was ignored form may
years.


Back propgation was independently rediscovered by Le Cun and published (in French) in 1985 .


Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis,
Harvard University, 1974
Y. LeCun: Une procédure d'apprentissage pour réseau a seuil asymmetrique (a Learning Scheme for Asymmetric Threshold
Networks), Proceedings of Cognitiva 85, 599-604, Paris, France, 1985,
However the rule gained international renown with the publication of Rumelhart & McClelland’s ‘Parallel
Distributed Processing’ texts in the early 1980s and they are the authors most stronfly associated with it.

11/05/2017
Rumelhart, D.E., J.L. McClelland and the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the
Microstructure of Cognition. Volume 1: Foundations, Cambridge, MA: MIT Press.
(c) Bishop: An introduction to Cognitive Science
27