Download Multidimensional Scaling

Document related concepts

Neuropsychopharmacology wikipedia , lookup

Transcript
Neural Network
Neural networks offer modelling procedures for
nonlinear data, which enables you to discover more
complex relationships in your data. You can thus
develop more accurate and effective predictive
models.
The introductory section of these slides is loosely
based on two notes by A.T.S. Carneiro Note 1 and
Note 2.
1
Multilayer Perceptron Neural
Network Algorithm (MLP)
The scientific literature presents several
regression algorithms, several of which are
implemented in the SPSS Modeller. The multilayer
perceptron neural network algorithm (MLP) was
selected for these examples.
The MLP neural network algorithm is based on the
functional principles of biological neural
structures, as indicated in the figure (below).
2
Multilayer Perceptron Neural
Network Algorithm (MLP)
Neural computing researches attempt to organise
mathematical models similarly to the structures
and organization of neurons in biological brains to
achieve similar processing abilities, in addition to
the inherent capacities of biological brains, such
as learning based on examples, trial and error,
knowledge generalization, among many others.
3
Biological Neural Structure
4
Multilayer Perceptron Neural
Network Algorithm (MLP)
Based on this analogy to biological neurons, the
MLP neural network algorithm implements a neural
network that is composed of layers of artificial
neurons that are stimulated by input signals, which
are transmitted through the network via synapses
connecting neurons in different layers. The figure
presents an artificial perceptron neuron model
with n inputs {x1, x2, …, xn}, in which each input xi
has an associated synapse wi, and an output y.
5
Artificial Neuron Model
6
Multilayer Perceptron Neural
Network Algorithm (MLP)
There is also an additional neuron parameter,
named w0, known as bias that can be interpreted
as a synapse associated to an input x0 = -1. The
output of the neuron y is based on the product
between input vector x (x0, x1, x2, …, xn) and
vector w (w0,w1, w2, …, wn) composed of synapses,
including the bias (w0),
n
x.w 
x w
i 0
i i
7
Multilayer Perceptron Neural
Network Algorithm (MLP)
The neuron output is then obtained through the
activation function of neuron y   ( x.w), in which a
hyperbolic tangent function is usually adopted
(sigmoid-nature function), defined by for a
generic value a; however, it is convenient to use
other activation functions in certain scenarios.
 (a) 
1 e
1 e
a
a
8
Multilayer Perceptron Neural
Network Algorithm (MLP)
The artificial neuron model is feed forward, that
is, connections are directed from inputs (x0, x1, …,
xn) to output y of the neuron. The figure presents
the layout of perceptron neurons in an MLP neural
network, in which there are two neuron layers, one
hidden and one output. Regarding the neural
network that is presented in the figure, each
neuron of the hidden layer is connected to each
neuron in the output layer. Therefore, inputs of
the output layer neurons correspond to the
outputs of hidden layer neurons.
9
Multilayer Perceptron Neural
Network Algorithm (MLP)
The analyst that uses the artificial neural network
algorithm must choose how many neurons to use in
the hidden layer, considering the set of input
data, since with a low number of neurons in the
hidden layer; the neural network is not able to
generalize each class's data. However, a high
number of neurons in the hidden layer prompts
the over fitting phenomenon, in which the neural
network exclusively learns training data, and does
not generalize learning for data classes.
10
Example Of A Neural Network
With A Hidden Layer
11
Multilayer Perceptron Neural
Network Algorithm (MLP)
The neural network training process is conducted
based on a back propagation algorithm, with the
purpose of adjusting values that are associated to
synapses to allow the neural network to map an
input space and output space, in which the input
vectors x are samples of the input space and each
input vector is associated to an output z, which
can be represented by a vector z (z1, z2, …, zn),
based on a scalable value or a symbolic value.
Specifically, for symbolic values, a neuron in the
output layer corresponds to each of the possible
symbols that are associated to the input vector.
12
Multilayer Perceptron Neural
Network Algorithm (MLP)
During the neural network training process, a set
of input data is initially determined, to which the
associated output is known, and random values are
attributed to each synapse in the neural network.
The data is presented to the neural network and
the supplied output is compared to the actual
output, generating an error value. The error value
is then employed to adjust reverse neural network
synapses, from output to inputs (back
propagated).
13
Multilayer Perceptron Neural
Network Algorithm (MLP)
The process of adjusting synapse values is
repeated until an interruption criterion is
established, for example, a fixed number of
repetitions or a minimum error. Thus, in each
repetition, the outputs that are provided by the
neural network get closer to the actual output.
The synapse value correction equations minimize
errors between the output that is provided by the
neural network and the actual output.
14
Multilayer Perceptron Neural
Network Algorithm (MLP)
For conventional regression, the methodology
employed consists in using neurons with linear
activation function and assigning an output neuron
to map each of the output vector's components.
In cases where the output is the only scalable
value, the neural network is designed with a single
neuron in the output layer.
Two examples are considered.
15
Multilayer Perceptron Neural
Network Algorithm (MLP)
For conventional regression, the methodology
employed consists in using neurons with linear
activation function and assigning an output neuron
to map each of the output vector's components.
In cases where the output is the only scalable
value, the neural network is designed with a single
neuron in the output layer.
Two examples are considered.
16
Data Set A
This data set was downloaded from archive or
more directly example.
17
Data Set A
It describes variables affecting concrete
compressive strength. Concrete is the most
important material in civil engineering. The
concrete compressive strength is a highly
nonlinear function of age and ingredients. These
ingredients include cement, blast furnace slag, fly
ash, water, super-plasticizer, coarse aggregate,
and fine aggregate.
18
Data Set A
It describes variables affecting concrete compressive
strength. Concrete is the most important material in civil
engineering. The concrete compressive strength is a
highly nonlinear function of age and ingredients. These
ingredients include cement, blast furnace slag, fly ash,
water, super-plasticizer, coarse aggregate, and fine
aggregate.
Number of instances (observations): 1030
Number of Attributes: 9
Attribute breakdown: 8 quantitative input variables, and
1 quantitative output variable
19
Missing Attribute Values: None
Data Set A
The file that is used is organised in nine columns, in which
each line represents data that is collected from a
concrete mixture analysed in a lab. The first seven
columns correspond to data about concentration of
elements in the mixture, in kg by m3 of concrete; the
following column corresponds to the age of the concrete,
in days; and the last column corresponds to the
sturdiness of the concrete, which is measured in MPa
(mega Pascal, pressure measurement unit).
20
Variable Information
Name
Cement
Blast Furnace Slag
Fly Ash
Water
Superplasticizer
Coarse Aggregate
Data Type
quantitative
quantitative
quantitative
quantitative
quantitative
quantitative
Measurement
Description
kg in a
m3
mixture
Input Variable
kg in a
m3
mixture
Input Variable
kg in a
m3
mixture
Input Variable
kg in a
m3
mixture
Input Variable
kg in a
m3
mixture
Input Variable
kg in a
m3
mixture
Input Variable
m3
mixture
Input Variable
Fine Aggregate
quantitative
kg in a
Age
quantitative
Day (1~365)
Input Variable
Concrete compressive strength
quantitative
MPa
Output Variable
21
Data Set A
All of these attributes are numerical variables whose
values correspond to the measurement unit; thus, the
neural network that is used is designed to solve a
regression type problem in which the input space
comprises the first eight columns of the file (cement
concentration to age) and the output space corresponds
to the ninth column of the file (concrete sturdiness).
I-Cheng Yeh, "Modeling of strength of high performance
concrete using artificial neural networks," Cement and
Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).
22
What Does It Look Like?
0
200
400
150
200
250
800
950
1100
0
200
400
600
400
200
Cement
400
200
Blast Furnace Slag
0
200
100
Fly Ash
0
250
200
Water
150
30
15
Superplasticizer
0
1100
950
Coarse Aggregate
800
1000
800
Fine Aggregate
600
400
200
Age
0
80
40
Concrete compressive strength
0
200
400
600
0
100
200
0
15
30
600
800
1000
0
40
80
23
SPSS Commands
24
SPSS Commands
Note that the
covariates have
been standardised,
so that none are
numerically
dominant.
25
SPSS Commands
As a first trial a
small hidden layer
is adopted, this
may be increased
in succeeding
applications of the
procedure.
26
SPSS Commands
27
SPSS Commands
The outputs are
saved so that
further analysis
may be
undertaken.
28
Neural Network With Optimal
Architecture
29
Results Of The Fitting Of
The Data
The figure
presents a
dispersion chart
containing actual
concrete
resistance values
on the horizontal
axis and values
that are estimated
by the neural
network with two
hidden neurons in
the vertical axis.
30
Results Of The Fitting Of
The Data
The identity line
(diagonal), which
represents the
ideal result, in
which each
concrete sample
would present the
exact same
resistance as
estimated by the
neural network.
31
Results Of The Fitting Of
The Data
In this case, the
high concentration
of points next to
the identity line is
visually evident,
which is confirmed
by correlation and
error, reported in
the final table,
evidencing the
efficiency of the
proposed
methodology.
32
Residuals Of The Results Of
The Fitting Of The Data
33
Input Variables And Their
Respective Relevance
As previously
mentioned, each
neural network
input element has
an associated
synapse, which is
represented by a
numerical value
that controls
input; the higher
the synapse value,
the more relevant
is the input for
the result that is
generated by the 34
neural network.
Input Variables And Their
Respective Relevance
The figure
presents a
relevance chart
for each input
field to obtain
fitting results,
information that
becomes available
after the creation
of the node
corresponding to
the model
generated.
35
Additional Analysis
Metrics are applied to the results represent the linear
correlation coefficient, which is based on the equation
and absolute average error. Which is based on the
equation for both generic data sets, in which the
correlation coefficient measures the influence between
data sets, and can also be interpreted as a coefficient of
similarity between two data sets; and the absolute
average error measures inconsistencies between two
data sets..
36
Additional Analysis
37
Additional Analysis
Also
38
Additional Analysis
39
Additional Analysis
40
Additional Analysis
41
Additional Analysis
42
Additional Analysis
43
Syntax
The commands are extensive, if interested refer to the
printer friendly version of these notes.
44
Summary
Any solution will depend on the random number seed
employed, so values will only be broadly similar with those
presented here).
Size of Hidden
Layer
Correlation
Mean Absolute
Sum
of
Deviation
Squares
Error
2
.91
5.3
60.2
4
.93
4.7
46.4
6
.94
4.4
42.7
45
Summary
Size of Hidden
Layer
Correlation
Mean Absolute
Sum
of
Deviation
Squares
Error
2
.91
5.3
60.2
4
.93
4.7
46.4
6
.94
4.4
42.7
By analysing results that are obtained by the neural
network with two, four, and six hidden layer neurons, it is
concluded that the best neural network configuration is
probably four hidden neurons, which presents both a
higher correlation and decreased error between actual
data and estimated values. The goal is to be parsimonious,
combining a good fit with as few neurons as possible.
46
Data Set B
The second example is possibly closer to those you might
encounter. The SPSS commands are identical and the
secondary analysis is simply reduces to a comparative
table.
47
Data Set B
This data set was downloaded from archive or more
directly example.
This breast cancer database was obtained from the
University of Wisconsin Hospitals, Madison from Dr.
William H. Wolberg.
O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis
via linear programming", SIAM News, Volume 23, Number
5, September 1990, pp 1 & 18.
48
Data Set B
Samples arrive periodically as Dr. Wolberg reports his
clinical cases. The database therefore reflects this
chronological grouping of the data. This grouping
information appears immediately below, having been
removed from the data itself:
Number of Instances: 699 (as of 15 July 1992)
Number of Attributes: 10 plus the class attribute
There are 16 instances in Groups 1 to 6 that contain a
single missing (i.e., unavailable) attribute value.
Class distribution Benign: 458 (65.5%) Malignant: 241
(34.5%).
49
Attribute Information
#
Attribute
Domain
1
Sample code number
id number
2
Clump Thickness
1 - 10
3
Uniformity of Cell Size
1 - 10
4
Uniformity of Cell Shape
1 - 10
5
Marginal Adhesion
1 - 10
6.
Single Epithelial Cell Size
1 - 10
7
Bare Nuclei
1 - 10
8
Bland Chromatin
1 - 10
9
Normal Nucleoli
1 - 10
10
Mitoses
1 - 10
11
Class
2 for benign
4 for malignant
50
What Does It Look Like?
51
SPSS Commands For
Additional Analysis
52
SPSS Commands For
Additional Analysis
Drag “Class_...” to
columns and
“Predict…” to rows.
53
Syntax
The commands are extensive, if interested refer to the
printer friendly version of these notes.
54
Neural Network With Optimal
Architecture
55
Predicted Pseudo-probability
The columns have
been split for
clarity. Effectively
the plot should
display only two
columns (2 and 4).
56
Input Variables And Their
Respective Relevance
As previously
mentioned, each
neural network input
element has an
associated synapse,
which is represented
by a numerical value
that controls input;
the higher the
synapse value, the
more relevant is the
input for the result
that is generated by
the neural network.
57
Input Variables And Their
Respective Relevance
The figure presents
a relevance chart
for each input field
to obtain fitting
results, information
that becomes
available after the
creation of the node
corresponding to
the model
generated.
58
Summary
Range 1-2 selects
1
Class
Predict
ed
Value
for
Class
Range 1-4 selects
3
Class
Range 1-6 selects
3
Class
Count
2
4
2
4
2
4
2
431
6
435
3
430
5
4
13
233
9
236
14
234
The random nature of the procedure employed explains the difference
in the final two columns.
59
Data Set B
Adopting a parsimonious approach, suggests that a single
hidden neuron should suffice.
The decision of which algorithm is most appropriate, in
this case, falls to a medical field expert, since only such
qualified professional is able to assess which error would
entail greater damages to patients, considering that a
benign tumour erroneously identified as a malign tumour
might cause psychological damages to patients, and most
malign tumour treatments have severe side effects, while
an erroneous malign tumour diagnosis, identified as a
benign tumour, might delay treatment, causing the patient
60
to lose valuable time in his/her recovery process
Data Set B
An application with the goal of assisting medical diagnosis
of cancer diseases, identifying whether patients have
malign or benign cancer. Considering the greatest
average success rates, as an alternative base for a
complete diagnosis support system, which can be
implemented in hospitals, medical clinics, or any other
health care institutions, thus reducing the probability of
incorrect diagnosis.
61