Download Chap02g-neural network model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed operating system wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

CAN bus wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Advanced information retreival
Chapter 02: Modeling Neural Network Model
Neural Network Model

A neural network is an oversimplified representation
of the neuron interconnections in the human brain:
 nodes
are processing units
 edges are synaptic connections
 the strength of a propagating signal is modelled by a
weight assigned to each edge
 the state of a node is defined by its activation level
 depending on its activation level, a node might issue
an output signal
Neural Networks
• Neural Networks
–
–
–
–
–
Complex learning systems recognized in animal brains
Single neuron has simple structure
Interconnected sets of neurons perform complex learning tasks
Human brain has 1015 synaptic connections
Artificial Neural Networks attempt to replicate non-linear
learning found in nature
Dendrites
Axon
Cell Body
Neural Networks
(cont’d)
– Dendrites gather inputs from other neurons and combine
information
– Then generate non-linear response when threshold reached
– Signal sent to other neurons via axon
x1
x2

xn

y
– Artificial neuron model is similar
– Data inputs (xi) are collected from upstream neurons input to
combination function (sigma)
Neural Networks
(cont’d)
– Activation function reads combined input and produces nonlinear response (y)
– Response channeled downstream to other neurons
• What problems applicable to Neural Networks?
–
–
–
–
Quite robust with respect to noisy data
Can learn and work around erroneous data
Results opaque to human interpretation
Often require long training times
Input and Output Encoding
– Neural Networks require attribute values encoded to [0, 1]
• Numeric
– Apply Min-max Normalization to continuous variables
X* 
X  min( X )
X  min( X )

range( X )
max( X )  min( X )
– Works well when Min and Max known
– Also assumes new data values occur within Min-Max range
– Values outside range may be rejected or mapped to Min or Max
Input and Output Encoding
(cont’d)
• Output
– Neural Networks always return continuous values [0, 1]
– Many classification problems have two outcomes
– Solution uses threshold established a priori in single output node
to separate classes
– For example, target variable is “leave” or “stay”
– Threshold value is “leave if output >= 0.67”
– Single output node value = 0.72 classifies record as “leave”
Simple Example of a Neural
Network
Input Layer
Node
1
Node
2
Node
3
Hidden Layer
W0A
W1A
W1B
W2A
W2B
W3A
W3B
Node
A
Node
B
Output Layer
WAZ
Node
Z
WBZ
W0Z
W0B
– Neural Network consists of layered, feedforward, completely
connected network of nodes
– Feedforward restricts network flow to single direction
– Flow does not loop or cycle
– Network composed of two or more layers
Simple Example of a Neural
Network (cont’d)
–
–
–
–
–
–
–
–
Most networks have Input, Hidden, Output layers
Network may contain more than one hidden layer
Network is completely connected
Each node in given layer, connected to every node in next layer
Every connection has weight (Wij) associated with it
Weight values randomly assigned 0 to 1 by algorithm
Number of input nodes dependent on number of predictors
Number of hidden and output nodes configurable
Simple Example of a Neural Network (cont)
Input Layer
Node
1
Node
2
Node
3
W1A
W1B
W2A
W2B
W3A
W3B
Hidden Layer
W0A
Node
A
WAZ
Node
B
WBZ
W0B
– Combination function produces linear combination of node
inputs and connection weights to single scalar value
net j   Wij xij  W0 j x0 j  W1 j x1 j  ...  WIj xIj
i
–
–
–
–
–
–
For node j, xij is ith input
Wij is weight associated with ith input node
I+ 1 inputs to node j
x1, x2, ..., xI are inputs from upstream nodes
x0 is constant input value = 1.0
Each input node has extra input W0jx0j = W0j
Output Layer
Node
Z
W0Z
Simple Example of a Neural
Network (cont’d)
x0 = 1.0
W0A = 0.5
W0B = 0.7
W0Z = 0.5
x1 = 0.4
W1A = 0.6
W1B = 0.9
WAZ = 0.9
x2 = 0.2
W2A = 0.8
W2B = 0.8
WBZ = 0.9
x3 = 0.7
W3A = 0.6
W3B = 0.4
– The scalar value computed for hidden layer Node A equals
net A   WiA xiA  W0 A (1.0)  W1 A x1 A  W2 A x2 A  W3 A x3 A 
i
0.5  0.6(0.4)  0.8(0.2)  0.6(0.7)  1.32
– For Node A, netA = 1.32 is input to activation function
– Neurons “fire” in biological organisms
– Signals sent between neurons when combination of inputs cross
threshold
Simple Example of a Neural
Network (cont’d)
– Firing response not necessarily linearly related to increase in
input stimulation
– Neural Networks model behavior using non-linear activation
function
– Sigmoid function most commonly used
y
1
1  ex
– In Node A, sigmoid function takes netA = 1.32 as input and
produces output
1
y
 0.7892
1.32
1 e
Simple Example of a Neural
Network (cont’d)
– Node A outputs 0.7892 along connection to Node Z, and
becomes component of netZ
– Before netZ is computed, contribution from Node B required
net B  WiB xiB  W0 B (1.0)  W1B x1B  W2 B x2 B  W3 B x3 B 
i
0.7  0.9(0.4)  0.8(0.2)  0.4(0.7)  1.5
and,
1
f (net B ) 
 0.8176
1.5
1 e
– Node Z combines outputs from Node A and Node B, through
netZ
Simple Example of a Neural
Network (cont’d)
– Inputs to Node Z not data attribute values
– Rather, outputs are from sigmoid function in upstream nodes
net Z  WiZ xiZ  W0 Z (1.0)  WAZ x AZ  WBZ xBZ 
i
0.5  0.9(0.7892 )  0.9(0.8176 )  1.9461
finally,
1
f (net z ) 
 0.8750
1  e 1.9461
– Value 0.8750 output from Neural Network on first pass
– Represents predicted value for target variable, given first
observation
Sigmoid Activation Function
– Sigmoid function combines nearly linear, curvilinear, and nearly
constant behavior depending on input value
– Function nearly linear for domain values -1 < x < 1
– Becomes curvilinear as values move away from center
– At extreme values, f(x) is nearly constant
– Moderate increments in x produce variable increase in f(x),
depending on location of x
– Sometimes called “Squashing Function”
– Takes real-valued input and returns values [0, 1]
Back-Propagation
– Neural Networks are supervised learning method
– Require target variable
– Each observation passed through network results in output
value
– Output value compared to actual value of target variable
– (Actual – Output) = Error
– Prediction error analogous to residuals in regression models
– Most networks use Sum of Squares (SSE) to measure how well
predictions fit target values
SSE 

2
(
actual

output
)

Re cords OutputNodes
Back-Propagation
(cont’d)
– Squared prediction errors summed over all output nodes, and
all records in data set
– Model weights constructed that minimize SSE
– Actual values that minimize SSE are unknown
– Weights estimated, given the data set
Neural Network for IR:

From the work by Wilkinson & Hingston, SIGIR’91
Query
Terms
Document
Terms
k1
Documents
d1
ka
ka
kb
kb
kc
dj
dj+1
kc
kt
dN
Neural Network for IR



Three layers network
Signals propagate across the network
First level of propagation:
 Query
terms issue the first signals
 These signals propagate accross the network to
reach the document nodes

Second level of propagation:
 Document
nodes might themselves generate new
signals which affect the document term nodes
 Document term nodes might respond with new
signals of their own
Quantifying Signal Propagation



Normalize signal strength (MAX = 1)
Query terms emit initial signal equal to 1
Weight associated with an edge from a query term
node ki to a document term node ki:
Wiq =

wiq
2
sqrt ( i wiq )
Weight associated with an edge from a document
term node ki to a document node dj:
Wij =
wij
2
sqrt ( i wij )
Quantifying Signal Propagation

After the first level of signal propagation, the
activation level of a document node dj is given by:
i Wiq Wij =
i wiq wij
2
sqrt
( 2i wiq ) * 2sqrt ( i wij
)
 which


is exactly the ranking of the Vector model
New signals might be exchanged among document
term nodes and document nodes in a process
analogous to a feedback cycle
A minimum threshold should be enforced to avoid
spurious signal generation
Conclusions



Model provides an interesting formulation of the IR
problem
Model has not been tested extensively
It is not clear the improvements that the model might
provide