Download ppt - Department of Computer Science and Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS623: Introduction to
Computing with Neural Nets
(lecture-17)
Pushpak Bhattacharyya
Computer Science and Engineering
Department
IIT Bombay
Boltzmann Machine
• Hopfield net
• Probabilistic neurons
• Energy expression = -Σi Σj>i wij xi xj
where xi = activation of ith neuron
• Used for optimization
• Central concern is to ensure global
minimum
• Based on simulated annealing
Comparative Remarks
Feed forward
n/w with BP
Mapping
device:
(i/p pattern -->
o/p pattern), i.e.
Classification
Minimizes total
sum square
error
Hopfield net
Associative
Memory
+
Optimization
device
Energy
Boltzmann
m/c
Constraint
satisfaction.
(Mapping +
Optimization
device)
Entropy
(Kullback–
Leibler
divergence)
Comparative Remarks (contd.)
Feed forward
n/w with BP
Deterministic
neurons
Learning to
associate i/p
with o/p i.e.
equivalent to a
function
Hopfield net
Boltzmann m/c
Deterministic
neurons
Probabilistic
neurons
Pattern
Probability
Distribution
Comparative Remarks (contd.)
Feed forward
n/w with BP
Can get stuck in
local minimum
(Greedy
approach)
Credit/blame
assignment
(consistent with
Hebbian rule)
Hopfield net
Boltzmann m/c
Local minimum Can come out
possible
of local
minimum
Activation
product
(consistent with
Hebbian rule)
Probability and
activation
product
(consistent with
Hebbian rule)
Theory of Boltzmann m/c
• For the m/c the computation means the
following:
At any time instant, make the state of the
kth neuron (sk) equal to 1, with probability:
1 / (1 + exp(-ΔEk / T))
ΔEk = change in energy of the m/c when the
kth neuron changes state
T = temperature which is a parameter of the
m/c
Theory of Boltzmann m/c (contd.)
1
P(sk = 1)
P(sk = 1) =
1 / (1 + exp(-ΔEk / T))
Increasing T
0
ΔEk = α
ΔEk
Theory of Boltzmann m/c (contd.)
ΔEk = Ekfinal – Ekinitial
= (sinitialk - sfinalk) * Σj≠kwkjsj
We observe:
1. The higher the temperature, lower is P(Sk=1)
2. at T = infinity, P(Sk=1) = P(Sk=0) = 0.5, equal
chance of being in state 0 or 1. Completely random
behavior
3. If T  0, then P(Sk=1)  1
4. The derivative is proportional P(Sk=1)*(1 - P(Sk=1))
Consequence of the form of
P(Sk=1)
P(Sα) α exp( -E(Sα)) / T
1 -1
1 -1
Probability Distribution called as
Boltzmann Distribution
N - bits
P(Sα) is the probability of the state Sα
Local “sigmoid” probabilistic behavior leads to global
Boltzmann Distribution behaviour of the n/w
P(Sα) α exp( -E(Sα)) / T
P
T
E
Ratio of state probabilities
• Normalizing,
P(Sα) = (exp(-E(Sα)) / T) / (∑β є all statesexp(-E(Sβ)/T)
P(Sα) / P(Sβ) = exp -(E(Sα) - E(Sβ) ) / T
Learning a probability distribution
• Digression: Estimation of a probability
distribution Q by another distribution P
• D = deviation = ∑sample space Qln Q/P
• D >= 0, which is a required property (just
like sum square error >= 0)
To prove D >= 0
Lemma: ln (1/x) >= (1 - x)
Let, x = 1 / (1 - y)
ln(1 - y) = -[y + y2/2 + y3/3 + y4/4 + …]
Proof (contd.)
(1 – x) = 1 – 1/(1 - y)
= -y(1 - y)-1
= -y(1 + y + y2 + y3 + …)
= -[y + y2 + y3 + y4 + …]
But, y + y2/2 + y3/3 + …<= y + y2 + y3 + …
Lemma proved.
Proof (contd.)
D = ∑ Q ln Q / P
>= ∑over the sample space Q(1 – P/Q)
= ∑ (Q - P)
= ∑ Q - ∑P
=1–1=0
Related documents