Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Bayesian Sampling and Ensemble
Learning in Generative
Topographic Mapping
Akio Utsugi
National Institute of Bioscience and Human-technology,
Neural Processing Letters, vol. 12, no. 3, pp. 277-290.
Summarized by Jong-Youn, Lim
Introduction
SOM
A minimal model for the formation of topology-preserving maps
An information processing tool to extract a hidden smooth
manifold from data
Drawbacks : no explicit statistical model for the data generation
Alternatives
Elastic net
Generative topographic mapping : based on the mixture of
spherical Gaussian generators with a constraint on the centroids
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Hyperparameter search of GTM on small data
using a Gibbs sampler, but time consuming on
large data
Needs for deterministic algorithm producing the
estimates quickly – ensemble learning(to minimize
the variational free energy of the model, which
gives an upper bound of negative log evidence)
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Generative topographic
mapping
Two versions : an original regression version , a
Gaussian process version
It consists of a spherical Gaussian mixture density
and a Gaussian process prior
A spherical Gaussian mixture density
f ( X | W , ) f ( X | Y ,W , ) f (Y )dY
f ( X | Y ,W , )
2
1
f (Y )
i 1 k 1 r
n
r
yik
r
r (2) f ( X | W , ) 1
r n 2
n
yik
2
exp
x
w
(1)
i
k
2
i 1 k 1
nm / 2 n
2
exp
x
w
(3)
i
k
2
i 1 k 1
nm / 2 n
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
r
W has a Gaussian prior
f ( w | h) (2 )
rm / 2
|M |
m / 2
1 '
1
exp
w
M
w
( j)
( j ) ( 4)
2
j 1
m
Bayesian inference of W
f (W | X , h) f ( X ,W | h) f ( X | W , ) f (W | h)(5)
Inference of h is based on its evidence f(X|h) (the maximizer of the
evidence is called the generalized maximum likelihood(GML)
estimate of h)
The approximations for the hyperparameter search algorithm are
valid only on abundant data
Hyperparameter search is improved using a Gibbs sampler
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Gibbs sampler in GTM
Any moment of the posteriors can be obtained
precisely by an average over the long sample series
Gibbs sampler is one of MCMC methods, which
does not need the design of a trial distribution
Conditional posterios on Y and W
Conditional posterior on Y(p is the posterior selection
probabilities of the inner units
n
r
2
y
f (Y | X , W , ) pikik (6)
exp xi wk
i 1 k 1
2
(7 )
pik
r
2
k 1 exp 2 xi wk
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
The conditional posterior on W is obtained by
normalizing f(X,Y,W|h) (product of 1,2,4)
m
f (W | X , Y , h) N ( w( j ) |( j ) , )(8)
j 1
( N M ) (9)
1 1
( j ) s( j ) (10)
n
N diag (n1 ,..., nr ) diag ( yi1 ,..., yir )(11)
i 1
n
s( j ) ( s1 j ,..., srj )' xij ( yi1 ,..., yir )'(12)
i 1
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Conditional posteriors on hyperparameters
M (D' D E ' E ) 1
f (W | , ) (2 ) rm / 2 lm / 2 ( r l ) m / 2 | D' D |m / 2
1 m
2
2
exp ( Dw( j ) Ew( j ) (16)
2 j 1
f ( | d , s ) G( | d , s )(17)
s d x d 1
G( x | d , s)
exp( sx)
( d )
Conditional posteriors are obtained by normalizing
f ( | X , Y ,W , H ) G ( | d , s )
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Ensemble learning in GTM
The ensemble learning is a deterministic algorithm
to obtain the estimates of parameters and
hyperparameters concurrently
Approximating ensemble density Q, and its
variational free energy on a model H
F (Q | H ) Q(Y ,W , , ) log
Q(Y ,W , , )
dYdWd d (27)
f ( X , Y ,W , , | H )
If we restrict Q to a factorial form, we can have a
straightforward algorithm for the minimization of
F
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
The optimization procedure
1. Initial densities are set to the partial ensembles
Q( y ), Q(W ), Q( ), Q( )
2. A new density of Q(Y) is obtained from other densities by
Q(Y ) exp Q(W )Q( )Q( ) log f ( X , Y ,W , , | H )dWdd (28)
3. Each of the other partial ensembles is also updated using the
same formula as above except that Y and the target variable
are exchanged
4. These updates of the partial ensembles are repeated until a
convergence condition is satisfied
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Simulations
Compare the algorithms in simulations : the
ensemble learning(deterministic algorithm), the
Gibbs sampler
Artificial data xi ( xi1 , xi 2 )' , i = 1,..,n are generated
from two independent standard Gaussian random
series { ei1 }, { ei 2 } by
xi1 4(i 1) / n 2 ei1 (42)
xi 2 sin[ 2 (i 1) / n] ei 2 (43)
Three noise levels :
0.3,0.4,0.5
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)
Conclusion
A simulation experiment showed the superiority of
the Gibbs sampler on small data and the validity
of the deterministic algorithms on large data
© 2001 SNU CSE Artificial Intelligence Lab (SCAI)