Download here - OIST Groups

Document related concepts
no text concepts found
Transcript
Information Geometry
and Neural Netowrks
Shun-ichi Amari
RIKEN Brain Science Institute
Orthogonal decomposition of rates
and (higher-order) correlations
Synchronous firing and higher correlations
Algebraic singularities caused by multiple stimuli
Dynamics of learning in multiplayer perceptrons
Information Geometry
Systems Theory
Statistics
Combinatorics
Information Theory
Neural Networks
Information Sciences
Physics
Math.
AI
Riemannian Manifold
Dual Affine
Connections
Manifold of Probability Distributions
Information Geometry ?
S   p  x;  ,  
2

1
 x   

p  x;  ,   
exp 

2
2
2




 p  x 

S   p  x; θ
θ  ( , )

Riemannian metric
Dual affine connections
Manifold of Probability Distributions
x  1, 2,3
{p( x)}
p   p1 , p2 , p3 
p3
p1
p1  p2  p3  1
M   p  x; 
p2
Two Structures
Riemannian metric and affine connection
ds 2   gij   di d j

p  x 
D  p : q   E p log

q
x
 

1
2
ds  D  p  x,   : p  x,   d  
2
Fisher information
 


gij  E 
log p
log p 
 j
 i

Riemannian Structure
ds   gij ( )d d
2
i
 d G ( )d
T
G ( )  ( gij )
Euclidean G  E
j
Affine Connection
covariant derivative
c X  Y
geodesic X=X X=X(t)
s
i
j
g
(

)
d

d

 ij
minimal distance
straight line
Independent Distributions
x1 , x2  0,1
S  { p( x1 , x2 )}
M  {q( x1 )q( x2 )}
Neural Firing
x1
x2
x3
xn
p(x)  p( x1 , x2 ,..., xn )
i  E[ xi ]
----firing rate
vij  Cov[ xi , x j ] ----covariance
higher-order correlations
orthogonal decomposition
Information Geometry
of Higher-Order Correlations
----orthogonal decomposition
Riemannian metric

dual affine connections
Pythagoras theorem
Dual geodesics
S   p  x, 
Correlations of Neural Firing

x1
2
x2
1
 p  x , x 
1
2
 p00 , p10 , p01 , p11
1  p1
2  p1
firing rates
correlations
p11 p00
  log
p10 p01
{(1 ,2 ), }
orthogonal coordinates
{ p00 , p01 , p10 , p11}
x1
x2
0011000101101
0100100110100
x3
0101101001010
firing rates: 1 ,2 ;12
correlation—covariance?
Independent Distributions
x1 , x2  0,1
S  { p( x1 , x2 )}
M  {q( x1 )q( x2 )}
Pythagoras Theorem
correlations
p
D[p:r] = D[p:q]+D[q:r]
q
p,q: same marginals
r,q: same correlations
1 ,2
r
independent

p( x)
D[ p : r ]   p( x) log
q( x)
x
estimation correlation
testing
invariant under firing rates
No pairwise correlations,
Triplewise correlation
x1
x2
x3
01100101……. 1100
01011001……. 1010
00111100……. 1001
p( x1 , x2 , x3 )  p( x1 ) p( x2 ) p( x3 )
p( x1 , x2 )  p( x1 ) p( x2 )
Pythagoras Decomposition
of KL Divergence
p (x)
only pairwise
p pairwise corr (x)
independent
pind (x)
Higher-Order Correlations
x   x1 , x2 ,
, xn 
p  x   exp  i xi   ij xi x j  ijk xi x j xk 
M0
M1

i  E[ xi ]
ij  E[ xi x j ]
(i ,ij ,ijk ,...)
(i ,ij ,ijk ,...)
Synfiring and
Higher-Order Correlations
Amari, Nakahara, Wu, Sakai
Population and Synfire
x1
x2
xn
Neurons
xi  1 ui 
ui  Gaussian
E[ui u j ]  
Population and Synfire
ui   wij s j  h
x1
xn
xi  1ui 
ui  (1   )i    h
i ,   N  0, 1
E[ui u j ]  
E[ui ]  1
2
s
pi  Prob  i neurons fire at the same time
 n Ci F (1  F )
i
n i
F  Pr{xi  1}  Pr{ui  0}
  h
 Pr{ i  
}
1

pi  Prob  i neurons fire at the same time
i
r
n
Pr  Pr{nr neurons fire}
q (r ,  )  e
z   

nH  r 
2
2n
e
 nz   
d
 r log F  1  r  log 1  F 
1
F  F a  h  
2

a  h
0
e
t2

2
dt

2  1

1
2
q(r ,  )  c exp[
{F ( ) 
h} ]
2(1   )
2  1
p(x, )  exp{i xi  ij xi x j  ijk xi x j xk  ...}
i i ...i  O(1/ n )
k 1
12
k
Synfiring
p ( x )  p ( x1 ,..., xn )
1
r   xi
n
q r 
q(r )
r
Bifurcation
xi : independent---single delta peak
pairwise correlated
Pr
higher-order correlation!
r
Field Theory of Population Coding
Shun-ichi Amari
RIKEN Brain Science Institute
[email protected]
Collaborators: Si Wu
Hiro Nakahara
Population Coding and Neural Field
z
x  rz | x
*
r  z   f  z  x     z 
*
 z 
f  z   exp  2 
 2a 
2
*

Population Encoding
f (z-x)
z
r  z   f  z  x     z 
x
r(z)
decoding r  z   xˆ
z
Noise
  z  0
n   z    z '  h  z  z '
2
2


z

z
'




2
h  z , z '  n 1      z  z '  n  exp 

2
2b 

b
z
Probability Model
r  z   f  z  x     z 
 n2
Q  r ( z ) x   c exp  2 r  z   f  z  x  h 1
 2
r  z h
1
r  z    r  z h
1
 z, z ' r  z ' dzdz '
 h  z, z  h  z , z  dz
1
'
'
''

r  z   f  z  x 

'
  z  z
''

Fisher
information


 d log Q r | x* 2 
*
I x  E 
 
dx

 
 
Cramer-Rao

E xˆ  x

* 2
1

*
I (x )
Fourier Analysis
1
f  z   F   
2
 f  z e
1
h  z  z   H   
2
'
 F  
I
d
2 
2
H  
n
2
2
2
i z
dz
 h  z e
i z
dz
Fisher Information
I
1)
2)
n
2
2
2

e
 a 2 2
b2 2

2
2
n 1     2 b n e
No correlation    0
n
I 3 2
a
Uniform correlations b  
n
I 3 2
a  1   
d
3)
Limited range correlations
c
b
n
n
1
I 2 3
 a 1 c ' 
4)
Wide range correlations:
b
1
I  2  A  d  0 1
c
5)
Special case:   1, b  2a
I 
1
n
Dynamics of Neural Fields
u  z , t 
 u  z , t    w  z  z    u  z , t   dz 
u
c   r  z 
Shaping
Detecting
Decoding
How the Brain Solves Singularity
in Population Coding
S. Amari and H. Nakahara
RIKEN Brain Science Institute
x1
x2
Z
Z
x1
x2
Neural Activity
r  z   1  v    z  x1   v  z  x2     z 
 1

1
Q  r  z  ; v, x1 , x2   exp  2  r  f   h   r  f  
 2

  log Q  log Q 
I ij  E 

 j 
 i
I   I ij  : Fisher information matrix
Parameter Space
v
x2
x1
u  x2  x1
w  1  v  x1  vx2
: difference
: center of gravity
   w, u, v 
Fisher information degenerates as u  0
1
Cramer-Raoparadigm: error  I
f  z;   1   2 H 2  z  1   3 H 3  z  1   z  1 
 v 1  v  2 v 1  v  2v  1 3 
   w,
u ,
u 
2
6


 g1

I  

g2



g3 

J
: Jacobian singular

I  J I J
T
w~O 1
 1 
u ~ O  2 
u 
 1 
v ~ O  3 
u 
 1 
xi ~O  2 
u 
w
synfiring resolves singularity
phase 1: f1  z    v   z  x1    v  z  x2 
: f 2  z    v   z  x1    v  z  x2 
  1    ,
v  1  v 
I : regular as u  0
x1
x2
Z
Z
x1
x2
synfiring mechanism
z1
z2
common multiplicative noise
S.Amari and H.Nagaoka,
Methods of Information Geometry
AMS &Oxford Univ Press, 2000
Mathematical Neurons
y     wi xi  h     w  x 
y
x
 (u )
u
Multilayer Perceptrons
y   vi  wi  x   n
x
x  ( x1 , x2 ,..., xn )
2
 1
p  y x;   c exp   y  f  x,   
 2

f  x,    vi  wi  x 
  (w1 ,..., wm ; v1 ,..., vm )
y
Multilayer Perceptron
neuromanifold
 ( x)
space of functions
y  f  x, θ 
  vi   wi  x 
θ  v 1
, vm ; w1,
, wm 
Neuromanifold
• Metrical structure
• Topological structure

Riemannian manifold
 log p( y | x; ) log p( y | x; )
gij ( )  E[
]
i  j
j
ds  d
2
2
  g ij   d i d j
 d G   d
T
  d

i
Geometry of singular model
y  v  w  x   n
v | w | 0
v
W
Gaussian mixture
p  x; v, w1 , w2   1  v   x  w1   v  x  w2 
 1 2
  x 
exp  x 
2
 2 
1
singular:
v 1  v   0
w1  w2 ,
v
w1
w2
Topological Singularities
S
M
singularities
Singularity of MLP--example
Backpropagation ---gradient learning
examples :  y1 , x1  ,
 yt , xt   training set
2
1
E ( y, x; )  y  f  x,  
2
  log p  y, x; 
E
 t  t

f  x,    vi  wi  x 

Information Geometry of MLP
Natural Gradient Learning :
S. Amari ; H.Y. Park
E
   G  

1
1
t 1
G
 1    G   G f f G
1
t
1
t
T
1
t
y  v1 ( w1 x)  v2 ( w2 x)  n
z
w1  w2  w
v1  v2  v
w1
w2
w1
u  w2  w1
z  v2  v1
v1
x
v2
w2
y
2 hidden-units
y  v1  w1  x   v2  w 2  x   n :
w1  w2  w
v1  v2  v
u  w2  w1
v2  v1
z
2v
Dynamics of Learning
d
 l ,
dt
du
 f (u, z ),
dt
du f (u, z )

,
dz k (u, z )
d
1
  G l
dt
dz
 k (u , z )
dt
1
u  z  log z  c
2
2
2
The teacher is on
singularity
du
1
2 2 3
  A(  z ) u
dt
4
dz
1
2
4
 A(  z ) zu
dt
4
1
u  z  log z  c
2
2
dz
zu

1
du
2
( z )
4
2
The teacher is on
singularity
du
1
2 2 3
  A(  z ) u
dt
4
dz
1
2
4
 A(  z ) zu
dt
4
1
u  z  log z  c
2
2
dz
zu

1
du
2
( z )
4
2
Related documents