Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information Geometry and Neural Netowrks Shun-ichi Amari RIKEN Brain Science Institute Orthogonal decomposition of rates and (higher-order) correlations Synchronous firing and higher correlations Algebraic singularities caused by multiple stimuli Dynamics of learning in multiplayer perceptrons Information Geometry Systems Theory Statistics Combinatorics Information Theory Neural Networks Information Sciences Physics Math. AI Riemannian Manifold Dual Affine Connections Manifold of Probability Distributions Information Geometry ? S p x; , 2 1 x p x; , exp 2 2 2 p x S p x; θ θ ( , ) Riemannian metric Dual affine connections Manifold of Probability Distributions x 1, 2,3 {p( x)} p p1 , p2 , p3 p3 p1 p1 p2 p3 1 M p x; p2 Two Structures Riemannian metric and affine connection ds 2 gij di d j p x D p : q E p log q x 1 2 ds D p x, : p x, d 2 Fisher information gij E log p log p j i Riemannian Structure ds gij ( )d d 2 i d G ( )d T G ( ) ( gij ) Euclidean G E j Affine Connection covariant derivative c X Y geodesic X=X X=X(t) s i j g ( ) d d ij minimal distance straight line Independent Distributions x1 , x2 0,1 S { p( x1 , x2 )} M {q( x1 )q( x2 )} Neural Firing x1 x2 x3 xn p(x) p( x1 , x2 ,..., xn ) i E[ xi ] ----firing rate vij Cov[ xi , x j ] ----covariance higher-order correlations orthogonal decomposition Information Geometry of Higher-Order Correlations ----orthogonal decomposition Riemannian metric dual affine connections Pythagoras theorem Dual geodesics S p x, Correlations of Neural Firing x1 2 x2 1 p x , x 1 2 p00 , p10 , p01 , p11 1 p1 2 p1 firing rates correlations p11 p00 log p10 p01 {(1 ,2 ), } orthogonal coordinates { p00 , p01 , p10 , p11} x1 x2 0011000101101 0100100110100 x3 0101101001010 firing rates: 1 ,2 ;12 correlation—covariance? Independent Distributions x1 , x2 0,1 S { p( x1 , x2 )} M {q( x1 )q( x2 )} Pythagoras Theorem correlations p D[p:r] = D[p:q]+D[q:r] q p,q: same marginals r,q: same correlations 1 ,2 r independent p( x) D[ p : r ] p( x) log q( x) x estimation correlation testing invariant under firing rates No pairwise correlations, Triplewise correlation x1 x2 x3 01100101……. 1100 01011001……. 1010 00111100……. 1001 p( x1 , x2 , x3 ) p( x1 ) p( x2 ) p( x3 ) p( x1 , x2 ) p( x1 ) p( x2 ) Pythagoras Decomposition of KL Divergence p (x) only pairwise p pairwise corr (x) independent pind (x) Higher-Order Correlations x x1 , x2 , , xn p x exp i xi ij xi x j ijk xi x j xk M0 M1 i E[ xi ] ij E[ xi x j ] (i ,ij ,ijk ,...) (i ,ij ,ijk ,...) Synfiring and Higher-Order Correlations Amari, Nakahara, Wu, Sakai Population and Synfire x1 x2 xn Neurons xi 1 ui ui Gaussian E[ui u j ] Population and Synfire ui wij s j h x1 xn xi 1ui ui (1 )i h i , N 0, 1 E[ui u j ] E[ui ] 1 2 s pi Prob i neurons fire at the same time n Ci F (1 F ) i n i F Pr{xi 1} Pr{ui 0} h Pr{ i } 1 pi Prob i neurons fire at the same time i r n Pr Pr{nr neurons fire} q (r , ) e z nH r 2 2n e nz d r log F 1 r log 1 F 1 F F a h 2 a h 0 e t2 2 dt 2 1 1 2 q(r , ) c exp[ {F ( ) h} ] 2(1 ) 2 1 p(x, ) exp{i xi ij xi x j ijk xi x j xk ...} i i ...i O(1/ n ) k 1 12 k Synfiring p ( x ) p ( x1 ,..., xn ) 1 r xi n q r q(r ) r Bifurcation xi : independent---single delta peak pairwise correlated Pr higher-order correlation! r Field Theory of Population Coding Shun-ichi Amari RIKEN Brain Science Institute [email protected] Collaborators: Si Wu Hiro Nakahara Population Coding and Neural Field z x rz | x * r z f z x z * z f z exp 2 2a 2 * Population Encoding f (z-x) z r z f z x z x r(z) decoding r z xˆ z Noise z 0 n z z ' h z z ' 2 2 z z ' 2 h z , z ' n 1 z z ' n exp 2 2b b z Probability Model r z f z x z n2 Q r ( z ) x c exp 2 r z f z x h 1 2 r z h 1 r z r z h 1 z, z ' r z ' dzdz ' h z, z h z , z dz 1 ' ' '' r z f z x ' z z '' Fisher information d log Q r | x* 2 * I x E dx Cramer-Rao E xˆ x * 2 1 * I (x ) Fourier Analysis 1 f z F 2 f z e 1 h z z H 2 ' F I d 2 2 H n 2 2 2 i z dz h z e i z dz Fisher Information I 1) 2) n 2 2 2 e a 2 2 b2 2 2 2 n 1 2 b n e No correlation 0 n I 3 2 a Uniform correlations b n I 3 2 a 1 d 3) Limited range correlations c b n n 1 I 2 3 a 1 c ' 4) Wide range correlations: b 1 I 2 A d 0 1 c 5) Special case: 1, b 2a I 1 n Dynamics of Neural Fields u z , t u z , t w z z u z , t dz u c r z Shaping Detecting Decoding How the Brain Solves Singularity in Population Coding S. Amari and H. Nakahara RIKEN Brain Science Institute x1 x2 Z Z x1 x2 Neural Activity r z 1 v z x1 v z x2 z 1 1 Q r z ; v, x1 , x2 exp 2 r f h r f 2 log Q log Q I ij E j i I I ij : Fisher information matrix Parameter Space v x2 x1 u x2 x1 w 1 v x1 vx2 : difference : center of gravity w, u, v Fisher information degenerates as u 0 1 Cramer-Raoparadigm: error I f z; 1 2 H 2 z 1 3 H 3 z 1 z 1 v 1 v 2 v 1 v 2v 1 3 w, u , u 2 6 g1 I g2 g3 J : Jacobian singular I J I J T w~O 1 1 u ~ O 2 u 1 v ~ O 3 u 1 xi ~O 2 u w synfiring resolves singularity phase 1: f1 z v z x1 v z x2 : f 2 z v z x1 v z x2 1 , v 1 v I : regular as u 0 x1 x2 Z Z x1 x2 synfiring mechanism z1 z2 common multiplicative noise S.Amari and H.Nagaoka, Methods of Information Geometry AMS &Oxford Univ Press, 2000 Mathematical Neurons y wi xi h w x y x (u ) u Multilayer Perceptrons y vi wi x n x x ( x1 , x2 ,..., xn ) 2 1 p y x; c exp y f x, 2 f x, vi wi x (w1 ,..., wm ; v1 ,..., vm ) y Multilayer Perceptron neuromanifold ( x) space of functions y f x, θ vi wi x θ v 1 , vm ; w1, , wm Neuromanifold • Metrical structure • Topological structure Riemannian manifold log p( y | x; ) log p( y | x; ) gij ( ) E[ ] i j j ds d 2 2 g ij d i d j d G d T d i Geometry of singular model y v w x n v | w | 0 v W Gaussian mixture p x; v, w1 , w2 1 v x w1 v x w2 1 2 x exp x 2 2 1 singular: v 1 v 0 w1 w2 , v w1 w2 Topological Singularities S M singularities Singularity of MLP--example Backpropagation ---gradient learning examples : y1 , x1 , yt , xt training set 2 1 E ( y, x; ) y f x, 2 log p y, x; E t t f x, vi wi x Information Geometry of MLP Natural Gradient Learning : S. Amari ; H.Y. Park E G 1 1 t 1 G 1 G G f f G 1 t 1 t T 1 t y v1 ( w1 x) v2 ( w2 x) n z w1 w2 w v1 v2 v w1 w2 w1 u w2 w1 z v2 v1 v1 x v2 w2 y 2 hidden-units y v1 w1 x v2 w 2 x n : w1 w2 w v1 v2 v u w2 w1 v2 v1 z 2v Dynamics of Learning d l , dt du f (u, z ), dt du f (u, z ) , dz k (u, z ) d 1 G l dt dz k (u , z ) dt 1 u z log z c 2 2 2 The teacher is on singularity du 1 2 2 3 A( z ) u dt 4 dz 1 2 4 A( z ) zu dt 4 1 u z log z c 2 2 dz zu 1 du 2 ( z ) 4 2 The teacher is on singularity du 1 2 2 3 A( z ) u dt 4 dz 1 2 4 A( z ) zu dt 4 1 u z log z c 2 2 dz zu 1 du 2 ( z ) 4 2