Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WAGOS Conformal Changes of Divergence and Information Geometry Shun-ichi Amari RIKEN Brain Science Institute Information Geometry Systems Theory Statistics Combinatorics Information Theory Neural Networks Information Sciences Physics Math. AI Riemannian Manifold Dual Affine Connections Vision, Shape optimization Manifold of Probability Distributions Information Geometry ? S p x; , 2 x 1 p x; , exp 2 2 2 Riemannian metric Dual affine connections p x S p x; θ θ ( , ) Manifold of Probability Distributions x 1, 2,3 {p( x)} p p1 , p2 , p3 p3 p1 p2 p3 1 S p x; Sn {p | ( p0 , p1 ,..., pn ); p i 1; pi 0} Sn 1 {p | ( p0 , p1 ,..., pn ); pi 0} Sn Sn 1 p1 p2 Riemannian Structure ds gij ( )d d 2 i d G ( )d T G ( ) ( gij ) Euclidean G E Fisher information j Affine Connection covariant derivative XY , c X Y geodesic X X 0, s i j g ( ) d d ij minimal distance straight line X=X(t) Duality X , Y gij X iY j X , Y X , Y X Y , Z X Y , Z Y , * X Z * Y X Y X Riemannian geometry: , Dual Affine Connections (, ) * e-geodesic log r x, t t log p x 1 t log q x c t m-geodesic r x, t tp x 1 t q x q x p x Divergence DZ : Y M D z : y 0 D z : y 0, iff z y D z : z dz gij dzi dz j positive-definite Y Z Metric and Connections Induced by Divergence (Eguchi) Riemannian metric affine connections gij z i j D z : y y z ijk z i j 'k D z : y y z ijk z i' 'j k D z : y y z i , zi yi ' i Duality: X Y , Z X Y , Z Y , * X Z k gij kij ijk ijk M , g , T Tijk kji Two Types of Divergence Invariant divergence (Chentsov, Csiszar) f-divergence: Fisher- structure q(x) D[p : q] = p(x)f{ }dx p(x) Flat divergence (Bregman) KL-divergence belongs to both classes Invariant divergence (manifold of probability distributions; S { p( x, )} ) Chentsov Amari -Nagaoka y k x : one-to-one sufficient statistics D p X x : q X x D pY y : qY y Csiszar f-divergence q D f p : q pi f i pi f u : convex, Ali-Silvey Morimoto , f 1 0, Dcf p : q cDf p : q f (u ) f u f u c u 1 u f 1 f ' 1 0 ; f '' 1 1 1 Invariant geometrical structure alpha-geometry S p x, (derived from invariant divergence) gij E i l j l Fisher information Tijk E i l j l k l l log p x, ; α -connection ijk i, j; k Tijk Levi-civita: : dually coupled X Y , Z X Y , Z Y , X Z i i Sn :Dually Flat Structure affine coordinates dual affine coordinates potential , , D 1 : 2 1 2 1 2 Dually flat manifold: Manifold with Convex Function , , L , coordinates S 1 n : convex function negative entropy Euclidean 2 p p x log p x dx, energy 1 i 2 2 mathematical programming, control systems, physics, engineering, economics Riemannian metric and flatness {S , ( ), } Bregman divergence D , grad 1 gij d i d j 2 gij i j , i i D , d Flatness (affine) : geodesic (notLevi-Civita) Legendre Transformation i i one-to-one i 0 i , i i i i D , ( ) max { ii ( )} Two flat coordinate systems : geodesic (e-geodesic) : dual geodesic (m-geodesic) “dually orthogonal” i , j i j i i , i i X Y , Z X Y , Z Y , * X Z , Geometry Riemannian metric G Gij D p : p dp 1 T p Gp, 2 G p p G p p G 1 Straightness (affine connection) p t a bt : -geodesic p t a b t : -geodesic Pythagorean Theorem (dually flat manifold) D P : Q D Q : R D P : R Euclidean space: self-dual 1 2 i 2 Projection Theorem min D P : Q QM Q = m-geodesic projection of P to M min D Q : P QM Q’ = e-geodesic projection of P to M divergence S {p} : space of probability distributions invariance dually flat space invariant divergence F-divergence Fisher inf metric Alpha connection KL-divergence Flat divergence convex functions Bregman p(x) D[p : q] = p(x) log{ }dx q(x) Space of positive measures : vectors, matrices, arrays S p, pi 0 : ( pi 1 not n holds) f-divergence Bregman divergence α-divergence divergence 1 1 D [ p : q] { pi qi pi 2 2 1 2 qi 1 2 KL-divergence pi D[ p : q ] { pi log pi qi } qi } α-representation 12 ri pi log pi , 1 , 1 r U ri typical case: u-representation, 12 U z z e z U ri pi , 1 , 1 (Amari-Nagaoka, Zhang) Divergence over α-representation D p : q U ri V ri ri ri 1 2 i ri : p -geodesic 1 2 ri : pi 2 1 -geodesic ri log pi ri pi 1 β-divergence 1 1 U z 1 z , 1 U0 z exp z 0 (Eguchi) -structure dually flat: -divergence D[ p : q] D0 p : q KL p : q 1 ( 1) ( 1) ( 1) { p ( x ) q ( x ) ( 1)q( x ) ( p( x ) q( x )}dx -div -div KL-div ( , ) divergence D , [ p : q] { pi : divergence 1: -divergence qi p i qi } Tsallis q-Entropy-- α structure 2q 1 1 Shannon entropy 1 H E[log ] p( x ) log p( x )dx p( x ) Generalized log 1 u1 q 1 , ln q u 1 q log u, exp q u ln q HT 1 q q 1 q 1 1 1 q u 1 (1 q)u} 1 1 E ln q p x 1 q , exp u p x dx 1 q 2 hq p HT p x dx 1 h 1 : 1 q q q HR 1 log hq 1 q hq p : convex: (p) f {hq ( p)} 1 1 Hq 1 Eq ln q p x 1 q hq p r x D p x : r x E p ln q p x 1 q 1 q 1 p x r x dx 1 q : -divergence ; -structure q -exponential family cf Pistone κ exponential p x, exp q x x i xi : convex p( x ) q xdx hp p( x ) q pˆ ( x ) : escort distribution hp i Eq xi 1 1 1 Eq ln q p x 1 q hq p ( ) and ( ) q-Geometry derived from : dually flat Sn p p x n i 0 i n i 0 pi 1 n log q p( x ) log q pi i ( x ) i 0 pi i x (log q pi log q p0 ) i ( x ) log q p0 1 pi1 q p01 q 1 q 1 piq i hq n i 1 Dually flat structure of q-escort pˆ Desc p : r pˆ log dx rˆ hq r p x q pˆ x log dx log r x hq p gijesc Eq i log pˆ j log pˆ geodesic: exponential family log p x, t t log p1 x 1 t log p2 x c t dual geodesic: q-family pˆ x, t tpˆ1 x 1 t pˆ 2 t q-escort probability distribution 1 q pˆ ( x) p ( x) hq ( p) Escort geometry gˆ ij ( ) Eq [ i log pˆ ( x, ) j log pˆ ( x, )] q -escort geometry 1 q pˆ x p x hq Dually flat structure of q-escort pˆ Desc p : r pˆ log dx rˆ hq r p x q pˆ x log dx log r x hq p gijesc Eq i log pˆ j log pˆ geodesic: exponential family log p x, t t log p1 x 1 t log p2 x c t dual geodesic: q-family pˆ x, t tpˆ1 x 1 t pˆ 2 t q Pythagorean theorem Dq p : q Dq q : r D p : r q - geodesic : p x, t 1 q tp1 x 1 q 1 t p2 x 1 q c t dual-q-geodesic: p x, t tp1 x 1 t p2 x c t q q q Projection theorem arg min Dq p : r p rM Max-entropy theorem constraints: Eq ak x ck max hq p q -Cramer Rao theorem : p x, q-unbiased estimator ˆ Eq ˆ Eq i ˆi j ˆj gij 1 q -maximum likelihood estimator ˆq arg max pˆ x1 , arg max , xN ; 1 hq N p xi ; q Bayes MAP with hq N /q q -super-robust estimator (Eguchi) p x, max pˆ x, max hq 1 bias-corrected q-estimating function sq x, pˆ x, i log p cq 1 1 cq 1 log hq 1 q 1 N s x , 0 i 0 q i 1 max pˆ xi , hq 1 Conformal change of divergence D p : q p D p : q gij p gij Tijk (Tijk sk gij s j gik si g jk ) si i log conformal transformation q -Fisher information q F gij ( p) gij ( p) hq ( p) q q divergence 1 q 1 q Dq [ p( x ) : r ( x )] (1 p( x ) r ( x ) dx ) (1 q)hq ( p ) Total Bregman divergence (Vemuri) TBD( p : q) ( p) (q) (q) ( p q) 1 | (q) | 2 Total Bregman Divergence and its Applications to Shape Retrieval •Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010 Total Bregman Divergence TD x : y D x : y 1 f 2 •rotational invariance •conformal geometry TBD examples Clustering : t-center y E x1 , , xm T-center of E x arg min TD x, xi i E x t-center x w f x f x w i i i wi 1 1 f xi 2 t-center is robust E x1 , , xn ; y 1 x x z x ; y, n influence function z x ; y z c as y : robust z x , y G G f y f x 1 w y 1 w xi f xi n z x , y G 1 Robust: z is bounded f y f y w y 1 f y w y 1 2 z x , y y y 1 y 2 Euclidean case f 1 2 x 2 How good is Total Bregman Divergence •vision •signal processing •geometry (conformal) TBD application-shape retrieval • Using MPEG7 database; • 70 classes, with 20 shapes each class (Meizhu Liu) First clustering then retrieval Advantages • Accurate; • Easy to access (shape representation); • Space and time efficient (only need to store the closed form t-centers, clustering can be done offline, hierarchical tree storage). Shape retrieval framework • • • • Shape--> Extract boundary points & align them--> Represent using mixture of Gaussians--> Clustering & use k-tree to store the clustering results; • Query on the tree. MPEG7 database • Great intraclass variability, and small interclass dissimilarity. Shape representation Experimental results Other TBD applications Diffusion tensor imaging (DTI) analysis [Vemuri] • Interpolation • Segmentation Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear