Download Conformal Changes of Divergence and Information Geometry Shun-ichi Amari RIKEN Brain Science Institute

Document related concepts
no text concepts found
Transcript
WAGOS
Conformal Changes of Divergence
and Information Geometry
Shun-ichi Amari
RIKEN Brain Science Institute
Information Geometry
Systems Theory
Statistics
Combinatorics
Information Theory
Neural Networks
Information Sciences
Physics
Math.
AI
Riemannian Manifold
Dual Affine Connections
Vision,
Shape
optimization
Manifold of Probability Distributions
Information Geometry ?
S   p  x;  ,  
2


x



1
 

p  x;  ,   
exp 

2
2
2




Riemannian metric
Dual affine connections
 p  x 

S   p  x; θ
θ  ( , )

Manifold of Probability Distributions
x  1, 2,3
{p( x)}
p   p1 , p2 , p3 
p3
p1  p2  p3  1
S   p  x; 
Sn  {p | ( p0 , p1 ,..., pn );
p
i
 1; pi  0}
Sn 1  {p | ( p0 , p1 ,..., pn ); pi  0}
Sn  Sn 1
p1
p2
Riemannian Structure
ds   gij ( )d d
2
i
 d G ( )d
T
G ( )  ( gij )
Euclidean G  E
Fisher information
j
Affine Connection
covariant derivative
XY ,
c X  Y
geodesic  X X  0,
s
i
j
g
(

)
d

d

 ij
minimal distance
straight line
X=X(t)
Duality
 X , Y   gij X iY j
X , Y  X ,  Y
X  Y , Z   X Y , Z    Y , * X Z 

*
Y
X
Y
X
Riemannian geometry:




,



Dual Affine Connections
(,  )
*
e-geodesic
log r  x, t   t log p  x   1  t  log q  x   c t 
m-geodesic
r  x, t   tp  x   1  t  q  x 
q  x
p  x
Divergence
DZ : Y 
M
D  z : y  0
D  z : y  0,
iff z  y
D  z : z  dz    gij dzi dz j
positive-definite
Y
Z
Metric and Connections Induced by Divergence
(Eguchi)
Riemannian
metric
affine connections
gij  z    i  j D  z : y  y  z
ijk  z    i  j  'k D  z : y  y  z
ijk  z    i'  'j  k D  z : y  y  z

i 
,
zi

 
yi
'
i
Duality:
X  Y , Z   X Y , Z    Y , * X Z 
 k gij   kij  
ijk  

ijk
M , g , T 
 Tijk

kji
Two Types of Divergence
Invariant divergence (Chentsov, Csiszar)
f-divergence: Fisher- structure
q(x)
D[p : q] =  p(x)f{
}dx
p(x)
Flat divergence (Bregman)
KL-divergence belongs to both classes
Invariant divergence
(manifold of probability
distributions; S  { p( x,  )} )
Chentsov
Amari -Nagaoka
y  k  x  : one-to-one
sufficient statistics
D  p X  x  : q X  x    D  pY  y  : qY  y  
Csiszar f-divergence
q
D f  p : q    pi f  i
 pi
f  u  : convex,
Ali-Silvey
Morimoto

,

f 1  0,
Dcf  p : q  cDf  p : q
f (u )
f  u   f  u   c  u  1
u
f 1  f ' 1  0 ; f '' 1  1
1
Invariant geometrical structure
alpha-geometry
S   p  x,  
(derived from invariant divergence)
gij    E  i l  j l 
Fisher information
Tijk    E  i l  j l  k l 
l  log p  x,   ;
α -connection
ijk  i, j; k  Tijk
   
Levi-civita:
: dually coupled
X Y , Z   X Y , Z  Y ,   X Z
i 

 i
Sn
:Dually Flat Structure
affine coordinates 
dual affine coordinates 
potential    ,
  
     ,
    
D 1 :  2    1     2   1  2
Dually flat manifold:
Manifold with Convex Function
   , , L , 
coordinates
S
1
n
   : convex function
negative entropy
Euclidean
2
  p    p  x  log p  x  dx,
energy
1
i 2
      
2
mathematical programming, control systems, physics, engineering, economics
Riemannian metric and flatness
{S ,  ( ),  }
Bregman divergence
D  ,             grad   
1
gij   d i d j

2

gij   i  j   ,  i  i

D  ,  d  
Flatness (affine)

: geodesic (notLevi-Civita)
Legendre Transformation
i  i  
 
one-to-one
i
       
0
i
      ,
i
i

 
i
i
D  ,             
 ( )  max { ii  ( )}
Two flat coordinate systems

: geodesic (e-geodesic)

: dual geodesic (m-geodesic)
“dually orthogonal”
i ,  j  i j
i 


i
,


 i
i
X  Y , Z   X Y , Z    Y , * X Z 
 , 
Geometry
Riemannian metric G   Gij 
D  p : p  dp  
1 T
p Gp,
2
G  p     p 
G   p      p   G 1
Straightness (affine connection)
p  t   a  bt
: -geodesic
p   t   a   b t
:  -geodesic
Pythagorean Theorem
(dually flat manifold)
D  P : Q  D Q : R  D  P : R
Euclidean space: self-dual
   
1
2

 i 
2
 
Projection Theorem
min D  P : Q 
QM
Q = m-geodesic projection of P to M
min D Q : P 
QM
Q’ = e-geodesic projection of P to M
divergence
S  {p} : space of probability distributions
invariance
dually flat space
invariant divergence
F-divergence
Fisher inf metric
Alpha connection
KL-divergence
Flat divergence
convex functions
Bregman
p(x)
D[p : q] =  p(x) log{
}dx
q(x)
Space of positive measures :
vectors, matrices, arrays
S   p,
pi  0 : ( pi  1 not n holds)
f-divergence
Bregman divergence
α-divergence
 divergence
1
1 
D [ p : q]  {
pi 
qi  pi
2
2
1
2
qi
1
2
KL-divergence
pi
D[ p : q ]   { pi log  pi  qi }
qi

}
α-representation
 12
ri   pi
log pi
,  1
,  1
  r   U  ri 
typical case: u-representation,
 12
U  z   z
 e z
U  ri   pi
,  1
,  1
(Amari-Nagaoka, Zhang)
Divergence over α-representation
D  p : q    U  ri    V  ri    ri ri
1
2
i
ri : p
 -geodesic
1
2
ri :
pi 2
1
 -geodesic

ri  log pi
ri  pi
 1
β-divergence
 1
1
U  z  
1   z   ,
 1
U0  z   exp  z 
 0
(Eguchi)
 -structure
dually flat:  -divergence
D[ p : q] 
D0  p : q  KL  p : q 
1
 (   1)
(  1)
(  1)
{
p
(
x
)

q
(
x
)

 (   1)q( x )  ( p( x )  q( x )}dx
 -div
 -div
KL-div
( ,  )  divergence
D , [ p : q]  {


pi
 

   :   divergence
  1:
 -divergence

 
qi    p i qi  }
Tsallis q-Entropy--
α  structure
  2q  1
1
Shannon entropy
1
H  E[log
]    p( x ) log p( x )dx
p( x )
Generalized log
 1
u1 q  1 ,

ln q  u   1  q

log u,


exp q  u    ln q 
HT
1

q
q 1
q 1
1
1 q
 u   1  (1  q)u}

1 
1
 E  ln q


p
x



 1 q

,
exp u

p  x  dx  1
q
2
hq  p  
HT
 p  x  dx
1

h  1 :

1 q
q
q
HR
1

log hq
1 q
 hq  p : convex:  (p)  f {hq ( p)}

1  1
Hq 
 1  Eq ln q p  x  

1  q  hq  p  

r  x 
D  p  x  : r  x     E p  ln q

p  x 

1
q
1 q

1   p  x  r  x  dx
1 q

:  -divergence ;  -structure

q -exponential
family
cf Pistone κ exponential
p  x,    exp q   x    
  x  i xi
   : convex
      
p( x ) q
xdx
hp
p( x ) q
pˆ ( x ) 
: escort distribution
hp
i  Eq  xi 

1  1
   
 1  Eq ln q p  x  

1  q  hq  p  
 ( ) and  ( )
q-Geometry derived from
: dually flat

Sn   p

p  x 
n

i 0
i 
n

i 0

pi  1

n
log q p( x )   log q pi i ( x )
i 0
pi i  x 
  (log q pi  log q p0 ) i ( x )  log q p0
1
pi1 q  p01 q 

1 q
1
piq
i 
hq
n
i 1
Dually flat structure of q-escort
pˆ
Desc  p : r    pˆ log dx
rˆ
hq  r 
p  x
 q  pˆ  x  log
dx  log
r  x
hq  p 
gijesc  Eq  i log pˆ  j log pˆ 
geodesic: exponential family
log p  x, t   t log p1  x   1  t  log p2  x   c t 
dual geodesic: q-family
pˆ  x, t   tpˆ1  x   1  t  pˆ 2 t 
q-escort probability distribution
1
q
pˆ ( x) 
p ( x)
hq ( p)
Escort geometry
gˆ ij ( )  Eq [ i log pˆ ( x,  ) j log pˆ ( x,  )]
q -escort geometry
1
q
pˆ  x   p  x 
hq
Dually flat structure of q-escort
pˆ
Desc  p : r    pˆ log dx
rˆ
hq  r 
p  x
 q  pˆ  x  log
dx  log
r  x
hq  p 
gijesc  Eq  i log pˆ  j log pˆ 
geodesic: exponential family
log p  x, t   t log p1  x   1  t  log p2  x   c t 
dual geodesic: q-family
pˆ  x, t   tpˆ1  x   1  t  pˆ 2 t 
q  Pythagorean theorem
Dq  p : q  Dq q : r   D  p : r 
q - geodesic :
p  x, t 
1 q
 tp1  x 
1 q
 1  t  p2  x 
1 q
 c t 
dual-q-geodesic:
p  x, t   tp1  x   1  t  p2  x   c  t 
q
q
q
Projection theorem
arg min Dq  p : r   p
rM
Max-entropy theorem
constraints: Eq  ak  x    ck
max  hq  p 
q -Cramer Rao theorem
: p  x,  
q-unbiased estimator ˆ
Eq ˆ   



Eq  i  ˆi  j  ˆj    gij 


 1
q -maximum likelihood estimator
ˆq  arg max pˆ  x1 ,
 arg max
, xN ;  
1
hq  
N
 p  xi ;  
q
 Bayes MAP with     hq  
 N /q
q -super-robust estimator (Eguchi)
p  x,  
max pˆ  x,    max
hq 1
bias-corrected q-estimating function
sq  x,    pˆ  x,    i log p  cq 1  
1
cq 1 
 log hq 1  
q 1
N
 s  x ,   0
i 0
q
i
1
 max
pˆ  xi ,  

hq 1
Conformal change of divergence
D  p : q     p  D  p : q
gij    p  gij
Tijk   (Tijk  sk gij  s j gik  si g jk )
si   i log 
conformal transformation
q -Fisher information
q
F
gij ( p) 
gij ( p)
hq ( p)
q
q  divergence
1
q
1 q
Dq [ p( x ) : r ( x )] 
(1   p( x ) r ( x ) dx )
(1  q)hq ( p )
Total Bregman divergence (Vemuri)
TBD( p : q) 
 ( p)   (q)   (q)  ( p  q)
1 |  (q) |
2
Total Bregman Divergence and
its Applications to Shape
Retrieval
•Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
Total Bregman Divergence
TD  x : y  
D  x : y
1  f
2
•rotational invariance
•conformal geometry
TBD examples
Clustering : t-center
y
E  x1 ,
, xm 
T-center of E
x   arg min  TD  x, xi 
i
E
x
t-center x

w f  x 

f  x  
w

i
i
i
wi 
1
1  f  xi 
2
t-center is robust
E    x1 ,
, xn ; y
1
x  x   z  x ; y,  
n



influence function z  x ; y 

z  c as y   : robust
z  x , y   G
G


f
y


f
x




1
w y
1
w  xi f  xi 

n
z  x  , y   G 1
Robust: z is bounded
f  y 
f  y 

w y
1  f  y 
w y  1
2

z  x , y   y
y
1 y
2
Euclidean case f 
1 2
x
2
How good is Total Bregman Divergence
•vision
•signal processing
•geometry (conformal)
TBD application-shape retrieval
• Using MPEG7 database;
• 70 classes, with 20 shapes each class
(Meizhu Liu)
First clustering then retrieval
Advantages
• Accurate;
• Easy to access (shape representation);
• Space and time efficient (only need to store the
closed form t-centers, clustering can be done offline,
hierarchical tree storage).
Shape retrieval framework
•
•
•
•
Shape-->
Extract boundary points & align them-->
Represent using mixture of Gaussians-->
Clustering & use k-tree to store the clustering
results;
• Query on the tree.
MPEG7 database
• Great intraclass variability, and small
interclass dissimilarity.
Shape representation
Experimental results
Other TBD applications
Diffusion tensor imaging (DTI) analysis [Vemuri]
• Interpolation
• Segmentation
Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and
its Applications to DTI Analysis, IEEE TMI, to appear
Related documents