Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
WAGOS
Conformal Changes of Divergence
and Information Geometry
Shun-ichi Amari
RIKEN Brain Science Institute
Information Geometry
Systems Theory
Statistics
Combinatorics
Information Theory
Neural Networks
Information Sciences
Physics
Math.
AI
Riemannian Manifold
Dual Affine Connections
Vision,
Shape
optimization
Manifold of Probability Distributions
Information Geometry ?
S p x; ,
2
x
1
p x; ,
exp
2
2
2
Riemannian metric
Dual affine connections
p x
S p x; θ
θ ( , )
Manifold of Probability Distributions
x 1, 2,3
{p( x)}
p p1 , p2 , p3
p3
p1 p2 p3 1
S p x;
Sn {p | ( p0 , p1 ,..., pn );
p
i
1; pi 0}
Sn 1 {p | ( p0 , p1 ,..., pn ); pi 0}
Sn Sn 1
p1
p2
Riemannian Structure
ds gij ( )d d
2
i
d G ( )d
T
G ( ) ( gij )
Euclidean G E
Fisher information
j
Affine Connection
covariant derivative
XY ,
c X Y
geodesic X X 0,
s
i
j
g
(
)
d
d
ij
minimal distance
straight line
X=X(t)
Duality
X , Y gij X iY j
X , Y X , Y
X Y , Z X Y , Z Y , * X Z
*
Y
X
Y
X
Riemannian geometry:
,
Dual Affine Connections
(, )
*
e-geodesic
log r x, t t log p x 1 t log q x c t
m-geodesic
r x, t tp x 1 t q x
q x
p x
Divergence
DZ : Y
M
D z : y 0
D z : y 0,
iff z y
D z : z dz gij dzi dz j
positive-definite
Y
Z
Metric and Connections Induced by Divergence
(Eguchi)
Riemannian
metric
affine connections
gij z i j D z : y y z
ijk z i j 'k D z : y y z
ijk z i' 'j k D z : y y z
i
,
zi
yi
'
i
Duality:
X Y , Z X Y , Z Y , * X Z
k gij kij
ijk
ijk
M , g , T
Tijk
kji
Two Types of Divergence
Invariant divergence (Chentsov, Csiszar)
f-divergence: Fisher- structure
q(x)
D[p : q] = p(x)f{
}dx
p(x)
Flat divergence (Bregman)
KL-divergence belongs to both classes
Invariant divergence
(manifold of probability
distributions; S { p( x, )} )
Chentsov
Amari -Nagaoka
y k x : one-to-one
sufficient statistics
D p X x : q X x D pY y : qY y
Csiszar f-divergence
q
D f p : q pi f i
pi
f u : convex,
Ali-Silvey
Morimoto
,
f 1 0,
Dcf p : q cDf p : q
f (u )
f u f u c u 1
u
f 1 f ' 1 0 ; f '' 1 1
1
Invariant geometrical structure
alpha-geometry
S p x,
(derived from invariant divergence)
gij E i l j l
Fisher information
Tijk E i l j l k l
l log p x, ;
α -connection
ijk i, j; k Tijk
Levi-civita:
: dually coupled
X Y , Z X Y , Z Y , X Z
i
i
Sn
:Dually Flat Structure
affine coordinates
dual affine coordinates
potential ,
,
D 1 : 2 1 2 1 2
Dually flat manifold:
Manifold with Convex Function
, , L ,
coordinates
S
1
n
: convex function
negative entropy
Euclidean
2
p p x log p x dx,
energy
1
i 2
2
mathematical programming, control systems, physics, engineering, economics
Riemannian metric and flatness
{S , ( ), }
Bregman divergence
D , grad
1
gij d i d j
2
gij i j , i i
D , d
Flatness (affine)
: geodesic (notLevi-Civita)
Legendre Transformation
i i
one-to-one
i
0
i
,
i
i
i
i
D ,
( ) max { ii ( )}
Two flat coordinate systems
: geodesic (e-geodesic)
: dual geodesic (m-geodesic)
“dually orthogonal”
i , j i j
i
i
,
i
i
X Y , Z X Y , Z Y , * X Z
,
Geometry
Riemannian metric G Gij
D p : p dp
1 T
p Gp,
2
G p p
G p p G 1
Straightness (affine connection)
p t a bt
: -geodesic
p t a b t
: -geodesic
Pythagorean Theorem
(dually flat manifold)
D P : Q D Q : R D P : R
Euclidean space: self-dual
1
2
i
2
Projection Theorem
min D P : Q
QM
Q = m-geodesic projection of P to M
min D Q : P
QM
Q’ = e-geodesic projection of P to M
divergence
S {p} : space of probability distributions
invariance
dually flat space
invariant divergence
F-divergence
Fisher inf metric
Alpha connection
KL-divergence
Flat divergence
convex functions
Bregman
p(x)
D[p : q] = p(x) log{
}dx
q(x)
Space of positive measures :
vectors, matrices, arrays
S p,
pi 0 : ( pi 1 not n holds)
f-divergence
Bregman divergence
α-divergence
divergence
1
1
D [ p : q] {
pi
qi pi
2
2
1
2
qi
1
2
KL-divergence
pi
D[ p : q ] { pi log pi qi }
qi
}
α-representation
12
ri pi
log pi
, 1
, 1
r U ri
typical case: u-representation,
12
U z z
e z
U ri pi
, 1
, 1
(Amari-Nagaoka, Zhang)
Divergence over α-representation
D p : q U ri V ri ri ri
1
2
i
ri : p
-geodesic
1
2
ri :
pi 2
1
-geodesic
ri log pi
ri pi
1
β-divergence
1
1
U z
1 z ,
1
U0 z exp z
0
(Eguchi)
-structure
dually flat: -divergence
D[ p : q]
D0 p : q KL p : q
1
( 1)
( 1)
( 1)
{
p
(
x
)
q
(
x
)
( 1)q( x ) ( p( x ) q( x )}dx
-div
-div
KL-div
( , ) divergence
D , [ p : q] {
pi
: divergence
1:
-divergence
qi p i qi }
Tsallis q-Entropy--
α structure
2q 1
1
Shannon entropy
1
H E[log
] p( x ) log p( x )dx
p( x )
Generalized log
1
u1 q 1 ,
ln q u 1 q
log u,
exp q u ln q
HT
1
q
q 1
q 1
1
1 q
u 1 (1 q)u}
1
1
E ln q
p
x
1 q
,
exp u
p x dx 1
q
2
hq p
HT
p x dx
1
h 1 :
1 q
q
q
HR
1
log hq
1 q
hq p : convex: (p) f {hq ( p)}
1 1
Hq
1 Eq ln q p x
1 q hq p
r x
D p x : r x E p ln q
p x
1
q
1 q
1 p x r x dx
1 q
: -divergence ; -structure
q -exponential
family
cf Pistone κ exponential
p x, exp q x
x i xi
: convex
p( x ) q
xdx
hp
p( x ) q
pˆ ( x )
: escort distribution
hp
i Eq xi
1 1
1 Eq ln q p x
1 q hq p
( ) and ( )
q-Geometry derived from
: dually flat
Sn p
p x
n
i 0
i
n
i 0
pi 1
n
log q p( x ) log q pi i ( x )
i 0
pi i x
(log q pi log q p0 ) i ( x ) log q p0
1
pi1 q p01 q
1 q
1
piq
i
hq
n
i 1
Dually flat structure of q-escort
pˆ
Desc p : r pˆ log dx
rˆ
hq r
p x
q pˆ x log
dx log
r x
hq p
gijesc Eq i log pˆ j log pˆ
geodesic: exponential family
log p x, t t log p1 x 1 t log p2 x c t
dual geodesic: q-family
pˆ x, t tpˆ1 x 1 t pˆ 2 t
q-escort probability distribution
1
q
pˆ ( x)
p ( x)
hq ( p)
Escort geometry
gˆ ij ( ) Eq [ i log pˆ ( x, ) j log pˆ ( x, )]
q -escort geometry
1
q
pˆ x p x
hq
Dually flat structure of q-escort
pˆ
Desc p : r pˆ log dx
rˆ
hq r
p x
q pˆ x log
dx log
r x
hq p
gijesc Eq i log pˆ j log pˆ
geodesic: exponential family
log p x, t t log p1 x 1 t log p2 x c t
dual geodesic: q-family
pˆ x, t tpˆ1 x 1 t pˆ 2 t
q Pythagorean theorem
Dq p : q Dq q : r D p : r
q - geodesic :
p x, t
1 q
tp1 x
1 q
1 t p2 x
1 q
c t
dual-q-geodesic:
p x, t tp1 x 1 t p2 x c t
q
q
q
Projection theorem
arg min Dq p : r p
rM
Max-entropy theorem
constraints: Eq ak x ck
max hq p
q -Cramer Rao theorem
: p x,
q-unbiased estimator ˆ
Eq ˆ
Eq i ˆi j ˆj gij
1
q -maximum likelihood estimator
ˆq arg max pˆ x1 ,
arg max
, xN ;
1
hq
N
p xi ;
q
Bayes MAP with hq
N /q
q -super-robust estimator (Eguchi)
p x,
max pˆ x, max
hq 1
bias-corrected q-estimating function
sq x, pˆ x, i log p cq 1
1
cq 1
log hq 1
q 1
N
s x , 0
i 0
q
i
1
max
pˆ xi ,
hq 1
Conformal change of divergence
D p : q p D p : q
gij p gij
Tijk (Tijk sk gij s j gik si g jk )
si i log
conformal transformation
q -Fisher information
q
F
gij ( p)
gij ( p)
hq ( p)
q
q divergence
1
q
1 q
Dq [ p( x ) : r ( x )]
(1 p( x ) r ( x ) dx )
(1 q)hq ( p )
Total Bregman divergence (Vemuri)
TBD( p : q)
( p) (q) (q) ( p q)
1 | (q) |
2
Total Bregman Divergence and
its Applications to Shape
Retrieval
•Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
Total Bregman Divergence
TD x : y
D x : y
1 f
2
•rotational invariance
•conformal geometry
TBD examples
Clustering : t-center
y
E x1 ,
, xm
T-center of E
x arg min TD x, xi
i
E
x
t-center x
w f x
f x
w
i
i
i
wi
1
1 f xi
2
t-center is robust
E x1 ,
, xn ; y
1
x x z x ; y,
n
influence function z x ; y
z c as y : robust
z x , y G
G
f
y
f
x
1
w y
1
w xi f xi
n
z x , y G 1
Robust: z is bounded
f y
f y
w y
1 f y
w y 1
2
z x , y y
y
1 y
2
Euclidean case f
1 2
x
2
How good is Total Bregman Divergence
•vision
•signal processing
•geometry (conformal)
TBD application-shape retrieval
• Using MPEG7 database;
• 70 classes, with 20 shapes each class
(Meizhu Liu)
First clustering then retrieval
Advantages
• Accurate;
• Easy to access (shape representation);
• Space and time efficient (only need to store the
closed form t-centers, clustering can be done offline,
hierarchical tree storage).
Shape retrieval framework
•
•
•
•
Shape-->
Extract boundary points & align them-->
Represent using mixture of Gaussians-->
Clustering & use k-tree to store the clustering
results;
• Query on the tree.
MPEG7 database
• Great intraclass variability, and small
interclass dissimilarity.
Shape representation
Experimental results
Other TBD applications
Diffusion tensor imaging (DTI) analysis [Vemuri]
• Interpolation
• Segmentation
Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and
its Applications to DTI Analysis, IEEE TMI, to appear