Download A quick introduction to Optimal Transport

Document related concepts

Inverse problem wikipedia , lookup

Travelling salesman problem wikipedia , lookup

Transcript
Introduction
The discrete case
Measures
The Euclidean case
Gradient flows, optimal transport,
and evolution PDE’s
2 - A quick introduction to Optimal Transport
Giuseppe Savaré
http://www.imati.cnr.it/∼savare
Dipartimento di Matematica, Università di Pavia
GNFM Summer School
Ravello, September 13–18, 2010
1
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
2
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
3
Introduction
The discrete case
Measures
The Euclidean case
Gaspard Monge (1746-1818)
42
3 The founding fathers of optimal transport
1781: “La théorie des déblais et des remblais ”
minimize the total cost. Monge assumed that the transport cost of one
unit of mass along a certain distance was given by the product of the
Problem: how to transport soil from the groud to a given configuration in the
mass
byefficient”
the distance.
“most
way.
T
x
déblais
y
remblais
Fig. 3.1. Monge’s problem of déblais and remblais
The transport cost is proportional to the distance |T (x) − x|.
Nowadays there is a Monge street in Paris, and therein one can find
4
Introduction
The discrete case
Measures
The Euclidean case
Leonid Kantorovich (1912-1986)
1939: Mathematical Methods of Organizing and
Planning of Production,
(unpublished until 1960).
1942: On the translocation of masses
1948: On a problem of Monge
1975: Nobel prize, jointly with Tjalling Koopmans,
“for their contributions to the theory of optimum allocation of resources”
Autobiography:
http://nobelprize.org/nobel prizes/economics/laureates/1975/kantorovich-autobio.html
Parallel contributions:
1941: Frank Hitchcock, The distribution of a product from several sources to
numerous localities (Jour. Math. Phys.)
1947: Tjalling Koopmans, Optimum utilization of the transportation system.
1947: George Dantzig, simplex method.
5
Introduction
The discrete case
Measures
The Euclidean case
Leonid Kantorovich (1912-1986)
1939: Mathematical Methods of Organizing and
Planning of Production,
(unpublished until 1960).
1942: On the translocation of masses
1948: On a problem of Monge
1975: Nobel prize, jointly with Tjalling Koopmans,
“for their contributions to the theory of optimum allocation of resources”
Autobiography:
http://nobelprize.org/nobel prizes/economics/laureates/1975/kantorovich-autobio.html
Parallel contributions:
1941: Frank Hitchcock, The distribution of a product from several sources to
numerous localities (Jour. Math. Phys.)
1947: Tjalling Koopmans, Optimum utilization of the transportation system.
1947: George Dantzig, simplex method.
Introduction
The discrete case
Measures
The Euclidean case
Leonid Kantorovich (1912-1986)
1939: Mathematical Methods of Organizing and
Planning of Production,
(unpublished until 1960).
1942: On the translocation of masses
1948: On a problem of Monge
1975: Nobel prize, jointly with Tjalling Koopmans,
“for their contributions to the theory of optimum allocation of resources”
Autobiography:
http://nobelprize.org/nobel prizes/economics/laureates/1975/kantorovich-autobio.html
Parallel contributions:
1941: Frank Hitchcock, The distribution of a product from several sources to
numerous localities (Jour. Math. Phys.)
1947: Tjalling Koopmans, Optimum utilization of the transportation system.
1947: George Dantzig, simplex method.
Introduction
The discrete case
Measures
The Euclidean case
Twoards the recent theory...
I
Statistical and probabilistic aspects:
(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Fréchet,. . . )
I
Rachev-Rüschendorf, Mass Transportation Problems (1998)
Particle systems, Boltzmann equation:
Dobrushin, Tanaka (∼’70)
I
Yann Brenier (’89): fluid mechanics, transport
map, polar decomposition. Dynamical interpratation
of optimal transport.
I
John Mather: Lagrangian dynamical systems.
Mike Cullen: meteorologic models, semigeostrofic equations.
Regularity, geometric and functional inequalities, Riemannian geometry,
urban planning, evolution equations, etc.:
L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.
Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,
T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .
I
I
I
C. Villani: Optimal transport: Old and New
Springer (2009) 978 p.
6
Introduction
The discrete case
Measures
The Euclidean case
Twoards the recent theory...
I
Statistical and probabilistic aspects:
(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Fréchet,. . . )
I
Rachev-Rüschendorf, Mass Transportation Problems (1998)
Particle systems, Boltzmann equation:
Dobrushin, Tanaka (∼’70)
I
Yann Brenier (’89): fluid mechanics, transport
map, polar decomposition. Dynamical interpratation
of optimal transport.
I
John Mather: Lagrangian dynamical systems.
Mike Cullen: meteorologic models, semigeostrofic equations.
Regularity, geometric and functional inequalities, Riemannian geometry,
urban planning, evolution equations, etc.:
L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.
Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,
T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .
I
I
I
C. Villani: Optimal transport: Old and New
Springer (2009) 978 p.
Introduction
The discrete case
Measures
The Euclidean case
Twoards the recent theory...
I
Statistical and probabilistic aspects:
(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Fréchet,. . . )
I
Rachev-Rüschendorf, Mass Transportation Problems (1998)
Particle systems, Boltzmann equation:
Dobrushin, Tanaka (∼’70)
I
Yann Brenier (’89): fluid mechanics, transport
map, polar decomposition. Dynamical interpratation
of optimal transport.
I
John Mather: Lagrangian dynamical systems.
Mike Cullen: meteorologic models, semigeostrofic equations.
Regularity, geometric and functional inequalities, Riemannian geometry,
urban planning, evolution equations, etc.:
L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.
Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,
T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .
I
I
I
C. Villani: Optimal transport: Old and New
Springer (2009) 978 p.
Introduction
The discrete case
Measures
The Euclidean case
Twoards the recent theory...
I
Statistical and probabilistic aspects:
(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Fréchet,. . . )
I
Rachev-Rüschendorf, Mass Transportation Problems (1998)
Particle systems, Boltzmann equation:
Dobrushin, Tanaka (∼’70)
I
Yann Brenier (’89): fluid mechanics, transport
map, polar decomposition. Dynamical interpratation
of optimal transport.
I
John Mather: Lagrangian dynamical systems.
Mike Cullen: meteorologic models, semigeostrofic equations.
Regularity, geometric and functional inequalities, Riemannian geometry,
urban planning, evolution equations, etc.:
L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.
Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,
T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .
I
I
I
C. Villani: Optimal transport: Old and New
Springer (2009) 978 p.
Introduction
The discrete case
Measures
The Euclidean case
Twoards the recent theory...
I
Statistical and probabilistic aspects:
(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Fréchet,. . . )
I
Rachev-Rüschendorf, Mass Transportation Problems (1998)
Particle systems, Boltzmann equation:
Dobrushin, Tanaka (∼’70)
I
Yann Brenier (’89): fluid mechanics, transport
map, polar decomposition. Dynamical interpratation
of optimal transport.
I
John Mather: Lagrangian dynamical systems.
Mike Cullen: meteorologic models, semigeostrofic equations.
Regularity, geometric and functional inequalities, Riemannian geometry,
urban planning, evolution equations, etc.:
L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.
Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,
T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .
I
I
I
C. Villani: Optimal transport: Old and New
Springer (2009) 978 p.
Introduction
The discrete case
Measures
The Euclidean case
Twoards the recent theory...
I
Statistical and probabilistic aspects:
(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Fréchet,. . . )
I
Rachev-Rüschendorf, Mass Transportation Problems (1998)
Particle systems, Boltzmann equation:
Dobrushin, Tanaka (∼’70)
I
Yann Brenier (’89): fluid mechanics, transport
map, polar decomposition. Dynamical interpratation
of optimal transport.
I
John Mather: Lagrangian dynamical systems.
Mike Cullen: meteorologic models, semigeostrofic equations.
Regularity, geometric and functional inequalities, Riemannian geometry,
urban planning, evolution equations, etc.:
L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.
Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,
T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .
I
I
I
C. Villani: Optimal transport: Old and New
Springer (2009) 978 p.
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
7
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
y1
x1
x2
y2
x3
y3
x4
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
y1
x1
x2
y2
x3
y3
x4
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
c11
x1
y1
c12
x2
c13
y2
x3
y3
x4
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
y1
x1
c21
x2
c22
y2
c23
x3
y3
x4
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
T11
x1
y1
T21
x2
y2
T33
x3
T42
x4
y3
T43
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
y1
x1
x2
y2
x3
y3
x4
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Discrete formulation
• Initial configuration of resources in X =
{x1 , · · · , xh }; at every point xi ∈ X it
is available the quantity mi = m(xi ).
• Final configuration Y = {y1 , · · · , yn }:
at every point yj the quantity nj =
n(yj ) is expected.
• The unitary cost cij = c(xi , yj ) for
transporting the single unit from position xi to the destination yj .
y1
x1
x2
y2
x3
y3
x4
Admissible transference plan: choose the quantities Ti,j = T (xi , yj ) moved
from xi to yj , so that
T (xi , yj ) ≥ 0,
X
T (xi , y) = m(xi ),
y∈Y
The cost of the transference plan T is C(T ) :=
X
T (x, yj ) = n(yj )
x∈X
X
c(x, y)T (x, y)
x∈X,y∈Y
8
Introduction
The discrete case
Measures
The Euclidean case
Optimal transport
Problem
Find the best transference plan T which minimizes the cost C(T ) among all the
admissible plans.
The linear programming structure: given positive coefficients mi , nj and ci,j
find the quantities Ti,j minimizing the linear functional
C(T ) =
X
ci,j Ti,j
i,j
under the linear/convex constraints
X
Ti,j ≥ 0,
Ti,j = mi ,
j
X
Ti,j = mj
i
In vector notation:
~ · T~ :
min C
A0 T~ ≥ 0,
A1 T~ = ~b
In the discrete case existence of the optimal plan is easy; more important are
3 foundamental properties:
I
Cyclical monotonicity of the optimal transference plan.
I
Dual characterization, Kantorovich potentials (prices in economic terms),
linear programming.
I
Integrality of the transference plan, transport maps.
9
Introduction
The discrete case
Measures
The Euclidean case
Optimal transport
Problem
Find the best transference plan T which minimizes the cost C(T ) among all the
admissible plans.
The linear programming structure: given positive coefficients mi , nj and ci,j
find the quantities Ti,j minimizing the linear functional
C(T ) =
X
ci,j Ti,j
i,j
under the linear/convex constraints
X
Ti,j ≥ 0,
Ti,j = mi ,
j
X
Ti,j = mj
i
In vector notation:
~ · T~ :
min C
A0 T~ ≥ 0,
A1 T~ = ~b
In the discrete case existence of the optimal plan is easy; more important are
3 foundamental properties:
I
Cyclical monotonicity of the optimal transference plan.
I
Dual characterization, Kantorovich potentials (prices in economic terms),
linear programming.
I
Integrality of the transference plan, transport maps.
Introduction
The discrete case
Measures
The Euclidean case
Optimal transport
Problem
Find the best transference plan T which minimizes the cost C(T ) among all the
admissible plans.
The linear programming structure: given positive coefficients mi , nj and ci,j
find the quantities Ti,j minimizing the linear functional
C(T ) =
X
ci,j Ti,j
i,j
under the linear/convex constraints
X
Ti,j ≥ 0,
Ti,j = mi ,
j
X
Ti,j = mj
i
In vector notation:
~ · T~ :
min C
A0 T~ ≥ 0,
A1 T~ = ~b
In the discrete case existence of the optimal plan is easy; more important are
3 foundamental properties:
I
Cyclical monotonicity of the optimal transference plan.
I
Dual characterization, Kantorovich potentials (prices in economic terms),
linear programming.
I
Integrality of the transference plan, transport maps.
Introduction
The discrete case
Measures
The Euclidean case
Cyclical monotonicity
Consider an aribtrary collection of couples (x, y) joined by a transport ray , i.e.
T (x, y) > 0: in the picture (x2 , y1 ), (x3 , y2 ), (x4 , y3 )
T11
x1
x2
y1
T21
y2
T33
x3
T42
x4
y3
T43
The associated (unitary) cost is
c(x2 , y1 ) + c(x3 , y2 ) + c(x4 , y3 ) ≤ c(x2 , y2 ) + c(x3 , y3 ) + c(x4 , y1 )
if one applies a (cyclical) permutation σ of the targets: y1 → y2 → y3 → y1
Theorem (Rachev-Ruschendorf )
If T is optimal the cost of any rearranged configuration by a cyclical permutation
cannot decrease.
10
Introduction
The discrete case
Measures
The Euclidean case
Cyclical monotonicity
Consider an aribtrary collection of couples (x, y) joined by a transport ray , i.e.
T (x, y) > 0: in the picture (x2 , y1 ), (x3 , y2 ), (x4 , y3 )
y1
x2
y2
x3
y3
x4
The associated (unitary) cost is
c(x2 , y1 ) + c(x3 , y2 ) + c(x4 , y3 ) ≤ c(x2 , y2 ) + c(x3 , y3 ) + c(x4 , y1 )
if one applies a (cyclical) permutation σ of the targets: y1 → y2 → y3 → y1
Theorem (Rachev-Ruschendorf )
If T is optimal the cost of any rearranged configuration by a cyclical permutation
cannot decrease.
10
Introduction
The discrete case
Measures
The Euclidean case
Cyclical monotonicity
Consider an aribtrary collection of couples (x, y) joined by a transport ray , i.e.
T (x, y) > 0: in the picture (x2 , y1 ), (x3 , y2 ), (x4 , y3 )
y1
x2
y2
x3
y3
x4
The associated (unitary) cost is
c(x2 , y1 ) + c(x3 , y2 ) + c(x4 , y3 ) ≤ c(x2 , y2 ) + c(x3 , y3 ) + c(x4 , y1 )
if one applies a (cyclical) permutation σ of the targets: y1 → y2 → y3 → y1
Theorem (Rachev-Ruschendorf )
If T is optimal the cost of any rearranged configuration by a cyclical permutation
cannot decrease.
10
Introduction
The discrete case
Measures
The Euclidean case
Cyclical monotonicity
Consider an aribtrary collection of couples (x, y) joined by a transport ray , i.e.
T (x, y) > 0: in the picture (x2 , y1 ), (x3 , y2 ), (x4 , y3 )
y1
σ
x2
y2
x3
σ
y3
x4
The associated (unitary) cost is
c(x2 , y1 ) + c(x3 , y2 ) + c(x4 , y3 ) ≤ c(x2 , y2 ) + c(x3 , y3 ) + c(x4 , y1 )
if one applies a (cyclical) permutation σ of the targets: y1 → y2 → y3 → y1
Theorem (Rachev-Ruschendorf )
If T is optimal the cost of any rearranged configuration by a cyclical permutation
cannot decrease.
10
Introduction
The discrete case
Measures
The Euclidean case
Cyclical monotonicity
Consider an aribtrary collection of couples (x, y) joined by a transport ray , i.e.
T (x, y) > 0: in the picture (x2 , y1 ), (x3 , y2 ), (x4 , y3 )
y1
σ
x2
y2
x3
σ
y3
x4
The associated (unitary) cost is
c(x2 , y1 ) + c(x3 , y2 ) + c(x4 , y3 ) ≤ c(x2 , y2 ) + c(x3 , y3 ) + c(x4 , y1 )
if one applies a (cyclical) permutation σ of the targets: y1 → y2 → y3 → y1
Theorem (Rachev-Ruschendorf )
If T is optimal the cost of any rearranged configuration by a cyclical permutation
cannot decrease.
10
Introduction
The discrete case
Measures
The Euclidean case
Cyclical monotonicity is also sufficient
Theorem
If T is a cyclically monotone admissible plan then it is optimal.
11
Introduction
The discrete case
Measures
The Euclidean case
The dual problem: optimal prices
Linear programming: the dual problem gives a crucial insight on the structure
of the optimal transference plan.
Economic interpretation: a transport company offers to take care the
transportation job: they will pay the price u(x) to buy a unit placed at the
point x and they will sell it at y for the price v(y).
To be competitive, the prices should be more convenient than the transportation
cost c(x, y):
v(y) − u(x) ≤ c(x, y)
x ∈ X, y ∈ Y
(*)
The total profit for the company is
X
X
P(u, v) :=
n(y)v(y) −
m(x)u(x)
y∈Y
x∈X
and their problem is to find the prices which maximaize the profits
max P(u, v)
among all the competitive prices (u, v) satisfying (*)
Clearly C(T ) ≥ P(u, v) for every admissible trasnference plan T and every couple
of competitive prices u, v.
12
Introduction
The discrete case
Measures
The Euclidean case
The dual problem: optimal prices
Linear programming: the dual problem gives a crucial insight on the structure
of the optimal transference plan.
Economic interpretation: a transport company offers to take care the
transportation job: they will pay the price u(x) to buy a unit placed at the
point x and they will sell it at y for the price v(y).
To be competitive, the prices should be more convenient than the transportation
cost c(x, y):
v(y) − u(x) ≤ c(x, y)
x ∈ X, y ∈ Y
(*)
The total profit for the company is
X
X
P(u, v) :=
n(y)v(y) −
m(x)u(x)
y∈Y
x∈X
and their problem is to find the prices which maximaize the profits
max P(u, v)
among all the competitive prices (u, v) satisfying (*)
Clearly C(T ) ≥ P(u, v) for every admissible trasnference plan T and every couple
of competitive prices u, v.
12
Introduction
The discrete case
Measures
The Euclidean case
The dual problem: optimal prices
Linear programming: the dual problem gives a crucial insight on the structure
of the optimal transference plan.
Economic interpretation: a transport company offers to take care the
transportation job: they will pay the price u(x) to buy a unit placed at the
point x and they will sell it at y for the price v(y).
To be competitive, the prices should be more convenient than the transportation
cost c(x, y):
v(y) − u(x) ≤ c(x, y)
x ∈ X, y ∈ Y
(*)
The total profit for the company is
X
X
P(u, v) :=
n(y)v(y) −
m(x)u(x)
y∈Y
x∈X
and their problem is to find the prices which maximaize the profits
max P(u, v)
among all the competitive prices (u, v) satisfying (*)
Clearly C(T ) ≥ P(u, v) for every admissible trasnference plan T and every couple
of competitive prices u, v.
12
Introduction
The discrete case
Measures
The Euclidean case
The dual problem: optimal prices
Linear programming: the dual problem gives a crucial insight on the structure
of the optimal transference plan.
Economic interpretation: a transport company offers to take care the
transportation job: they will pay the price u(x) to buy a unit placed at the
point x and they will sell it at y for the price v(y).
To be competitive, the prices should be more convenient than the transportation
cost c(x, y):
v(y) − u(x) ≤ c(x, y)
x ∈ X, y ∈ Y
(*)
The total profit for the company is
X
X
P(u, v) :=
n(y)v(y) −
m(x)u(x)
y∈Y
x∈X
and their problem is to find the prices which maximaize the profits
max P(u, v)
among all the competitive prices (u, v) satisfying (*)
Clearly C(T ) ≥ P(u, v) for every admissible trasnference plan T and every couple
of competitive prices u, v.
12
Introduction
The discrete case
Measures
The Euclidean case
Duality theorem
Theorem (Min-max and “complementary slackness”)
An admissible transference plan T is optimal if and only if there exist competitive
prices (u, v) such that
C(T ) = P(u, v).
In particular
min C(T ) = max P(u, v).
T
(u,v)
Moreover, the “slackness”
S(x, y) := c(x, y) − u(x) − v(y) ≥ 0
satisfies the “complementary slackness principle”
T (x, y)S(x, y) = 0
i.e.
T (x, y) > 0 ⇒ S(x, y) = 0.
“If x and y are connected through an optimal transport ray then their respective
prices u(x) e v(y) are maximal: v(y) − u(x) = c(x, y).”
Introduction
The discrete case
Measures
The Euclidean case
Duality theorem
Theorem (Min-max and “complementary slackness”)
An admissible transference plan T is optimal if and only if there exist competitive
prices (u, v) such that
C(T ) = P(u, v).
In particular
min C(T ) = max P(u, v).
T
(u,v)
Moreover, the “slackness”
S(x, y) := c(x, y) − u(x) − v(y) ≥ 0
satisfies the “complementary slackness principle”
T (x, y)S(x, y) = 0
i.e.
T (x, y) > 0 ⇒ S(x, y) = 0.
“If x and y are connected through an optimal transport ray then their respective
prices u(x) e v(y) are maximal: v(y) − u(x) = c(x, y).”
Introduction
The discrete case
Measures
The Euclidean case
Duality theorem
Theorem (Min-max and “complementary slackness”)
An admissible transference plan T is optimal if and only if there exist competitive
prices (u, v) such that
C(T ) = P(u, v).
In particular
min C(T ) = max P(u, v).
T
(u,v)
Moreover, the “slackness”
S(x, y) := c(x, y) − u(x) − v(y) ≥ 0
satisfies the “complementary slackness principle”
T (x, y)S(x, y) = 0
i.e.
T (x, y) > 0 ⇒ S(x, y) = 0.
“If x and y are connected through an optimal transport ray then their respective
prices u(x) e v(y) are maximal: v(y) − u(x) = c(x, y).”
Introduction
The discrete case
Measures
The Euclidean case
Duality theorem
Theorem (Min-max and “complementary slackness”)
An admissible transference plan T is optimal if and only if there exist competitive
prices (u, v) such that
C(T ) = P(u, v).
In particular
min C(T ) = max P(u, v).
T
(u,v)
Moreover, the “slackness”
S(x, y) := c(x, y) − u(x) − v(y) ≥ 0
satisfies the “complementary slackness principle”
T (x, y)S(x, y) = 0
i.e.
T (x, y) > 0 ⇒ S(x, y) = 0.
“If x and y are connected through an optimal transport ray then their respective
prices u(x) e v(y) are maximal: v(y) − u(x) = c(x, y).”
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
i
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
i
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
“
i
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
“
i
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
“
i
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
“
i
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Duality via Von Neumann min-max
min
T
X
Ti,j ≥ 0,
ci,j Ti,j :
i,j
X
Ti,j = mi ,
j
X
Ti,j = nj .
i
Introduce Lagrange multipliers Si,j ≥ 0, ui , vj for the constraint
min
T
X
i,j
ci,j Ti,j = min max
T
S,u,v
X
−
X
i,j
ui
S,u,v
X
= max
u,v
= max
u,v
T
j
“
i
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
= max min
X
Si,j Ti,j
” X “X
”
Ti,j − mi +
vj
Ti,j − mj
j
= min max
S,u,v
X
i,j
“X
i
T
ci,j Ti,j −
X
“
”
Ti,j ci,j − Si,j − ui + vj + vj nj − ui mi
i,j
vj nj − ui mi :
ci,j − Si,j − ui − vj = 0
vj nj − ui mi :
ci,j − ui − vj ≥ 0.
i,j
X
i,j
14
Introduction
The discrete case
Measures
The Euclidean case
Integrality
Theorem
If the initial and final configuration m(x), n(y) ∈ N are integers then there
exists an integer optimal transference plan T , i.e. T (x, y) ∈ N.
In other words, there is no need to split unitary quantities in order to realize the
optimal transport.
Corollary
If m(x) ≡ 1 and n(y) are integers, then the transference plan T is associated to
a transport map t : X → Y so that
T (x, y) > 0
⇔
y = t(x).
If moreover n(y) ≡ 1 then the map t is one-to-one.
Roughly speaking: from every point x ∈ X start a unique transport ray and
mass is not splitted in various directions.
Introduction
The discrete case
Measures
The Euclidean case
Integrality
Theorem
If the initial and final configuration m(x), n(y) ∈ N are integers then there
exists an integer optimal transference plan T , i.e. T (x, y) ∈ N.
In other words, there is no need to split unitary quantities in order to realize the
optimal transport.
Corollary
If m(x) ≡ 1 and n(y) are integers, then the transference plan T is associated to
a transport map t : X → Y so that
T (x, y) > 0
⇔
y = t(x).
If moreover n(y) ≡ 1 then the map t is one-to-one.
Roughly speaking: from every point x ∈ X start a unique transport ray and
mass is not splitted in various directions.
Introduction
The discrete case
Measures
The Euclidean case
Integrality
Theorem
If the initial and final configuration m(x), n(y) ∈ N are integers then there
exists an integer optimal transference plan T , i.e. T (x, y) ∈ N.
In other words, there is no need to split unitary quantities in order to realize the
optimal transport.
Corollary
If m(x) ≡ 1 and n(y) are integers, then the transference plan T is associated to
a transport map t : X → Y so that
T (x, y) > 0
⇔
y = t(x).
If moreover n(y) ≡ 1 then the map t is one-to-one.
Roughly speaking: from every point x ∈ X start a unique transport ray and
mass is not splitted in various directions.
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
16
Introduction
The discrete case
Measures
The Euclidean case
Measure data
I
I
I
X, Y discrete spaces
X, Y topological spaces (R, RN , locally compact
spaces, Polish (i.e. complete and separable) spaces, Radon spaces, . . . ): here
RN .
The cost
a (lower-semi) continuous function c : X × Y → R ∪ {+∞}.
The initial and final configurations m(x), n(y)
a couple of Borel measures
µ, ν on X and Y . The mass is normalized to 1.
Given A ⊂ X, B ⊂ Y µ(A) denotes the quantity of resources available in
A, ν(B) denotes the resources expected in B.
Rm
Transport plan T
a measure γ on X ×
Y : γ(A × B) is the mass coming from A
and transported in B.
Admissibility: the marginals of γ are
thus fixed (γ is a coupling between µ and
ν)
γ(A × Y ) = µ(A),
ν
γ
γ(X × B) = ν(B)
|x − y| = 0
ν
Γ(µ, ν) : collection of all the admissible
trasnference plan/couplings.
µ
µ
Rm
The cost of a transference plan γ is
X
x,y
Z
c(x, y)T (x, y)
C(γ) :=
c(x, y) dγ(x, y).
X×Y
17
Introduction
The discrete case
Measures
The Euclidean case
Measure data
I
I
I
X, Y discrete spaces
X, Y topological spaces (R, RN , locally compact
spaces, Polish (i.e. complete and separable) spaces, Radon spaces, . . . ): here
RN .
The cost
a (lower-semi) continuous function c : X × Y → R ∪ {+∞}.
The initial and final configurations m(x), n(y)
a couple of Borel measures
µ, ν on X and Y . The mass is normalized to 1.
Given A ⊂ X, B ⊂ Y µ(A) denotes the quantity of resources available in
A, ν(B) denotes the resources expected in B.
Rm
Transport plan T
a measure γ on X ×
Y : γ(A × B) is the mass coming from A
and transported in B.
Admissibility: the marginals of γ are
thus fixed (γ is a coupling between µ and
ν)
γ(A × Y ) = µ(A),
ν
γ
γ(X × B) = ν(B)
|x − y| = 0
ν
Γ(µ, ν) : collection of all the admissible
trasnference plan/couplings.
µ
µ
The cost of a transference plan γ is
X
x,y
Z
c(x, y)T (x, y)
C(γ) :=
c(x, y) dγ(x, y).
X×Y
Rm
Introduction
The discrete case
Measures
The Euclidean case
Measure data
I
I
I
X, Y discrete spaces
X, Y topological spaces (R, RN , locally compact
spaces, Polish (i.e. complete and separable) spaces, Radon spaces, . . . ): here
RN .
The cost
a (lower-semi) continuous function c : X × Y → R ∪ {+∞}.
The initial and final configurations m(x), n(y)
a couple of Borel measures
µ, ν on X and Y . The mass is normalized to 1.
Given A ⊂ X, B ⊂ Y µ(A) denotes the quantity of resources available in
A, ν(B) denotes the resources expected in B.
Rm
Transport plan T
a measure γ on X ×
Y : γ(A × B) is the mass coming from A
and transported in B.
Admissibility: the marginals of γ are
thus fixed (γ is a coupling between µ and
ν)
γ(A × Y ) = µ(A),
ν
γ
γ(X × B) = ν(B)
|x − y| = 0
ν
Γ(µ, ν) : collection of all the admissible
trasnference plan/couplings.
µ
µ
The cost of a transference plan γ is
X
x,y
Z
c(x, y)T (x, y)
C(γ) :=
c(x, y) dγ(x, y).
X×Y
Rm
Introduction
The discrete case
Measures
The Euclidean case
Measure data
I
I
I
X, Y discrete spaces
X, Y topological spaces (R, RN , locally compact
spaces, Polish (i.e. complete and separable) spaces, Radon spaces, . . . ): here
RN .
The cost
a (lower-semi) continuous function c : X × Y → R ∪ {+∞}.
The initial and final configurations m(x), n(y)
a couple of Borel measures
µ, ν on X and Y . The mass is normalized to 1.
Given A ⊂ X, B ⊂ Y µ(A) denotes the quantity of resources available in
A, ν(B) denotes the resources expected in B.
Rm
Transport plan T
a measure γ on X ×
Y : γ(A × B) is the mass coming from A
and transported in B.
Admissibility: the marginals of γ are
thus fixed (γ is a coupling between µ and
ν)
γ(A × Y ) = µ(A),
ν
γ
γ(X × B) = ν(B)
|x − y| = 0
ν
Γ(µ, ν) : collection of all the admissible
trasnference plan/couplings.
µ
µ
Rm
The cost of a transference plan γ is
X
x,y
Z
c(x, y)T (x, y)
C(γ) :=
c(x, y) dγ(x, y).
X×Y
17
Introduction
The discrete case
Measures
The Euclidean case
Measure data
I
I
I
X, Y discrete spaces
X, Y topological spaces (R, RN , locally compact
spaces, Polish (i.e. complete and separable) spaces, Radon spaces, . . . ): here
RN .
The cost
a (lower-semi) continuous function c : X × Y → R ∪ {+∞}.
The initial and final configurations m(x), n(y)
a couple of Borel measures
µ, ν on X and Y . The mass is normalized to 1.
Given A ⊂ X, B ⊂ Y µ(A) denotes the quantity of resources available in
A, ν(B) denotes the resources expected in B.
Rm
Transport plan T
a measure γ on X ×
Y : γ(A × B) is the mass coming from A
and transported in B.
Admissibility: the marginals of γ are
thus fixed (γ is a coupling between µ and
ν)
γ(A × Y ) = µ(A),
ν
γ
γ(X × B) = ν(B)
|x − y| = 0
ν
Γ(µ, ν) : collection of all the admissible
trasnference plan/couplings.
µ
µ
Rm
The cost of a transference plan γ is
X
x,y
Z
c(x, y)T (x, y)
C(γ) :=
c(x, y) dγ(x, y).
X×Y
17
Introduction
The discrete case
Measures
The Euclidean case
Transport and probability
Discrete setting: {x1 , · · · , xN }, {m1 , · · · , mN }
µ=
transport map, yi = t(xi ),
X
t# µ = ν =
mi δyi .
In term of measures
X
ν(B) =
mi =
i:yi ∈ B
X
X
mi =
P
i
mi δxi . t:=
mi = µ(t−1 (B))
i:xi ∈t−1 (B )
i:t(xi )∈B
In general, for every Borel map t : X → Y and every Borel measure µ ∈ P(X)
we define
ν = t# µ ⇔ ν(B) = µ(t−1 (B)).
In probability: P is a probability measure on the probability space Ω,
X : Ω → X is a random variable,
X# P ∈ P(X )
is the law of X,
Change of variable formula:
Z
Z
φ(t(x)) dµ(x) =
X
X# P(A) = P[X ∈ A].
φ(y) dν(y)
Y
Z
Expectation:
Z
E[φ(X)] =
φ(X(ω)) dP(ω) =
Ω
φ(x) d(X# P)
X
18
Introduction
The discrete case
Measures
The Euclidean case
Transport and probability
Discrete setting: {x1 , · · · , xN }, {m1 , · · · , mN }
µ=
transport map, yi = t(xi ),
X
t# µ = ν =
mi δyi .
In term of measures
X
ν(B) =
mi =
i:yi ∈ B
X
X
mi =
P
i
mi δxi . t:=
mi = µ(t−1 (B))
i:xi ∈t−1 (B )
i:t(xi )∈B
In general, for every Borel map t : X → Y and every Borel measure µ ∈ P(X)
we define
ν = t# µ ⇔ ν(B) = µ(t−1 (B)).
In probability: P is a probability measure on the probability space Ω,
X : Ω → X is a random variable,
X# P ∈ P(X )
is the law of X,
Change of variable formula:
Z
Z
φ(t(x)) dµ(x) =
X
X# P(A) = P[X ∈ A].
φ(y) dν(y)
Y
Z
Expectation:
Z
E[φ(X)] =
φ(X(ω)) dP(ω) =
Ω
φ(x) d(X# P)
X
Introduction
The discrete case
Measures
The Euclidean case
Transport and probability
Discrete setting: {x1 , · · · , xN }, {m1 , · · · , mN }
µ=
transport map, yi = t(xi ),
X
t# µ = ν =
mi δyi .
In term of measures
X
ν(B) =
mi =
i:yi ∈ B
X
X
mi =
P
i
mi δxi . t:=
mi = µ(t−1 (B))
i:xi ∈t−1 (B )
i:t(xi )∈B
In general, for every Borel map t : X → Y and every Borel measure µ ∈ P(X)
we define
ν = t# µ ⇔ ν(B) = µ(t−1 (B)).
In probability: P is a probability measure on the probability space Ω,
X : Ω → X is a random variable,
X# P ∈ P(X )
is the law of X,
Change of variable formula:
Z
Z
φ(t(x)) dµ(x) =
X
X# P(A) = P[X ∈ A].
φ(y) dν(y)
Y
Z
Expectation:
Z
E[φ(X)] =
φ(X(ω)) dP(ω) =
Ω
φ(x) d(X# P)
X
Introduction
The discrete case
Measures
The Euclidean case
Transport and probability
Discrete setting: {x1 , · · · , xN }, {m1 , · · · , mN }
µ=
transport map, yi = t(xi ),
X
t# µ = ν =
mi δyi .
In term of measures
X
ν(B) =
mi =
i:yi ∈ B
X
X
mi =
P
i
mi δxi . t:=
mi = µ(t−1 (B))
i:xi ∈t−1 (B )
i:t(xi )∈B
In general, for every Borel map t : X → Y and every Borel measure µ ∈ P(X)
we define
ν = t# µ ⇔ ν(B) = µ(t−1 (B)).
In probability: P is a probability measure on the probability space Ω,
X : Ω → X is a random variable,
X# P ∈ P(X )
is the law of X,
Change of variable formula:
Z
Z
φ(t(x)) dµ(x) =
X
X# P(A) = P[X ∈ A].
φ(y) dν(y)
Y
Z
Expectation:
Z
E[φ(X)] =
φ(X(ω)) dP(ω) =
Ω
φ(x) d(X# P)
X
Introduction
The discrete case
Measures
The Euclidean case
Transport and probability
Discrete setting: {x1 , · · · , xN }, {m1 , · · · , mN }
µ=
transport map, yi = t(xi ),
X
t# µ = ν =
mi δyi .
In term of measures
X
ν(B) =
mi =
i:yi ∈ B
X
X
mi =
P
i
mi δxi . t:=
mi = µ(t−1 (B))
i:xi ∈t−1 (B )
i:t(xi )∈B
In general, for every Borel map t : X → Y and every Borel measure µ ∈ P(X)
we define
ν = t# µ ⇔ ν(B) = µ(t−1 (B)).
In probability: P is a probability measure on the probability space Ω,
X : Ω → X is a random variable,
X# P ∈ P(X )
is the law of X,
Change of variable formula:
Z
Z
φ(t(x)) dµ(x) =
X
X# P(A) = P[X ∈ A].
φ(y) dν(y)
Y
Z
Expectation:
Z
E[φ(X)] =
φ(X(ω)) dP(ω) =
Ω
φ(x) d(X# P)
X
Introduction
The discrete case
Measures
The Euclidean case
The general problem
Problem
Given two Borel probability measures µ ∈ P(X) and ν ∈ P(Y ) find an
admissible trasnference plan γ ∈ Γ(µ, ν) minimizing the toal cost
min
γ ∈Γ(µ,ν)
C(γ)
Kantorovich potentials: functions u : X → R, v : Y → R such that
v(y) − u(x) ≤ c(x, y)
X
x
u(x)m(x) +
X
Z
v(y)n(y)
(Π(c))
Z
P(u, v) :=
u(x) dµ(x) +
X
y
Problem (Dual formulation)
Find a couple of Kantorovich potentials (u, v) ∈ Π(c) maximizing
max P(u, v).
Π(c)
v(y) dν(y)
Y
Introduction
The discrete case
Measures
The Euclidean case
The general problem
Problem
Given two Borel probability measures µ ∈ P(X) and ν ∈ P(Y ) find an
admissible trasnference plan γ ∈ Γ(µ, ν) minimizing the toal cost
min
γ ∈Γ(µ,ν)
C(γ)
Kantorovich potentials: functions u : X → R, v : Y → R such that
v(y) − u(x) ≤ c(x, y)
X
x
u(x)m(x) +
X
Z
v(y)n(y)
(Π(c))
Z
P(u, v) :=
u(x) dµ(x) +
X
y
Problem (Dual formulation)
Find a couple of Kantorovich potentials (u, v) ∈ Π(c) maximizing
max P(u, v).
Π(c)
v(y) dν(y)
Y
Introduction
The discrete case
Measures
The Euclidean case
The general problem
Problem
Given two Borel probability measures µ ∈ P(X) and ν ∈ P(Y ) find an
admissible trasnference plan γ ∈ Γ(µ, ν) minimizing the toal cost
min
γ ∈Γ(µ,ν)
C(γ)
Kantorovich potentials: functions u : X → R, v : Y → R such that
v(y) − u(x) ≤ c(x, y)
X
x
u(x)m(x) +
X
Z
v(y)n(y)
(Π(c))
Z
P(u, v) :=
u(x) dµ(x) +
X
y
Problem (Dual formulation)
Find a couple of Kantorovich potentials (u, v) ∈ Π(c) maximizing
max P(u, v).
Π(c)
v(y) dν(y)
Y
Introduction
The discrete case
Measures
The Euclidean case
A foundamental theorem
Assume that the cost is continuous and feasible, e.g.
ZZ
C(µ ⊗ ν) =
c(x, y) d(µ ⊗ ν)(x, y) < +∞
(sufficient feasibility codition)
X×Y
Theorem
Existence There rexists an optimal transference plan γ opt ∈ Γ(µ, ν) and a
couple of optimal Kantorovich potentials (uopt , vopt ) ∈ Π(c).
Duality
C(γ opt ) =
min C(γ) = max P(u, v) = P(uopt , vopt ).
Γ(µ,ν)
Slackness For every (x, y) ∈ supp(γ) (
Π(c)
connection by a transport ray)
c(x, y) = vopt (y) − uopt (x).
Cyclical monotonicity For every (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) in the support
of γ and every permutation σ : {1, 2, · · · N } → {1, 2, · · · , N }
c(x1 , y1 ) + · · · + c(xN , yN ) ≤ c(x1 , yσ(1) ) + · · · c(xN , yσ(N ) ).
20
Introduction
The discrete case
Measures
The Euclidean case
A foundamental theorem
Assume that the cost is continuous and feasible, e.g.
ZZ
C(µ ⊗ ν) =
c(x, y) d(µ ⊗ ν)(x, y) < +∞
(sufficient feasibility codition)
X×Y
Theorem
Existence There rexists an optimal transference plan γ opt ∈ Γ(µ, ν) and a
couple of optimal Kantorovich potentials (uopt , vopt ) ∈ Π(c).
Duality
C(γ opt ) =
min C(γ) = max P(u, v) = P(uopt , vopt ).
Γ(µ,ν)
Slackness For every (x, y) ∈ supp(γ) (
Π(c)
connection by a transport ray)
c(x, y) = vopt (y) − uopt (x).
Cyclical monotonicity For every (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) in the support
of γ and every permutation σ : {1, 2, · · · N } → {1, 2, · · · , N }
c(x1 , y1 ) + · · · + c(xN , yN ) ≤ c(x1 , yσ(1) ) + · · · c(xN , yσ(N ) ).
20
Introduction
The discrete case
Measures
The Euclidean case
A foundamental theorem
Assume that the cost is continuous and feasible, e.g.
ZZ
C(µ ⊗ ν) =
c(x, y) d(µ ⊗ ν)(x, y) < +∞
(sufficient feasibility codition)
X×Y
Theorem
Existence There rexists an optimal transference plan γ opt ∈ Γ(µ, ν) and a
couple of optimal Kantorovich potentials (uopt , vopt ) ∈ Π(c).
Duality
C(γ opt ) =
min C(γ) = max P(u, v) = P(uopt , vopt ).
Γ(µ,ν)
Slackness For every (x, y) ∈ supp(γ) (
Π(c)
connection by a transport ray)
c(x, y) = vopt (y) − uopt (x).
Cyclical monotonicity For every (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) in the support
of γ and every permutation σ : {1, 2, · · · N } → {1, 2, · · · , N }
c(x1 , y1 ) + · · · + c(xN , yN ) ≤ c(x1 , yσ(1) ) + · · · c(xN , yσ(N ) ).
20
Introduction
The discrete case
Measures
The Euclidean case
A foundamental theorem
Assume that the cost is continuous and feasible, e.g.
ZZ
C(µ ⊗ ν) =
c(x, y) d(µ ⊗ ν)(x, y) < +∞
(sufficient feasibility codition)
X×Y
Theorem
Existence There rexists an optimal transference plan γ opt ∈ Γ(µ, ν) and a
couple of optimal Kantorovich potentials (uopt , vopt ) ∈ Π(c).
Duality
C(γ opt ) =
min C(γ) = max P(u, v) = P(uopt , vopt ).
Γ(µ,ν)
Slackness For every (x, y) ∈ supp(γ) (
Π(c)
connection by a transport ray)
c(x, y) = vopt (y) − uopt (x).
Cyclical monotonicity For every (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) in the support
of γ and every permutation σ : {1, 2, · · · N } → {1, 2, · · · , N }
c(x1 , y1 ) + · · · + c(xN , yN ) ≤ c(x1 , yσ(1) ) + · · · c(xN , yσ(N ) ).
20
Introduction
The discrete case
Measures
The Euclidean case
A foundamental theorem
Assume that the cost is continuous and feasible, e.g.
ZZ
C(µ ⊗ ν) =
c(x, y) d(µ ⊗ ν)(x, y) < +∞
(sufficient feasibility codition)
X×Y
Theorem
Existence There rexists an optimal transference plan γ opt ∈ Γ(µ, ν) and a
couple of optimal Kantorovich potentials (uopt , vopt ) ∈ Π(c).
Duality
C(γ opt ) =
min C(γ) = max P(u, v) = P(uopt , vopt ).
Γ(µ,ν)
Slackness For every (x, y) ∈ supp(γ) (
Π(c)
connection by a transport ray)
c(x, y) = vopt (y) − uopt (x).
Cyclical monotonicity For every (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) in the support
of γ and every permutation σ : {1, 2, · · · N } → {1, 2, · · · , N }
c(x1 , y1 ) + · · · + c(xN , yN ) ≤ c(x1 , yσ(1) ) + · · · c(xN , yσ(N ) ).
20
Introduction
The discrete case
Measures
The Euclidean case
Outline
1 A short historical tour
2 The “discrete” case, duality and linear programming
3 The measure-theoretic setting
4 Euclidean spaces: geometry and transport maps
21
Introduction
The discrete case
Measures
The Euclidean case
Some important questions
I
Uniqueness of the optimal transference plan
I
Integrality
I
Links with the geometry: the cost function (x, y) depends on the distance
between x and y (|x − y| when X = Y = Rd )
I
I
existence of a transport map.
Regularity of Kantorovich potentials
Further information when the measures µ = f L d L d and
ν = gL d L d are absolutely continuous with respect to the
Lebesgue measure:
Z
Z
µ(A) =
f (x) dx, ν(B) =
g(y) dy.
A
B
All these questions are strictly linked!
From now on we will consider the Euclidean case X = Y = Rd .
Introduction
The discrete case
Measures
The Euclidean case
Some important questions
I
Uniqueness of the optimal transference plan
I
Integrality
I
Links with the geometry: the cost function (x, y) depends on the distance
between x and y (|x − y| when X = Y = Rd )
I
Regularity of Kantorovich potentials
I
existence of a transport map.
Further information when the measures µ = f L d L d and
ν = gL d L d are absolutely continuous with respect to the
Lebesgue measure:
Z
Z
µ(A) =
f (x) dx, ν(B) =
g(y) dy.
A
B
All these questions are strictly linked!
From now on we will consider the Euclidean case X = Y = Rd .
22
Introduction
The discrete case
Measures
The Euclidean case
Some important questions
I
Uniqueness of the optimal transference plan
I
Integrality
I
Links with the geometry: the cost function (x, y) depends on the distance
between x and y (|x − y| when X = Y = Rd )
I
Regularity of Kantorovich potentials
I
existence of a transport map.
Further information when the measures µ = f L d L d and
ν = gL d L d are absolutely continuous with respect to the
Lebesgue measure:
Z
Z
µ(A) =
f (x) dx, ν(B) =
g(y) dy.
A
B
All these questions are strictly linked!
From now on we will consider the Euclidean case X = Y = Rd .
22
Introduction
The discrete case
Measures
The Euclidean case
Some important questions
I
Uniqueness of the optimal transference plan
I
Integrality
I
Links with the geometry: the cost function (x, y) depends on the distance
between x and y (|x − y| when X = Y = Rd )
I
Regularity of Kantorovich potentials
I
existence of a transport map.
Further information when the measures µ = f L d L d and
ν = gL d L d are absolutely continuous with respect to the
Lebesgue measure:
Z
Z
µ(A) =
f (x) dx, ν(B) =
g(y) dy.
A
B
All these questions are strictly linked!
From now on we will consider the Euclidean case X = Y = Rd .
22
Introduction
The discrete case
Measures
The Euclidean case
Some important questions
I
Uniqueness of the optimal transference plan
I
Integrality
I
Links with the geometry: the cost function (x, y) depends on the distance
between x and y (|x − y| when X = Y = Rd )
I
Regularity of Kantorovich potentials
I
existence of a transport map.
Further information when the measures µ = f L d L d and
ν = gL d L d are absolutely continuous with respect to the
Lebesgue measure:
Z
Z
µ(A) =
f (x) dx, ν(B) =
g(y) dy.
A
B
All these questions are strictly linked!
From now on we will consider the Euclidean case X = Y = Rd .
22
Introduction
The discrete case
Measures
The Euclidean case
Some important questions
I
Uniqueness of the optimal transference plan
I
Integrality
I
Links with the geometry: the cost function (x, y) depends on the distance
between x and y (|x − y| when X = Y = Rd )
I
Regularity of Kantorovich potentials
I
existence of a transport map.
Further information when the measures µ = f L d L d and
ν = gL d L d are absolutely continuous with respect to the
Lebesgue measure:
Z
Z
µ(A) =
f (x) dx, ν(B) =
g(y) dy.
A
B
All these questions are strictly linked!
From now on we will consider the Euclidean case X = Y = Rd .
22
Introduction
The discrete case
Measures
The Euclidean case
Integrality and transport maps
At the continuous level the integrality condition could be informally stated by
asking that (almost) every point x is the starting point of at most one
transport ray.
We can say that y is connected to x by a transport ray if (x, y) ∈ supp γ; thus we
have
(x, y 1 ), (x, y 2 ) ∈ supp γ ⇒ y1 = y2 =: t(x)
a property which should hold µ-almost everywhere.
t : X → Y is called transport map induced by the plan γ. It satisfies
if A = t−1 (B) then µ(A) = ν(B) = γ(A × B).
Recalling the change-of-variable formula, if µ = f dx, ν = g dy, and t is
differentiable
Z
Z
Z
µ(A) =
f (x) dx = ν(B) =
g(y) dy =
g(t(x))| det Dt(x)| dx
A
B
A
so that
f (x) = g(t(x))| det Dt(x)|.
23
Introduction
The discrete case
Measures
The Euclidean case
Integrality and transport maps
At the continuous level the integrality condition could be informally stated by
asking that (almost) every point x is the starting point of at most one
transport ray.
We can say that y is connected to x by a transport ray if (x, y) ∈ supp γ; thus we
have
(x, y 1 ), (x, y 2 ) ∈ supp γ ⇒ y1 = y2 =: t(x)
a property which should hold µ-almost everywhere.
t : X → Y is called transport map induced by the plan γ. It satisfies
if A = t−1 (B) then µ(A) = ν(B) = γ(A × B).
Recalling the change-of-variable formula, if µ = f dx, ν = g dy, and t is
differentiable
Z
Z
Z
µ(A) =
f (x) dx = ν(B) =
g(y) dy =
g(t(x))| det Dt(x)| dx
A
B
A
so that
f (x) = g(t(x))| det Dt(x)|.
23
Introduction
The discrete case
Measures
The Euclidean case
Integrality and transport maps
At the continuous level the integrality condition could be informally stated by
asking that (almost) every point x is the starting point of at most one
transport ray.
We can say that y is connected to x by a transport ray if (x, y) ∈ supp γ; thus we
have
(x, y 1 ), (x, y 2 ) ∈ supp γ ⇒ y1 = y2 =: t(x)
a property which should hold µ-almost everywhere.
t : X → Y is called transport map induced by the plan γ. It satisfies
if A = t−1 (B) then µ(A) = ν(B) = γ(A × B).
Recalling the change-of-variable formula, if µ = f dx, ν = g dy, and t is
differentiable
Z
Z
Z
µ(A) =
f (x) dx = ν(B) =
g(y) dy =
g(t(x))| det Dt(x)| dx
A
B
A
so that
f (x) = g(t(x))| det Dt(x)|.
23
Introduction
The discrete case
Measures
The Euclidean case
Integrality and transport maps
At the continuous level the integrality condition could be informally stated by
asking that (almost) every point x is the starting point of at most one
transport ray.
We can say that y is connected to x by a transport ray if (x, y) ∈ supp γ; thus we
have
(x, y 1 ), (x, y 2 ) ∈ supp γ ⇒ y1 = y2 =: t(x)
a property which should hold µ-almost everywhere.
t : X → Y is called transport map induced by the plan γ. It satisfies
if A = t−1 (B) then µ(A) = ν(B) = γ(A × B).
Recalling the change-of-variable formula, if µ = f dx, ν = g dy, and t is
differentiable
Z
Z
Z
µ(A) =
f (x) dx = ν(B) =
g(y) dy =
g(t(x))| det Dt(x)| dx
A
B
A
so that
f (x) = g(t(x))| det Dt(x)|.
23
Introduction
The discrete case
Measures
The Euclidean case
Existence and uniqueness of the optimal transport map:
c(x, y) = 12 |x − y|2
Theorem (Brenier (1989))
Siano µ = f dx, ν = g dy, c(x, y) :=
1
|x
2
− y|2
I
There exists a unique optimal transference plan γ and it is associated to a
transport map t.
I
The Kantorovich potentials are perturbations of convex functions; more
precisely
1
|x|2 + u(x) = φ(x)
2
and
1 2
|y| − v(y) = ψ(y)
2
are convex
and ψ is the Legendre transform of φ
ψ(y) = φ∗ (y) = sup hy, xi − φ(x).
x
I
t(x) = ∇φ(x) = x − ∇u(x) is the gradient of a convex function, it is
essentially injective, a.e. differentiable, differenziabile, and Dt = D2 φ is
positive definite.
I
φ solves Monge-Ampére equation
det D2 φ(x) =
f (x)
g(∇φ(x))
24
Introduction
The discrete case
Measures
The Euclidean case
Existence and uniqueness of the optimal transport map:
c(x, y) = 12 |x − y|2
Theorem (Brenier (1989))
Siano µ = f dx, ν = g dy, c(x, y) :=
1
|x
2
− y|2
I
There exists a unique optimal transference plan γ and it is associated to a
transport map t.
I
The Kantorovich potentials are perturbations of convex functions; more
precisely
1
|x|2 + u(x) = φ(x)
2
and
1 2
|y| − v(y) = ψ(y)
2
are convex
and ψ is the Legendre transform of φ
ψ(y) = φ∗ (y) = sup hy, xi − φ(x).
x
I
t(x) = ∇φ(x) = x − ∇u(x) is the gradient of a convex function, it is
essentially injective, a.e. differentiable, differenziabile, and Dt = D2 φ is
positive definite.
I
φ solves Monge-Ampére equation
det D2 φ(x) =
f (x)
g(∇φ(x))
24
Introduction
The discrete case
Measures
The Euclidean case
Existence and uniqueness of the optimal transport map:
c(x, y) = 12 |x − y|2
Theorem (Brenier (1989))
Siano µ = f dx, ν = g dy, c(x, y) :=
1
|x
2
− y|2
I
There exists a unique optimal transference plan γ and it is associated to a
transport map t.
I
The Kantorovich potentials are perturbations of convex functions; more
precisely
1
|x|2 + u(x) = φ(x)
2
and
1 2
|y| − v(y) = ψ(y)
2
are convex
and ψ is the Legendre transform of φ
ψ(y) = φ∗ (y) = sup hy, xi − φ(x).
x
I
t(x) = ∇φ(x) = x − ∇u(x) is the gradient of a convex function, it is
essentially injective, a.e. differentiable, differenziabile, and Dt = D2 φ is
positive definite.
I
φ solves Monge-Ampére equation
det D2 φ(x) =
f (x)
g(∇φ(x))
24
Introduction
The discrete case
Measures
The Euclidean case
Existence and uniqueness of the optimal transport map:
c(x, y) = 12 |x − y|2
Theorem (Brenier (1989))
Siano µ = f dx, ν = g dy, c(x, y) :=
1
|x
2
− y|2
I
There exists a unique optimal transference plan γ and it is associated to a
transport map t.
I
The Kantorovich potentials are perturbations of convex functions; more
precisely
1
|x|2 + u(x) = φ(x)
2
and
1 2
|y| − v(y) = ψ(y)
2
are convex
and ψ is the Legendre transform of φ
ψ(y) = φ∗ (y) = sup hy, xi − φ(x).
x
I
t(x) = ∇φ(x) = x − ∇u(x) is the gradient of a convex function, it is
essentially injective, a.e. differentiable, differenziabile, and Dt = D2 φ is
positive definite.
I
φ solves Monge-Ampére equation
det D2 φ(x) =
f (x)
g(∇φ(x))
24
Introduction
The discrete case
Measures
The Euclidean case
Existence and uniqueness of the optimal transport map:
c(x, y) = 12 |x − y|2
Theorem (Brenier (1989))
Siano µ = f dx, ν = g dy, c(x, y) :=
1
|x
2
− y|2
I
There exists a unique optimal transference plan γ and it is associated to a
transport map t.
I
The Kantorovich potentials are perturbations of convex functions; more
precisely
1
|x|2 + u(x) = φ(x)
2
and
1 2
|y| − v(y) = ψ(y)
2
are convex
and ψ is the Legendre transform of φ
ψ(y) = φ∗ (y) = sup hy, xi − φ(x).
x
I
t(x) = ∇φ(x) = x − ∇u(x) is the gradient of a convex function, it is
essentially injective, a.e. differentiable, differenziabile, and Dt = D2 φ is
positive definite.
I
φ solves Monge-Ampére equation
det D2 φ(x) =
f (x)
g(∇φ(x))
Introduction
The discrete case
Measures
The Euclidean case
Brenier theorem
µ = f dx, ν = g dx are absolutely continuous in Rd .
Rd
The optimal coupling γ ∈ Γo (µ, ν) is
concentrated on the graph of a
ν
cyclically monotone map t:
γ
γ = (i × t)# µ
Z
|x − t(x)|2 dµ(x)
W2 (µ, ν) =
R
ν
d
µ
µ
Rd
t can be recovered by the optimal Kantorovich potentials u − v satisfying
Z
Z
v(y) − u(x) ≤ |x − y|2 , W22 (µ, ν) =
v(y) dν(y) −
u(x) dµ(x)
by
t(x) = x + ∇u(x) = ∇
“1
2
”
|x|2 + u(x) ,
1
|x|2 + u(x)
2
is convex.
25
Introduction
The discrete case
Measures
The Euclidean case
Brenier theorem
µ = f dx, ν = g dx are absolutely continuous in Rd .
Rd
The optimal coupling γ ∈ Γo (µ, ν) is
concentrated on the graph of a
ν
cyclically monotone map t:
t
γ = (i × t)# µ
Z
|x − t(x)|2 dµ(x)
W2 (µ, ν) =
R
ν
d
µ
µ
Rd
t can be recovered by the optimal Kantorovich potentials u − v satisfying
Z
Z
v(y) − u(x) ≤ |x − y|2 , W22 (µ, ν) =
v(y) dν(y) −
u(x) dµ(x)
by
t(x) = x + ∇u(x) = ∇
“1
2
”
|x|2 + u(x) ,
1
|x|2 + u(x)
2
is convex.
25
Introduction
The discrete case
Measures
The Euclidean case
Brenier theorem
µ = f dx, ν = g dx are absolutely continuous in Rd .
Rd
The optimal coupling γ ∈ Γo (µ, ν) is
concentrated on the graph of a
ν
cyclically monotone map t:
t
γ = (i × t)# µ
Z
|x − t(x)|2 dµ(x)
W2 (µ, ν) =
R
ν
d
µ
µ
Rd
t can be recovered by the optimal Kantorovich potentials u − v satisfying
Z
Z
v(y) − u(x) ≤ |x − y|2 , W22 (µ, ν) =
v(y) dν(y) −
u(x) dµ(x)
by
t(x) = x + ∇u(x) = ∇
“1
2
”
|x|2 + u(x) ,
1
|x|2 + u(x)
2
is convex.
25
Introduction
The discrete case
Measures
The Euclidean case
Brenier theorem
µ = f dx, ν = g dx are absolutely continuous in Rd .
Rd
The optimal coupling γ ∈ Γo (µ, ν) is
concentrated on the graph of a
ν
cyclically monotone map t:
t
γ = (i × t)# µ
Z
|x − t(x)|2 dµ(x)
W2 (µ, ν) =
R
ν
d
µ
µ
Rd
t can be recovered by the optimal Kantorovich potentials u − v satisfying
Z
Z
v(y) − u(x) ≤ |x − y|2 , W22 (µ, ν) =
v(y) dν(y) −
u(x) dµ(x)
by
t(x) = x + ∇u(x) = ∇
“1
2
”
|x|2 + u(x) ,
1
|x|2 + u(x)
2
is convex.
25
Introduction
The discrete case
Measures
The Euclidean case
Extensions and applications
I
Strictly convex costs c(x, y) = h(|x − y|): Gangbo-McCann,. . . (’96-)
I
Monge problem c(x, y) = |x − y|: Sudakov (’79), Ambrosio (2000),. . . ,
Bianchini, Champion-De Pascale,. . .
I
Regularity: (Caffarelli,. . . (’92-), Wang, Trudinger, Loeper, Villani,
McCann,)
I
Isoperimetric and functional inequalities: Gromov, Villani, Otto,
McCann, Maggi, Figalli, Pratelli, . . .
I
Hilbert and Wiener spaces: Feyel-Ustunel, Ambrosio-Gigli-S., (’04-), . . .
I
Riemannian manifold, Ricci flow: McCann, Sturm, Villani, Lott,
Topping, Carfora . . . (’98-))
I
...
Introduction
The discrete case
Measures
The Euclidean case
A distance between probability measures
The quadratic cost c(x, y) = |x − y|2 induces a distance between probability
measures with finite quadratic moment (P2 (Rd )): the so-called
Kantorovich-Rubinstein-Wasserstein distance
ZZ
“
”1/2 “
”1/2
W2 (µ, ν) := C(µ, ν)
=
min
|x − y|2 dγ(x, y)
γ ∈Γ(µ,ν )
This distance has a simple interpretation in the case of discrete measures: if
N
N
1 X
1 X
δxk e ν =
δy allora
µ=
N k=1
N k=1 k
W22 (µ, ν) = min
σ
N
1 X
|xk − yσ(k) |2 ,
N k=1
σ permutation of {1, 2, · · · , N }
P2 (Rd ), W2 is a complete and separable metric space, the distance W2 is
associated to the weak convergence of measures:
8Z
Z
>
<
ζ(x) dµn (x) →
ζ(x) dµ(x)
W2 (µn , µ) → 0 ⇔
>
: per ogni ζ ∈ C 0 (Rd ), |ζ(x)| ≤ A|x|2 + B.
27
Introduction
The discrete case
Measures
The Euclidean case
A distance between probability measures
The quadratic cost c(x, y) = |x − y|2 induces a distance between probability
measures with finite quadratic moment (P2 (Rd )): the so-called
Kantorovich-Rubinstein-Wasserstein distance
ZZ
“
”1/2 “
”1/2
W2 (µ, ν) := C(µ, ν)
=
min
|x − y|2 dγ(x, y)
γ ∈Γ(µ,ν )
This distance has a simple interpretation in the case of discrete measures: if
N
N
1 X
1 X
µ=
δxk e ν =
δy allora
N k=1
N k=1 k
W22 (µ, ν) = min
σ
N
1 X
|xk − yσ(k) |2 ,
N k=1
σ permutation of {1, 2, · · · , N }
P2 (Rd ), W2 is a complete and separable metric space, the distance W2 is
associated to the weak convergence of measures:
8Z
Z
>
<
ζ(x) dµn (x) →
ζ(x) dµ(x)
W2 (µn , µ) → 0 ⇔
>
: per ogni ζ ∈ C 0 (Rd ), |ζ(x)| ≤ A|x|2 + B.
27
Introduction
The discrete case
Measures
The Euclidean case
A distance between probability measures
The quadratic cost c(x, y) = |x − y|2 induces a distance between probability
measures with finite quadratic moment (P2 (Rd )): the so-called
Kantorovich-Rubinstein-Wasserstein distance
ZZ
“
”1/2 “
”1/2
W2 (µ, ν) := C(µ, ν)
=
min
|x − y|2 dγ(x, y)
γ ∈Γ(µ,ν )
This distance has a simple interpretation in the case of discrete measures: if
N
N
1 X
1 X
µ=
δxk e ν =
δy allora
N k=1
N k=1 k
W22 (µ, ν) = min
σ
N
1 X
|xk − yσ(k) |2 ,
N k=1
σ permutation of {1, 2, · · · , N }
P2 (Rd ), W2 is a complete and separable metric space, the distance W2 is
associated to the weak convergence of measures:
8Z
Z
>
<
ζ(x) dµn (x) →
ζ(x) dµ(x)
W2 (µn , µ) → 0 ⇔
>
: per ogni ζ ∈ C 0 (Rd ), |ζ(x)| ≤ A|x|2 + B.
27
Introduction
The discrete case
Measures
The Euclidean case
Weak convergence, lower semicontinuity, and compactness
Definition (Weak convergence)
A sequence µn ∈ P(Rm ) converges weakly to µ ∈ P(Rm ) if
Z
Z
lim
ϕ(x) dµn (x) =
ϕ(x) dµ(x) ∀ϕ ∈ Cb0 (Rd )
n→+∞
I
Rm
Rm
Test functions ϕ can be equivalently choosen in Cc0 (Rd ) or in Cc∞ (Rd ), as for
distributional convergence.
I
If Xn → X pointwise, then (Xn )# P * X# P.
I
If ζ : Rd → [0, +∞] is just lower semicontinuous (no boundedness is
required) and µn * µ then
Z
Z
lim inf
ζ(x) dµn (x) ≥
ζ(x) dµ(x).
n→+∞
I
Rd
Rd
Prokhorov Theorem: A set Γ ⊂ P(Rd ) is weakly relatively compact
iff it is tight, i.e.
for every ε > 0 there exists a compact set K b Rd :
µ(Rd \ K) ≤ ε
∀ µ ∈ Γ.
28
Introduction
The discrete case
Measures
The Euclidean case
Weak convergence, lower semicontinuity, and compactness
Definition (Weak convergence)
A sequence µn ∈ P(Rm ) converges weakly to µ ∈ P(Rm ) if
Z
Z
lim
ϕ(x) dµn (x) =
ϕ(x) dµ(x) ∀ϕ ∈ Cb0 (Rd )
n→+∞
I
Rm
Rm
Test functions ϕ can be equivalently choosen in Cc0 (Rd ) or in Cc∞ (Rd ), as for
distributional convergence.
I
If Xn → X pointwise, then (Xn )# P * X# P.
I
If ζ : Rd → [0, +∞] is just lower semicontinuous (no boundedness is
required) and µn * µ then
Z
Z
lim inf
ζ(x) dµn (x) ≥
ζ(x) dµ(x).
n→+∞
I
Rd
Rd
Prokhorov Theorem: A set Γ ⊂ P(Rd ) is weakly relatively compact
iff it is tight, i.e.
for every ε > 0 there exists a compact set K b Rd :
µ(Rd \ K) ≤ ε
∀ µ ∈ Γ.
28
Introduction
The discrete case
Measures
The Euclidean case
Weak convergence, lower semicontinuity, and compactness
Definition (Weak convergence)
A sequence µn ∈ P(Rm ) converges weakly to µ ∈ P(Rm ) if
Z
Z
lim
ϕ(x) dµn (x) =
ϕ(x) dµ(x) ∀ϕ ∈ Cb0 (Rd )
n→+∞
I
Rm
Rm
Test functions ϕ can be equivalently choosen in Cc0 (Rd ) or in Cc∞ (Rd ), as for
distributional convergence.
I
If Xn → X pointwise, then (Xn )# P * X# P.
I
If ζ : Rd → [0, +∞] is just lower semicontinuous (no boundedness is
required) and µn * µ then
Z
Z
lim inf
ζ(x) dµn (x) ≥
ζ(x) dµ(x).
n→+∞
I
Rd
Rd
Prokhorov Theorem: A set Γ ⊂ P(Rd ) is weakly relatively compact
iff it is tight, i.e.
for every ε > 0 there exists a compact set K b Rd :
µ(Rd \ K) ≤ ε
∀ µ ∈ Γ.
28
Introduction
The discrete case
Measures
The Euclidean case
Weak convergence, lower semicontinuity, and compactness
Definition (Weak convergence)
A sequence µn ∈ P(Rm ) converges weakly to µ ∈ P(Rm ) if
Z
Z
lim
ϕ(x) dµn (x) =
ϕ(x) dµ(x) ∀ϕ ∈ Cb0 (Rd )
n→+∞
I
Rm
Rm
Test functions ϕ can be equivalently choosen in Cc0 (Rd ) or in Cc∞ (Rd ), as for
distributional convergence.
I
If Xn → X pointwise, then (Xn )# P * X# P.
I
If ζ : Rd → [0, +∞] is just lower semicontinuous (no boundedness is
required) and µn * µ then
Z
Z
lim inf
ζ(x) dµn (x) ≥
ζ(x) dµ(x).
n→+∞
I
Rd
Rd
Prokhorov Theorem: A set Γ ⊂ P(Rd ) is weakly relatively compact
iff it is tight, i.e.
for every ε > 0 there exists a compact set K b Rd :
µ(Rd \ K) ≤ ε
∀ µ ∈ Γ.
28
Introduction
The discrete case
Measures
The Euclidean case
Optimal couplings and triangular inequality
Lower semicontinuity and tightness: the minimum problem
nZ
o
W22 (µ1 , µ2 ) := min
|x1 − x2 |2 dµ(x1 , x2 ) : µ ∈ Γ(µ1 , µ2 )
m
R
×Rm
is attained: Γo (µ1 , µ2 ) denotes the collection (closed, convex set) of all the
optimal couplings in P2 (Rm × Rm ). In general more than one optimal coupling
could exist.
Connecting a sequence of measures, disintegration and Kolmogorov
theorem:
if µ1,2 ∈ Γo (µ1 , µ2 ), µ2,3 ∈ Γo (µ2 , µ3 ), · · · , µj,j+1 ∈ Γo (µj , µj+1 ) then there
exists a probability measure P and random variables X1 , X2 , X3 , · · · , Xj , Xj+1 , · · ·
such that µ1,2 = (X1 , X2 )# P, · · · , µj,j+1 = (Xj , Xj+1 )# P.
In particular
ˆ
˜
W22 (µj , µj+1 ) = E |Xj − Xj+1 |2
(Xh , Xk )# P ∈ Γ(µh , µk ) but it is not optimal in general
if h, k are not consecutive.
Application: W2 is a distance, triangular inequality.
W2 (µ1 , µ3 ) ≤ W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
“ ˆ
˜”1/2 “ ˆ
˜”1/2
W2 (µ1 , µ3 ) ≤ E |X1 − X3 |2
= E |(X1 − X2 ) + (X2 − X3 )|2
“ ˆ
˜”1/2 “ ˆ
˜”1/2
≤ E |X1 − X2 |2
+ E |X2 − X3 |2
= W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
Introduction
The discrete case
Measures
The Euclidean case
Optimal couplings and triangular inequality
Lower semicontinuity and tightness: the minimum problem
nZ
o
W22 (µ1 , µ2 ) := min
|x1 − x2 |2 dµ(x1 , x2 ) : µ ∈ Γ(µ1 , µ2 )
m
R
×Rm
is attained: Γo (µ1 , µ2 ) denotes the collection (closed, convex set) of all the
optimal couplings in P2 (Rm × Rm ). In general more than one optimal coupling
could exist.
Connecting a sequence of measures, disintegration and Kolmogorov
theorem:
if µ1,2 ∈ Γo (µ1 , µ2 ), µ2,3 ∈ Γo (µ2 , µ3 ), · · · , µj,j+1 ∈ Γo (µj , µj+1 ) then there
exists a probability measure P and random variables X1 , X2 , X3 , · · · , Xj , Xj+1 , · · ·
such that µ1,2 = (X1 , X2 )# P, · · · , µj,j+1 = (Xj , Xj+1 )# P.
In particular
ˆ
˜
W22 (µj , µj+1 ) = E |Xj − Xj+1 |2
(Xh , Xk )# P ∈ Γ(µh , µk ) but it is not optimal in general
if h, k are not consecutive.
Application: W2 is a distance, triangular inequality.
W2 (µ1 , µ3 ) ≤ W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
“ ˆ
˜”1/2 “ ˆ
˜”1/2
W2 (µ1 , µ3 ) ≤ E |X1 − X3 |2
= E |(X1 − X2 ) + (X2 − X3 )|2
“ ˆ
˜”1/2 “ ˆ
˜”1/2
≤ E |X1 − X2 |2
+ E |X2 − X3 |2
= W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
29
Introduction
The discrete case
Measures
The Euclidean case
Optimal couplings and triangular inequality
Lower semicontinuity and tightness: the minimum problem
nZ
o
W22 (µ1 , µ2 ) := min
|x1 − x2 |2 dµ(x1 , x2 ) : µ ∈ Γ(µ1 , µ2 )
m
R
×Rm
is attained: Γo (µ1 , µ2 ) denotes the collection (closed, convex set) of all the
optimal couplings in P2 (Rm × Rm ). In general more than one optimal coupling
could exist.
Connecting a sequence of measures, disintegration and Kolmogorov
theorem:
if µ1,2 ∈ Γo (µ1 , µ2 ), µ2,3 ∈ Γo (µ2 , µ3 ), · · · , µj,j+1 ∈ Γo (µj , µj+1 ) then there
exists a probability measure P and random variables X1 , X2 , X3 , · · · , Xj , Xj+1 , · · ·
such that µ1,2 = (X1 , X2 )# P, · · · , µj,j+1 = (Xj , Xj+1 )# P.
In particular
ˆ
˜
W22 (µj , µj+1 ) = E |Xj − Xj+1 |2
(Xh , Xk )# P ∈ Γ(µh , µk ) but it is not optimal in general
if h, k are not consecutive.
Application: W2 is a distance, triangular inequality.
W2 (µ1 , µ3 ) ≤ W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
“ ˆ
˜”1/2 “ ˆ
˜”1/2
W2 (µ1 , µ3 ) ≤ E |X1 − X3 |2
= E |(X1 − X2 ) + (X2 − X3 )|2
“ ˆ
˜”1/2 “ ˆ
˜”1/2
≤ E |X1 − X2 |2
+ E |X2 − X3 |2
= W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
29
Introduction
The discrete case
Measures
The Euclidean case
Optimal couplings and triangular inequality
Lower semicontinuity and tightness: the minimum problem
nZ
o
W22 (µ1 , µ2 ) := min
|x1 − x2 |2 dµ(x1 , x2 ) : µ ∈ Γ(µ1 , µ2 )
m
R
×Rm
is attained: Γo (µ1 , µ2 ) denotes the collection (closed, convex set) of all the
optimal couplings in P2 (Rm × Rm ). In general more than one optimal coupling
could exist.
Connecting a sequence of measures, disintegration and Kolmogorov
theorem:
if µ1,2 ∈ Γo (µ1 , µ2 ), µ2,3 ∈ Γo (µ2 , µ3 ), · · · , µj,j+1 ∈ Γo (µj , µj+1 ) then there
exists a probability measure P and random variables X1 , X2 , X3 , · · · , Xj , Xj+1 , · · ·
such that µ1,2 = (X1 , X2 )# P, · · · , µj,j+1 = (Xj , Xj+1 )# P.
In particular
ˆ
˜
W22 (µj , µj+1 ) = E |Xj − Xj+1 |2
(Xh , Xk )# P ∈ Γ(µh , µk ) but it is not optimal in general
if h, k are not consecutive.
Application: W2 is a distance, triangular inequality.
W2 (µ1 , µ3 ) ≤ W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
“ ˆ
˜”1/2 “ ˆ
˜”1/2
W2 (µ1 , µ3 ) ≤ E |X1 − X3 |2
= E |(X1 − X2 ) + (X2 − X3 )|2
“ ˆ
˜”1/2 “ ˆ
˜”1/2
+ E |X2 − X3 |2
= W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
≤ E |X1 − X2 |2
29
Introduction
The discrete case
Measures
The Euclidean case
Optimal couplings and triangular inequality
Lower semicontinuity and tightness: the minimum problem
nZ
o
W22 (µ1 , µ2 ) := min
|x1 − x2 |2 dµ(x1 , x2 ) : µ ∈ Γ(µ1 , µ2 )
m
R
×Rm
is attained: Γo (µ1 , µ2 ) denotes the collection (closed, convex set) of all the
optimal couplings in P2 (Rm × Rm ). In general more than one optimal coupling
could exist.
Connecting a sequence of measures, disintegration and Kolmogorov
theorem:
if µ1,2 ∈ Γo (µ1 , µ2 ), µ2,3 ∈ Γo (µ2 , µ3 ), · · · , µj,j+1 ∈ Γo (µj , µj+1 ) then there
exists a probability measure P and random variables X1 , X2 , X3 , · · · , Xj , Xj+1 , · · ·
such that µ1,2 = (X1 , X2 )# P, · · · , µj,j+1 = (Xj , Xj+1 )# P.
In particular
ˆ
˜
W22 (µj , µj+1 ) = E |Xj − Xj+1 |2
(Xh , Xk )# P ∈ Γ(µh , µk ) but it is not optimal in general
if h, k are not consecutive.
Application: W2 is a distance, triangular inequality.
W2 (µ1 , µ3 ) ≤ W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
“ ˆ
˜”1/2 “ ˆ
˜”1/2
W2 (µ1 , µ3 ) ≤ E |X1 − X3 |2
= E |(X1 − X2 ) + (X2 − X3 )|2
“ ˆ
˜”1/2 “ ˆ
˜”1/2
+ E |X2 − X3 |2
= W2 (µ1 , µ2 ) + W2 (µ2 , µ3 )
≤ E |X1 − X2 |2
29
Introduction
The discrete case
Measures
The Euclidean case
“Soft” properties
⇔
Weak convergence +
convergence of the quadratic
moments.
I
Convergence with respect to W
I
Completeness (if one considers all the probability measures in P2 (Rm )).
I
Lower semicontinuity with respect to weak/distributional convergence
I
Convexity (but linear segments are not geodesics!)
I
Existence of (constant speed, minimizing) geodesics connecting arbitrary
measures µ0 , µ1 : they are curves µ : t ∈ [0, 1] 7→ µt s.t.
W2 (µ0 , µ1 ) = L10 [µ],
W2 (µs , µt ) = |t − s| W2 (µ0 , µ1 ).
Introduction
The discrete case
Measures
The Euclidean case
“Soft” properties
⇔
Weak convergence +
convergence of the quadratic
moments.
I
Convergence with respect to W
I
Completeness (if one considers all the probability measures in P2 (Rm )).
I
Lower semicontinuity with respect to weak/distributional convergence
I
Convexity (but linear segments are not geodesics!)
I
Existence of (constant speed, minimizing) geodesics connecting arbitrary
measures µ0 , µ1 : they are curves µ : t ∈ [0, 1] 7→ µt s.t.
W2 (µ0 , µ1 ) = L10 [µ],
W2 (µs , µt ) = |t − s| W2 (µ0 , µ1 ).
30
Introduction
The discrete case
Measures
The Euclidean case
“Soft” properties
⇔
Weak convergence +
convergence of the quadratic
moments.
I
Convergence with respect to W
I
Completeness (if one considers all the probability measures in P2 (Rm )).
I
Lower semicontinuity with respect to weak/distributional convergence
I
Convexity (but linear segments are not geodesics!)
I
Existence of (constant speed, minimizing) geodesics connecting arbitrary
measures µ0 , µ1 : they are curves µ : t ∈ [0, 1] 7→ µt s.t.
W2 (µ0 , µ1 ) = L10 [µ],
W2 (µs , µt ) = |t − s| W2 (µ0 , µ1 ).
30
Introduction
The discrete case
Measures
The Euclidean case
“Soft” properties
⇔
Weak convergence +
convergence of the quadratic
moments.
I
Convergence with respect to W
I
Completeness (if one considers all the probability measures in P2 (Rm )).
I
Lower semicontinuity with respect to weak/distributional convergence
I
Convexity (but linear segments are not geodesics!)
I
Existence of (constant speed, minimizing) geodesics connecting arbitrary
measures µ0 , µ1 : they are curves µ : t ∈ [0, 1] 7→ µt s.t.
W2 (µ0 , µ1 ) = L10 [µ],
W2 (µs , µt ) = |t − s| W2 (µ0 , µ1 ).
30
Introduction
The discrete case
Measures
The Euclidean case
“Soft” properties
⇔
Weak convergence +
convergence of the quadratic
moments.
I
Convergence with respect to W
I
Completeness (if one considers all the probability measures in P2 (Rm )).
I
Lower semicontinuity with respect to weak/distributional convergence
I
Convexity (but linear segments are not geodesics!)
I
Existence of (constant speed, minimizing) geodesics connecting arbitrary
measures µ0 , µ1 : they are curves µ : t ∈ [0, 1] 7→ µt s.t.
W2 (µ0 , µ1 ) = L10 [µ],
W2 (µs , µt ) = |t − s| W2 (µ0 , µ1 ).