Download proof of arithmetic-geometric means inequality

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

System of linear equations wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Matrix calculus wikipedia , lookup

Shapley–Folkman lemma wikipedia , lookup

Transcript
proof of arithmetic-geometric means
inequality using Lagrange multipliers∗
stevecheng†
2013-03-21 19:31:17
As an interesting example of the Lagrange multiplier method, we employ it
to prove the arithmetic-geometric means inequality:
√
n
x1 · · · xn ≤
x1 + · · · + xn
,
n
xi ≥ 0 ,
with equality if and only if all the xi are equal.
To begin with, define f : Rn≥0 → R≥0 by f (x) = (x1 · · · xn )1/n .
Fix a >P
0 (the arithmetic mean), and consider the set M = {x ∈ Rn : xi ≥
n
0 for all i,
i xi = na}. This is clearly a closed and bounded set in R , and
hence is compact. Therefore f on M has both a maximum and minimum.
M is almost a manifold (a differentiable surface), except that it has a boundary consisting of points with xi = 0 for some i. Clearly f attains the minimum
zero on thisP
boundary, and on the “interior” of M , that is M 0 = {x ∈ Rn : xi >
0
0 for all i,
i xi = na}, f is positive. Now M is a manifold, and the maximum
of f on M is evidently a local maximum on M 0 . Hence the Lagrange multiplier
method may be applied.
We have the constraint1 0 = g(x) =
maximize f .
We compute:
P
i
xi − na under which we are to
f (x)
∂f
∂g
=
=λ
= λ.
nxi
∂xi
∂xi
∗ hProofOfArithmeticgeometricMeansInequalityUsingLagrangeMultipliersi created: h201303-21i by: hstevechengi version: h37278i Privacy setting: h1i hExamplei h49-00i h26B12i
† This text is available under the Creative Commons Attribution/Share-Alike License 3.0.
You can reuse this document or portions thereof only if you do so under terms that are
compatible with the CC-BY-SA license.
1 Although g does not constrain that x > 0, this does not really matter — for those who
i
are worried about such things, a rigorous way to escape this annoyance is to simply define
f (x) = 0 to be zero whenever xi ≤ 0 for some i. Clearly the new f cannot have a maximum
at such points, so we may simply ignore the points with xi ≤ 0 when solving the system.
And for the same reason, the fact that f fails to be differentiable at the boundary points is
immaterial.
1
If we take the product, i = 1, . . . , n, of the extreme left- and right- hand sides
of this equation, we obtain
f (x)n
1
=
= λn
nn
nn x1 · · · xn
from which it follows that λ = 1/n (λ = −1/n is obviously inadmissible). Then
substituting back, we get f (x) = xi , which implies that the xi are all equal to
a, The value of f at this point is a.
Since we have obtained a unique solution to the Lagrange multiplier system,
this unique solution must be the unique maximum point of f on M . So f (x) ≤ a
amongst all numbers xi whose arithmetic mean is a, with equality occurring if
and only if all the xi are equal to a.
The case a = 0, which was not included in the above analysis, is trivial.
Proof of concavity
The question of whether the point obtained from solving Lagrange system is
actually a maximum for f was taken care of by our preliminary compactness
argument. Another popular way to determine whether a given stationary point
is a local maximum or minimum is by studying the Hessian of the function at
that point. For the sake of illustrating the relevant techniques, we will prove
again that xi = a is a local maximum for f , this time by showing that the
Hessian is negative definite.
Actually, it turns out to be not much harder to show that f is weakly concave
everywhere on Rn≥0 , not just on M . A plausibility argument for this fact is that
√
the graph of t 7→ n t is concave, and since f also involves an nth root, it should
be concave too. We will prove concavity of f by showing that the Hessian of f
is negative semi-definite.
Since M is a convex 2 set, the restriction of f to M will also be weakly
concave; then we can conclude that the critical point xi = a on M must be a
global minimum on M .
We begin by calculating second-order derivatives:
∂2f
∂
f (x)
1 ∂f
∂ 1
=
=
+ f (x)
∂xj ∂xi
∂xj nxi
nxi ∂xj
∂xj nxi
f (x)
f (x)
= 2
− δij
.
n xi xj
nx2i
2 If M is not convex, then the arguments that follow will not work. In general, a prescription
to determine whether a critical point is a local minimum or maximum can be found in tests
for local extrema in Lagrange multiplier method.
2
(The symbol δij is the Kronecker delta.) Thus the Hessian matrix is:
n

x21

1
f (x)
f (x) 
..

D2 f (x) = 2
− 2 
.


n
xi xj ij
n
n
x2n
Now if we had actual numbers to substitute for the xi , then we can go to a
computer and ask if the matrix is negative definite or not. But we want to
prove global concavity, so we have to employ some clever algebra, and work
directly from the definition.
Let v = (v1 , . . . , vn ) be a test vector. Negative semi-definiteness means
v D2 f (x) v T ≤ 0 for all v. So let us write out:


2
X
X
f (x)
vi vj
vi 
v D2 f (x)v T = 2 
−n
2
n
x
x
x
i j
i
i,j
i


!2
2
X
X
vi 
f (x)
vi
= 2 
−n
.
n
x
x
i
i
i
i
We would be done if we can prove
!2
X
i
wi
≤n
X
i
wi2 ,
wi =
vi
.
xi
(1)
But look, this is just the Cauchy-Schwarz inequality, concerning the dot product
of the vectors (w1 , . . . , wn ) and (1, . . . , 1)! So D2 f (x) is negative semi-definite
everywhere, i.e. f is weakly concave.
Moreover, the case of equality in the Cauchy-Schwarz inequality (??) only
occurs when (w1 , . . . , wn ) is parallel to (1, . . . , 1), i.e. vi = µxi for some scalar
µ. Notice that (1,P. . . , 1) is the normal vector to the tangent space of M , so
(1, . . . , 1) · v = µ i xi 6= 0 unless µ = 0. This means, equality never holds
in (??) if v 6= 0 is restricted to the tangent space of M . In turn, this means
that f is strictly concave on the convex set M , and so xi = a is a strict global
minimum of f on M .
Proof of inequality by symmetry argument
Observe that the maximum point xi = a is pleasantly symmetric. We can
show that the solution has to be symmetric, by exploiting the symmetry of the
function f and the set M .
We have already calculated that f is strictly concave on M ; this was independent of the Lagrange multiplier method. Since f is strictly concave on a
closed convex set, it has a unique maximum there; call it x.
3
Pick any indices i and j, ranging from 1 to n, and form the linear transformation T , which switches the ith and jth coordinates of its argument.
By definition, f (x) ≥ f (ξ) for all ξ ∈ M . But f ◦ T = f , so this means
f (T x) ≥ f (T ξ) for all ξ ∈ M . But M is mapped to itself by the transformation
T , so this is the same as saying that f (T x) ≥ f (ξ) for all ξ ∈ M . But the global
maximum is unique, so T x = x, that is, xi = xj .
Note that the argument in this last proof is extremely general — it takes the
form: if x is the unique maximum of f : M → R, and T is any symmetry of M
such that f ◦ T = f , then T x = x. This kind of symmetry argument was used to
great effect by Jakob Steiner in his famous attempted proof of the isoperimetric
inequality: among all closed curves of a given arc length, the circle maximizes
the area enclosed. However, Steiner did not realize that the assumption that
there is a unique maximizing curve has to be proved. Actually, that is the
hardest part of the proof — if the assumption is known, then one may easily
calculate using the Lagrange multiplier method that the maximizing curve must
be a circle!
4