Download Speicher

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-negative matrix factorization wikipedia , lookup

Four-vector wikipedia , lookup

Gaussian elimination wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Matrix calculus wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Jordan normal form wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Matrix multiplication wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Transcript
Free Probability Theory and Random
Matrices
Roland Speicher
Queen’s University
Kingston, Canada
We are interested in the limiting eigenvalue distribution of
N × N random matrices
for
N → ∞.
Usually, large N distributions are close to the N → ∞ limit, and
asymptotic results give good predictions for finite N .
1
We can consider the convergence for N → ∞ of
• the eigenvalue distribution of one ”typical” realization of the
N × N random matrix
• the averaged eigenvalue distribution over many realizations
of the N × N random matrices
2
Consider (selfadjoint!) Gaussian N × N random matrix.
We have almost sure convergence (convergence of ”typical” realization) of its eigenvalue distribution towards
0.35
0.3
0.3
0.3
0.25
0.25
0.25
0.2
0.15
Probability
0.35
Probability
Probability
Wigner’s semicircle.
0.2
0.15
0.2
0.15
0.1
0.1
0.1
0.05
0.05
0.05
0
−3
−2
−1
0
N=300
1
2
3
0
−3
−2
−1
0
1
N=1000
2
3
0
−3
−2
−1
0
1
2
N=3000
3
3
0.35
0.35
0.3
0.3
0.3
0.25
0.25
0.25
0.2
0.15
Probability
0.35
Probability
Probability
Convergence of the averaged eigenvalue distribution happens
usually much faster, very good agreement with asymptotic limit
for moderate N .
0.2
0.15
0.2
0.15
0.1
0.1
0.1
0.05
0.05
0.05
0
−3
−2
−1
0
N=5
1
2
3
0
−3
−2
−1
0
1
N=20
2
3
0
−3
−2
−1
0
1
2
N=50
trials=5000
4
3
Consider Wishart random matrix A = XX ∗, where X is N × M
random matrix with independent Gaussian entries
Its eigenvalue distribution converges (averaged and almost
surely) towards Marchenko-Pastur distribution.
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Probability
Probability
Example: M = 2N , 2000 trials
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
−0.5
0
0.5
1
1.5
2
N=10
2.5
3
3.5
4
0
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
N=50
5
We want to consider more complicated situations, built out of
simple cases (like Gaussian or Wishart) by doing operations like
• taking the sum of two matrices
• taking the product of two matrices
• taking corners of matrices
6
Note: If different N × N random matrices A and B are involved
then the eigenvalue distribution of non-trivial functions f (A, B)
(like A + B or AB) will of course depend on the relation between
the eigenspaces of A and of B.
However: It turns out there is a deterministic and treatable result
if
• the eigenspaces are in ”generic” position and
• if N → ∞
This is the realm of free probability theory.
7
Consider N × N random matrices A and C such that
• A has an asymptotic eigenvalue distribution for N → ∞ and
C has an asymptotic eigenvalue distribution for N → ∞
• A and C are independent (i.e., entries of A are independent
from entries of C)
8
Then eigenspaces of A and of C might still be in special relation
(e.g., both A and C could be diagonal).
However, consider now
A
and
B := U CU ∗,
where U is Haar unitary N × N random matrix.
Then, eigenspaces of A and of B are in ”generic” position and
the asymptotic eigenvalue distribution of A + B depends only on
the asymptotic eigenvalue distribution of A and the asymptotic
eigenvalue distribution of B (which is the same as the one of C).
9
We can expect that the asymptotic eigenvalue distribution of
f (A, B) depends only on the asymptotic eigenvalue distribution
of A and the asymptotic eigenvalue distribution of B if
• A and B are independent
• one of them is unitarily invariant
(i.e., the joint distribution of the entries does not change
under unitary conjugation)
Note: Gaussian and Wishart random matrices are unitarily invariant
10
Thus: the asymptotic eigenvalue distribution of
• the sum of random matrices in generic position
A + U CU ∗
• the product of random matrices in generic position
AU CU ∗
• corners of unitarily invariant matrices U CU ∗
should only depend on the asymptotic eigenvalue distribution of
A and of C.
11
Example: sum of independent Gaussian and Wishart (M = 2N )
random matrices, averaged over 10000 trials
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
−3
−2
−1
0
1
N=5
2
3
4
5
0
−3
−2
−1
0
1
2
3
4
N=50
12
5
Example: product of two independent Wishart (M = 5N ) random matrices, averaged over 10000 trials
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.5
1
1.5
2
N=5
2.5
3
3.5
0
0
0.5
1
1.5
2
2.5
3
N=50
13
3.5
Example: upper left corner of size N/2 × N/2 of a randomly
rotated N × N projection matrix,
with half of the eigenvalues 0 and half of the eigenvalues 1,
averaged over 10000 trials
1.5
1.5
1
1
0.5
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
N=8
0.7
0.8
0.9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
N=32
14
Problems:
• Do we have a conceptual way of understanding
these asymptotic eigenvalue distributions?
• Is there an algorithm for actually calculating these
asymptotic eigenvalue distributions?
15
How do we analyze the eigenvalue distributions?
eigenvalue distribution
of matrix A
=
ˆ
knowledge of
traces of powers,
tr(Ak )
1 k
k
λ1 + · · · + λN
N
=
tr(Ak )
=
ˆ
knowledge of
expectations of
traces of powers,
E[tr(Ak )]
averaged eigenvalue
distribution of
random matrix A
16
Stieltjes inversion formula. If one knows the asymptotic moments
αk := lim E[tr(Ak )]
N →∞
of a random matrix A, then one can get its asymptotic eigenvalue
distribution µ as follows:
Form Cauchy (or Stieltjes) transform
G(z) :=
∞
X
k=0
αk
z k+1
Then:
dµ(t) = −
1
lim ℑG(t + iε)
ε→0
π
17
Consider random matrices A and B in generic position.
We want to understand A + B, i.e., for all k ∈ N
h k
E tr (A + B)
i
.
But
E[tr((A+B)6 )] = E[tr(A6)]+· · ·+E[tr(ABAABA)]+· · ·+E[tr(B 6)],
thus we need to understand
mixed moments in A and B
18
Use following notation:
ϕ(A) := lim E[tr(A)].
N →∞
Question: If A and B are in generic position, can we understand
ϕ (An1 B m1 An2 B m2 · · · )
in terms of
k
ϕ(A )
k∈N
and
k
ϕ(B )
k∈N
19
Example: Consider two Gaussian random matrices A, B which
are independent (and thus in generic position).
Then the asymptotic mixed moments in A and B
ϕ (An1 B m1 An2 B m2 · · · )
are given by
# non-crossing/planar pairings of the pattern
|A · A{z· · · A} · B
| · B{z· · · B} · A
| · A{z· · · A} · B
| · B{z· · · B} · · · ,
n1-times
m1-times
n2-times
m2 -times
which do not pair A with B
20
Example: ϕ(AABB ABB A) = 2 since there are two such noncrossing pairings:
A A B B A B B A
A A B B A B B A
and
Note: each of the pairings connects at least one of the groups
An1 , B m1 , An2 , . . . only among itself!
and thus:
ϕ
A2−ϕ(A2)1
B 2−ϕ(B 2)1
A−ϕ(A)1
B 2−ϕ(B 2)1
A−ϕ(A)1
21
=0
In general we have
ϕ
n
n
1
1
A − ϕ(A ) · 1 ·
m
m
1
1
B − ϕ(B ) · 1 ·
n
n
2
2
A − ϕ(A ) · 1 · · ·
= # non-crossing pairings which do not pair A with B,
and for which each group is connected with some other group
=0
22
Actual equation for the calculation of the mixed moments
ϕ1 (An1 B m1 An2 B m2 · · · )
is different for different random matrix ensembles.
However, the relation between the mixed moments,
ϕ
An1 − ϕ(An1 ) · 1 · B m1 − ϕ(B m1 ) · 1 · · ·
=0
remains the same for matrix ensembles in generic position and
constitutes the definition of freeness.
23
Definition [Voiculescu 1985]: A and B are free (with respect
to ϕ) if we have for all n1, m1, n2, · · · ≥ 1 that
=0
=0
ϕ
ϕ
An1 −ϕ(An1 )·1 · B m1 −ϕ(B m1 )·1 · An2 −ϕ(An2 )·1 · · ·
B n1 −ϕ(B n1 )·1 · Am1 −ϕ(Am1 )·1 · B n2 −ϕ(B n2 )·1 · · ·
ϕ alternating product in centered words in A and in B = 0
24
Note: freeness is a rule for calculating mixed moments in A and
B from the moments of A and the moments of B.
Example:
ϕ
An − ϕ(An)1
B m − ϕ(B m )1
= 0,
thus
ϕ(AnB m)−ϕ(An·1)ϕ(B m )−ϕ(An)ϕ(1·B m )+ϕ(An )ϕ(B m )ϕ(1·1) = 0,
and hence
ϕ(AnB m) = ϕ(An) · ϕ(B m).
25
Freeness is a rule for calculating mixed moments, analogous to the concept of independence for random variables.
Thus freeness is also called free independence
Note: free independence is a different rule from classical independence; free independence occurs typically for non-commuting
random variables.
Example:
ϕ
A − ϕ(A)1 · B − ϕ(B)1 · A − ϕ(A)1 · B − ϕ(B)1
= 0,
which results in
ϕ(ABAB) = ϕ(AA) · ϕ(B) · ϕ(B) + ϕ(A) · ϕ(A) · ϕ(BB)
− ϕ(A) · ϕ(B) · ϕ(A) · ϕ(B)
26
Consider A, B free.
Then, by freeness, the moments of A+B are uniquely determined
by the moments of A and the moments of B.
Notation: We say the distribution of A + B is the
free convolution
of the distribution of A and the distribution of B,
µA+B = µA ⊞ µB .
27
In principle, freeness determines this, but the concrete nature of
this rule is not clear.
Examples: We have
= ϕ(A) + ϕ(B)
2
= ϕ(A2) + 2ϕ(A)ϕ(B) + ϕ(B 2)
ϕ (A + B)3
= ϕ(A3) + 3ϕ(A2)ϕ(B) + 3ϕ(A)ϕ(B 2 ) + ϕ(B 3 )
= ϕ(A4) + 4ϕ(A3)ϕ(B) + 4ϕ(A2)ϕ(B 2 )
ϕ (A + B)1
ϕ (A + B)
4
ϕ (A + B)
+ 2 ϕ(A2)ϕ(B)ϕ(B) + ϕ(A)ϕ(A)ϕ(B 2 )
− ϕ(A)ϕ(B)ϕ(A)ϕ(B)
+ 4ϕ(A)ϕ(B 3 ) + ϕ(B 4 )
28
To treat these formulas in general, linearize the free convolution
by going over from moments (ϕ(Am))m∈N to free cumulants
(κm)m∈N.
Those are defined by relations like:
ϕ(A1) = κ1
ϕ(A2) = κ2 + κ2
1
ϕ(A3) = κ3 + 3κ1κ2 + κ3
1
2
4
ϕ(A4) = κ4 + 4κ1κ3 + 2κ2
2 + 6κ1κ2 + κ1
..
29
There is a combinatorial structure behind these formulas, the
sums are running over non-crossing partitions:
ϕ(A2) =
+
= κ2 + κ1κ1
ϕ(A1) =
= κ1
ϕ(A3) =
+
+
+
+
= κ3 + κ1κ2 + κ2κ1 + κ2κ1 + κ1κ1κ1
ϕ(A4) =
+
+
+
+
+
+
+
+
+
2
4
= κ4 + 4κ1κ3 + 2κ2
+
6κ
κ
+
κ
2
1 2
1
+
+
+
+
30
This combinatorial relation between moments (ϕ(Am ))m∈N and
cumulants (κm)m∈N can be translated into generating power series.
Put
∞
X
ϕ(Am)
1
G(z) = +
m+1
z
z
m=1
Cauchy transform
and
R(z) =
∞
X
κmz m−1
R-transform
m=1
Then we have the relation
1
+ R(G(z)) = z.
G(z)
31
Theorem [Voiculescu 1986, Speicher 1994]:
Let A and B be free. Then one has
RA+B (z) = RA(z) + RB (z),
or equivalently
A
B
κA+B
=
κ
+
κ
m
m
m
∀ m.
32
This, together with the relation between Cauchy transform and
R-transform and with the Stieltjes inversion formula, gives an
effective algorithm for calculating free convolutions, i.e., sums
of random matrices in generic position.
A
GA
RA
↓
RA +RB = RA+B
B
GB
GA+B
A+B
↑
RB
33
Example: Wigner + Wishart (M = 2N ), trials = 4000
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−3
−2
−1
0
1
2
3
4
5
N=100
34
One has similar analytic description for product.
Theorem [Voiculescu 1987, Haagerup 1997, Nica +
Speicher 1997]:
Put
MA(z) :=
∞
X
ϕ(Am)z m
m=0
and define
1 + z <−1>
SA(z) :=
(z)
MA
z
S -transform of A
Then: If A and B are free, we have
SAB (z) = SA(z) · SB (z).
35
Example: Wishart x Wishart (M = 5N ), trials=1000
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
1
1.5
2
2.5
3
3.5
N=100
36
upper left corner of size N/2 × N/2 of a projection matrix,
with N/2 eigenvalues 0 and N/2 eigenvalues 1; trials=5000
1
0.8
0.6
0.4
0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
N=64
37
• Free Calculator by Raj Rao and Alan Edelman
• A. Nica and R. Speicher: Lectures on the Combinatorics of
Free Probability.
To appear soon in the London Mathematical Society Lecture
Note Series, vol. 335, Cambridge University Press
38
Outlook on other talks around free probability
• Anshelevich: ”free” orthogonal and Meixner polynomials
• Burda: free random Levy matrices
• Chatterjee: concentration of measures and free probability
• Demni: free stochastic processes
• Kargin: large deviations in free probability
• Mingo + Speicher: fluctuations of random matrices
• Rashidi Far: operator-valued free probability theory and block
matrices
39