Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Non-negative matrix factorization wikipedia , lookup
Four-vector wikipedia , lookup
Gaussian elimination wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Matrix calculus wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Jordan normal form wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Free Probability Theory and Random Matrices Roland Speicher Queen’s University Kingston, Canada We are interested in the limiting eigenvalue distribution of N × N random matrices for N → ∞. Usually, large N distributions are close to the N → ∞ limit, and asymptotic results give good predictions for finite N . 1 We can consider the convergence for N → ∞ of • the eigenvalue distribution of one ”typical” realization of the N × N random matrix • the averaged eigenvalue distribution over many realizations of the N × N random matrices 2 Consider (selfadjoint!) Gaussian N × N random matrix. We have almost sure convergence (convergence of ”typical” realization) of its eigenvalue distribution towards 0.35 0.3 0.3 0.3 0.25 0.25 0.25 0.2 0.15 Probability 0.35 Probability Probability Wigner’s semicircle. 0.2 0.15 0.2 0.15 0.1 0.1 0.1 0.05 0.05 0.05 0 −3 −2 −1 0 N=300 1 2 3 0 −3 −2 −1 0 1 N=1000 2 3 0 −3 −2 −1 0 1 2 N=3000 3 3 0.35 0.35 0.3 0.3 0.3 0.25 0.25 0.25 0.2 0.15 Probability 0.35 Probability Probability Convergence of the averaged eigenvalue distribution happens usually much faster, very good agreement with asymptotic limit for moderate N . 0.2 0.15 0.2 0.15 0.1 0.1 0.1 0.05 0.05 0.05 0 −3 −2 −1 0 N=5 1 2 3 0 −3 −2 −1 0 1 N=20 2 3 0 −3 −2 −1 0 1 2 N=50 trials=5000 4 3 Consider Wishart random matrix A = XX ∗, where X is N × M random matrix with independent Gaussian entries Its eigenvalue distribution converges (averaged and almost surely) towards Marchenko-Pastur distribution. 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 Probability Probability Example: M = 2N , 2000 trials 0.5 0.4 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 −0.5 0 0.5 1 1.5 2 N=10 2.5 3 3.5 4 0 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 N=50 5 We want to consider more complicated situations, built out of simple cases (like Gaussian or Wishart) by doing operations like • taking the sum of two matrices • taking the product of two matrices • taking corners of matrices 6 Note: If different N × N random matrices A and B are involved then the eigenvalue distribution of non-trivial functions f (A, B) (like A + B or AB) will of course depend on the relation between the eigenspaces of A and of B. However: It turns out there is a deterministic and treatable result if • the eigenspaces are in ”generic” position and • if N → ∞ This is the realm of free probability theory. 7 Consider N × N random matrices A and C such that • A has an asymptotic eigenvalue distribution for N → ∞ and C has an asymptotic eigenvalue distribution for N → ∞ • A and C are independent (i.e., entries of A are independent from entries of C) 8 Then eigenspaces of A and of C might still be in special relation (e.g., both A and C could be diagonal). However, consider now A and B := U CU ∗, where U is Haar unitary N × N random matrix. Then, eigenspaces of A and of B are in ”generic” position and the asymptotic eigenvalue distribution of A + B depends only on the asymptotic eigenvalue distribution of A and the asymptotic eigenvalue distribution of B (which is the same as the one of C). 9 We can expect that the asymptotic eigenvalue distribution of f (A, B) depends only on the asymptotic eigenvalue distribution of A and the asymptotic eigenvalue distribution of B if • A and B are independent • one of them is unitarily invariant (i.e., the joint distribution of the entries does not change under unitary conjugation) Note: Gaussian and Wishart random matrices are unitarily invariant 10 Thus: the asymptotic eigenvalue distribution of • the sum of random matrices in generic position A + U CU ∗ • the product of random matrices in generic position AU CU ∗ • corners of unitarily invariant matrices U CU ∗ should only depend on the asymptotic eigenvalue distribution of A and of C. 11 Example: sum of independent Gaussian and Wishart (M = 2N ) random matrices, averaged over 10000 trials 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 −3 −2 −1 0 1 N=5 2 3 4 5 0 −3 −2 −1 0 1 2 3 4 N=50 12 5 Example: product of two independent Wishart (M = 5N ) random matrices, averaged over 10000 trials 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.5 1 1.5 2 N=5 2.5 3 3.5 0 0 0.5 1 1.5 2 2.5 3 N=50 13 3.5 Example: upper left corner of size N/2 × N/2 of a randomly rotated N × N projection matrix, with half of the eigenvalues 0 and half of the eigenvalues 1, averaged over 10000 trials 1.5 1.5 1 1 0.5 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 N=8 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 N=32 14 Problems: • Do we have a conceptual way of understanding these asymptotic eigenvalue distributions? • Is there an algorithm for actually calculating these asymptotic eigenvalue distributions? 15 How do we analyze the eigenvalue distributions? eigenvalue distribution of matrix A = ˆ knowledge of traces of powers, tr(Ak ) 1 k k λ1 + · · · + λN N = tr(Ak ) = ˆ knowledge of expectations of traces of powers, E[tr(Ak )] averaged eigenvalue distribution of random matrix A 16 Stieltjes inversion formula. If one knows the asymptotic moments αk := lim E[tr(Ak )] N →∞ of a random matrix A, then one can get its asymptotic eigenvalue distribution µ as follows: Form Cauchy (or Stieltjes) transform G(z) := ∞ X k=0 αk z k+1 Then: dµ(t) = − 1 lim ℑG(t + iε) ε→0 π 17 Consider random matrices A and B in generic position. We want to understand A + B, i.e., for all k ∈ N h k E tr (A + B) i . But E[tr((A+B)6 )] = E[tr(A6)]+· · ·+E[tr(ABAABA)]+· · ·+E[tr(B 6)], thus we need to understand mixed moments in A and B 18 Use following notation: ϕ(A) := lim E[tr(A)]. N →∞ Question: If A and B are in generic position, can we understand ϕ (An1 B m1 An2 B m2 · · · ) in terms of k ϕ(A ) k∈N and k ϕ(B ) k∈N 19 Example: Consider two Gaussian random matrices A, B which are independent (and thus in generic position). Then the asymptotic mixed moments in A and B ϕ (An1 B m1 An2 B m2 · · · ) are given by # non-crossing/planar pairings of the pattern |A · A{z· · · A} · B | · B{z· · · B} · A | · A{z· · · A} · B | · B{z· · · B} · · · , n1-times m1-times n2-times m2 -times which do not pair A with B 20 Example: ϕ(AABB ABB A) = 2 since there are two such noncrossing pairings: A A B B A B B A A A B B A B B A and Note: each of the pairings connects at least one of the groups An1 , B m1 , An2 , . . . only among itself! and thus: ϕ A2−ϕ(A2)1 B 2−ϕ(B 2)1 A−ϕ(A)1 B 2−ϕ(B 2)1 A−ϕ(A)1 21 =0 In general we have ϕ n n 1 1 A − ϕ(A ) · 1 · m m 1 1 B − ϕ(B ) · 1 · n n 2 2 A − ϕ(A ) · 1 · · · = # non-crossing pairings which do not pair A with B, and for which each group is connected with some other group =0 22 Actual equation for the calculation of the mixed moments ϕ1 (An1 B m1 An2 B m2 · · · ) is different for different random matrix ensembles. However, the relation between the mixed moments, ϕ An1 − ϕ(An1 ) · 1 · B m1 − ϕ(B m1 ) · 1 · · · =0 remains the same for matrix ensembles in generic position and constitutes the definition of freeness. 23 Definition [Voiculescu 1985]: A and B are free (with respect to ϕ) if we have for all n1, m1, n2, · · · ≥ 1 that =0 =0 ϕ ϕ An1 −ϕ(An1 )·1 · B m1 −ϕ(B m1 )·1 · An2 −ϕ(An2 )·1 · · · B n1 −ϕ(B n1 )·1 · Am1 −ϕ(Am1 )·1 · B n2 −ϕ(B n2 )·1 · · · ϕ alternating product in centered words in A and in B = 0 24 Note: freeness is a rule for calculating mixed moments in A and B from the moments of A and the moments of B. Example: ϕ An − ϕ(An)1 B m − ϕ(B m )1 = 0, thus ϕ(AnB m)−ϕ(An·1)ϕ(B m )−ϕ(An)ϕ(1·B m )+ϕ(An )ϕ(B m )ϕ(1·1) = 0, and hence ϕ(AnB m) = ϕ(An) · ϕ(B m). 25 Freeness is a rule for calculating mixed moments, analogous to the concept of independence for random variables. Thus freeness is also called free independence Note: free independence is a different rule from classical independence; free independence occurs typically for non-commuting random variables. Example: ϕ A − ϕ(A)1 · B − ϕ(B)1 · A − ϕ(A)1 · B − ϕ(B)1 = 0, which results in ϕ(ABAB) = ϕ(AA) · ϕ(B) · ϕ(B) + ϕ(A) · ϕ(A) · ϕ(BB) − ϕ(A) · ϕ(B) · ϕ(A) · ϕ(B) 26 Consider A, B free. Then, by freeness, the moments of A+B are uniquely determined by the moments of A and the moments of B. Notation: We say the distribution of A + B is the free convolution of the distribution of A and the distribution of B, µA+B = µA ⊞ µB . 27 In principle, freeness determines this, but the concrete nature of this rule is not clear. Examples: We have = ϕ(A) + ϕ(B) 2 = ϕ(A2) + 2ϕ(A)ϕ(B) + ϕ(B 2) ϕ (A + B)3 = ϕ(A3) + 3ϕ(A2)ϕ(B) + 3ϕ(A)ϕ(B 2 ) + ϕ(B 3 ) = ϕ(A4) + 4ϕ(A3)ϕ(B) + 4ϕ(A2)ϕ(B 2 ) ϕ (A + B)1 ϕ (A + B) 4 ϕ (A + B) + 2 ϕ(A2)ϕ(B)ϕ(B) + ϕ(A)ϕ(A)ϕ(B 2 ) − ϕ(A)ϕ(B)ϕ(A)ϕ(B) + 4ϕ(A)ϕ(B 3 ) + ϕ(B 4 ) 28 To treat these formulas in general, linearize the free convolution by going over from moments (ϕ(Am))m∈N to free cumulants (κm)m∈N. Those are defined by relations like: ϕ(A1) = κ1 ϕ(A2) = κ2 + κ2 1 ϕ(A3) = κ3 + 3κ1κ2 + κ3 1 2 4 ϕ(A4) = κ4 + 4κ1κ3 + 2κ2 2 + 6κ1κ2 + κ1 .. 29 There is a combinatorial structure behind these formulas, the sums are running over non-crossing partitions: ϕ(A2) = + = κ2 + κ1κ1 ϕ(A1) = = κ1 ϕ(A3) = + + + + = κ3 + κ1κ2 + κ2κ1 + κ2κ1 + κ1κ1κ1 ϕ(A4) = + + + + + + + + + 2 4 = κ4 + 4κ1κ3 + 2κ2 + 6κ κ + κ 2 1 2 1 + + + + 30 This combinatorial relation between moments (ϕ(Am ))m∈N and cumulants (κm)m∈N can be translated into generating power series. Put ∞ X ϕ(Am) 1 G(z) = + m+1 z z m=1 Cauchy transform and R(z) = ∞ X κmz m−1 R-transform m=1 Then we have the relation 1 + R(G(z)) = z. G(z) 31 Theorem [Voiculescu 1986, Speicher 1994]: Let A and B be free. Then one has RA+B (z) = RA(z) + RB (z), or equivalently A B κA+B = κ + κ m m m ∀ m. 32 This, together with the relation between Cauchy transform and R-transform and with the Stieltjes inversion formula, gives an effective algorithm for calculating free convolutions, i.e., sums of random matrices in generic position. A GA RA ↓ RA +RB = RA+B B GB GA+B A+B ↑ RB 33 Example: Wigner + Wishart (M = 2N ), trials = 4000 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −3 −2 −1 0 1 2 3 4 5 N=100 34 One has similar analytic description for product. Theorem [Voiculescu 1987, Haagerup 1997, Nica + Speicher 1997]: Put MA(z) := ∞ X ϕ(Am)z m m=0 and define 1 + z <−1> SA(z) := (z) MA z S -transform of A Then: If A and B are free, we have SAB (z) = SA(z) · SB (z). 35 Example: Wishart x Wishart (M = 5N ), trials=1000 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 N=100 36 upper left corner of size N/2 × N/2 of a projection matrix, with N/2 eigenvalues 0 and N/2 eigenvalues 1; trials=5000 1 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 N=64 37 • Free Calculator by Raj Rao and Alan Edelman • A. Nica and R. Speicher: Lectures on the Combinatorics of Free Probability. To appear soon in the London Mathematical Society Lecture Note Series, vol. 335, Cambridge University Press 38 Outlook on other talks around free probability • Anshelevich: ”free” orthogonal and Meixner polynomials • Burda: free random Levy matrices • Chatterjee: concentration of measures and free probability • Demni: free stochastic processes • Kargin: large deviations in free probability • Mingo + Speicher: fluctuations of random matrices • Rashidi Far: operator-valued free probability theory and block matrices 39