* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The eigenvalue spacing of iid random matrices
Rotation matrix wikipedia , lookup
Determinant wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Symmetric cone wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Gaussian elimination wikipedia , lookup
Four-vector wikipedia , lookup
Matrix calculus wikipedia , lookup
Matrix multiplication wikipedia , lookup
Principal component analysis wikipedia , lookup
Jordan normal form wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
The eigenvalue spacing of iid random matrices Stephen Ge UCLA Introduction Gaps (or spacing) between eigenvalues of random matrices have had interest from various perspectives, e.g.: I Montgomery’s conjecture relating the normalized gaps between zeros of the zeta function and eigenvalue gaps of a random matrix ensemble I Simplicity of the spectrum of an Erdös-Rényi graph G (n, p) in relation to the graph isomorphism problem We will discuss the smallest gap between eigenvalues of iid random matrices What is an iid matrix? Let ξ be a random variable with mean 0 and variance 1. Let An be an n × n random matrix with each entry an independent and identically distributed (iid) copy of ξ. Examples include: I ξ ∼ NR (0, 1) or ξ ∼ NC (0, 1) I ξ is a Bernoulli random variable with p = 1/2 Theorem (Girko, Bai, Götze-Tikhomirov, Pan-Zhou, Tao-Vu ’08) n Let {λk }k=1 be the eigenvalues of An and define µn := n 1X δ √1 λk n n k=1 to be the empirical spectral distribution of An . Then: µn converges to the uniform distribution over the (complex) unit disk as n tends to infinity. Circular law for iid matrices Plots of eigenvalues of 256 × 256 matrices generated using ξ ∼ NC (0, 1) (left) and ξ ∼ Bernoulli(1/2) (right) Simplicity of the spectrum What can be said about P(A has simple spectrum)? The symmetric case: Theorem (Tao-Vu ’14) Let Mn = (ξij ) be a real symmetric random matrix where ξij are jointly independent for i < j and ξji = ξij . With proper distribution assumptions on ξij , the spectrum of Mn is simple with probability at least 1 − n−B for any fixed B > 0. Now let A be iid as before. I When ξ is Gaussian (or any absolutely continuous distribution), P(A has simple spectrum) = 1 I When ξ is Bernoulli, a repeat eigenvalue occurs with nonzero probability. For example, three columns being all multiples of each other implies 0 is a repeat eigenvalue. In the iid case, we will obtain simplicity of the spectrum as a corollary of a polynomial tail bound for the spacing between eigenvalues. Main Theorem Theorem (G. ’16) Let A be an n × n matrix with each entry an iid copy of ξ. Let λ1 , . . . , λn be the eigenvalues of A and ∆ := min |λi − λj | i6=j be the minimum gap between any pair of eigenvalues of A. Then for large enough C > 0: P(∆ < n−C ) = o(1) Asymptotic notation (big O, little o) will always be with respect to the size of the matrix n → ∞. Qualitatively, the above theorem implies that iid random matrices have simple spectrum asymptotically almost surely. The o(1) error can be made more precise with stronger moment assumptions, e.g. subgaussian Assumptions for atom random variables We will suppose the following hold true for the (real-valued) atom random variable ξ: I (mean zero) Eξ = 0 I (bounded fourth moment) E |ξ| < ∞ I (non-degeneracy) There exists some K , p > 0 such that 4 sup P(|ξ − u| < 1) ≤ 1 − p u∈R and P(1 ≤ |ξ − ξ 0 | ≤ K ) ≥ p/2 where ξ 0 is an iid copy of ξ The standard Gaussian and Bernoulli random variables both satisfy the above assumptions First reduction We first argue that it suffices to establish: √ P λi , λj ∈ B(z, n−C ) and kAkop = O( n) = o(n−2C ) for any z ∈ C with z = O(1). [start figure on board] Proof of reduction From the fourth moment assumption, we may work √ outside an o(1) event where the operator norm kAkop is bigger than O( n). Let B be a ball of radius O(1) that contains all the eigenvalues of √1n A. We will proceed via a covering argument. Cover B with balls of radius r = C1 n−C centered at {zα } such that if λi , λj are two points in B with |λi − λj | < n−C , then there exists zα such that λi , λj ∈ B(zα , r ). The number of distinct balls necessary is O(r −2 ) = O(n2C ). We now use a union bound: P(∆ < n−C ) ≤ o(1) + X √ P λi , λj ∈ B(zα , r ) and kAkop = O( n) α Assuming each summand is o(n−2C ) gives the o(1) bound for P(∆ < n−C ). Dealing with eigenvectors of non Hermitian matrices Fix z = O(1) and denote 1 N := √ A − zIn n We want to bound the probability that there exists eigenvalues λi , λj that are O(n−C )-close to z. A single eigenvalue within O(n−C ) of z already implies sn (N) = O(n−C ). This is not enough for the desired o(n−2C ) bound we seek, even in the Gaussian case. We would like to have a bound on the second smallest singular value via the presence of a second eigenvalue. The eigenvectors associated to λi , λj may not be orthogonal, hence it may not be the case that sn−1 (N) = O(n−C ). Orthogonal vectors lemma Lemma √ Suppose λi , λj ∈ B(z, n−C ) and kAkop = O( n). Then there exists orthogonal unit vectors v , w ∈ Cn and a = O(1) such that Nv = (λi − z)v and Nw = (λj − z)w + av . In particular we have kNv k , kNw − av k = O(n−C ). Proof. Suppose λi 6= λj and let vi , vj be corresponding unit eigenvectors. Take v = vi and w to be a unit vector orthogonal to v such that v , w span the same plane as vi , vj . The claims for Nv follow. Writing w in terms of vi , vj and expanding Nw gives Nw = (λj − z)w + av for some a. We finish by estimating |a| = kav k ≤ kNw k + k(λj − z)w k = O(1) Almost orthogonal eigenvectors We now seek to bound the event that there exists orthogonal v , w and a = O(1) such that Nv = (λi − z)v and Nw = (λj − z)w + av where 1 N := √ A − zIn n The parameter a is a measure of how orthogonal the associated eigenvectors are. In the ideal ”almost orthogonal” situation a = O(n−C ) we have kNv k = O(n−C ) and kNw k ≤ kNw − av k + kav k = O(n−C ). Since v , w are orthogonal unit vectors, we thus have sn (N), sn−1 (N) = O(n−C ) in this case. [start running example] The general case: |a| = Ω(n−C ) Lemma Let N be an n × n matrix. Suppose there exists orthogonal unit vectors v , w ∈ Cn and a = Ω(n−C ) such that Nv = (λi − z)v and Nw = (λj − z)w + av where |λi − z| , |λj − z| = O(n−C ). Then: sn (N) = O n−2C |a| Proof. (via Rudelson) sn (N) ≤ s2 (N|span(v ,w ) ) ≤ dist(Nv , span(Nw )) ≤ |λi − z| |λj − z| |a| Wrap up In summary: I If a = O(n−C ), then sn (N) = O(n−C ) and sn−1 (N) = O(n−C ) I −2C If a = Ω(n−C ), then sn (N) = O( n|a| ) and sn−1 (N) = O(|a|) The required o(n−2C ) probability bound can now be obtained from the following smallest singular values result: √ √ √ P sn (A − z n) ≤ t, sn−1 (A − z n) ≤ s and kAkop = O( n) = O(t 2 s 2 nO(1) ) + exp(−cn) The above result assumes that the atom variables are complex valued with independent real and imaginary parts. More concretely, take ξ = ξ1 + iξ2 with ξ1 , ξ2 independent and satisfying all of the assumptions we had before and let A be an n × n matrix with each entry an iid copy of ξ. The assumption that the entries have independent real and imaginary parts is essential for the quadratic exponent. Interlude Complications in the real case In the following discussion we only work with the least singular value sn . The quadratic exponent is no longer present if the atom random variables are assumed to be real valued. Theorem (Edelman) Let ξ be the real Gaussian with mean 0 and variance 1 and let N be an n × n random matrix with each entry an iid copy of ξ. Then: P(sn (N) ≤ t) ≤ n1/2 t for any t > 0. Least singular value results Theorem (Sankar-Spielman-Teng, Tao-Vu, Rudelson-Vershynin) Let ξ be a normalized random variable and let N be an n × n random matrix with each entry an iid copy of ξ. Let M be a deterministic shift matrix. Then: P (sn (N + M) ≤ t) = O(n1/2 t) + error These results with a bound of t instead of t 2 are not sufficient for the full covering approach. However, they still work if we are not trying to cover the entire unit disk. For example, using these l.s.v results and the same argument as before we can establish eigenvalue spacing along the real line. [start figure on board] Invertibility away from the real axis Off of the real line, the behavior reverts to the complex Gaussian case, with a correction given by the imaginary part of the shift z. Theorem (G. ’16) Let ξ be a real valued random variable satisfying the assumptions at the start and let A be an n × n random matrix with each entry an iid copy of ξ. Let z be a complex number with z = O(1) and |=(z)| ≥ δ > 0. Then: 2 √ √ t O(1) n + exp(−cn) P sn (A − z n) ≤ t and kAkop = O( n) = O δ [finish proof on board] A look inside the least singular value theorem Bounding the event sn (M) ≤ t is typically reduced to bounding the distance dist(Xn , Hn ) < tnO(1) where Xn is the nth column of M and Hn is the span of the first n − 1 columns. We can further bound the probability by considering a unit vector X ∗ that is orthogonal to to the first n − 1 columns: P(dist(Xn , Hn ) < tnO(1) ) ≤ P(hX ∗ , Xn i < tnO(1) ) When Xn consists of complex gaussian entries, the above concentration probability has no dependence on X ∗ and we can obtain a bound of O(t 2 nO(1) ). Structure of normal vector In the real gaussian case, concentration probability depends heavily on the structure of the vector X ∗ . For instance, if the coordinates of X ∗ all lie on a line in C, then the best bound we can obtain is O(tnO(1) ). On the other hand, a sufficiently ”two-dimensional” X ∗ will cause hX ∗ , Xn i to spread out in two dimensions, and we may obtain an improved bound of O(t 2 nO(1) ). The structure of X ∗ will come √ from the fact that it is orthogonal to the first n − 1 columns of A − z n, where A has all real entries. The only complex part of the equation is the imaginary part of z, which we are assuming is bounded below by δ. This forces X ∗ to not lie entirely on a single line. Thank you!