Download The eigenvalue spacing of iid random matrices

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Rotation matrix wikipedia , lookup

Determinant wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Symmetric cone wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Gaussian elimination wikipedia , lookup

Four-vector wikipedia , lookup

Matrix calculus wikipedia , lookup

Matrix multiplication wikipedia , lookup

Principal component analysis wikipedia , lookup

Jordan normal form wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Transcript
The eigenvalue spacing of iid random matrices
Stephen Ge
UCLA
Introduction
Gaps (or spacing) between eigenvalues of random matrices have had
interest from various perspectives, e.g.:
I
Montgomery’s conjecture relating the normalized gaps between zeros
of the zeta function and eigenvalue gaps of a random matrix
ensemble
I
Simplicity of the spectrum of an Erdös-Rényi graph G (n, p) in
relation to the graph isomorphism problem
We will discuss the smallest gap between eigenvalues of iid random
matrices
What is an iid matrix?
Let ξ be a random variable with mean 0 and variance 1. Let An be an
n × n random matrix with each entry an independent and identically
distributed (iid) copy of ξ. Examples include:
I
ξ ∼ NR (0, 1) or ξ ∼ NC (0, 1)
I
ξ is a Bernoulli random variable with p = 1/2
Theorem (Girko, Bai, Götze-Tikhomirov, Pan-Zhou, Tao-Vu ’08)
n
Let {λk }k=1 be the eigenvalues of An and define
µn :=
n
1X
δ √1 λk
n
n
k=1
to be the empirical spectral distribution of An . Then: µn converges to the
uniform distribution over the (complex) unit disk as n tends to infinity.
Circular law for iid matrices
Plots of eigenvalues of 256 × 256 matrices generated using ξ ∼ NC (0, 1)
(left) and ξ ∼ Bernoulli(1/2) (right)
Simplicity of the spectrum
What can be said about P(A has simple spectrum)? The symmetric case:
Theorem (Tao-Vu ’14)
Let Mn = (ξij ) be a real symmetric random matrix where ξij are jointly
independent for i < j and ξji = ξij . With proper distribution assumptions
on ξij , the spectrum of Mn is simple with probability at least 1 − n−B for
any fixed B > 0.
Now let A be iid as before.
I
When ξ is Gaussian (or any absolutely continuous distribution),
P(A has simple spectrum) = 1
I
When ξ is Bernoulli, a repeat eigenvalue occurs with nonzero
probability. For example, three columns being all multiples of each
other implies 0 is a repeat eigenvalue.
In the iid case, we will obtain simplicity of the spectrum as a corollary of
a polynomial tail bound for the spacing between eigenvalues.
Main Theorem
Theorem (G. ’16)
Let A be an n × n matrix with each entry an iid copy of ξ. Let λ1 , . . . , λn
be the eigenvalues of A and
∆ := min |λi − λj |
i6=j
be the minimum gap between any pair of eigenvalues of A. Then for
large enough C > 0:
P(∆ < n−C ) = o(1)
Asymptotic notation (big O, little o) will always be with respect to the
size of the matrix n → ∞.
Qualitatively, the above theorem implies that iid random matrices have
simple spectrum asymptotically almost surely. The o(1) error can be
made more precise with stronger moment assumptions, e.g. subgaussian
Assumptions for atom random variables
We will suppose the following hold true for the (real-valued) atom
random variable ξ:
I
(mean zero) Eξ = 0
I
(bounded fourth moment) E |ξ| < ∞
I
(non-degeneracy) There exists some K , p > 0 such that
4
sup P(|ξ − u| < 1) ≤ 1 − p
u∈R
and
P(1 ≤ |ξ − ξ 0 | ≤ K ) ≥ p/2
where ξ 0 is an iid copy of ξ
The standard Gaussian and Bernoulli random variables both satisfy
the above assumptions
First reduction
We first argue that it suffices to establish:
√ P λi , λj ∈ B(z, n−C ) and kAkop = O( n) = o(n−2C )
for any z ∈ C with z = O(1).
[start figure on board]
Proof of reduction
From the fourth moment assumption, we may work
√ outside an o(1) event
where the operator norm kAkop is bigger than O( n). Let B be a ball of
radius O(1) that contains all the eigenvalues of √1n A. We will proceed
via a covering argument.
Cover B with balls of radius r = C1 n−C centered at {zα } such that if
λi , λj are two points in B with |λi − λj | < n−C , then there exists zα such
that λi , λj ∈ B(zα , r ). The number of distinct balls necessary is
O(r −2 ) = O(n2C ). We now use a union bound:
P(∆ < n−C ) ≤ o(1) +
X
√ P λi , λj ∈ B(zα , r ) and kAkop = O( n)
α
Assuming each summand is o(n−2C ) gives the o(1) bound for
P(∆ < n−C ).
Dealing with eigenvectors of non Hermitian matrices
Fix z = O(1) and denote
1
N := √ A − zIn
n
We want to bound the probability that there exists eigenvalues λi , λj that
are O(n−C )-close to z. A single eigenvalue within O(n−C ) of z already
implies sn (N) = O(n−C ).
This is not enough for the desired o(n−2C ) bound we seek, even in the
Gaussian case. We would like to have a bound on the second smallest
singular value via the presence of a second eigenvalue. The eigenvectors
associated to λi , λj may not be orthogonal, hence it may not be the case
that sn−1 (N) = O(n−C ).
Orthogonal vectors lemma
Lemma
√
Suppose λi , λj ∈ B(z, n−C ) and kAkop = O( n). Then there exists
orthogonal unit vectors v , w ∈ Cn and a = O(1) such that
Nv = (λi − z)v and Nw = (λj − z)w + av . In particular we have
kNv k , kNw − av k = O(n−C ).
Proof.
Suppose λi 6= λj and let vi , vj be corresponding unit eigenvectors. Take
v = vi and w to be a unit vector orthogonal to v such that v , w span the
same plane as vi , vj . The claims for Nv follow.
Writing w in terms of vi , vj and expanding Nw gives
Nw = (λj − z)w + av
for some a. We finish by estimating
|a| = kav k ≤ kNw k + k(λj − z)w k = O(1)
Almost orthogonal eigenvectors
We now seek to bound the event that there exists orthogonal v , w and
a = O(1) such that
Nv = (λi − z)v and Nw = (λj − z)w + av
where
1
N := √ A − zIn
n
The parameter a is a measure of how orthogonal the associated
eigenvectors are. In the ideal ”almost orthogonal” situation a = O(n−C )
we have kNv k = O(n−C ) and kNw k ≤ kNw − av k + kav k = O(n−C ).
Since v , w are orthogonal unit vectors, we thus have
sn (N), sn−1 (N) = O(n−C ) in this case.
[start running example]
The general case: |a| = Ω(n−C )
Lemma
Let N be an n × n matrix. Suppose there exists orthogonal unit vectors
v , w ∈ Cn and a = Ω(n−C ) such that
Nv = (λi − z)v and Nw = (λj − z)w + av
where |λi − z| , |λj − z| = O(n−C ). Then:
sn (N) = O
n−2C
|a|
Proof.
(via Rudelson)
sn (N) ≤ s2 (N|span(v ,w ) ) ≤ dist(Nv , span(Nw )) ≤
|λi − z| |λj − z|
|a|
Wrap up
In summary:
I If a = O(n−C ), then sn (N) = O(n−C ) and sn−1 (N) = O(n−C )
I
−2C
If a = Ω(n−C ), then sn (N) = O( n|a| ) and sn−1 (N) = O(|a|)
The required o(n−2C ) probability bound can now be obtained from the
following smallest singular values result:
√
√
√ P sn (A − z n) ≤ t, sn−1 (A − z n) ≤ s and kAkop = O( n)
= O(t 2 s 2 nO(1) ) + exp(−cn)
The above result assumes that the atom variables are complex valued
with independent real and imaginary parts. More concretely, take
ξ = ξ1 + iξ2 with ξ1 , ξ2 independent and satisfying all of the assumptions
we had before and let A be an n × n matrix with each entry an iid copy
of ξ.
The assumption that the entries have independent real and imaginary
parts is essential for the quadratic exponent.
Interlude
Complications in the real case
In the following discussion we only work with the least singular value sn .
The quadratic exponent is no longer present if the atom random variables
are assumed to be real valued.
Theorem (Edelman)
Let ξ be the real Gaussian with mean 0 and variance 1 and let N be an
n × n random matrix with each entry an iid copy of ξ. Then:
P(sn (N) ≤ t) ≤ n1/2 t
for any t > 0.
Least singular value results
Theorem (Sankar-Spielman-Teng, Tao-Vu, Rudelson-Vershynin)
Let ξ be a normalized random variable and let N be an n × n random
matrix with each entry an iid copy of ξ. Let M be a deterministic shift
matrix. Then:
P (sn (N + M) ≤ t) = O(n1/2 t) + error
These results with a bound of t instead of t 2 are not sufficient for the full
covering approach. However, they still work if we are not trying to cover
the entire unit disk. For example, using these l.s.v results and the same
argument as before we can establish eigenvalue spacing along the real
line.
[start figure on board]
Invertibility away from the real axis
Off of the real line, the behavior reverts to the complex Gaussian case,
with a correction given by the imaginary part of the shift z.
Theorem (G. ’16)
Let ξ be a real valued random variable satisfying the assumptions at the
start and let A be an n × n random matrix with each entry an iid copy of
ξ. Let z be a complex number with z = O(1) and |=(z)| ≥ δ > 0. Then:
2
√ √
t O(1)
n
+ exp(−cn)
P sn (A − z n) ≤ t and kAkop = O( n) = O
δ
[finish proof on board]
A look inside the least singular value theorem
Bounding the event sn (M) ≤ t is typically reduced to bounding the
distance
dist(Xn , Hn ) < tnO(1)
where Xn is the nth column of M and Hn is the span of the first n − 1
columns. We can further bound the probability by considering a unit
vector X ∗ that is orthogonal to to the first n − 1 columns:
P(dist(Xn , Hn ) < tnO(1) ) ≤ P(hX ∗ , Xn i < tnO(1) )
When Xn consists of complex gaussian entries, the above concentration
probability has no dependence on X ∗ and we can obtain a bound of
O(t 2 nO(1) ).
Structure of normal vector
In the real gaussian case, concentration probability depends heavily on
the structure of the vector X ∗ . For instance, if the coordinates of X ∗ all
lie on a line in C, then the best bound we can obtain is O(tnO(1) ). On
the other hand, a sufficiently ”two-dimensional” X ∗ will cause hX ∗ , Xn i
to spread out in two dimensions, and we may obtain an improved bound
of O(t 2 nO(1) ).
The structure of X ∗ will come
√ from the fact that it is orthogonal to the
first n − 1 columns of A − z n, where A has all real entries. The only
complex part of the equation is the imaginary part of z, which we are
assuming is bounded below by δ. This forces X ∗ to not lie entirely on a
single line.
Thank you!