* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download - Free Documents
Survey
Document related concepts
Determinant wikipedia , lookup
Gaussian elimination wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Four-vector wikipedia , lookup
Jordan normal form wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Matrix calculus wikipedia , lookup
Matrix multiplication wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Brouwer fixed-point theorem wikipedia , lookup
Transcript
a r X i v . v m a t h . S P J a n RANDOM COVARIANCE MATRICES UNIVERSALITY OF LOCAL STATISTICS OF EIGENVALUES TERENCE TAO AND VAN VU Abstract. We study the eigenvalues values of the covariance matrix n M M of a large rectangular matrix M Mn,p ij ipjn whose entries are iid random variables of mean zero, variance one, and having nite C th moment for some suciently large constant C . The main result of this paper is a Four Moment Theorem for iid covariance matrices analogous to the Four Moment Theorem for Wigner matrices estab lished by the authors in see also . Indeed, our arguments here draw heavily from those in . As in that paper, we can use this theorem together with existing results to establish universality of local statistics of eigenvalues under mild conditions. As a byproduct of our arguments, we also extend our previous results on random hermitian matrices to the case in which the entries have nite C th moment rather than exponential decay. . Introduction .. The model. The main purpose of this paper is to study the asymptotic local eigenvalue statistics of covariance matrices of large random matrices. Let us rst x the matrix ensembles that we will be studying. Denition Random covariance matrices. Let n be a large integer parameter going o to innity, and let p pn be another integer parameter such that p n and lim n p/n y for some lt y . We let M M n,p ij ip,jn be a random p n matrix, whose distribution of course is allowed to depend on n. We say that the matrix ensemble M obeys condition C with some exponent C if the random variables ij are jointly independent, have mean zero and variance , and obey the moment condition sup i,j E ij C C for some constant C independent of n, p. We say that the matrix M is iid if the ij are identically and independently distributed with law independent of n. Given such a matrix, we form the nn covariance matrix W W n,p n M M. This matrix has rank p and so the rst n p eigenvalues are trivial we order the necessarily positive remaining eigenvalues of these matrices counting multiplic ity as W... p W. T. Tao is supported by a grant from the MacArthur Foundation, by NSF grant DMS, and by the NSF Waterman award. V. Vu is supported by research grants DMS and AFOSARFA. TERENCE TAO AND VAN VU We often abbreviate i W as i . Remark . In this paper we will focus primarily on the case y , but several of our results extend to other values of y as well. The case p gt n can of course be deduced from the p lt n case after some minor notational changes by transposing the matrix M, which does not aect the nontrivial eigenvalues of the covariance matrix. One can also easily normalise the variance of the entries to be some other quantity than if one wishes. Observe that the quantities i n / i can be interpreted as the nontrivial singular values of the original matrix M, and ,..., p can also be interpreted as the eigenvalues of the pp matrix n MM . It will be convenient to exploit all three of these spectral interpretations of ,..., p in this paper. Condition C is analogous to Condition C for Wignertype matrices in , but with the exponential decay hypothesis relaxed to polynomial decay only. The wellknown MarchenkoPastur law governs the bulk distribution of the eigen values ,..., p of W Theorem MarchenkoPastur law. Assume Condition C with C , and suppose that p/n y for some lt y . Then for any x gt , the random variables p ip i Wx converge in probability to x MP,y x dx, where MP,y x xy b xx a a,b x and a y b y . When furthermore M is iid, one can also obtain the case C . Proof. For C , see , for C gt , see for the C iid case, see . Further results are known on the rate of convergence see . In this paper we are concerned instead with the local eigenvalue statistics. A model case is the complex Wishart ensemble, in which the ij are iid variables which are complex gaussians with mean zero and variance . In this case, the distribution of the eigenvalues ,..., n of W can be explicitly computed as a special case of the Laguerre unitary ensemble. For instance, when p n, the joint distribution is given by the density function n ,..., n cn iltjn i j exp n i i / for some explicit normalization constant cn. Very similarly to the GUE case, one can use this explicit formula to directly com pute several local statistics, including the distribution of the largest and smallest eigenvalues , the correlation functions etc. Also in similarity to the GUE UNIVERSALITY FOR COVARIANCE MATRICES case, it is widely conjectured that these statistics hold for a much larger class of random matrices. For some earlier results in this direction, we refer to , , , and the references therein. The goal of this paper is to establish a Four Moment theorem for random covari ance matrices, as an analogue of a recent result in . This theorem asserts that all local statistics of the eigenvalues of W n is determined by the rst four moments of the entries. .. The Four Moment Theorem. We rst need some denitions. Denition Frequent events. Let E be an event depending on n. E holds asymptotically almost surely if PE o. E holds with high probability if PE On c for some constant c gt independent of n. E holds with overwhelming probability if PE O C n C for every constant C gt or equivalently, that PE explog n. E holds almost surely if PE . Denition Matching. We say that two complex random variables , match to order k for some integer k if one has ERe m Im l ERe m Im l for all m, l with m l k. Our main result is Theorem Four Moment Theorem. For suciently small c gt and suciently large C gt C would suce the following holds for every lt lt and k . Let M ij ip,jn and M ij ip,jn be matrix ensembles obeying condition C with the the indicated constant C , and assume that for each i, j that ij and ij match to order . Let W, W be the associated covariance matrices. Assume also that p/n y for some lt y . Let G R k R be a smooth function obeying the derivative bounds j Gx n c for all j and x R k . Then for any i lt i lt i k n, and for n suciently large depending on , k, c we have EGn i W, . . . , n i k W EGn i W ,...,n i k W n c . If ij and ij only match to order rather than , the conclusion still holds provided that one strengthens to j Gx n jc for all j and x R k and any c gt , provided that c is suciently small depending on c . TERENCE TAO AND VAN VU This is an analogue of , Theorem for covariance matrices, with the main dierence being that the exponential decay condition from , Theorem is dropped, being replaced instead by the high moment condition in C. The main results of , can all be strengthened similarly, using the same argument. The value C is ad hoc, and we make no attempt to optimize this constant. Remark . The reason that we restrict the eigenvalues to the bulk of the spectrum p i p is to guarantee that the density function MP is bounded away from zero. In view of the results in , we expect that the result extends to the edge of the spectrum as well. In particular, in view of the results in , it is likely that the hard edge asymptotics of Forrester can be extended to a wider class of ensembles. We will pursue this issue elsewhere. .. Applications. One can apply Theorem in a similar way as its counterpart , Theorem in order to obtain universality results for large classes of random matrices, usually with less than four moment assumption. Let us demonstrate this through an example concerning the universality of the sine kernel. Using the explicit formula , Nagao and Wadati established the following result for the complex Wishart ensemble. Theorem Sine kernel for Wishart ensemble. Let k be an integer, let fR k C be a continuous function with compact support and symmetric with respect to permutations, and let lt u lt we assume all these quantities are independent of n. Assume that p nO thus y , and that W is given by the complex Wishart ensemble. Let ,..., p be the nontrivial eigenvalues of W. Then the quantity E i,...,i k p fp MP, u i ,...,p MP, u i k converges as n to R m ft ,...,t k detKt i ,t j i,jk dt . . . dt k where Kx, y sinxy xy is the sine kernel. Remark . The results in allowed f to be bounded measurable rather than continuous, but when we consider discrete ensembles later, it will be important to keep f continuous. Returning to the bulk, the following extension was established by Ben Arous and Peche, as a variant of Johanssons result for random hermitian matrices. We say that a complex random variable of mean zero and variance one is Gauss divisible if has the same distribution as t / t / for some lt t lt and some independent random variables , of mean zero and variance , with distributed according to the complex gaussian. See Section . for the asymptotic notation we will be using. UNIVERSALITY FOR COVARIANCE MATRICES Theorem Sine kernel for Gaussian divisible ensemble. Theorem which is for the Wishart ensemble and for p nO can be extended to the case when p n On / so y is still , and when M is an iid matrix obeying condition C with C , and with the ij gauss divisible. Using Theorem and Theorem in exactly the same way we used , Theorem and Johanssons theorem to establish , Theorem , we obtain the following Theorem Sine kernel for more general ensembles. Theorem can be extended to the case when p n On / so y is still , and when M is an iid matrix obeying condition C with C suciently large C would suce, and with ij have support on at least three points. Remark . It was shown in , Corollary that if the real and imaginary parts of a complex random variable were independent with mean zero and variance one, and both were supported on at least three points, then matched to order with a gauss divisible random variable with nite C moment indeed, if one inspects the convexity argument used to solve the moment problem in , Lemma , the gauss divisible random variable could be taken to be the sum of a gaussian variable and a discrete variable, and in particular is thus exponentially decaying. The arguments in this paper will be a nonsymmetric version of those in . Thus, for instance, everywhere eigenvectors are used in , left and right singular vectors will be used instead and while in one often removes the last row and column of a Hermitian n n matrix to make a n n submatrix, we will instead remove just the last row of a p n matrix to form a p n matrix. One way to connect the singular value problem for iid matrices to eigenvalue problems for Wigner matrices is to form the augmented matrix M M M which is a pnpn Hermitian matrix, which has eigenvalues M, . . . , p M together with n p eigenvalues at zero. Thus one can view the singular values of an iid matrix as being essentially given by the eigenvalues of a slightly larger Her mitian matrix which is of Wigner type except that the entries have been zeroed out on two diagonal blocks. We will take advantage of thus augmented perspective in some parts of the paper particularly when we wish to import results from as black boxes, but in other parts it will in fact be more convenient to work with M directly. In particular, the fact that many of the entries in are zero and in particular, have zero mean and variance seems to make it dicult to apply parts of the arguments particularly those that are probabilistic in nature, rather than deterministic in directly to this matrix. Nevertheless one can view this connection as a heuristic explanation as to why so much of the machinery in the Hermitian eigenvalue problem can be transferred to the nonHermitian singular value problem. TERENCE TAO AND VAN VU .. Extensions. In a very recent work, Erd os, Schlein, Yau and Yin ex tended Theorem to a large class of matrices, assuming that the distribution of the entries ij is suciently smooth. While their results do not apply for entries with discrete distributions, it allows one to extend Theorem to the case when t is a negative power of n. Given this, one can use the argument in to remove the requirement that the real and imaginary parts of ij be supported on at least three points. We can also have the following analogue of , Theorem . Theorem Universality of averaged correlation function. Fix gt and u such that lt u lt u lt . Let k and let f R k R be a continuous, compactly supported function, and let W W n,n be a random covariance matrix. Then the quantity u u R k f ,..., k MP u k p k n u n MP u ,...,u k n MP u d ...d k du converges as n to R k f ,..., k detK i , j k i,j d ...d k , where Kx, y is the Dyson sine kernel Kx, y sinx y xy . The details are more or less the same as in and omitted. .. Acknowledgments. We thank HorngTzer Yau for references. . The gap property and the exponential decay removing trick The following property plays an important role in . Denition Gap property. Let M be a matrix ensemble obeying condition C. We say that M obeys the gap property if for every , c gt independent of n, and for every p i p, one has i W i Wn c with high probability. The implied constants in this statement can depend of course on and c. As an analogue of , Theorem , we prove the following theorem, using the same method with some modications. Theorem Gap theorem. Let M ij ip,jn obey condition C for some C , and suppose that the coecients ij are exponentially decaying in the sense that P ij t C expt for all t C for all i, j and some constants C, C gt . Then M obeys the gap property. Even more recently, a similar result was also established by Peche. UNIVERSALITY FOR COVARIANCE MATRICES Next, we have the following analogue of , Theorem . Theorem Four Moment Theorem with Gap assumption. For suciently small c gt and suciently large C gt C would suce the fol lowing holds for every lt lt and k . Let M ij ip,jn and M ij ip,jn be matrix ensembles obeying condition C with the indi cated constant C , and assume that for each i, j that ij and ij match to order . Let W, W be the associated covariance matrices. Assume also that M and M obeys the gap property, and that p/n y for some lt y . Let G R k R be a smooth function obeying the derivative bounds j Gx n c for all j and x R k . Then for any i lt i lt i k n, and for n suciently large depending on , k, c we have EGn i W, . . . , n i k W EGn i W ,...,n i k W n c . If ij and ij only match to order rather than , the conclusion still holds provided that one strengthens to j Gx n jc for all j and x R k and any c gt , provided that c is suciently small depending on c . This theorem is weaker than Theorem , as we assume the gap property. The dierence comparing to , Theorem is that in the latter we assume exponential decay rather than the gap property. However, this dierence is only a formality, since in the proof of , Theorem , the only place we used exponential decay is to prove the gap property via Theorem , Theorem . The new step that enables us to remove the gap property altogether is the following theorem, which asserts that the gap property is already guaranteed by condition C, given C bounded third moment. Theorem Gap theorem. Assume that M ij ip,jn satises condi tion C with C . Then M obeys the gap property. Theorem follows directly from Theorems and . The core of the proof of Theorem is Theorem , which allows us to insert information such as the gap property into the test function G. We will also use this theorem, combining with Theorem and Lemma , to prove Theorem . The rest of the paper is organized as follows. The next three sections are devoted to technical lemmas. The proofs of Theorems and are presented in Section , assuming Theorems and . The proofs of these two theorems are presented in Sections and , respectively. TERENCE TAO AND VAN VU . The main technical lemmas Important note. The arguments in this paper are very similar to, and draw heavily from, the previous paper of the authors. We recommend therefore that the reader be familiar with that paper rst, before reading the current one. In the proof of the Four Moment Theorem as well as the Gap Theorem for n n Wigner matrices in , a crucial ingredient was a variant of the Delocaliza tion Theorem of Erd os, Schlein, and Yau , , . This result asserts assuming uniformly exponentially decaying distribution for the coecients that with over whelming probability, all the unit eigenvectors of the Wigner matrix have coe cients On /o thus the energy of the eigenvector is spread out more or less uniformly amongst the n coecients. When one just assumes uniformly bounded C moment rather than uniform exponential decay, the bound becomes On /O/C instead where the implied constant in the exponent is uniform in C , of course. Similarly, to prove the Four Moment and Gap Theorems in this paper, we will need a Delocalization theorem for the singular vectors of the matrix M. We dene a right singular vector u i resp. left singular vector v i with singular value i M n i W / to be an eigenvector of W n M M resp. W n MM with eigenvalue i . Observe from the singular value decomposition that one can nd orthonormal bases u ,...,u p C n and v ,...,v p C p for the corange kerM of M and of C p respectively, such that Mu i i v i and M v i i u i . In the generic case when the singular values are simple i.e. lt lt . . . p , the unit singular vectors u i ,v i are determined up to multiplication by a complex phase e i . We will establish the following Erd osSchleinYau type delocalization theorem analogous to , Proposition , which is an essential ingredient to Theorems , and is also of some independent interest Theorem Delocalization theorem. Suppose that p/n y for some lt y , and let M obey condition C for some C . Suppose further that that ij K almost surely for some K gt which can depend on n and all i, j, and that the probability distribution of M is continuous. Let gt be independent of n. Then with overwhelming probability, all the unit left and right singular vectors of M with eigenvalue i in the interval a, b with a, b dened in have all coecients uniformly of size OKn / log n. Strictly speaking, u , . . . , up will span a slightly larger space than the corange if some of the singular values , . . . , p vanish. However, we shall primarily restrict attention to the generic case in which this vanishing does not occur. UNIVERSALITY FOR COVARIANCE MATRICES The factors K log n can probably be improved slightly, but anything which is polynomial in K and log n will suce for our purposes. Observe that if M obeys condition C, then each event ij K with K n /C say occurs with prob ability On . Thus in practice, we will be able to apply the above theorem with K n /C without diculty. The continuity hypothesis is a technical one, imposed so that the singular values are almost surely simple, but in practice we will be able to eliminate this hypothesis by a limiting argument as none of the bounds will depend on any quantitative measure of this continuity. As with other proofs of delocalization theorems in the literature, Theorem is in turn deduced from the following eigenvalue concentration bound analogous to , Proposition Theorem Eigenvalue concentration theorem. Let the hypotheses be as in Theorem , and let gt be independent of n. Then for any interval I a, b of length I K log n/n, one has with overwhelming probability uniformly in I that N I p I MP,y x dx p, where N I ip i WI is the number of eigenvalues in I. We remark that a very similar result with slightly dierent hypotheses on the parameters and on the underlying random variable distributions was recently es tablished in , Corollary .. We isolate one particular consequence of Theorem also established in Corollary Concentration of the bulk. Let the hypotheses be as in Theorem . Then there exists gt independent of n such that with overwhelming probability, one has a i Wb for all p i p. Proof. From Theorem , we see with overwhelming probability that the number of eigenvalues in a ,b is at least p, if is suciently small depending on . The claim follows. .. Notation. Throughout this paper, n will be an asymptotic parameter going to innity. Some quantities e.g. , y and C will remain other independent of n, while other quantities e.g. p, or the matrix M will depend on n. All statements here are understood to hold only in the asymptotic regime when n is suciently large depending on all quantities that are independent of n. We write X OY , Y X, X Y , or Y X if one has X CY for all suciently large n and some C independent of n. Note however that C is allowed to depend on other quantities independent of n, such as and y, unless otherwise stated. We write X oY if X cnY where cn as n . We write X Y TERENCE TAO AND VAN VU or X Y if X Y X, thus for instance if p/n y for some lt y then p n. We write for the complex imaginary unit, in order to free up the letter i to denote an integer usually between and n. We write X for the length of a vector X, A A op for the operator norm of a matrix A, and A F trAA / for the Frobenius or HilbertSchmidt norm. . Basic tools .. Tools from linear algebra. In this section we recall some basic identities and inequalities from linear algebra which will be used in this paper. We begin with the Cauchy interlacing law and the Weyl inequalities Lemma Cauchy interlacing law. Let p n. i If A n is an n n Hermitian matrix, and A n is an n n minor, then i A n i A n i A n for all i lt n. ii If M n,p is a pn matrix, and M n,p is an pn minor, then i M n,p i M n,p i M n,p for all i lt p. iii If p lt n, if M n,p is a p n matrix, and M n,p is a p n minor, then i M n,p i M n,p i M n,p for all i p, with the understanding that M n,p . For p n, one can of course use the transpose of ii instead. Proof. Claim i follows from the minimax formula i A n inf V dimV i sup vV v v A n v where V ranges over idimensional subspaces in C n . Similarly, ii and iii follow from the minimax formula i M n,p inf V dimV inp sup vV v M n,p v. Lemma Weyl inequality. Let p n. If A, B are n n Hermitian matrices, then i A i BAB op for all i n. If M, N are p n matrices, then i M i NMN op for all i p. Proof. This follows from the same minimax formulae used to establish Lemma . UNIVERSALITY FOR COVARIANCE MATRICES Remark . One can also deduce the singular value versions of Lemmas , from their Hermitian counterparts by using the augmented matrices . We omit the details. We have the following elementary formula for a component of an eigenvector of a Hermitian matrix, in terms of the eigenvalues and eigenvectors of a minor Lemma Formula for coordinate of an eigenvector. Let A n A n X X a be a nn Hermitian matrix for some a R and X C n , and let v x be a unit eigenvector of A n with eigenvalue i A n , where x C and v C n . Suppose that none of the eigenvalues of A n are equal to i A n . Then x n j j A n i A n u j A n X , where u A n ,...,u n A n C n is an orthonormal eigenbasis correspond ing to the eigenvalues A n ,..., n A n of A n . Proof. See e.g. , Lemma . This implies an analogous formula for singular vectors Corollary Formula for coordinate of a singular vector. Let p, n , and let M p,n M p,n X be a p n matrix for some X C p , and let u x be a right unit singular vector of M p,n with singular value i M p,n , where x C and u C n . Suppose that none of the singular values of M p,n are equal to i M p,n . Then x minp,n j j Mp,n jMp,n iMp,n v j M p,n X , where v M p,n ,...,v minp,n M p,n C p is an orthonormal system of left singular vectors corresponding to the nontrivial singular values of M p,n . In a similar vein, if M p,n M p,n Y for some Y C n , and vy is a left unit singular vector of M p,n with singular value i M p,n , where y C and v C p , and none of the singular values of M p,n are equal to i M p,n , then y minp,n j j Mp,n jMp,n iMp,n u j M p,n Y , TERENCE TAO AND VAN VU where u M p,n ,...,u minp,n M p,n C n is an orthonormal system of right singular vectors corresponding to the nontrivial singular values of M p,n . Proof. We just prove the rst claim, as the second is proven analogously or by taking adjoints. Observe that u x is a unit eigenvector of the matrix M p,n M p,n M p,n M p,n M p,n X X M p,n X with eigenvalue i M p,n . Applying Lemma , we obtain x n j j M p,n M p,n i M p,n u j M p,n M p,n M p,n X . But u j M p,n M p,n M p,n j M p,n v j M p,n for the minp, n nontrivial singular values possibly after relabeling the j, and vanishes for trivial ones, and j M p,n M p,n j M p,n , so the claim follows. The Stieltjes transform sz of a Hermitian matrix W is dened for complex z by the formula sz n n i i Wz . It has the following alternate representation see e.g. , Chapter Lemma . Let W ij i,jn be a Hermitian matrix, and let z be a complex number not in the spectrum of W. Then we have s n z n n k kk za k W k zI a k where W k is the n n matrix with the k th row and column removed, and a k C n is the k th column of W with the k th entry removed. Proof. By Schurs complement, kk za k W k zI a k is the k th diagonal entry of W zI . Taking traces, one obtains the claim. .. Tools from probability theory. We will rely frequently on the following concentration of measure result for projections of random vectors Lemma Distance between a random vector and a subspace. Let X ,..., n C n be a random vector whose entries are independent with mean zero, variance , and are bounded in magnitude by K almost surely for some K, where K E . Let H be a subspace of dimension d and H the orthogonal projec tion onto H. Then P H X d t exp t K . In particular, one has H X d OK log n UNIVERSALITY FOR COVARIANCE MATRICES with overwhelming probability. Proof. See , Lemma the proof is a short application of Talagrands inequality . . Delocalization The purpose of this section is to establish Theorem and Theorem . The material here is closely analogous to , Sections ., ., as well as that of the original results in , , and can be read independently of the other sections of the paper. The recent paper also contains arguments and results closely related to those in this section. .. Deduction of Theorem from Theorem . We begin by showing how Theorem follows from Theorem . We shall just establish the claim for the right singular vectors u i , as the claim for the left singular vectors is similar. We x and allow all implied constants to depend on and y. We can also assume that K log n on as the claim is trivial otherwise. As M is continuous, we see that the nontrivial singular values are almost surely simple and positive, so that the singular vectors u i are well dened up to unit phases. Fix i p it suces by the union bound and symmetry to show that the event that i falls outside a , b or that the n th coordinate x of u i is OKn / log n holds with uniformly overwhelming probability. Applying Corollary , it suces to show that with uniformly overwhelming prob ability, either i a , b , or minp,n j j M p,n j M p,n i M p,n v j M p,n X n K log n , where M M p,n X . But if i a , b , then by Theorem , one can nd with uniformly overwhelming probability a set J , . . . , minp, n with J K log n such that j M p,n i M p,n OK log n/n for all j J since i n i , we conclude that j M p,n i M p,n OK log n. In particular, j M p,n n. By Pythagoras theorem, the lefthand side of is then bounded from below by n H X K log n where H C p is the span of the v j M p,n for j J. But from Lemma and the fact that X is independent of M p,n , one has H X K log n In the case p n, one would have to replace M p,n by its transpose to return to the regime p n. TERENCE TAO AND VAN VU with uniformly overwhelming probability, and the claim follows. It thus remains to establish Theorem . .. A crude upper bound. Let the hypotheses be as in Theorem . We rst establish a crude upper bound, which illustrates the techniques used to prove The orem , and also plays an important direct role in that proof Proposition Eigenvalue upper bound. Let the hypotheses be as in Theorem . then for any interval I a , b of length I K log n/n, one has with overwhelming probability uniformly in I that N I nI where I denotes the length of I, and N I was dened in . To prove this proposition, we suppose for contradiction that N I CnI for some large constant C to be chosen later. We will show that for C large enough, this leads to a contradiction with overwhelming probability. We follow the standard approach see e.g. of controlling the eigenvalue count ing function N I via the Stieltjes transform sz p p j j Wz . Fix I. If x is the midpoint of I, I/, and z x , we see that Imsz N I p recall that p n so from one has Imsz C. Applying Lemma , with W replaced by the p p matrix W n MM which only has the nontrivial eigenvalues, we see that sz p p k kk za k W k zI a k , where kk is the kk entry of W ,W k is the p p matrix with the k th row and column of W removed, and a k C p is the k th column of W with the k th entry removed. Using the crude bound Im z Imz and , one concludes p p k Ima k W k zI a k C. UNIVERSALITY FOR COVARIANCE MATRICES By the pigeonhole principle, there exists k p such that Ima k W k zI a k C. The fact that k varies will cost us a factor of p in our probability estimates, but this will not be of concern since all of our claims will hold with overwhelming probability. Fix k. Note that a k n M k X k and W k n M k M k where X k C n is the adjoint of the k th row of M, and M k is the p n matrix formed by removing that row. Thus, if we let v M k ,...,v p M k C p and u M k ,...,u p M k C n be coupled orthonormal systems of left and right singular vectors of M k , and let j W k n j M k for j p be the associated eigenvectors, one has a k W k zI a k p j a k v j M k j W k z . and thus Ima k W k zI a k p j a k v j M k j W k x . We conclude that p j a k v j M k j W k x C . The expression a k v j M k can be rewritten much more favorably using as a k v j M k j M k n X k u j M k . The advantage of this latter formulation is that the random variables X k and u j M k are independent for xed k. Next, note that from and the Cauchy interlacing law Lemma one can nd an interval J , . . . , p of length J Cn such that j W k I. We conclude that jJ j M k n X k u j M k C . Since j W k I, one has j M k n, and thus jJ X k u j M k n C . TERENCE TAO AND VAN VU The lefthand side can be rewritten using Pythagoras theorem as H X k , where H is the span of the eigenvectors u j M k for j J. But from Lemma and we see that this quantity is n with overwhelming probability, giving the desired contradiction with overwhelming property even after taking the union bound in k. This concludes the proof of Proposition . .. Reduction to a Stieltjes transform bound. We now begin the proof of Theorem in earnest. We continue to allow all implied constants to depend on and y. It suces by a limiting argument using Lemma to establish the claim under the assumption that the distribution of M is continuous our arguments will not use any quantitative estimates on this continuity. The strategy is to compare s with the MarchenkoPastur Stieltjes transform s MP,y z R MP,y x xz dx. A routine application of and the Cauchy integral formula yields the explicit formula s MP,y z yz yz yz yz where we use the branch of yz yz with cut at a, b that is asymptotic to yz as z . To put it another way, for z in the upper halfplane, s MP,y z is the unique solution to the equation s MP,y y z yzs MP,y z with Ims MP,y z gt . Details of these computations can also be found in . We have the following standard relation between convergence of Stieltjes transform and convergence of the counting function Lemma Control of Stieltjes transform implies control on ESD. Let / /n, and L, , gt . Suppose that one has the bound s MP,y z sz with uniformly overwhelming probability for each z with Rez L and Imz . Then for any interval I in a , b with I max, log , one has N I n I MP,y x dx nI with overwhelming probability. Proof. This follows from , Lemma strictly speaking, that lemma was phrased for the semicircular distribution rather than the MarchenkoPastur distribution, but an inspection of the proof shows the proof can be modied without diculty. See also and , Corollary . for closely related lemmas. UNIVERSALITY FOR COVARIANCE MATRICES In view of this lemma, we see that to show Theorem , it suces to show that for each complex number z with a/ Rez b/ and Imz K log n n , one has sz s MP,y zo with uniformly overwhelming probability. For this, we return to the formula inserting the identities , one has sz p p k kk zY k where Y k p j j M k n X k u j M k j W k z . Suppose we condition M k and thus W k to be xed the entries of X k remain independent with mean zero and variance , and thus since the u j are unit vectors EY k M k p j j M k n j W k z p n zs k z where s k z p p j j W k z is the Stieltjes transform of W k . From the Cauchy interlacing law Lemma we see that the dierence sz p p s k z p p j j Wz p j j W k z is bounded in magnitude by O p times the total variation of the function z on , , which is O . Thus p p s k z sz O p and thus EY k M k p n p n zsz O n y o y ozsz since p/n y o and / on. We will shortly show a similar bound for Y k itself Lemma Concentration of Y k . For each k p, one has Y k yo y ozsz with overwhelming probability uniformly in k and I. TERENCE TAO AND VAN VU Meanwhile, we have kk n X k and hence by Lemma , kk o with overwhelming probability again uniformly in k and I. Inserting these bounds into , one obtains sz p p k z y o y ozsz with overwhelming probability thus sz almost solves in some sense. From the quadratic formula, the two solutions of are s MP,y z and yz yz s MP,y z. One concludes that with overwhelming probability, one has either sz s MP,y zo or sz yz yz o or sz yz yz s MP,y zo with the convention that yz yz when y . By using a n net say of possible zs and using the union bound and the fact that sz has a Lipschitz constant of at most On say in the region of interest we may assume that the above trichotomy holds for all z with a/ Rez b/ and Imz n say. When Imz n , then sz, s MP,y z are both o, and so holds in this case. By continuity, we thus claim that either holds for all z in the domain of interest, or there exists a z such that as well as one of or both hold. From one has s MP,y z yz yz s MP,y z yz which implies that the separation between s MP,y z from yz yz is bounded from below, which implies that , cannot both hold for n large enough. Simi larly, from we see that yz yz s MP,y z yz yz yz since y z yz has zeroes only when z a, b, and z is bounded away from these singularities, we see also that , cannot both hold for n large enough. Thus the continuity argument shows that holds with uniformly overwhelming probability for all z in the region of interest for n large enough, which gives and thus Theorem . UNIVERSALITY FOR COVARIANCE MATRICES . Proof of Theorem and Theorem We rst prove Theorem . The arguments follow those in . We begin by observing from Markovs inequality and the union bound that one has ij , ij n /C say for all i, j with probability On . Thus, by truncation and adjusting the moments appropriately, using Lemma to absorb the error, one may assume without loss of generality that ij , ij n /C almost surely for all i, j. Next, by a further approximation argument we may assume that the distribution of M, M is continuous. This is a purely qualitative assumption, to ensure that the singular values are almost surely simple our bounds will not depend on any quantitative measure on the continuity, and so the general case then follows by a limiting argument using Lemma . The key technical step is the following theorem, whose proof is delayed to the next section. Theorem Truncated Four Moment Theorem. For suciently small c gt and suciently large C gt the following holds for every lt lt and k . Let M ij ip,jn and M ij ip,jn be matrix ensembles obeying condition C for some C , as well as . Assume that p/n y for some lt y , and that ij and ij match to order . Let G R k R k R be a smooth function obeying the derivative bounds j Gx ,...,x k ,q ,...,q k n c for all j and x ,...,x k R, q ,...,q k R, and such that G is supported on the region q ,...,q k n c , and the gradient is in all k variables. Then for any i lt i lt i k n, and for n suciently large depending on , k, c we have EG n i M, . . . , n i k M, Q i M, . . . , Q i k M EG n i M ,..., n i k M ,Q i M ,...,Q i k M n c . If ij , ij match to order , then the conclusion still holds as long as one strengthens to j Gx ,...,x k ,q ,...,q k n jc for some c gt , if c is suciently small depending on c . Given a p n matrix M we form the augmented matrix M dened in , whose eigenvalues are M, . . . , p M, together with the eigenvalue with multi plicity n p if p lt n. For each i p, we introduce in analogy with the TERENCE TAO AND VAN VU arguments in the quantities Q i M iM n i M n jpji j M i M np i M p j j M i M . The factor of n in Q i M is present to align the notation here with that in , in which one dilated the matrix by n. We set Q i M if the singular value i is repeated, but this event occurs with probability zero since we are assuming M to be continuously distributed. The gap property on M ensures an upper bound on Q i M Lemma . If M satises the gap property, then for any c gt independent of n, and any p i p, one has Q i Mn c with high probability. Proof. Observe the upper bound Q i M n jpji j M i M np n i M . From Corollary we see that with overwhelming probability, i M /n is bounded away from zero, and so np niM O/n. To bound the other term in , one repeats the proof of , Lemma . By applyign a truncation argument exactly as in , Section ., one can now remove the hypothesis in Theorem that G is supported in the region q ,...,q k n c . In particular, one can now handle the case when G is independent of q ,...,q k and Theorem follows after making the change of variables n and using the chain rule and Corollary . Next, we prove Theorem , assuming both Theorems and . The main observation here is the following lemma. Lemma Matching lemma. Let be a complex random variable with mean zero, unit variance, and third moment bounded by some constant a. Then there exists a complex random variable with support bounded by the ball of radius O a centered at the origin and in particular, obeying the exponential decay hypothesis uniformly in for xed a which matches to third order. Proof. In order for to match to third order, it suces that have mean zero, variance , and that E E and E E . Accordingly, let C be the set of pairs E ,E where ranges over complex random variables with mean zero, variance one, and compact support. Clearly is convex. It is also invariant under the symmetry z, w e i z, e i w UNIVERSALITY FOR COVARIANCE MATRICES for any phase . Thus, if z, w , then z, e i/ w , and hence by convexity , e i/ w , and hence by convexity and rotation invariance , w whenever w w. Since z, w and , w both lie in , by convexity cz, lies in it also for some absolute constant c gt , and so again by convexity and rotation invariance z , whenever z cz. One last application of convexity then gives z /, w / whenever z cz and w w. It is easy to construct complex random variables with mean zero, variance one, compact support, and arbitrarily large third moment. Since the third moment is comparable to z w, we thus conclude that contains all of C , i.e. every complex random variable with nite third moment with mean zero and unit variance can be matched to third order by a variable of compact support. An inspection of the argument shows that if the third moment is bounded by a then the support can also be bounded by O a . Now consider a random matrix M as in Theorem with atom variables ij . By the above lemma, for each i, j, we can nd ij which satises the exponential decay hypothesis and match ij to third order. Let q be a smooth cuto to the region qn c for some c gt independent of n, and let p i p. By Theorem ,M satises the gap property. By Lemma , EQ i M On c for some c gt independent of n, so by Theorem one has EQ i M On c for some c gt independent of n. We conclude that M also obeys the gap property. The next two sections are devoted to the proofs of Theorem and Theorem , respectively. Remark . The above trick to remove the exponential decay hypothesis for The orem also works to remove the same hypothesis in , Theorem . The point is that in the analogue of Theorem in that paper implicit in , Section ., the exponential decay hypothesis is not used anywhere in the argument only a uniformly bounded C moment for C large enough is required, as is the case here. Because of this, one can replace all the exponential decay hypotheses in the results of , by a hypothesis of bounded C moment we omit the details. . The proof of Theorem It remains to prove Theorem . By telescoping series, it suces to establish a bound EG n i M, . . . , n i k M, Q i M, . . . , Q i k M EG n i M ,..., n i k M ,Q i M ,...,Q i k M n c TERENCE TAO AND VAN VU under the assumption that the coecients ij , ij of M and M are identical except in one entry, say the qr entry for some q p and r n, since the claim then follows by interchanging each of the pn On entries of M into M separately. Write Mz for the matrix M or M with the qr entry replaced by z. We apply the following proposition, which follows from a lengthy argument in Proposition Replacement given a good conguration. Let the notation and assumptions be as in Theorem . There exists a positive constant C independent of k such that the following holds. Let gt . We condition i.e. freeze all the entries of Mz to be constant, except for the qr entry, which is z. We assume that for every j k and every z n / whose real and imaginary parts are multiples of n C , we have Singular value separation For any i n with i i j n , we have n i Mz n ij Mz n ii j . Also, we assume n ij Az n n. Delocalization at i j If u ij Mz C n ,v ij Mz C p are unit right and left singular vectors of Mz, then e q v ij Mz, e r u ij Mz n / . For every P ij, Mze q ,P ij, Mze r / n / , whenever P ij, resp. P ij , is the orthogonal projection to the span of right singular vectors u i Mz resp. left singular vectors v i Mz cor responding to singular values i Az with ii j lt . We say that M, e q ,e r are a good conguration for i ,...,i k if the above proper ties hold. Assuming this good conguration, then we have if ij and ij match to order , or if they match to order and holds. Proof. This follows by applying , Proposition to the p np n Hermitian matrix Az nMz, where Mz is the augmented matrix of Mz, dened in . Note that the eigenvalues of Az are n Mz, . . . , n p Mz and , and that the eigenvalues are given up to unit phases by v j Mz u j Mz . Note also that the analogue of in , Proposition is trivially true if is comparable to n, so one can restrict attention to the regime on. In view of the above proposition, we see that to conclude the proof of Theorem and thus Theorem it suces to show that for any gt , that M, e q ,e r are a good conguration for i ,...,i k with overwhelming probability, if C is suciently large depending on cf. , Proposition . UNIVERSALITY FOR COVARIANCE MATRICES Our main tools for this are Theorem and Theorem . Actually, we need a slight variant Proposition . The conclusions of Theorem and Theorem continue to hold if one replaces the qr entry of M by a deterministic number z On /O/C . This is proven exactly as in , Corollary and is omitted. We return to the task of establishing a good conguration with overwhelming prob ability. By the union bound we may x j k, and also x the z n / whose real and imaginary parts are multiples of n C . By the union bound again and Proposition , the eigenvalue separation condition holds with overwhelm ing probability for every i n with i j n if C is suciently large, as does . A similar argument using Pythagoras theorem and Corollary gives with overwhelming probability noting as before that we may restrict atten tion to the regime on. Corollary also gives with overwhelming probability. This gives the claim, and Theorem follows. . Proof of Theorem We now prove Theorem , closely following the analogous arguments in . Using the exponential decay condition, we may truncate the ij and renormalise moments, using Lemma to assume that ij log O n almost surely. By a limiting argument we may assume that M has a continuous distribution, so that the singular values are almost surely simple. We write i instead of i, p instead of p, and write N p n. As in , the strategy is to propagate a narrow gap for M M p,n backwards in the p variable, until one can use Theorem to show that the gap occurs with small probability. More precisely, for any i l lt i p p , we let M p,n be the p n matrix formed using the rst p rows of M p,n , and we dene following the regularized gap g i,l,p inf iilltiip N i M p,n N i M p,n mini i , log C N log . N , where C gt is a large constant to be chosen later. It will suce to show that g i,,p n c . The main tool for this is Lemma Backwards propagation of gap. Suppose that p / p lt p and l p/ is such that g i,l,p TERENCE TAO AND VAN VU for some lt which can depend on n, and that g i,l,p m g i,l,p for some m with m / . Let X p be the p th row of M p,n , and let u M p,n ,...,u p M p,n be an orthonor mal system of right singular vectors of M p,n associated to M p,n ,..., p M p,n . Then one of the following statements hold i Macroscopic spectral concentration There exists i lt i p with i i log C/ n such that n i M p,n n i M p,n / explog . ni i . ii Small inner products There exists p/ i i l lt i i /p with i i log C/ n such that ijlti X p u j M p,n i i m/ log . n . iii Large singular value For some i p one has i M p,n nexplog . n / . iv Large inner product in bulk There exists p/ i /p such that X p u i M p,n explog . n / . v Large row We have X p nexplog . n / . vi Large inner product near i There exists p/ i /p with ii log C n such that X p u i M p,n m/ nlog . n. Proof. This follows by applying , Lemma to the p n p n Hermitian matrix A pn n M p,n M p,n , which after removing the bottom row and rightmost column which is X p , plus p zeroes yields the p n p n Hermitian matrix A pn n M p,n M p,n Strictly speaking, there are some harmless adjustments by constant factors that need to be made to this lemma, ultimately coming from the fact that n, p, n p are only comparable up to constants, rather than equal, but these adjustments make only a negligible change to the proof of that lemma. UNIVERSALITY FOR COVARIANCE MATRICES which has eigenvalues n M p,n ,..., n p M p,n and , and an orthonor mal eigenbasis that includes the vectors u j M p,n v j M p,n for j p. The large coecient event in , Lemma iii cannot occur here, as A pn has zero diagonal. By repeating the arguments in , Section . almost verbatim, it then suces to show that Proposition Bad events are rare. Suppose that p / p lt p and l p/, and set n for some suciently small xed gt . Then a The events i, iii, iv, v in Lemma all fail with high probability. b There is a constant C such that all the coecients of the right singular vectors u j M p,n for p/ j /p are of magnitude at most n / log C n with overwhelming probability. Conditioning M p,n to be a matrix with this property, the events ii and vi occur with a conditional probability of at most m n . c Furthermore, there is a constant C depending on C ,,C such that if lC and M p,n is conditioned as in b, then ii and vi in fact occur with a conditional probability of at most m log C nn . But Proposition can be proven by repeating the proof of , Proposition with only cosmetic changes, the only signicant dierence being that Theorem and Theorem are applied instead of , Theorem and , Proposition respectively. References G. Anderson, A. Guionnet and O. Zeitouni, An introduction to random matrices, book to be published by Cambridge Univ. Press. Z. D. Bai and J. Silverstein, Spectral analysis of large dimensional random matrices, Mathe matics Monograph Series , Science Press, Beijing . G. Ben Arous and S. Peche, Universality of local eigenvalue statistics for some sample covari ance matrices, Comm. Pure Appl. Math. , no. , . K. Costello and V. Vu, Concentration of random determinants and permanent estimators, to appear in SIAM discrete math. P. Deift, Orthogonal polynomials and random matrices a RiemannHilbert approach. Courant Lecture Notes in Mathematics, . New York University, Courant Institute of Math ematical Sciences, New York American Mathematical Society, Providence, RI, . P. Deift, Universality for mathematical and physical systems. International Congress of Math ematicians Vol. I, , Eur. Math. Soc., Zrich, . A. Edelman, Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Anal. Appl. , . A. Edelman, The distribution and moments of the smallest eigenvalue of a random matrix of Wishart type, Linear Algebra Appl. , . L. Erdos, B. Schlein and HT. Yau, Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices, to appear in Annals of Probability. L. Erdos, B. Schlein and HT. Yau, Local semicircle law and complete delocalization for Wigner random matrices, to appear in Comm. Math. Phys. TERENCE TAO AND VAN VU L. Erdos, B. Schlein and HT. Yau, Wegner estimate and level repulsion for Wigner random matrices, submitted. L. Erdos, J. Ramirez, B. Schlein and HT. Yau, Universality of sinekernel for Wigner matrices with a small Gaussian perturbation, arXiv.. L. Erdos, J. Ramirez, B. Schlein and HT. Yau, Bulk universality for Wigner matrices, arXiv. L. Erdos, B. Schlein, HT. Yau and J. Yin, The local relaxation ow approach to universality of the local statistics for random matrices, arXiv.. L. Erdos, J. Ramirez, B. Schlein, T. Tao, V. Vu, and HT. Yau, Bulk universality for Wigner hermitian matrices with subexponential decay, to appear in Math. Res. Lett. O. N. Feldheim and S. Sodin, A universality result for the smallest eigenvalues of certain sample covariance matrices, preprint. P. J. Forrester, The spectrum edge of random matrix ensembles, Nuclear Phys. B , . P. J. Forrester, Exact results and universal asymptotics in the Laguerre random matrix ensemble, J. Math. Phys. , no. , . P. J. Forrester, N. S. Witte, The distribution of the rst eigenvalue spacing at the hard edge of the Laguerre unitary ensemble, Kyushu J. Math. , no. , . J. Ginibre, Statistical Ensembles of Complex, Quaternion, and Real Matrices, Journal of Mathematical Physics , . F. Gotze, A. Tikhomirov, Rate of convergence to the semicircular law, SFB Universitat Bielefeld, Preprint , . F. Gotze, A. Tikhomirov, Rate of convergence in probability to the MarchenkoPastur law, Bernoulli , no. , . A. Guionnet and O. Zeitouni, Concentration of the spectral measure for large matrices, Electron. Comm. Probab. , electronic. J. Gustavsson, Gaussian uctuations of eigenvalues in the GUE, Ann. Inst. H. Poincar Probab. Statist. , no. , . K. Johansson, Universality of the local spacing distribution in certain ensembles of Hermitian Wigner matrices, Comm. Math. Phys. , no. , . I. Johnstone, On the distribution of largest principal component, Ann. Statist. , . N. Katz and P. Sarnak, Random matrices, Frobenius eigenvalues, and monodromy. Amer ican Mathematical Society Colloquium Publications, . American Mathematical Society, Providence, RI, . M. Ledoux, The concentration of measure phenomenon, Mathematical survey and mono graphs, volume , AMS . J. W. Lindeberg, Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeit srechnung, Math. Z. , . V. A. Marchenko, L. A. Pastur, The distribution of eigenvalues in certain sets of random matrices, Mat. Sb. , . M.L. Mehta, Random Matrices and the Statistical Theory of Energy Levels, Academic Press, New York, NY, . M.L. Mehta and M. Gaudin, On the density of eigenvalues of a random matrix, Nuclear Phys. ,. T. Nagao, K. Slevin, Laguerre ensembles of random matrices nonuniversal correlation func tions, J. Math. Phys. , no. , . T. Nagao, M. Wadati, Correlation functions of random matrix ensembles related to classical orthogonal polynomials, J. Phys. Soc. Japan , no. , . V. Paulauskas and A. Rackauskas, Approximation theory in the central limit theorem, Kluwer Academic Publishers, . L. A. Pastur, Spectra of random selfadjoint operators. Russian Math. Surveys, , . S. Peche, Universality in the bulk of the spectrum for complex sample covariance matrices, preprint. B. Rider, J. Ramirez, Diusion at the random matrix hard edge, to appear in Comm. Math. Phys. A. Soshnikov, Universality at the edge of the spectrum in Wigner random matrices, Comm. Math. Phys. , no. , . UNIVERSALITY FOR COVARIANCE MATRICES A. Soshnikov, Gaussian limit for determinantal random point elds, Ann. Probab. . A. Soshnikov, A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices, J. Statist. Phys. , no. , . M. Talagrand, A new look at independence, Ann. Prob. , no. , . T. Tao and V. Vu, On random matrices singularity and determinant, Random Structures Algorithms , no. , . T. Tao and V. Vu with an appendix by M. Krishnapur, Random matrices Universality of ESDs and the circular law, to appear in Annals of Probability. T. Tao and V. Vu, Random matrices The distribution of the smallest singular values Uni verality at the Hard Edge, To appear in GAFA. , T. Tao and V. Vu, Random matrices The local statistics of eigenvalues, to appear in Acta Math.. T. Tao and V. Vu, Random matrices universality of local eigenvalue statistics up to the edge, to appear in Comm. Math. Phys. J. von Neumann and H. Goldstine, Numerical inverting matrices of high order, Bull. Amer. Math. Soc. , , . P. Wigner, On the distribution of the roots of certain symmetric matrices, The Annals of Mathematics . K. W. Wachter, Strong limits of random matrix spectra for sample covariance matrices of independent elements, Ann. Probab., , . Y. Q. Yin, Limiting spectral distribution for a class of random matrices. J. Multivariate Anal., , . Department of Mathematics, UCLA, Los Angeles CA Email address taomath.ucla.edu Department of Mathematics, Rutgers, Piscataway, NJ Email address vanvumath.rutgers.edu TERENCE TAO AND VAN VU We often abbreviate i W as i . Remark . In this paper we will focus primarily on the case y , but several of our results extend to other values of y as well. The case p gt n can of course be deduced from the p lt n case after some minor notational changes by transposing the matrix M , which does not aect the nontrivial eigenvalues of the covariance matrix. One can also easily normalise the variance of the entries to be some other / quantity than if one wishes. Observe that the quantities i ni can be interpreted as the nontrivial singular values of the original matrix M , and , . . . , p can also be interpreted as the eigenvalues of the p p matrix n M M . It will be convenient to exploit all three of these spectral interpretations of , . . . , p in this paper. Condition C is analogous to Condition C for Wignertype matrices in , but with the exponential decay hypothesis relaxed to polynomial decay only. The wellknown MarchenkoPastur law governs the bulk distribution of the eigenvalues , . . . , p of W Theorem MarchenkoPastur law. Assume Condition C with C , and suppose that p/n y for some lt y . Then for any x gt , the random variables i p i W x p x converge in probability to MP,y x dx, where and When furthermore M is iid, one can also obtain the case C . Proof. For C , see , for C gt , see for the C iid case, see . Further results are known on the rate of convergence see . In this paper we are concerned instead with the local eigenvalue statistics. A model case is the complex Wishart ensemble, in which the ij are iid variables which are complex gaussians with mean zero and variance . In this case, the distribution of the eigenvalues , . . . , n of W can be explicitly computed as a special case of the Laguerre unitary ensemble. For instance, when p n, the joint distribution is given by the density function n MP,y x xy y b xx aa,b x b y . a n , . . . , n cn iltjn i j exp i/ i for some explicit normalization constant cn. Very similarly to the GUE case, one can use this explicit formula to directly compute several local statistics, including the distribution of the largest and smallest eigenvalues , the correlation functions etc. Also in similarity to the GUE We rst need some denitions. . the conclusion still holds provided that one strengthens to for all j and x Rk and any c gt . . For some earlier results in this direction. we refer to . Denition Frequent events. We say that two complex random variables . E holds asymptotically almost surely if PE o.jn be matrix ensembles obeying condition C with the the indicated constant C . that PE explog n. match to order k for some integer k if one has ERem Iml ERe m Im l for all m. Assume also that p/n y for some lt y . Let E be an event depending on n. . as an analogue of a recent result in . E holds with high probability if PE Onc for some constant c gt independent of n. . nik W EGni W . k. . j Gx njc . Let M ij ip. . . E holds with overwhelming probability if PE OC nC for every constant C gt or equivalently.. The Four Moment Theorem. Denition Matching. and for n suciently large depending on . The goal of this paper is to establish a Four Moment theorem for random covariance matrices. For suciently small c gt and suciently large C gt C would suce the following holds for every lt lt and k . c we have EGni W . .UNIVERSALITY FOR COVARIANCE MATRICES case. l with m l k. W be the associated covariance matrices. . Let G Rk R be a smooth function obeying the derivative bounds for all j and x Rk . If ij and ij only match to order rather than . and the references therein.jn and M ij ip. it is widely conjectured that these statistics hold for a much larger class of random matrices. . j that ij and ij match to order . and assume that for each i. Let W. j Gx nc Then for any i lt i lt ik n. This theorem asserts that all local statistics of the eigenvalues of Wn is determined by the rst four moments of the entries. nik W nc . Our main result is Theorem Four Moment Theorem. provided that c is suciently small depending on c . . E holds almost surely if PE . usually with less than four moment assumption. Let . we expect that the result extends to the edge of the spectrum as well. of mean zero and variance . with the main dierence being that the exponential decay condition from . . Let us demonstrate this through an example concerning the universality of the sine kernel. pMP. Remark .jk dt .. dtk Rm where Kx. ui . and let lt u lt . Applications.ik p f pMP. e We say that a complex random variable of mean zero and variance one is Gauss divisible if has the same distribution as t/ t/ for some lt t lt and some independent random variables . Nagao and Wadati established the following result for the complex Wishart ensemble. See Section . p be the nontrivial eigenvalues of W . it will be important to keep f continuous. y sinxy xy is the sine kernel. can all be strengthened similarly. . . in view of the results in . Theorem in order to obtain universality results for large classes of random matrices. Using the explicit formula . Assume that p n O thus y . . with distributed according to the complex gaussian. Returning to the bulk. for the asymptotic notation we will be using. Remark . Let k be an integer.. Then the quantity E i . and that W is given by the complex Wishart ensemble. using the same argument. we assume all these quantities are independent of n. tk detKti .. TERENCE TAO AND VAN VU This is an analogue of . . The reason that we restrict the eigenvalues to the bulk of the spectrum p i p is to guarantee that the density function MP is bounded away from zero. The main results of . The value C is ad hoc. . tj i. being replaced instead by the high moment condition in C. uik converges as n to f t .. In particular. let f Rk C be a continuous function with compact support and symmetric with respect to permutations. Theorem Sine kernel for Wishart ensemble. it is likely that the hard edge asymptotics of Forrester can be extended to a wider class of ensembles. . One can apply Theorem in a similar way as its counterpart . . and we make no attempt to optimize this constant.. Theorem is dropped. . The results in allowed f to be bounded measurable rather than continuous. as a variant of Johanssons result for random hermitian matrices. . . . . the following extension was established by Ben Arous and Pech. but when we consider discrete ensembles later. We will pursue this issue elsewhere. Theorem for covariance matrices. . . In view of the results in . . left and right singular vectors will be used instead. Nevertheless one can view this connection as a heuristic explanation as to why so much of the machinery in the Hermitian eigenvalue problem can be transferred to the nonHermitian singular value problem. which has eigenvalues M . It was shown in . . . Theorem . Theorem can be extended to the case when p n On/ so y is still . the fact that many of the entries in are zero and in particular. Using Theorem and Theorem in exactly the same way we used . Thus. and both were supported on at least three points. One way to connect the singular value problem for iid matrices to eigenvalue problems for Wigner matrices is to form the augmented matrix M M M which is a pnpn Hermitian matrix. the gauss divisible random variable could be taken to be the sum of a gaussian variable and a discrete variable. everywhere eigenvectors are used in . The arguments in this paper will be a nonsymmetric version of those in . We will take advantage of thus augmented perspective in some parts of the paper particularly when we wish to import results from as black boxes. if one inspects the convexity argument used to solve the moment problem in . we obtain the following Theorem Sine kernel for more general ensembles. and when M is an iid matrix obeying condition C with C suciently large C would suce. Thus one can view the singular values of an iid matrix as being essentially given by the eigenvalues of a slightly larger Hermitian matrix which is of Wigner type except that the entries have been zeroed out on two diagonal blocks. then matched to order with a gauss divisible random variable with nite C moment indeed. for instance. Theorem and Johanssons theorem to establish . . In particular. rather than deterministic in directly to this matrix. we will instead remove just the last row of a p n matrix to form a p n matrix. . and in particular is thus exponentially decaying. Remark . and with ij have support on at least three points. . Corollary that if the real and imaginary parts of a complex random variable were independent with mean zero and variance one.UNIVERSALITY FOR COVARIANCE MATRICES Theorem Sine kernel for Gaussian divisible ensemble. have zero mean and variance seems to make it dicult to apply parts of the arguments particularly those that are probabilistic in nature. but in other parts it will in fact be more convenient to work with M directly. and when M is an iid matrix obeying condition C with C . and with the ij gauss divisible. Theorem which is for the Wishart ensemble and for p n O can be extended to the case when p n On/ so y is still . p M together with n p eigenvalues at zero. Lemma . and while in one often removes the last row and column of a Hermitian n n matrix to make a n n submatrix. We thank HorngTzer Yau for references. Fix gt and u such that lt u lt u lt .jn obey condition C for some C . Extensions. Given this. . The gap property and the exponential decay removing trick The following property plays an important role in . where Kx. . y is the Dyson sine kernel Kx. In a very recent work. one can use the argument in to remove the requirement that the real and imaginary parts of ij be supported on at least three points. a similar result was also established by Pch. Then the quantity u k pk u . . using the same method with some modications. Then M obeys the gap property. and suppose that the coecients ij are exponentially decaying in the sense that Pij tC expt for all t C for all i. . and let W Wn. Acknowledgments. x y The details are more or less the same as in and omitted. k detKi . Theorem . .. one has i W i W nc with high probability. compactly supported function. Even more recently. We say that M obeys the gap property if for every . j and some constants C. Yau and Yin exo tended Theorem to a large class of matrices.. . u f . dk . it allows one to extend Theorem to the case when t is a negative power of n. and for every p i p. Theorem Universality of averaged correlation function. Theorem . TERENCE TAO AND VAN VU . dk du k n u Rk MP u nMP u nMP u converges as n to Rk f . We can also have the following analogue of . . Denition Gap property. . Schlein. Erds. . k d . . .n be a random covariance matrix. While their results do not apply for entries with discrete distributions. j k i. . Theorem Gap theorem. .j d . Let k and let f Rk R be a continuous. y sinx y . The implied constants in this statement can depend of course on and c. . . assuming that the distribution of the entries ij is suciently smooth. . C gt . . . e e . c gt independent of n. we prove the following theorem. Let M be a matrix ensemble obeying condition C. As an analogue of . Let M ij ip. nik W EGni W . Let W. and assume that for each i.UNIVERSALITY FOR COVARIANCE MATRICES Next. Theorem is that in the latter we assume exponential decay rather than the gap property. assuming Theorems and . The proofs of Theorems and are presented in Section . this dierence is only a formality.jn satises condition C with C . Then M obeys the gap property. Theorem .jn be matrix ensembles obeying condition C with the indi cated constant C . to prove Theorem . j Gx njc Theorem Gap theorem. However. The new step that enables us to remove the gap property altogether is the following theorem. . Assume that M ij ip. j that ij and ij match to order . . Then for any i lt i lt ik n. W be the associated covariance matrices.jn and M ij ip. the only place we used exponential decay is to prove the gap property via Theorem . nik W nc . This theorem is weaker than Theorem . . and that p/n y for some lt y . which allows us to insert information such as the gap property into the test function G. the conclusion still holds provided that one strengthens to for all j and x Rk and any c gt . . The core of the proof of Theorem is Theorem . c we have EGni W . Let G Rk R be a smooth function obeying the derivative bounds j Gx nc for all j and x Rk . Theorem . . we have the following analogue of . given C bounded third moment. since in the proof of . as we assume the gap property. Let M ij ip. which asserts that the gap property is already guaranteed by condition C. . and for n suciently large depending on . combining with Theorem and Lemma . provided that c is suciently small depending on c . k. . Theorem follows directly from Theorems and . If ij and ij only match to order rather than . . The proofs of these two theorems are presented in Sections and . Theorem Four Moment Theorem with Gap assumption. respectively. The rest of the paper is organized as follows. The dierence comparing to . . We will also use this theorem. The next three sections are devoted to technical lemmas. Theorem . Assume also that M and M obeys the gap property. For suciently small c gt and suciently large C gt C would suce the following holds for every lt lt and k . . We recommend therefore that the reader be familiar with that paper rst. . . . In the generic case when the singular values are simple i. . and Yau . The main technical lemmas Important note. p . j. to prove the Four Moment and Gap Theorems in this paper. p vanish. . Suppose that p/n y for some lt y . . and that the probability distribution of M is continuous. the previous paper of the authors. u will span a slightly larger space than the corange if some of the p singular values . Strictly speaking.e. . and draw heavily from. . However. b with a. left singular vector vi with singular value i M ni W / to be an eigenvector of W n M M resp. We will establish the following ErdsSchleinYau type delocalization theorem o analogous to . . before reading the current one. Then with overwhelming probability. This result asserts assuming o uniformly exponentially decaying distribution for the coecients that with overwhelming probability. . all the unit eigenvectors of the Wigner matrix have coecients On/o thus the energy of the eigenvector is spread out more or less uniformly amongst the n coecients. the unit singular vectors ui . u . . which is an essential ingredient to Theorems . Observe from the singular value decomposition that one can nd orthonormal bases u . Similarly. . b dened in have all coecients uniformly of size OKn/ log n. We dene a right singular vector ui resp. vi are determined up to multiplication by a complex phase ei . In the proof of the Four Moment Theorem as well as the Gap Theorem for n n Wigner matrices in . Let gt be independent of n. . The arguments in this paper are very similar to. a crucial ingredient was a variant of the Delocalization Theorem of Erds. and let M obey condition C for some C . W n M M with eigenvalue i . we shall primarily restrict attention to the generic case in which this vanishing does not occur. such that M ui i vi and M vi i ui . Suppose further that that ij K almost surely for some K gt which can depend on n and all i. vp Cp for the corange kerM of M and of Cp respectively. and is also of some independent interest Theorem Delocalization theorem. . When one just assumes uniformly bounded C moment rather than uniform exponential decay. the bound becomes On/O/C instead where the implied constant in the exponent is uniform in C . . TERENCE TAO AND VAN VU . . . . Proposition . . Schlein. all the unit left and right singular vectors of M with eigenvalue i in the interval a. . we will need a Delocalization theorem for the singular vectors of the matrix M . up Cn and v . lt lt . of course. b is at least p.. The claim follows. unless otherwise stated. All statements here are understood to hold only in the asymptotic regime when n is suciently large depending on all quantities that are independent of n. Throughout this paper. is the number of eigenvalues in I. We write X OY .g. We remark that a very similar result with slightly dierent hypotheses on the parameters and on the underlying random variable distributions was recently established in . y and C will remain other independent of n. or the matrix M will depend on n. Some quantities e. and let gt be independent of n. We write X Y . Proposition Theorem Eigenvalue concentration theorem..y x dx p. imposed so that the singular values are almost surely simple. . As with other proofs of delocalization theorems in the literature. Let the hypotheses be as in Theorem . while other quantities e. but anything which is polynomial in K and log n will suce for our purposes.UNIVERSALITY FOR COVARIANCE MATRICES The factors K log n can probably be improved slightly. or Y X if one has X CY for all suciently large n and some C independent of n. then each event ij K with K n/C say occurs with probability On . Note however that C is allowed to depend on other quantities independent of n. but in practice we will be able to eliminate this hypothesis by a limiting argument as none of the bounds will depend on any quantitative measure of this continuity. such as and y. Proof. we will be able to apply the above theorem with K n/C without diculty. Let the hypotheses be as in Theorem . We isolate one particular consequence of Theorem also established in Corollary Concentration of the bulk. one has with overwhelming probability uniformly in I that NI p where NI i p i W I I MP. Then there exists gt independent of n such that with overwhelming probability. n will be an asymptotic parameter going to innity. b of length I K log n/n. Then for any interval I a.g. Notation. From Theorem . Thus in practice. Observe that if M obeys condition C. one has a i W b for all p i p. p. we see with overwhelming probability that the number of eigenvalues in a . if is suciently small depending on . Theorem is in turn deduced from the following eigenvalue concentration bound analogous to . We write X oY if X cnY where cn as n . Y X. . X Y . Corollary . The continuity hypothesis is a technical one. and A F trAA / for the Frobenius or HilbertSchmidt norm. Similarly. Let p n. then i Mn. thus for instance if p/n y for some lt y then p n. We write X for the length of a vector X. TERENCE TAO AND VAN VU or X Y if X Y X. and Mn. Lemma Weyl inequality. This follows from the same minimax formulae used to establish Lemma . ii and iii follow from the minimax formula i Mn. one can of course use the transpose of ii instead. If M.p is a p n matrix. For p n. Claim i follows from the minimax formula i An V dimV i vV v inf sup v An v where V ranges over idimensional subspaces in Cn .p for all i lt p.p i Mn. Tools from linear algebra. iii If p lt n. Proof. in order to free up the letter i to denote an integer usually between and n.p v . A A op for the operator norm of a matrix A. then i An i An i An for all i lt n. then i A i B A B op for all i n. then i Mn. Proof. then i M i N M N op for all i p. N are p n matrices.. We write for the complex imaginary unit.p for all i p.p is a p n minor. If A. i If An is an n n Hermitian matrix. In this section we recall some basic identities and inequalities from linear algebra which will be used in this paper. Basic tools . and An is an n n minor. with the understanding that Mn. B are n n Hermitian matrices.p i Mn. We begin with the Cauchy interlacing law and the Weyl inequalities Lemma Cauchy interlacing law.p is an pn minor.p . .p i Mn. . Let p n. ii If Mn.p i Mn.p is a pn matrix. if Mn. and Mn.p V dimV inp vV v inf sup Mn. where v Mp. .n i Mp. .n Y . un An Cn is an orthonormal eigenbasis corresponding to the eigenvalues An . and let X u be a right unit singular vector of x Mp. .n .n Mp. Suppose that none of the eigenvalues of An are equal to i An .n j j Mp. We omit the details.UNIVERSALITY FOR COVARIANCE MATRICES Remark . and v y is a left unit singular vector of Mp. .n with singular value i Mp. In a similar vein. Lemma . and let eigenvector of An with eigenvalue i An . We have the following elementary formula for a component of an eigenvector of a Hermitian matrix. .n X . where x C and u Cn . Then x n j j An i An uj An X . This implies an analogous formula for singular vectors Corollary Formula for coordinate of a singular vector. .n are equal to i Mp. One can also deduce the singular value versions of Lemmas .n are equal to i Mp. . then y j Mp. See e.n . n . Suppose that none of the singular values of Mp.n . . vminp. if Mp. and none of the singular values of Mp.n minp.n vj Mp. .n j j Mp. where x C and v Cn .n .n . where y C and v Cp .n Y for some Y Cn .n with singular value i Mp. Let p.n i Mp.n Cp is an orthonormal system of left singular vectors corresponding to the nontrivial singular values of Mp. in terms of the eigenvalues and eigenvectors of a minor Lemma Formula for coordinate of an eigenvector. . . n An of An . from their Hermitian counterparts by using the augmented matrices .n uj Mp.n . . Then x j Mp.n Mp. .n minp. . Proof.g. where u An .n be a p n matrix for some X Cp . and let Mp.n Mp. Let An An X X a v x be a unit be a n n Hermitian matrix for some a R and X Cn . The Stieltjes transform sz of a Hermitian matrix W is dened for complex z by the formula n .n vj Mp.n Mp. We just prove the rst claim. one obtains the claim. .n Mp. sz n i i W z It has the following alternate representation see e. we obtain x n j j Mp. Chapter Lemma . variance . n Cn be a random vector whose entries are independent with mean zero. Proof. Observe that is a unit eigenvector of the matrix x Mp. i Mp.n j Mp. Let X .n X Mp.. one has H X d OK log n . so the claim follows.n Mp. .n uj Mp.n Mp. Tools from probability theory. TERENCE TAO AND VAN VU where u Mp.n Mp. .n Mp. Proof. and vanishes for trivial ones. . . . Then t P H X d t exp . .n Cn is an orthonormal system of right singular vectors corresponding to the nontrivial singular values of Mp. Taking traces. kk za Wk zI ak k is the k th diagonal entry of W zI . uminp.n Mp. and j Mp.jn be a Hermitian matrix.n . Let W ij i.n X .n Mp. where K E . . Applying Lemma . as the second is proven analogously or by u taking adjoints. and let z be a complex number not in the spectrum of W . and are bounded in magnitude by K almost surely for some K.n for the minp.n Mp.n Mp. By Schurs complement.n .n . K In particular. Then we have sn z n n k kk z a Wk zI ak k where Wk is the n n matrix with the k th row and column removed. . .n . n nontrivial singular values possibly after relabeling the j. and ak Cn is the k th column of W with the k th entry removed.n Mp. Let H be a subspace of dimension d and H the orthogonal projection onto H.g.n j Mp. We will rely frequently on the following concentration of measure result for projections of random vectors Lemma Distance between a random vector and a subspace.n But uj Mp.n X X with eigenvalue i Mp. But if i a .n n.n j n j Mp.n .n OK log n/n for all j J. b or that the nth coordinate x of ui is OKn/ log n holds with uniformly overwhelming probability. minp. We shall just establish the claim for the right singular vectors ui . We can also assume that K log n on as the claim is trivial otherwise. either i a . we conclude that j Mp. Delocalization The purpose of this section is to establish Theorem and Theorem . j Mp. .n . .n OK log n.n by its transpose to return to the regime p n. the proof is a short application of Talagrands inequality .UNIVERSALITY FOR COVARIANCE MATRICES with overwhelming probability. .n K where M Mp. since i n i . . one would have to replace M p. the lefthand side of is then bounded from below by n H X log K n where H Cp is the span of the vj Mp.n i Mp. By Pythagoras theorem. We x and allow all implied constants to depend on and y. The recent paper also contains arguments and results closely related to those in this section. . Applying Corollary .. .n X . so that the singular vectors ui are well dened up to unit phases. one can nd with uniformly overwhelming probability a set J . or minp. it suces to show that with uniformly overwhelming probability. b . n with J K log n such that j Mp. b . Fix i p. one has H X K log n In the case p n. vj Mp. as the claim for the left singular vectors is similar.. Proof. Deduction of Theorem from Theorem . . As M is continuous. Sections .. . Lemma . The material here is closely analogous to . as well as that of the original results in . .n X log n j Mp.n for j J.n i Mp. and can be read independently of the other sections of the paper. But from Lemma and the fact that X is independent of Mp. In particular.n i Mp. it suces by the union bound and symmetry to show that the event that i falls outside a . we see that the nontrivial singular values are almost surely simple and positive. See . then by Theorem . We begin by showing how Theorem follows from Theorem . one concludes k C. which illustrates the techniques used to prove Theorem . and the claim follows. we see that sz p p nMM which k . . of controlling the eigenvalue counting function NI via the Stieltjes transform sz p p j . b of length I K log n/n. Let the hypotheses be as in Theorem . Ima Wk zI ak k . and ak Cp is the k th column of W with the k th entry removed. j W z NI p Fix I. To prove this proposition. A crude upper bound. kk z a Wk zI ak k where kk is the kk entry of W .. we suppose for contradiction that NI CnI for some large constant C to be chosen later. We follow the standard approach see e. this leads to a contradiction with overwhelming probability. It thus remains to establish Theorem . and also plays an important direct role in that proof Proposition Eigenvalue upper bound. We rst establish a crude upper bound. We will show that for C large enough. and z x Imsz recall that p n so from one has Imsz C. then for any interval I a . I/. Let the hypotheses be as in Theorem . TERENCE TAO AND VAN VU with uniformly overwhelming probability. and NI was dened in . . one has with overwhelming probability uniformly in I that NI nI where I denotes the length of I. Using the crude bound Im z p p Imz and . with W replaced by the p p matrix W only has the nontrivial eigenvalues. If x is the midpoint of I. we see that Applying Lemma .g. Wk is the p p matrix with the k th row and column of W removed. Imak Wk zI ak Mk Mk n where Xk Cn is the adjoint of the k th row of M . Fix k. We conclude that j Mk Xk uj Mk . j Wk x p j a vj Mk k . . Xk uj Mk C jJ . one has p and thus a Wk k zI ak j p a vj Mk k . p of length J Cn such that j Wk I. Thus. but this will not be of concern since all of our claims will hold with overwhelming probability. . a vj Mk k Next. and Mk is the p n matrix formed by removing that row. . . if we let v Mk . note that from and the Cauchy interlacing law Lemma one can nd an interval J . n The advantage of this latter formulation is that the random variables Xk and uj Mk are independent for xed k. up Mk Cn be coupled orthonormal systems of left and right singular vectors of Mk . and let j Wk n j Mk for j p be the associated eigenvectors. and thus n . vp Mk Cp and u Mk . . . . j Wk z Ima Wk k We conclude that zI ak j a vj Mk k . there exists k p such that C. . Note that and ak Wk M k Xk n By the pigeonhole principle. . . j Wk x C The expression a vj Mk can be rewritten much more favorably using as k j Mk Xk uj Mk . n C jJ Since j Wk I.UNIVERSALITY FOR COVARIANCE MATRICES The fact that k varies will cost us a factor of p in our probability estimates. . . one has j Mk n. xz where we use the branch of y z yz with cut at a. where H is the span of the eigenvectors uj Mk for j J.. We now begin the proof of Theorem in earnest. The strategy is to compare s with the MarchenkoPastur Stieltjes transform sMP. for closely related lemmas.y x dx nI . Then for any interval I in a .y y z yzsMP. Lemma . log .y z sz with uniformly overwhelming probability for each z with Rez L and Imz . one has NI n with overwhelming probability. that lemma was phrased for the semicircular distribution rather than the MarchenkoPastur distribution. To put it another way. But from Lemma and we see that this quantity is n with overwhelming probability.y z R MP. b that is asymptotic to y z as z . I MP. but an inspection of the proof shows the proof can be modied without diculty. sMP. strictly speaking. Let / /n. our arguments will not use any quantitative estimates on this continuity. Proof. We have the following standard relation between convergence of Stieltjes transform and convergence of the counting function Lemma Control of Stieltjes transform implies control on ESD. .y z gt . We continue to allow all implied constants to depend on and y. This follows from . It suces by a limiting argument using Lemma to establish the claim under the assumption that the distribution of M is continuous. Reduction to a Stieltjes transform bound. Corollary .y z with ImsMP. Suppose that one has the bound sMP. TERENCE TAO AND VAN VU The lefthand side can be rewritten using Pythagoras theorem as H Xk . See also and . b with I max. This concludes the proof of Proposition . for z in the upper halfplane. giving the desired contradiction with overwhelming property even after taking the union bound in k.y z is the unique solution to the equation sMP. and L.y z yz y z yz yz dx.y x A routine application of and the Cauchy integral formula yields the explicit formula sMP. gt . Details of these computations can also be found in . . one has Yk y o y ozsz with overwhelming probability uniformly in k and I. We will shortly show a similar bound for Yk itself Lemma Concentration of Yk . For this. and thus since the uj are unit vectors p EYk Mk where sk z is the Stieltjes transform of Wk . . inserting the identities . n j Wk z Suppose we condition Mk and thus Wk to be xed.y z o with uniformly overwhelming probability. it suces to show that for each complex number z with a / Rez b / and Imz K log n . For each k p. . we see that to show Theorem . Thus z p p p sk z sz O p p and thus EYk Mk p p zsz O n n n y o y ozsz since p/n y o and / on. n one has sz sMP. we return to the formula .UNIVERSALITY FOR COVARIANCE MATRICES In view of this lemma. one has where Yk j sz p p k kk z Yk p j Mk Xk uj Mk . j j Mk n j Wk z p zsk z n p p j j Wk z From the Cauchy interlacing law Lemma we see that the dierence p sk z sz p p j j W z j j Wk z is bounded in magnitude by O p times the total variation of the function on . the entries of Xk remain independent with mean zero and variance . which is O . y z. cannot both hold for n large enough. we thus claim that either holds for all z in the domain of interest. Thus the continuity argument shows that holds with uniformly overwhelming probability for all z in the region of interest for n large enough. the two solutions of are sMP. b. From one has sMP. one obtains kk sz p p k z y o y ozsz with overwhelming probability. yz since y z yz has zeroes only when z a. By using a n net say yz of possible zs and using the union bound and the fact that sz has a Lipschitz constant of at most On say in the region of interest we may assume that the above trichotomy holds for all z with a/ Rez b/ and Imz n say. one has either or or sz yz sMP. which implies that . which gives and thus Theorem . or there exists a z such that as well as one of or both hold. from we see that yz sMP. .y z from yz is bounded from yz below. cannot both hold for n large enough. thus sz almost solves in some sense.y z yz y z yz .y z and yz yz sMP. sMP.y z yz yz which implies that the separation between sMP. TERENCE TAO AND VAN VU Meanwhile.y z yz sMP. and z is bounded away from these singularities. and so holds in this case.y z o with the convention that yz when y .y z are both o. From the quadratic formula. One concludes that with overwhelming probability. By continuity. Similarly. kk o with overwhelming probability again uniformly in k and I. we see also that . When Imz n . then sz.y z o yz sz yz o yz sz sMP. Inserting these bounds into . we have Xk n and hence by Lemma . k.jn be matrix ensembles obeying condition C for some C . ij match to order . q . .UNIVERSALITY FOR COVARIANCE MATRICES . . ij n/C almost surely for all i. . . . . if c is suciently small depending on c . by truncation and adjusting the moments appropriately. our bounds will not depend on any quantitative measure on the continuity. Thus. M is continuous. . Next. . xk . . . Then for any i lt i lt ik n. and that ij and ij match to order . . xk R. q . . . . . . . We begin by observing from Markovs inequality and the union bound that one has ij . . Theorem Truncated Four Moment Theorem. . and such that G is supported on the region q . by a further approximation argument we may assume that the distribution of M. For each i p. using Lemma to absorb the error. . and so the general case then follows by a limiting argument using Lemma . . whose proof is delayed to the next section. Let M ij ip. Qi M . . whose eigenvalues are M . Qik M EG ni M . we introduce in analogy with the . ij n/C say for all i. nik M . . . . and for n suciently large depending on . . . Let G Rk Rk R be a smooth function obeying the derivative bounds j Gx . Proof of Theorem and Theorem We rst prove Theorem . The key technical step is the following theorem. . The arguments follow those in . . For suciently small c gt and suciently large C gt the following holds for every lt lt and k . . If ij . . . Assume that p/n y for some lt y . Qik M nc . qk nc for all j and x . . . to ensure that the singular values are almost surely simple. one may assume without loss of generality that ij . Qi M . . This is a purely qualitative assumption. j. j with probability On . . and the gradient is in all k variables. c we have EG ni M . . xk . . as well as . nik M . then the conclusion still holds as long as one strengthens to j Gx . . . . . . .jn and M ij ip. . . Given a p n matrix M we form the augmented matrix M dened in . . q . p M . qk nc . qk R. qk njc for some c gt . together with the eigenvalue with multiplicity n p if p lt n. . Clearly is convex. We set Qi M if the singular value i is repeated. The main observation here is the following lemma. . . j M i M i M j j M i M p The factor of n in Qi M is present to align the notation here with that in . qk . and compact support. in which one dilated the matrix by n. variance one. Observe the upper bound Qi M n jpji np . we prove Theorem . To bound the other term in . Accordingly. In order for to match to third order. but this event occurs with probability zero since we are assuming M to be continuously distributed. Let be a complex random variable with mean zero. and third moment bounded by some constant a. w ei z. . It is also invariant under the symmetry z. ei w . Section . Then there exists a complex random variable with support bounded by the ball of radius Oa centered at the origin and in particular. one can now remove the hypothesis in Theorem that G is supported in the region q . i M /n is bounded np away from zero. and that E E and E E . In particular. . and any p i p. E where ranges over complex random variables with mean zero. it suces that have mean zero. . Lemma . . Lemma Matching lemma. TERENCE TAO AND VAN VU arguments in the quantities Qi M n i M i M n jpji np . obeying the exponential decay hypothesis uniformly in for xed a which matches to third order. The gap property on M ensures an upper bound on Qi M Lemma . By applyign a truncation argument exactly as in . variance . . and so ni M O/n. one can now handle the case when G is independent of q . j M i M ni M From Corollary we see that with overwhelming probability. If M satises the gap property.. then for any c gt independent of n. unit variance. one has Qi M nc with high probability. Proof. Next. let C be the set of pairs E . and Theorem follows after making the change of variables n and using the chain rule and Corollary . qk nc . one repeats the proof of . . Proof. assuming both Theorems and . The proof of Theorem It remains to prove Theorem . . . and so again by convexity and rotation invariance z . By Theorem .UNIVERSALITY FOR COVARIANCE MATRICES for phase . by convexity cz. An inspection of the argument shows that if the third moment is bounded by a then the support can also be bounded by Oa . w / whenever z cz and w w. . Qi M . . ei/ w . . . and arbitrarily large third moment.e. nik M . EQi M Onc for some c gt independent of n. for each i. it suces to establish a bound EG ni M . we can nd ij which satises the exponential decay hypothesis and match ij to third order. By telescoping series. The next two sections are devoted to the proofs of Theorem and Theorem . and hence by convexity and rotation invariance . w . Since z. . as is the case here. Qi M . whenever z cz. . Now consider a random matrix M as in Theorem with atom variables ij . variance one.. Remark . we omit the details. It is easy to construct complex random variables with mean zero. By the above lemma. so by Theorem one has EQi M Onc for some c gt independent of n. Qik M nc . Since the third moment is comparable to z w. i. every complex random variable with nite third moment with mean zero and unit variance can be matched to third order by a variable of compact support. ei/ w . only a uniformly bounded C moment for C large enough is required. and let p i p. nik M . . . . Because of this. respectively. . w and . The above trick to remove the exponential decay hypothesis for Theorem also works to remove the same hypothesis in . lies in it also for some absolute constant c gt . compact support. . one can replace all the exponential decay hypotheses in the results of . Let q be a smooth cuto to the region q nc for some c gt independent of n. w whenever w w. Theorem . Section . by a hypothesis of bounded C moment. we thus conclude that contains all of C . and hence by convexity any . Qik M EG ni M . . . w both lie in . Thus. By Lemma . The point is that in the analogue of Theorem in that paper implicit in . if z. . then z. One last application of convexity then gives z /. M satises the gap property. the exponential decay hypothesis is not used anywhere in the argument. We conclude that M also obeys the gap property. . j. Let the notation and assumptions be as in Theorem . There exists a positive constant C independent of k such that the following holds. whenever Pij . . Proposition is trivially true if is comparable to n. TERENCE TAO AND VAN VU under the assumption that the coecients ij . resp. where Mz is the augmented matrix of M z. . say the qr entry for some q p and r n. except for the qr entry. Pij . if C is suciently large depending on cf. We condition i.e. . and that the eigenvalues are given up to unit phases by . . uj M z Note also that the analogue of in . eq . In view of the above proposition. e uij M z n/ . er are a good conguration for i . vij M z Cp are unit right and left singular vectors of M z. Let gt . we see that to conclude the proof of Theorem and thus Theorem it suces to show that for any gt . freeze all the entries of M z to be constant. . M zer / n/ . Proof. dened in . This follows by applying . . Proposition to the p n p n Hermitian matrix Az nMz. . ij of M and M are identical except in one entry. . We apply the following proposition. we have ni M z nij M z n i ij . . Write M z for the matrix M or M with the qr entry replaced by z. then e vij M z. so one can restrict attention to the regime on. Proposition . . Pij . we assume Delocalization at ij If uij M z Cn . r q For every Pij . . which follows from a lengthy argument in Proposition Replacement given a good conguration. . np M z vj M z and . which is z. that M . er are a good conguration for i . . nij Az n n. then we have if ij and ij match to order . ik with overwhelming probability. left singular vectors vi M z corresponding to singular values i Az with i ij lt . we have Singular value separation For any i n with i ij n . eq . ik if the above proper ties hold. or if they match to order and holds. Note that the eigenvalues of Az are n M z. . Also. Assuming this good conguration. We say that M . M zeq . We assume that for every j k and every z n/ whose real and imaginary parts are multiples of nC . is the orthogonal projection to the span of right singular vectors ui M z resp. since the claim then follows by interchanging each of the pn On entries of M into M separately. Corollary and is omitted. This gives the claim. Suppose that p / p lt p and l p/ is such that gi .n N i Mp. It will suce to show that The main tool for this is Lemma Backwards propagation of gap. as does . Using the exponential decay condition. and write N p n. and also x the z n/ whose real and imaginary parts are multiples of nC .. Corollary also gives with overwhelming probability. By a limiting argument we may assume that M has a continuous distribution.p .n backwards in the p variable. we need a slight variant Proposition . we let Mp.n . A similar argument using Pythagoras theorem and Corollary gives with overwhelming probability noting as before that we may restrict attention to the regime on.l. and we dene following the regularized gap N i Mp. The conclusions of Theorem and Theorem continue to hold if one replaces the qr entry of M by a deterministic number z On/O/C . closely following the analogous arguments in . By the union bound again and Proposition . We return to the task of establishing a good conguration with overwhelming probability.UNIVERSALITY FOR COVARIANCE MATRICES Our main tools for this are Theorem and Theorem . we may truncate the ij and renormalise moments. the eigenvalue separation condition holds with overwhelming probability for every i n with i j n if C is suciently large. gi. for any i l lt i p p . so that the singular values are almost surely simple. p instead of p. By the union bound we may x j k. where C gt is a large constant to be chosen later.p nc .p inf . . Actually. until one can use Theorem to show that the gap occurs with small probability. the strategy is to propagate a narrow gap for M Mp . using Lemma to assume that ij logO n almost surely. logC N log gi . N i illtii p mini i .n be the p n matrix formed using the rst p rows of Mp . As in . Proof of Theorem We now prove Theorem .l. More precisely. This is proven exactly as in . We write i instead of i. and Theorem follows.n . / v Large row We have n exp log. . rather than equal.n m/ n log. p Mp. n p are only comparable up to constants. ii Small inner products There exists p/ i i l lt i i /p with i i logC / n such that i jlti Xp uj Mp.n .n be an orthonormal system of right singular vectors of Mp.n / explog. / vi Large inner product near i There exists p/ i /p with i i logC n such that Xp Xp ui Mp.p m / . .n Strictly speaking. but these adjustments make only a negligible change to the proof of that lemma. there are some harmless adjustments by constant factors that need to be made to this lemma. Proof. n .n .l. ultimately coming from the fact that n.n Mp. .n . ni i . up Mp.n which after removing the bottom row and rightmost column which is Xp .n . n.l. . . TERENCE TAO AND VAN VU for some lt which can depend on n.n i i . p.n . n iii Large singular value For some i p one has n exp log. Mp. This follows by applying . n Xp ui Mp. . .n Apn n . i Mp. and that for some m with gi .n associated to Mp.n ni Mp. . . and let u Mp. plus p zeroes yields the p n p n Hermitian matrix Apn n Mp. Let Xp be the pth row of Mp . m/ log.p m gi . n . Then one of the following statements hold i Macroscopic spectral concentration There exists i lt i p with i i logC / n such that ni Mp.n / iv Large inner product in bulk There exists p/ i /p such that exp log. Lemma to the p n p n Hermitian matrix Mp. . v in Lemma all fail with high probability. B.n . . New York University. But Proposition can be proven by repeating the proof of . Linear Algebra Appl. Universality for mathematical and physical systems.n and . The large vj Mp.n is conditioned as in b. iii. Press. Zeitouni. . Beijing . Appl. c Furthermore. B. to appear in Annals of Probability. . Costello and V. Yau. . American Mathematical Society. Matrix Anal. . Guionnet and O. Orthogonal polynomials and random matrices a RiemannHilbert approach. then ii and vi in fact occur with a conditional probability of at most m logC n n . Math. almost verbatim. . Eigenvalues and condition numbers of random matrices. Schlein and HT. and an orthonoruj Mp. International Congress of Mathematicians Vol. no. . A. it then suces to show that Proposition Bad events are rare. K. to appear in Comm. Zrich.n for p/ j /p are of magnitude at most n/ logC n with overwhelming probability. Pure Appl. L.n coecient event in . Science Press. C such that if l C and Mp. Then a The events i. Erds. . Courant Institute of Mathematical Sciences. References G. there is a constant C depending on C . Math. Edelman. P. P. Mathematics Monograph Series . SIAM J. Comm. Vu. L. Proposition respectively. and set n for some suciently small xed gt . An introduction to random matrices.n to be a matrix with this property. to appear in SIAM discrete math. Section . RI. Schlein and HT. New York. Deift. . Erds. I. G. Concentration of random determinants and permanent estimators.UNIVERSALITY FOR COVARIANCE MATRICES which has eigenvalues n Mp. Bai and J. Conditioning Mp. Pch. iv. Suppose that p / p lt p and l p/. Local semicircle law and complete delocalization for o Wigner random matrices. b There is a constant C such that all the coecients of the right singular vectors uj Mp. .n mal eigenbasis that includes the vectors for j p. np Mp. book to be published by Cambridge Univ. Providence. . the events ii and vi occur with a conditional probability of at most m n . Silverstein. . Theorem and . Courant Lecture Notes in Mathematics. Math. Soc. A. Ben Arous and S. the only signicant dierence being that Theorem and Theorem are applied instead of . D. as Apn has zero diagonal. By repeating the arguments in . Yau. Proposition with only cosmetic changes. A. . Phys. Semicircle law on short scales and delocalization of o eigenvectors for Wigner random matrices. Anderson. The distribution and moments of the smallest eigenvalue of a random matrix of Wishart type. Spectral analysis of large dimensional random matrices. Lemma iii cannot occur here. Eur. . Universality of local eigenvalue statistics for some sample covarie e ance matrices. . . Z. Edelman.. Deift. Electron. . Phys. . Soc. Pch. Nuclear Phys. . The distribution of the rst eigenvalue spacing at the hard edge of the Laguerre unitary ensemble. . Wegner estimate and level repulsion for Wigner random o matrices. Statistical Ensembles of Complex. Ramirez. Kyushu J. Pastur. to appear in Comm. Guionnet and O. N. Schlein and HT. . V. The distribution of eigenvalues in certain sets of random matrices. Statist. Comm. J. preprint. M. N. A universality result for the smallest eigenvalues of certain sample covariance matrices. . Nagao. Inst. P. Mehta and M. AMS . Feldheim and S. Correlation functions of random matrix ensembles related to classical orthogonal polynomials. Kluwer Academic Publishers. Japan . . K. Erds. S. A. Gustavsson. Random matrices. A. Universality of the local spacing distribution in certain ensembles of Hermitian Wigner matrices. Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung. Lett. Math. B . J. Academic Press. Ramirez. arXiv. Yau and J. Yin. New York. Katz and P. Rackauskas. Gaudin. TERENCE TAO AND VAN VU L. Math. L. V. Erds. . no. arXiv. A. B. Universality of sinekernel for Wigner matrices o with a small Gaussian perturbation. Ginibre. . . M. . Gtze. Erds. Math. M. Mehta. volume . Yau. Comm. HT. Erds. Johnstone. P. . . L. Nuclear Phys. B. Forrester. Slevin. I. Forrester. Probab. Rider. Yau. American Mathematical Society.L. Rate of convergence in probability to the MarchenkoPastur law. H. P. no. no. F. Surveys. . . Erds. Universality at the edge of the spectrum in Wigner random matrices. J. Forrester. Journal of Mathematical Physics . Marchenko. Bulk universality for Wigner matrices. no. T. The spectrum edge of random matrix ensembles. O. no. . . no. Gaussian uctuations of eigenvalues in the GUE. Bulk universality for Wigner o hermitian matrices with subexponential decay. L. . and HT. Sarnak. no. electronic. RI. A. and Real Matrices. . Statist. Frobenius eigenvalues. V. o arXiv. SFB Universitt o a Bielefeld. Providence. Preprint . J. T. . . Tikhomirov. . K. B. Math. Tikhomirov. Random Matrices and the Statistical Theory of Energy Levels. Phys. Poincar Probab. Quaternion. Soshnikov. . On the distribution of largest principal component. . J. Math. no. . Schlein and HT. J. Ann. Laguerre ensembles of random matrices nonuniversal correlation functions. Vu. The local relaxation ow approach to universality o of the local statistics for random matrices. Approximation theory in the central limit theorem. On the density of eigenvalues of a random matrix. . Russian Math.L. Z. F. The concentration of measure phenomenon. Phys. to appear in Math. J. . Ann. J. J. Phys. J. Spectra of random selfadjoint operators. Math. S. N. Pastur. Paulauskas and A. Ramirez. . . . Witte. A. A. T.. Johansson. . Rate of convergence to the semicircular law. B. Ramirez. Exact results and universal asymptotics in the Laguerre random matrix ensemble. Yau. J. submitted. NY. B. Sodin. Schlein and HT. Ledoux. Math. Tao. . . . Wadati. Gtze. . Diusion at the random matrix hard edge. Nagao. L. Mat. . B. Sb. . .. . Schlein. . American Mathematical Society Colloquium Publications. W. Lindeberg. Universality in the bulk of the spectrum for complex sample covariance matrices. Schlein. o Bernoulli . e e preprint. . Comm. A. M. L. and monodromy. Phys. J. J. Zeitouni. Yau. Res. Concentration of the spectral measure for large matrices. Phys. Mathematical survey and monographs. L. . .rutgers. The Annals of Mathematics . Strong limits of random matrix spectra for sample covariance matrices of independent elements. W. . . Phys. no. . M. Wachter. Rutgers. On the distribution of the roots of certain symmetric matrices. Piscataway. Goldstine. to appear in Annals of Probability. Math. A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. Y. no. Limiting spectral distribution for a class of random matrices. Vu. Soshnikov. no. K. Vu. Phys. . T. Soc. Tao and V. On random matrices singularity and determinant. Random matrices Universality of ESDs and the circular law. Soshnikov. .. Tao and V. .. Vu. Random Structures Algorithms . . Multivariate Anal. Bull.UNIVERSALITY FOR COVARIANCE MATRICES A. T. UCLA. . Los Angeles CA Email address taomath. Wigner. Numerical inverting matrices of high order. Probab.edu Department of Mathematics. Random matrices The distribution of the smallest singular values Univerality at the Hard Edge. Probab. Math. Tao and V. Vu with an appendix by M. J. NJ Email address vanvumath. . . . von Neumann and H. to appear in Comm. Random matrices universality of local eigenvalue statistics up to the edge. Random matrices The local statistics of eigenvalues. Gaussian limit for determinantal random point elds. To appear in GAFA. Vu. Tao and V. Ann. . J. J. A new look at independence. T. . Q.. Ann.edu . T. A.ucla. Krishnapur. Department of Mathematics. . P. . to appear in Acta Math. Yin. Amer. Prob. Statist. Ann. Tao and V. T. Talagrand.