Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gaussian elimination wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Shapley–Folkman lemma wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Jordan normal form wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Matrix multiplication wikipedia , lookup
Matrix calculus wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Notes from Compressed Sensing and Contemporary Sampling Cambridge Part III Mathematical Tripos 2012-2013 Lecturer: Anders Hansen Vivak Patel May 1, 2013 1 Contents I Compressed Sensing 3 1 Sparse Linear Algebra 1.1 Basic Definitions and Notation . . . . 1.2 Spark . . . . . . . . . . . . . . . . . . 1.3 Null Space Property . . . . . . . . . . 1.4 Restricted Isometry Property . . . . . 1.4.1 RIP and Stability . . . . . . . 1.4.2 RIP and Measurement Bounds 1.4.3 RIP and NSP . . . . . . . . . . 1.5 Noise-Free Signal Recovery . . . . . . 1.6 Noisy Signal Recovery . . . . . . . . . 1.7 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Signal Recovery in Random Sampling 2.1 Bernstein Inequalities . . . . . . . . . . . . 2.2 RIP Less Theory of Compressed Sensing . . 2.3 Random Sampling in Bounded Orthonormal 2.4 Sampling for Recovery . . . . . . . . . . . . II Generalised Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 4 5 5 6 7 10 12 13 . . . . . . . . . . Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 14 16 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Part I Compressed Sensing 1 Sparse Linear Algebra 1.1 Basic Definitions and Notation 1. Σk = {x ∈ Rn : kxk0 ≤ k} is the set of vectors in Rn with k or fewer non-zero components 2. k-term Approximation (a) Motivation: sparsity is key in determining x uniquely from an underdetermined system. However, x may not be sparse, but we can approximate it as being sparse: (b) The k-term approximation for a vector x is: σk (x)p = inf kx − x̃kp x̃∈Σk 3. Null Space of A ∈ Rm×n is N (A) = {x ∈ Rn : Ax = 0} 1.2 Spark 1. Definitions and Properties of Spark (a) Definition Definition 1.1. Let A ∈ Rm×n . The spark (A) = inf z6=0 {kzk0 : Az = 0}. Equivalently, the spark (A) is the minimum number of linearly dependent columns of A. (b) If we assume that m < n, which it is for an underdetermined system, spark (A) ∈ [2, m + 1] 2. Recovering at most a unique x given a signal y Theorem 1.1. Let y ∈ Rm . There exists at most one x ∈ Σk such that y = Ax if and only if spark (A) > 2k Proof. The proof relies on the following idea: Suppose x, x0 ∈ Σk and Ax0 = y = Ax =⇒ x − x0 ∈ N (A). Moreover, kx − x0 k0 ≤ 2k. However, if we restrict spark (A) > 2k then no 2k-sparse vector (or more sparse) can be in the null space, hence x0 = x. (a) Let y ∈ Rm . Suppose x ∈ Σk uniquely satisfies y = Ax. suppose spark (A) ≤ 2k. Then Σ2k ∩ N (A) 6= ∅. Let h ∈ Σ2k ∩ N (A). Then, ∃x̃ ∈ Σk such that h = x − x̃ (this is like completing the basis). Then Ah = 0 =⇒ Ax = Ax̃ a contradiction. (b) Suppose spark (A) > 2k Suppose ∃x, x0 ∈ Σk such that y = Ax = Ax0 . Then, x − x0 ∈ N (A) ∩ Σ2k . However, ∀h ∈ N (A), khk0 ≥ spark (A) > 2k ≥ kx − x0 k0 . So this is a contradiction. 3 1.3 Null Space Property 1. Definition of NSP Definition 1.2. A matrix A satisfies the NSP of order k if ∃c ∈ R such that on some active set Λ with |Λ| ≤ k, ∀h ∈ N (A), khΛ k2 ≤ c khΛc k1 √ k 2. Motivation: when a signal is exactly sparse, the previous theorem is sufficient, however, when it is approximatley sparse, we may not have unique solutions. The null space may contain vectors that are very sparse or very compressible (well approximated by sparse vectors). Therefore, to make sure that N (A) is well behaved, the null space property guarantees that: h ∈ N (A), if khΛc k1 = 0 then khΛ k2 = 0 so that h = 0. 3. Instance-Optimality: Given a sensing matrix A and recovery algorithm ∆, the following guarantees the optimal performance of ∆ based on σk (x)1 : c k∆(Ax) − xk2 ≤ √ σk (x)1 k 4. Instance-Optimality implies NSP Theorem 1.2. Let A : Rn → Rm be a sensing matrix and ∆ : Rm → Rn be a recovery algorithm. If (A, ∆) is instance-optimal, then A satisfies the NSP of order 2k Proof. Let h ∈ N (A) and decompose it into x − x0 with x0 ∈ Σk . The result follows: (a) Let h ∈ N (A) and let Λ be the index set of h’s 2k largest components. Let Λ = Λ0 ∪ Λ1 where |Λ0 | = |Λ1 | = k. Set x = hΛ1 + hΛc and x0 = −hΛ0 . Then x − x0 = hΛ1 + hΛc + hΛ0 = h (b) Since kx0 k0 ≤ k, then σk (x0 )1 = 0. By Instance-optimality, k∆(Ax0 ) − x0 k2 ≤ 0 =⇒ ∆(Ax0 ) = x0 Moreover, 0 = Ah = A(x − x0 ) =⇒ Ax = Ax0 (c) Therefore: khΛ k2 ≤ khk2 = kx − x0 k2 = kx − ∆(Ax0 )k2 = kx − ∆(Ax)k c ≤ √ σk (x)1 k 4 (d) Noting that σk (x)1 = inf x̃∈Σk kx − x̃k. By the definition of x, x̃ = hΛ1 , implying that σk (x)1 = khΛc k1 . Therefore: c c0 khΛ k2 ≤ √ σk (x)1 ≤ √ khΛc k1 k 2k 5. Note that if an algorithm satisfies the optimality condition, it satisfies the NSP. So for an approximately sparse signal (i.e. a compressed signal), we have a unique x given by Ax = y 1.4 Restricted Isometry Property 1. Motivation: when a signal has noise or is corrupted, NSP is not strong enough of a condition to guarantee a unique x. The Restrict Isometry Property, on the other hand, is. 2. Definition of RIP Definition 1.3. A matrix A satisfies the RIP of order k if ∃δk ∈ (0, 1) such that ∀x ∈ Σk : 2 2 2 (1 − δk ) kxk2 ≤ kAxk2 ≤ (1 + δk ) kxk2 1.4.1 RIP and Stability 1. Definition of Stability Definition 1.4. Let A ∈ Rm×n and ∆ : Rm → Rn . (A, ∆) is c-stable if for any x ∈ Σk and error e ∈ Rm , k∆(Ax + e) − xk2 ≤ c kek2 2. C-Stability implies part of the RIP Theorem 1.3. Suppose (A, ∆) is c-stable. Then x ∈ Σ2k 1 c kxk2 ≤ kAxk2 for all Proof. We note that we are in Σ2k which can be produced by any two vectors in Σk . Let x, z ∈ Σk . We use these to cleverly define errors: (a) Let x, z ∈ Σk . et ex = A(z−x) and ez = A(x−z) . Then Ax + ex = 2 2 A(x+z) Az + ez = . Let the recovered value be ∆(Ax + ex ) = ∆(Az + 2 e z ) = y ∈ Rn . (b) Then 2 kx − zk2 ≤ kx − yk2 + ky − zk2 ≤ c kex k2 + c kez k2 = c kA(x − z)k2 5 1.4.2 RIP and Measurement Bounds 1. Motivation: We can recover exactly k sparse vectors if we have that spark (A) ≥ 2k, where m + 1 ≥ spark (A). Given noise in the signal, we will typically need more measurements. 2. RIP implies number of measurements m needed: Theorem 1.4. Let A ∈ Rm×n satisfy the √ of order RIP 2k with δ2k ∈ (0, 1/2]. Then m ≥ c log(n/k) and c = 21 log( 24 + 1) ≈ 0.28 3. Important Lemma used to prove Theorem 1.4 Lemma 1.1. Suppose k and n satisfy k < n/2. Then ∃X ⊂ Σk such that ∀x, z ∈ X with x 6= z √ kxk2 ≤ k r k kx − zk2 ≥ 2 n k log |X| ≥ log 2 k Proof. We first create a set satisfying the first requirement. Then we pick a subset satisfying the second requirement. However, this limits the size of the subset, resulting in the third requirement. (a) Let U = {x ∈ {0, −1, 1}n ⊂ Rn : kxk0 = k}. Then |U | = nk 2k and √ kxk2 = k. This satisfies the first requirement. 2 (b) From this set, fix x ∈ U . Since kx − zk0 ≤ kx − zk2 , if we bound 2 kx − zk2 ≤ k/2 then: 2 |{x ∈ U : kx − zk2 ≤ k/2}| ≤ |{z ∈ U : kx − zk0 ≤ k/2}| n 2k/2 ≤ k/2 n ≤ 3k/2 k/2 (c) If we remove the above set from U we are left with only nk 2k − n k/2 points to choose from. Suppose we choose j points for k/2 2 2 which kxr − xp k2 ≥ k/2 for r 6= q and 1 ≤ r, q ≤ j. Then we have k/2 n at least nk 2k − j k/2 3 points to choose from. If we find the maximum value of j possible we will have that k/2 n 4 k |X| ≤ n 3 k/2 Proof of Theorem 1.4. We want to establish a (loose) lower bound on m for sparse signals in Σ2k . Again we can decompose this space as two vectors from Σk . We use the special subset X ⊂ Σk of the lemma, which has nice properties to get the lower bound. 6 (a) Let x, z ∈ X. By RIP, assumptions on δ2k , and Lemma 1.1: r 1 k kA(x − z)k2 ≥ kx − zk2 ≥ 2 4 (b) Let x ∈ X. By the RIP and the lemma: r 3k kAxk2 ≤ 2 q (c) Thus, all Ax for x ∈ X are a distance k4 , and that all Ax are within q a distance 3k 2 from the origin. Therefore, for any z ∈ X h p i h p i p vol BO 3k/2 + k/16 ≥ |X| vol BAz k/16 m m p p p 3k/2 + k/16 k/16 ≥ |X| √ m 24 + 1 ≥ |X| 1 √ log 24 + 1 n 1 √ k log m≥ k 2 log 24 + 1 m ≥ log(|X|) 1.4.3 RIP and NSP 1. Satisfying the RIP implies satisfying the NSP Theorem 1.5. Suppose A satisfies the RIP of order 2k with δ2k ≤ Then A satisfies the NSP of order 2k with c = 1−(1+2√2δ ) √ 2−1. 2k Proof. We use the Lemmas below to prove this result. (a) By RIP and Lemma 1.6. Let h ∈ N (A). Let Λ be the 2k largest components of h. Let Λ0 be the k largest components of h and Λ1 be the next k largest so that Λ = Λ0 ∪ Λ1 . i. Noting that Ah = 0 and since we satisfy the conditions of 1.6: hΛc α 0 1 khΛ k2 ≤ α √ = √ (khΛc k1 + khΛ1 k1 ) k k ii. By Lemma 1.2, √ √ α α khΛ k2 ≤ √ khΛc k1 + k khΛ1 k2 ≤ √ khΛc k1 + k khk2 k k iii. Rearranging and noting that 1 − α > 0: √ 2α khΛc k1 √ khΛ k2 ≤ 1−α 2k 7 2. Important Lemmas (a) Relation between essential norms Lemma 1.2. Suppose u ∈ Σk then kuk1 ≤ √ k kuk2 ≤ k kuk∞ Proof. By Cauchy-Schwarz and noting that kuk1 = hu, sgn (u)i and 2 u2i ≤ kuk∞ : kuk1 = hu, sgn (u)i ≤ ksgn (u)k2 kuk2 √ ≤ k kuk2 n X 2 kuk2 = u2i i=1 ≤ k X 2 kuk∞ i=1 2 = k kuk∞ (b) Perpendicular vectors Lemma 1.3. Suppose u ⊥ v. Then kuk2 + kvk2 ≤ √ 2 ku + vk2 kuk2 Proof. Using the inequality from 1.2 on the vector: A = . kvk2 √ √ We have that kAk1 ≤ 2 kAk2 = 2 ku + vk2 (c) RIP and bounding the Inner Product Lemma 1.4. Suppose A satisfies the RIP of order 2k, then for any pair of vectors u, v ∈ Σk with disjoint support, |hAu, Avi| ≤ δ2k kuk2 kvk2 Proof. Let u0 , v 0 ∈ Σk with disjoint support. Let u = u0 / ku0 k2 and v = v 0 / kv 0 k2 so that kuk2 = kvk2 = 1. The result follows from the parallelogram law and the RIP: i. By the Parallelogram Law: i 1h 2 2 kAu + Avk2 − kAu − Avk2 4 i 1h 2 2 ≤ (1 + δ2k ) ku + vk2 − (1 − δ2k ) ku − vk2 4 |hAu, Avi| = 2 2 ii. Since u ⊥ v, ku + vk2 = ku − vk2 . Therefore: |hAu, Avi| ≤ i δ2k δ2k h 2 2 = kuk2 + kvk2 = δ2k 4 2 8 (d) k-sparse partitioning of vectors Lemma 1.5. Let Λ0 ⊂ {1, . . . , n} such that |Λ0 | ≤ k. For u ∈ Rn , let Λ1 be the k largest index set in hΛc0 . Define Λ2 to be the next largest, and so on. Then: X √ uΛj ≤ uΛc / k 0 1 j≥2 Proof. Notice that for j ≥ 2 that the largest component of hΛj is smaller than the smallest component of hΛj−1 and the average of the components in modulus of hΛj . Therefore: i. hΛ ≤ 1 hΛ for j ≥ 2. j ∞ k j−1 1 ii. So for j ≥ 2 and by 1.2: X√ X hΛj ≤ k hΛj ∞ 2 j≥2 j≥2 X 1 √ hΛj−1 1 ≤ k j≥2 1 = √ hΛc0 1 k (e) RIP implies less strict NSP Lemma 1.6. Let Λ0 ⊂ {1, . . . , n} such that |Λ0 | ≤ k. Then for fixed h let Λ1 be the next k largest index set in hΛc0 . Let Λ = Λ0 ∪ Λ1 . Suppose A satisfies the RIP of order 2k. then: hΛc |hAhΛ , Ahi| 0 1 +β khΛ k2 ≤ α √ khΛ k2 k P Proof. Note that hΛ ∈ Σ2k , and hΛ = h − j≥2 hΛj , by the RIP, and 9 Lemmas 1.4, 1.3, and 1.5: 2 2 (1 − δ2k ) khΛ k2 ≤ kAhΛ k2 = hAhΛ , Ahi − X hAhΛ , AhΛj i j≥2 X ≤ |hAhΛ , Ahi| + hAhΛ , AhΛj i j≥2 X = |hAhΛ , Ahi| + hAhΛ1 , AhΛj i + hAhΛ0 , AhΛj i j≥2 X (khΛ1 k2 + khΛ0 k2 ) δ2k hΛj 2 ≤ |hAhΛ , Ahi| + j≥2 X √ hΛj ≤ |hAhΛ , Ahi| + 2 khΛ k δ2k 2 2 j≥2 ≤ |hAhΛ , Ahi| + √ hΛc 0 1 2 khΛ k2 δ2k √ k Rearranging, we have: √ 2δ2k hΛc0 1 1 |hAhΛ , Ahi| √ khΛ k2 ≤ + 1 − δ2k 1 − δ2k khΛ k2 k 1.5 Noise-Free Signal Recovery 1. Spark, NSP and RIP are properties of a matrix that determine the number of samples needed to reasonably recover a sparse signal with noise 2. Motivation: to actually compute the signal, we use L1 minimisation, which can be easily computed. We need to determine an upper bound in the error for the recovered signal x̂ and the true signal x. So we explore the value of kx − x̂k2 in the noise free context 3. The L1 minimisation algorithm for a sensing matrix which satisfies the RIP is instance-optimal √ Theorem 1.6. Suppose A satisfies the RIP of order 2k with δ2k < 2 − 1. Suppose we obtain measurements of the form y = Ax. Let x̂ ∈ C0 arg min kzk1 : Az = y. Then kx̂ − xk2 ≤ √ σ (x)1 k k Proof. By assumption kx̂k1 ≤ kxk1 . Applying Lemma 1.7: C0 C1 kx̂ − xk2 ≤ √ σk (x)1 + |hA(x̂ − x)Λ , A(x̂ − x)i| khΛ k2 k Noting that Ax = Ax̂ = y, then A(x̂ − x) = 0. The result follows. 10 4. The following lemma is similar to Lemma 1.6, and is essential to proving the above result. Lemma 1.7. Suppose A satisfies the RIP of order 2k. Let Λ0 be the set of the index set for k largest components of x, and Λ1 be the index set of the next k-largest components of hΛc0 where h = x̂ − x for x, x̂ ∈ Rn . Let Λ = Λ0 ∪ Λ1 . If kx̂k1 ≤ kxk1 then: C1 C0 |hAhΛ , Ahi| khk2 ≤ √ σk (x)1 + khΛ k2 k Proof. Using the triangle inequality, khk2 ≤ khΛc k2 + khΛ k2 . We bound each term individually: (a) By Lemma 1.5, we have that khΛc k2 ≤ X hΛj ≤ √1 hΛc 0 1 k j≥2 Therefore, we will use σk (x)1 = kx − xΛ0 k1 = xΛc0 to upper bound hΛc . 0 1 kxk1 ≥ kx̂k1 = kx + hk1 ≥ kxΛ0 k1 − khΛ0 k1 + hΛc0 1 − xΛc0 1 Rearranging, and using Lemma 1.2: hΛc ≤ kxk − kxΛ k + khΛ k + xΛc 0 1 0 1 1 0 1 0 1 ≤ kx − xΛ0 k + khΛ0 k + xΛc 1 1 0 1 ≤ 2σk (x)1 + khΛ0 k1 √ ≤ 2σk (x)1 + k khΛ0 k2 Therefore: 2 khΛc k2 ≤ khΛ k2 + √ σk (x)1 k (b) By Lemma 1.6 and the previous part of this proof: |hAhΛ , Ahi| α khΛ k2 ≤ √ hΛc0 1 + β khΛ k2 k 2α |hAhΛ , Ahi| ≤ √ σk (x)1 + α khΛ k2 + β khΛ k2 k β |hAhΛ , Ahi| 2α σk (x)1 + ≤√ 1 − α khΛ k2 k(1 − α) 11 (c) Combining the two results: σk (x)1 khk2 ≤ 2 khΛ k2 + 2 √ k 4α 2β |hAhΛ , Ahi| σk (x)1 √ ≤ + +2 1−α 1 − α khΛ k2 k 2β |hAhΛ , Ahi| 1 + α σk (x)1 √ + =2 1−α 1 − α khΛ k2 k 5. Remarks (a) If x is exactly k-sparse, kx̂ − xk2 = 0 since σk (x)1 = 0. Therefore, we would recover x exactly. (b) Since we are not dealing with noise, we can simplify Lemma 1.7 and the theorem by assuming an NSP of order 2k. 1.6 Noisy Signal Recovery 1. Motivation: More often than note, y = Ax + e is our sample where e is some sort of corruption, in which case, we want to estimate z to satisfy kAx − yk2 ≤ kek2 . If kek2 is bounded, we can bound the error between the L1 estimator and the signal. 2. Error bound between the L1 estimator and the signal m×n Theorem satisfies the RIP of order 2k with √ 1.7. Suppose A ∈ R δ2k < 2 − 1. Let y = Ax + e, where kek2 < . Let x̂ ∈ arg min kzk1 : kAx − yk2 ≤ . Then: C0 kx̂ − xk2 ≤ √ σk (x)1 + C2 k Proof. Let h = x̂ − x. Let Λ0 be the index set of the k-largest components of x, and Λ1 be the index set for the k largest components of hΛc0 . Let Λ = Λ0 ∪ Λ1 . (a) By assumption, kx̂k1 ≤ kxk1 . Applying Lemma 1.7: C0 |hAhΛ , Ahi| khk2 ≤ √ σk (x)1 + C1 khΛ k2 k (b) By Cauchy-Schwarz: |hAhΛ , Ahi| ≤ kAhΛ k2 kAhk2 . By RIP, p kAhΛ k2 ≤ 1 + δ2k khΛ k2 (c) By triangle-inequality, definition of the estimator, and our assumption on the error: kAhk2 ≤ kAx − yk + kAx̂ − yk ≤ 2 kek2 ≤ 2 12 (d) Therefore, p C0 kx̂ − xk2 ≤ √ σk (x)1 + C1 2 1 + δ2k k 1.7 Coherence 1. Motivation: Spark, NSP, and RIP all provide guarantees for recovering k-sparse signals, but verifying these properties is difficult. The property of coherence is easily computed and can be related to spark, NSP and RIP under certain conditions. 2. Note: we will be using two definitions of coherence. One is presented below. The other is the maximal modulus element of the matrix. 3. Definition: Definition 1.5. The coherence of A with columns ai is: µ(A) = max 1≤i<j≤n |hai , aj i| kai k2 kaj k2 4. Important Properties (without proof): Theorem 1.8. Let µ(A) be the coherence defined above of a matrix A. 1 (a) For any matrix A, spark (A) ≥ 1 + µ(A) 1 (b) If k < 21 1 + µ(A) then for all y ∈ Rm , there exists at most 1 x ∈ Σk such that y = Ax. (c) If A has unit norm columns and coherence µ(A), then A satisfies the 1 RIP of order k with δk = (k − 1)µ(A) for all k < µ(A) . 2 Signal Recovery in Random Sampling 2.1 Bernstein Inequalities 1. Bernstein Inequality Theorem 2.1. Suppose Y1 , . . .h, Yn are i independent zero-mean random 2 variables and |Yj | ≤ R a.s. If E |Yj | ≤ σj2 then for all t > 0: hX i P Yj ≥ t ≤ 2 exp 2. Vector Bernstein Inequality 13 −t2 /2 P 2 1 σj + 3 RT ! Theorem 2.2. Let Y1 , . . . , Yn ∈ Cd be independent izero-mean random P h 2 2 vectors such that kYj k2 ≤ R a.s. Let σ ≥ E kYj k2 . Then for 0 < t: i hX P Yj ≥ σ + t ≤ exp 2 −t2 /2 σ 2 + (6σ + t)R/3 3. Square Matrix Bernstein Inequality Theorem 2.3. Let Y1 , . . . , Yn ∈ Cd×d are self-adjoint (equal to its conjugate transposed), zero-mean random matrices. Suppose P the largest eigenvalue of Yj satisfies λmax (Yj ) ≤ R a.s.. Let σ 2 = k V [Yj ]k2→2 . Then h i X −t2 /2 P λmax ( Yj ) ≥ t ≤ d exp σ 2 + Rt/3 4. Rectangular Matrix Bernstein Inequality Theorem 2.4. P Let Y1 , . . . , Yn P ∈ Cd1 ×d2 such that kYl k2→2 ≤ R for all l. 2 ∗ Let σ = max {k E [Yl Yl ]k , k E [Yl∗ Yl ]k}. Then for t > 0: hX i −t2 /2 Yl P ≥ t ≤ 2(d1 + d2 ) exp σ 2 + Rt/3 2→2 2.2 RIP Less Theory of Compressed Sensing 1. Motivation: in some situations, the RIP theory of compressive sensing requires an impossible number of measurements to be taken. The RIPless theory holds in a probabilistic setting, allowing us to circumvent these issues. Even in the RIPless setting, we can still construct sensing matrices A, and must determine when unique solutions can be recovered. 2. Notation ( a/ |a| if a 6= 0 0 if a = 0 sgn (x1 ) .. (b) For x ∈ Cn , let sgn (x) = . (a) For a ∈ C, let sgn (a) = sgn (xn ) (c) For S ⊂ {1, . . . , n} and A ∈ Cm×n , AS = APS , which is a restriction to columns in S. 3. Inexact Dual Certificate: Properties of a sensing matrix A which ensure a unique L1 minimiser. Theorem 2.5. Let A ∈ Cm×n with columns al for l ∈ {1, . . . , n}. let x ∈ Cn with supp(x) = S (i.e. the non-zero components of x are indexed by S). Let α, β, γ, θ > 0 such that: (a) Firstly: −1 ∗ As As |Range(PS ) 2→2 14 ≤α maxc kA∗s al k2 ≤ β l∈S (b) Secondly, suppose ∃u ∈ Cn such that u = A∗ h for h ∈ Cm where kus − sgn (xs )k2 ≤ γ kuS c k ≤ θ If θ + αβγ < 1 then x is the unique minimiser of min kzk1 : Ax = Az Proof. Let x̂ = arg min kzk1 : Ax = Az. Let v = x̂ − x. We want to show that v = 0 (a) We do this by showing kx̂k1 ≥ kxk1 + Constant × kvS c k1 which implies that kvs k1 = 0 since, by construction, kx̂k1 ≤ kxk1 (b) Note that using the definition of sgn (a + bi) and using generic vectors a + bi and c + di we can show the last line: kx̂k1 = kx + vk1 = kx + vS k1 + kvS c k1 = hsgn (x + vS ) , x + vS i + kvS c k1 ≥ kxk1 − Re (hsgn (x) , vS i) + kvS c k1 (c) Now we want to upper bound Re (hsgn (x) , vS i). We use us to do so. Note that: huS , vS i = hu, vi − huS , vS c i − huS c , vS i − huS c , vS c i = hu, vi − huS c , vS c i = hA∗ h, vi − huS c , vS c i = hh, Avi − huS c , vS c i = −huS c , vS c i Therefore, by Cauchy-Schwarz: |hsgn (x) , vS i| = |hsgn (x) − uS , vS i + huS , vS i| ≤ |hsgn (x) − uS , vS i| + |huS c , vS c i| ≤ ksgn (x) − uS k2 kvS k2 + kuS c k∞ kvS c k1 ≤ γ kvS k2 + θ kvS c k1 So we have: kx̂k1 ≥ kxk1 − γ kvS k2 + (1 − θ) kvS c k1 (d) Now we want to get kvS k2 in terms of kvS c k1 . Noting that Av = 0 =⇒ AS vS + AS c vS c = 0, −1 kvS k2 = (A∗S AS ) [A∗S (AS c vS c )] ≤ α kA∗S AS c vS c k2 2 Moreover, kA∗S AS c vS c k2 X ∗ = AS al vl c l∈S 2 X ∗ ≤ |vl | kAS al k2 l∈S c ≤β X |vl | l∈S c = β kvS c k1 15 (e) Therefore, we have that kx̂k ≥ kxk1 + (1 − θ − αβγ) kvS c k1 . By assumption, 1 > θ + αβγ. So vS c = 0 which implies that vS = 0 since AS vS = −AS c vS c = 0. So v = 0. And we recover x exactly. 2.3 Random Sampling in Bounded Orthonormal Systems 1. Motivation: A random sensing matrix confers unique advantages to reconstructing signals. However, structured random matrices, those generated by a random choice of parameters, have many computational advantages over completely unstructured random matrices 2. Definition of Bounded Orthonormal System (BOS) Definition 2.1. Suppose D ⊆ Rd with a probability measure ν. Let Φ = {φ1 , . . . , φn } be an orthonormal system of complex valued functions R on D with respect to ν (i.e. D φi (t)φi (t)dν(t) = δij ). Φ is a Bounded Orthonormal System if ∃K > 0 such that kφi k∞ ≤ K for i = 1, . . . , n Pn 3. Goal: consider a function f (t) = k=1 xk φk (t). Suppose t1 , . . . , tm ∈ D are randomly sampled points with measurements yi = f (ti ). Then we T have measurements y = ( y1 · · · ym ) , sensing matrix Aij = φj (ti ), and unknown parameters x = ( x1 · · · xn )T . By determining x we can construct the function f . 2.4 Sampling for Recovery 1. Motivation: Ultimately, we want a lower bound guarantee on the number of sampled points m that allows us to recover x with some probability 2. Applying the Bernstein Inequality Lemma 2.1. Let A ∈ Cm×n be a random sampling matrix with respect to a BOS. Let à = √1m A. Let v ∈ Cn with supp(v) = S, and |S| = s. Then for t > 0 ! i h −m t2 ∗ ps P ÃS c Ãv > t kvk2 ≤ 4n exp 4K 2 1 + 18 ∞ t Proof. We will be applying the Bernstein Inequality to prove this solution. So we need to find a suitable RV and prove it has 0 mean, are bounded and have bounded variance. (a) Note that Ã∗S c Ãv = maxk∈S c hek , Ã∗ Avi. Let Xl = {φj (tl )}nj=1 . ∞ Then Xl are the columns of A∗ . Without loss of generality, let kvk2 = 1 and let Yl = hek , Xl Xl ∗ vi (b) We compute the expectation of Y , noting that h i h i E (Xl Xl∗ )ij = E φi (tl )φj (tl ) = δij Implies, since k ∈ S c and supp(v) = S: E [Yl ] = hek , E [Xl Xl∗ ] vi = hek , Ivi = 0 16 (c) We now bound Yl : |Yl | = |hek , Xl Xl∗ vi| = |hek , Xl i| |hXl |k∈S , vi| = |φk (tl )| |hXl |k∈S , vi| ≤ K kXl |k∈S k2 kvk2 = K kXl |k∈S k2 sX 2 =K |φk (tl )| s∈K ≤ K2 sX 1 s∈K √ = K2 s (d) We now bound the variance: E [Yl Yl∗ ] = E [hek , Xl ihXl , vihv, Xl ihXl , ek i] h i 2 = E |φk (tl )| v ∗ Xl Xl∗ v ≤ K 2 v ∗ E [Xl Xl∗ ] v = K 2 v∗ v = K2 (e) Before we apply the Bernstein Inequality, we note that for any z ∈ C 2 |z| = Re(z)2 + Im(z)2 ≤ 2Re(z)2 ∧ 2Im(z)2 (f) In light of the previous fact, and applying Bernstein’s Inequality: " # m i h X 1 t P hek , Ã∗ Ãvi > t ≤ P Re(Yl ) > √ m 2 l=1 " # m 1 X t +P Im(Yl ) > √ m 2 l=1 2 2 −m t /4 √ ≤ 4 exp √ mK 2 + sK 2 mt/ 18 ! t2 −m ps = 4 exp 4K 2 1 + 18 t (g) Noting that P [maxk∈S c Zl > t] ≤ variable Zl , the result follows. Pn 3. Applying the Matrix Bernstein Inequality 17 k=1 P [Zl > t] for some random Lemma 2.2. Let A ∈ Cm×n random matrix corresponding to a BOS with K ≥ 1. Let S ⊆ {1, . . . , n} and |S| = s. Then for δ ∈ (0, 1), à = √1m A, and I = PS : h i −3mδ 2 ∗ P ÃS ÃS − I > δ ≤ 2s exp 8K 2 s 2→2 Proof. We want to use matrix Bernstein inequality for which we need Yl independent, self-adjoin, zero-mean, bounded, and bounded summed covariance in spectral norm. (a) Let Xl be the columns of A∗S which are independent since tl are independent. Let Yl = Xl Xl∗ − I = Yl∗ . So Yl are independent, self-adjoint, and: m 1 X Yl Ã∗S Ã∗S − I = m l=1 (b) E [Yl ] = E [Xl Xl∗ − I] = 0 (c) Now we bound Yl : kYl k2→2 = kXl Xl∗ − Ik2→2 2 = max v ∗ (Xl Xl∗ )v − kvk2 kvk2 =1 2 = max (hXl , vi) − 1 kvk2 =1 2 2 ≤ max kXl k2 kvk2 − 1 kvk2 =1 2 = kXl k2 − 1 2 X ≤ φk (tl ) k∈S 2 ≤K s (d) Noting that the covariance of Yl has the following property (in the usual sense): V [Yl ] =E [Yl Yl∗ ] h i 2 =E (Xl Xl∗ − I) =E [Xl (Xl∗ Xl )Xl∗ − 2Xl Xl∗ + I] h i 2 =E kXl k2 Xl XL∗ − I ≤K 2 sI − I ≤K 2 sI Then: m X V [Xl ] l=1 ≤ mK 2 sI 2→2 = mK 2 s 2→2 18 (e) Applying the Matrix Bernstein Inequality: # " m X i h ∗ > mδ > δ =P Yl P ÃS ÃS − I 2→2 l=1 2→2 −δ 2 m/2 ≤2s exp K 2 s(1 + δ/3) 2 −δ m 3 ≤2s exp K 2 s 6 + 2δ −3δ 2 m ≤2s exp 8K 2 s 4. Application of Vector Bernstein Inequality Lemma 2.3. Let S ⊆ {1, . . . , n} with |S| = s. Let v ∈ Cs and kvk2 = 1. Then for t > 0: " # ! r K 2s −mt2 1 ∗ p P ÃS ÃS − I v ≥ + t ≤ exp m 2K 2 s 1 + 2 K 2 s/m + t/3 2 Proof. Here we want to apply the vector Bernstein inequality, which requires a random hvector with mean zero, bounded in norm, and with i 2 2 σ = supkzk2 =1 E |hY, zi| bounded. (a) Let Xl∗ be a column vector of A∗S . Let Yl = (Xl Xl∗ − I)v. Then: m 1 X Ã∗S ÃS − I v = Yl m l=1 (b) Zero Mean: E [Yl ] = E [Xl Xl∗ − I] v = (I − I)v = 0 (c) Bounded in Norm: kYl k2 = k(Xl Xl∗ − I)vk ≤ K 2 s (d) Bounded σ 2 : h i 2 σ 2 ≤m max E |hYl , zi| kzk2 =1 i h 2 2 ≤m max E kYl k2 kzk2 kzk2 =1 h i 2 =mv ∗ E kXl k2 Xl Xl∗ − 2Xl Xl∗ + I v ≤mv ∗ (K 2 s − 1)v ≤mK 2 s 19 (e) Therefore, by the Vector Bernstein Inequality: # " r K 2s ∗ P ÃS ÃS − I v ≥ + t m 2 hX i √ =P Yl ≥ mK 2 s + mt 2 ≤ exp = exp −m2 t2 /2 √ msK 2 + (6 msK 2 + mt)sK 2 /3 ! 1 −mt2 p 2K 2 s 1 + 2 K 2 s/m + t/3 ! 5. Application of Rectangular Matrix Bernstein Inequality √ Lemma 2.4. For 0 < t ≤ 2 s, and ãj the j th column of à = √1 A: m −3 mt2 ∗ P maxc ÃS ãj ≥ t ≤ 2(s + 1)n exp j∈S 10 K 2 s 2 Proof. We want to apply the rectangular matrix Bernstein for s × 1 matrices. We need to find Yl with mean zero, bounded and bounded σ 2 (a) Let Xl be the column vector of A∗S . Let Yl = Xl φj (tl ) for j ∈ S c fixed. (b) Zero-mean, since k ∈ S and j ∈ S c : h i E [Yl ]k = E φk (tl )φj (tl ) = 0 (c) Bounded in Norm √ kYl k2 = kXl φj (tl )k2 ≤ |φj (tl )| kXl k2 ≤ K × K s Pm Pm (d) Bounded σ 2 = max k l=1 E [Yl Yl∗ ]k2→2 , k l=1 E [Yl∗ Yl ]k . For the first term: h i 2 E [Yl Yl∗ ] = E |φj (tl )| Xl Xl∗ ≤ K 2 I For the second term, since the system is orthonormal: h i 2 E [Yl∗ Yl ] = E |φj (tl )| Xl∗ Xl i X h ≤ K2 E φk (tl )φk (tl ) k∈S =K 2 X k∈S = K 2s Therefore, σ 2 ≤ mK 2 s. 20 1 √ (e) We apply the Rectangular Matrix Bernstein for t ≤ 2 s " # X 1 i h ∗ P ÃS ãj ≥ t = P Yl ≥ t m 2 l∈S 2 −m2 t2 /2 √ ≤ 2(s + 1) exp σ 2 + K 2 smt/3 −3mt2 ≤ 2(s + 1) exp 6K 2 s + 4K 2 s Therefore:, the result follows from: X n i h P maxc Ã∗S ãj ≥ t ≤ P Ã∗S ãj ≥ t j∈S 2 2 j=1 6. Probably of Satisfying the Inexact Dual Certificate for a BOS: Theorem 2.6. Let x ∈ Cn be an s-sparse vector. Choose A ∈ Cm×n random matrix corresponding to a BOS with K ≥ 1. If m ≥ CK 2 s 2 log(4N ) log(12−1 ) + log(s) log(12−1 log(s)) for some constant C > 0 then x is the unique minimiser for kzk1 : Ax = Az with probability of at least 1 − . Proof. We want to satisfy the conditions of the inexact dual certificate, which allow us to recover a sparse solution. These conditions are that for some α, β, γ, θ: 1 For al the lth column of A: ∗ −1 ≤α (AS AS ) maxc kA∗S al k2 ≤ β l∈S 2→2 ∗ n 2 There exists a u ∈ C for which u = A h for some h ∈ Cm such that: kuS − sgn (xS )k2 ≤ γ kuS c k ≤ θ The proof depends on the Golfing Framework, which we set up first. Then we show that property 2 holds with some probability, from which we can show property 1 will hold as well. PL (1) (a) Golfing Scheme Construction. Let m = ∈ k=1 mk . Let A 1 m1 ×N (L) mL ×N C ,...,A ∈ C . Therefore, letting à = √m A, where A is: (1) A .. A = . ∈ Cm×N A(L) We select u in a recursive way hoping it will satisfy 2. Letting u(0) = 0 ∈ CN , define the sequence: 1 (n) ∗ (n) (n−1) ) + u(n−1) ∈ CN u(n) = A AS (sgn (xS ) − uS mn 21 i. By construction, u(L) = A∗ h for h ∈ Cm (n) ii. Defining w(n) = sgn (xS ) − uS : 1 (n) ∗ (n) w(n) = I − AS AS w(n−1) mn n ∗ Y 1 (j) (j) AS AS sgn (xS ) w(n) = I− m j j=1 u(n) = n X 1 (i) (i) (i−1) AS w A mi i=1 (b) Demonstrating property 2. i. Notice, we want to apply Lemma 2.3 for some choice of rn > 0: (n) uS − sgn ((xS )) = w(n) 2 2 1 (n) ∗ (n) (n) ≤ w AS AS − I 2 mn 2→2 hp i K 2 s/mn + rn ≤ w(n) 2 i Y hp (L) K 2 s/mn + rn uS − sgn ((xS )) ≤ ksgn (xS )k2 2 i √ Y hp 2 K s/mn + rn ≤ s Therefore by Lemma 2.3, the first inequality has probability of failure: ! 1 −mn rn2 p p1 (n) ≤ exp 2K 2 s 1 + 2 K 2 s/mn + rn /3 So the overall probability of failure is going to be L X p1 (n) n=1 ii. Now to get the second part of property 2, we want to use Lemma 2.1. Therefore, for some choice of tn : (L) uS c ∞ L X 1 (n) ∗ (n) (n−1) ≤ AS w mn AS c ∞ i=1 ≤ L X tn w(n−1) 2 n=1 ≤ L X i Y hp √ n−1 K 2 s/mn0 + rn0 tn s n=1 22 n0 =1 By Lemma 2.1, we fail at bounding each term with a probability of: ! 1 −mn t2n p p2 (n) ≤ 4N exp 4K 2 1 + s/18tn We fail at abounding the entire sum by: n X p2 (n) L=1 iii. We now choose mn , rn , tn , L, C: A. m1 = m2 ≥ CK 2 s log(4N ) log(2−1 ) and mn ≥ CK 2 s log(2L−1 ) 1 1 B. r1 = r2 = 2e√log and rn = 2e 4N C. t1 = t2 = 1 √ e s and tn = log√4N e s D. L = dlog(s)/2e + 2 h i E. C = 8e2 1 + 2e−1 √18 + 61 iv. These choices imply the following results: A. The choice of mn and rn implies: p 1 K 2 s/m1 + r1 ≤ √ e log 4N p K 2 s/mn + rn ≤ 1 e P B. kuS − sgn (xS )k2 ≤ e−2 for which failure is p1 (n) ≤ 2 P 1 C. kuS c k∞ ≤ −1 for which failure is p2 (n) ≤ 2 D. So 2 holds with probability 1 − 4 (c) Demonstrating property 1: ≤ 21 with probability of faili. By Lemma 2.2, Ã∗S ÃS − I 2→2 ∗ −1 2 −3m <m holds with ure 2s exp 32K 2 s . Therefore, (AS AS ) 2→2 the same probability, which we want to be bounded by . This requires that: m > 32/3K 2 s log(2s−1 ) which can be satisfied by the value of m selected to demonstrate 2. ii. Note that α = 2/m, γ = e−2 , θ = (e−1)−1 . To satisfy θ+αβγ < 1, we require β/m < 1.554 so we choose β/m < 3/2. By Lemma 2.4 maxc Ã∗S ãl < β/m l∈S 2 with a probability of failure of at most −27m 2 2N exp 40K 2 s We can make this probability less than with the selection of m above. 23 (d) Overall, α, β, γ, θ behave with a probability of 1 − 4 + 1 − + 1 − for the appropriate selection of m. Replacing 6 by gives us the desired result. Part II Generalised Sampling 24