Download Compressed Sensing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gaussian elimination wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Shapley–Folkman lemma wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Jordan normal form wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Matrix multiplication wikipedia , lookup

Matrix calculus wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Four-vector wikipedia , lookup

Transcript
Notes from Compressed Sensing and
Contemporary Sampling
Cambridge Part III Mathematical Tripos 2012-2013
Lecturer: Anders Hansen
Vivak Patel
May 1, 2013
1
Contents
I
Compressed Sensing
3
1 Sparse Linear Algebra
1.1 Basic Definitions and Notation . . . .
1.2 Spark . . . . . . . . . . . . . . . . . .
1.3 Null Space Property . . . . . . . . . .
1.4 Restricted Isometry Property . . . . .
1.4.1 RIP and Stability . . . . . . .
1.4.2 RIP and Measurement Bounds
1.4.3 RIP and NSP . . . . . . . . . .
1.5 Noise-Free Signal Recovery . . . . . .
1.6 Noisy Signal Recovery . . . . . . . . .
1.7 Coherence . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Signal Recovery in Random Sampling
2.1 Bernstein Inequalities . . . . . . . . . . . .
2.2 RIP Less Theory of Compressed Sensing . .
2.3 Random Sampling in Bounded Orthonormal
2.4 Sampling for Recovery . . . . . . . . . . . .
II
Generalised Sampling
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
4
5
5
6
7
10
12
13
. . . . .
. . . . .
Systems
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
16
16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
2
Part I
Compressed Sensing
1
Sparse Linear Algebra
1.1
Basic Definitions and Notation
1. Σk = {x ∈ Rn : kxk0 ≤ k} is the set of vectors in Rn with k or fewer
non-zero components
2. k-term Approximation
(a) Motivation: sparsity is key in determining x uniquely from an underdetermined system. However, x may not be sparse, but we can
approximate it as being sparse:
(b) The k-term approximation for a vector x is:
σk (x)p = inf kx − x̃kp
x̃∈Σk
3. Null Space of A ∈ Rm×n is N (A) = {x ∈ Rn : Ax = 0}
1.2
Spark
1. Definitions and Properties of Spark
(a) Definition
Definition 1.1. Let A ∈ Rm×n . The spark (A) = inf z6=0 {kzk0 :
Az = 0}. Equivalently, the spark (A) is the minimum number of
linearly dependent columns of A.
(b) If we assume that m < n, which it is for an underdetermined system,
spark (A) ∈ [2, m + 1]
2. Recovering at most a unique x given a signal y
Theorem 1.1. Let y ∈ Rm . There exists at most one x ∈ Σk such that
y = Ax if and only if spark (A) > 2k
Proof. The proof relies on the following idea: Suppose x, x0 ∈ Σk and
Ax0 = y = Ax =⇒ x − x0 ∈ N (A). Moreover, kx − x0 k0 ≤ 2k. However,
if we restrict spark (A) > 2k then no 2k-sparse vector (or more sparse)
can be in the null space, hence x0 = x.
(a) Let y ∈ Rm . Suppose x ∈ Σk uniquely satisfies y = Ax. suppose
spark (A) ≤ 2k. Then Σ2k ∩ N (A) 6= ∅. Let h ∈ Σ2k ∩ N (A). Then,
∃x̃ ∈ Σk such that h = x − x̃ (this is like completing the basis). Then
Ah = 0 =⇒ Ax = Ax̃ a contradiction.
(b) Suppose spark (A) > 2k Suppose ∃x, x0 ∈ Σk such that y = Ax =
Ax0 . Then, x − x0 ∈ N (A) ∩ Σ2k . However, ∀h ∈ N (A), khk0 ≥
spark (A) > 2k ≥ kx − x0 k0 . So this is a contradiction.
3
1.3
Null Space Property
1. Definition of NSP
Definition 1.2. A matrix A satisfies the NSP of order k if ∃c ∈ R such
that on some active set Λ with |Λ| ≤ k, ∀h ∈ N (A),
khΛ k2 ≤
c khΛc k1
√
k
2. Motivation: when a signal is exactly sparse, the previous theorem is sufficient, however, when it is approximatley sparse, we may not have unique
solutions. The null space may contain vectors that are very sparse or very
compressible (well approximated by sparse vectors). Therefore, to make
sure that N (A) is well behaved, the null space property guarantees that:
h ∈ N (A), if khΛc k1 = 0 then khΛ k2 = 0 so that h = 0.
3. Instance-Optimality: Given a sensing matrix A and recovery algorithm ∆,
the following guarantees the optimal performance of ∆ based on σk (x)1 :
c
k∆(Ax) − xk2 ≤ √ σk (x)1
k
4. Instance-Optimality implies NSP
Theorem 1.2. Let A : Rn → Rm be a sensing matrix and ∆ : Rm → Rn
be a recovery algorithm. If (A, ∆) is instance-optimal, then A satisfies the
NSP of order 2k
Proof. Let h ∈ N (A) and decompose it into x − x0 with x0 ∈ Σk . The
result follows:
(a) Let h ∈ N (A) and let Λ be the index set of h’s 2k largest components.
Let Λ = Λ0 ∪ Λ1 where |Λ0 | = |Λ1 | = k. Set x = hΛ1 + hΛc and
x0 = −hΛ0 . Then
x − x0 = hΛ1 + hΛc + hΛ0 = h
(b) Since kx0 k0 ≤ k, then σk (x0 )1 = 0. By Instance-optimality,
k∆(Ax0 ) − x0 k2 ≤ 0 =⇒ ∆(Ax0 ) = x0
Moreover,
0 = Ah = A(x − x0 ) =⇒ Ax = Ax0
(c) Therefore:
khΛ k2 ≤ khk2 = kx − x0 k2
= kx − ∆(Ax0 )k2
= kx − ∆(Ax)k
c
≤ √ σk (x)1
k
4
(d) Noting that σk (x)1 = inf x̃∈Σk kx − x̃k. By the definition of x, x̃ =
hΛ1 , implying that σk (x)1 = khΛc k1 . Therefore:
c
c0
khΛ k2 ≤ √ σk (x)1 ≤ √ khΛc k1
k
2k
5. Note that if an algorithm satisfies the optimality condition, it satisfies the
NSP. So for an approximately sparse signal (i.e. a compressed signal), we
have a unique x given by Ax = y
1.4
Restricted Isometry Property
1. Motivation: when a signal has noise or is corrupted, NSP is not strong
enough of a condition to guarantee a unique x. The Restrict Isometry
Property, on the other hand, is.
2. Definition of RIP
Definition 1.3. A matrix A satisfies the RIP of order k if ∃δk ∈ (0, 1)
such that ∀x ∈ Σk :
2
2
2
(1 − δk ) kxk2 ≤ kAxk2 ≤ (1 + δk ) kxk2
1.4.1
RIP and Stability
1. Definition of Stability
Definition 1.4. Let A ∈ Rm×n and ∆ : Rm → Rn . (A, ∆) is c-stable if
for any x ∈ Σk and error e ∈ Rm ,
k∆(Ax + e) − xk2 ≤ c kek2
2. C-Stability implies part of the RIP
Theorem 1.3. Suppose (A, ∆) is c-stable. Then
x ∈ Σ2k
1
c
kxk2 ≤ kAxk2 for all
Proof. We note that we are in Σ2k which can be produced by any two
vectors in Σk . Let x, z ∈ Σk . We use these to cleverly define errors:
(a) Let x, z ∈ Σk . et ex = A(z−x)
and ez = A(x−z)
. Then Ax + ex =
2
2
A(x+z)
Az + ez =
. Let the recovered value be ∆(Ax + ex ) = ∆(Az +
2
e z ) = y ∈ Rn .
(b) Then
2
kx − zk2 ≤ kx − yk2 + ky − zk2
≤ c kex k2 + c kez k2
= c kA(x − z)k2
5
1.4.2
RIP and Measurement Bounds
1. Motivation: We can recover exactly k sparse vectors if we have that
spark (A) ≥ 2k, where m + 1 ≥ spark (A). Given noise in the signal,
we will typically need more measurements.
2. RIP implies number of measurements m needed:
Theorem 1.4. Let A ∈ Rm×n satisfy the
√ of order
RIP
2k with δ2k ∈
(0, 1/2]. Then m ≥ c log(n/k) and c = 21 log( 24 + 1) ≈ 0.28
3. Important Lemma used to prove Theorem 1.4
Lemma 1.1. Suppose k and n satisfy k < n/2. Then ∃X ⊂ Σk such that
∀x, z ∈ X with x 6= z
√
kxk2 ≤ k
r
k
kx − zk2 ≥
2
n
k
log |X| ≥ log
2
k
Proof. We first create a set satisfying the first requirement. Then we pick
a subset satisfying the second requirement. However, this limits the size
of the subset, resulting in the third requirement.
(a) Let U = {x ∈ {0, −1, 1}n ⊂ Rn : kxk0 = k}. Then |U | = nk 2k and
√
kxk2 = k. This satisfies the first requirement.
2
(b) From this set, fix x ∈ U . Since kx − zk0 ≤ kx − zk2 , if we bound
2
kx − zk2 ≤ k/2 then:
2
|{x ∈ U : kx − zk2 ≤ k/2}| ≤ |{z ∈ U : kx − zk0 ≤ k/2}|
n
2k/2
≤
k/2
n
≤
3k/2
k/2
(c) If we remove the above set from U we are left with only nk 2k −
n
k/2
points to choose from. Suppose we choose j points for
k/2 2
2
which kxr − xp k2 ≥ k/2 for r 6= q and 1 ≤ r, q ≤ j. Then we have
k/2
n
at least nk 2k − j k/2
3
points to choose from. If we find the
maximum value of j possible we will have that
k/2
n
4
k |X| ≤ n
3
k/2
Proof of Theorem 1.4. We want to establish a (loose) lower bound on m
for sparse signals in Σ2k . Again we can decompose this space as two
vectors from Σk . We use the special subset X ⊂ Σk of the lemma, which
has nice properties to get the lower bound.
6
(a) Let x, z ∈ X. By RIP, assumptions on δ2k , and Lemma 1.1:
r
1
k
kA(x − z)k2 ≥ kx − zk2 ≥
2
4
(b) Let x ∈ X. By the RIP and the lemma:
r
3k
kAxk2 ≤
2
q
(c) Thus, all Ax for x ∈ X are a distance k4 , and that all Ax are within
q
a distance 3k
2 from the origin. Therefore, for any z ∈ X
h
p
i
h
p
i
p
vol BO
3k/2 + k/16 ≥ |X| vol BAz
k/16
m
m
p
p
p
3k/2 + k/16
k/16
≥ |X|
√
m
24 + 1
≥ |X|
1
√
log 24 + 1
n
1
√
k log
m≥
k
2 log 24 + 1
m ≥ log(|X|)
1.4.3
RIP and NSP
1. Satisfying the RIP implies satisfying the NSP
Theorem 1.5. Suppose A satisfies the RIP of order 2k with δ2k ≤
Then A satisfies the NSP of order 2k with c = 1−(1+2√2δ )
√
2−1.
2k
Proof. We use the Lemmas below to prove this result.
(a) By RIP and Lemma 1.6. Let h ∈ N (A). Let Λ be the 2k largest
components of h. Let Λ0 be the k largest components of h and Λ1
be the next k largest so that Λ = Λ0 ∪ Λ1 .
i. Noting that Ah = 0 and since we satisfy the conditions of 1.6:
hΛc α
0 1
khΛ k2 ≤ α √
= √ (khΛc k1 + khΛ1 k1 )
k
k
ii. By Lemma 1.2,
√
√
α α khΛ k2 ≤ √ khΛc k1 + k khΛ1 k2 ≤ √ khΛc k1 + k khk2
k
k
iii. Rearranging and noting that 1 − α > 0:
√
2α khΛc k1
√
khΛ k2 ≤
1−α
2k
7
2. Important Lemmas
(a) Relation between essential norms
Lemma 1.2. Suppose u ∈ Σk then kuk1 ≤
√
k kuk2 ≤ k kuk∞
Proof. By Cauchy-Schwarz and noting that kuk1 = hu, sgn (u)i and
2
u2i ≤ kuk∞ :
kuk1 = hu, sgn (u)i
≤ ksgn (u)k2 kuk2
√
≤ k kuk2
n
X
2
kuk2 =
u2i
i=1
≤
k
X
2
kuk∞
i=1
2
= k kuk∞
(b) Perpendicular vectors
Lemma 1.3. Suppose u ⊥ v. Then kuk2 + kvk2 ≤
√
2 ku + vk2
kuk2
Proof. Using the inequality from 1.2 on the vector: A =
.
kvk2
√
√
We have that kAk1 ≤ 2 kAk2 = 2 ku + vk2
(c) RIP and bounding the Inner Product
Lemma 1.4. Suppose A satisfies the RIP of order 2k, then for
any pair of vectors u, v ∈ Σk with disjoint support, |hAu, Avi| ≤
δ2k kuk2 kvk2
Proof. Let u0 , v 0 ∈ Σk with disjoint support. Let u = u0 / ku0 k2 and
v = v 0 / kv 0 k2 so that kuk2 = kvk2 = 1. The result follows from the
parallelogram law and the RIP:
i. By the Parallelogram Law:
i
1h
2
2
kAu + Avk2 − kAu − Avk2
4
i
1h
2
2
≤
(1 + δ2k ) ku + vk2 − (1 − δ2k ) ku − vk2
4
|hAu, Avi| =
2
2
ii. Since u ⊥ v, ku + vk2 = ku − vk2 . Therefore:
|hAu, Avi| ≤
i
δ2k
δ2k h
2
2
=
kuk2 + kvk2 = δ2k
4
2
8
(d) k-sparse partitioning of vectors
Lemma 1.5. Let Λ0 ⊂ {1, . . . , n} such that |Λ0 | ≤ k. For u ∈ Rn ,
let Λ1 be the k largest index set in hΛc0 . Define Λ2 to be the next
largest, and so on. Then:
X
√
uΛj ≤ uΛc / k
0 1
j≥2
Proof. Notice that for j ≥ 2 that the largest component of hΛj is
smaller than the smallest component of hΛj−1 and the average of the
components in modulus of hΛj . Therefore:
i. hΛ ≤ 1 hΛ for j ≥ 2.
j
∞
k
j−1
1
ii. So for j ≥ 2 and by 1.2:
X√ X
hΛj ≤
k hΛj ∞
2
j≥2
j≥2
X 1 √ hΛj−1 1
≤
k
j≥2
1 = √ hΛc0 1
k
(e) RIP implies less strict NSP
Lemma 1.6. Let Λ0 ⊂ {1, . . . , n} such that |Λ0 | ≤ k. Then for fixed
h let Λ1 be the next k largest index set in hΛc0 . Let Λ = Λ0 ∪ Λ1 .
Suppose A satisfies the RIP of order 2k. then:
hΛc |hAhΛ , Ahi|
0 1
+β
khΛ k2 ≤ α √
khΛ k2
k
P
Proof. Note that hΛ ∈ Σ2k , and hΛ = h − j≥2 hΛj , by the RIP, and
9
Lemmas 1.4, 1.3, and 1.5:
2
2
(1 − δ2k ) khΛ k2 ≤ kAhΛ k2
= hAhΛ , Ahi −
X
hAhΛ , AhΛj i
j≥2
X
≤ |hAhΛ , Ahi| + hAhΛ , AhΛj i
j≥2
X
= |hAhΛ , Ahi| + hAhΛ1 , AhΛj i + hAhΛ0 , AhΛj i
j≥2
X
(khΛ1 k2 + khΛ0 k2 ) δ2k hΛj 2 ≤ |hAhΛ , Ahi| + j≥2
X
√
hΛj ≤ |hAhΛ , Ahi| + 2 khΛ k δ2k
2
2
j≥2
≤ |hAhΛ , Ahi| +
√
hΛc 0 1
2 khΛ k2 δ2k √
k
Rearranging, we have:
√
2δ2k hΛc0 1
1
|hAhΛ , Ahi|
√
khΛ k2 ≤
+
1 − δ2k
1 − δ2k
khΛ k2
k
1.5
Noise-Free Signal Recovery
1. Spark, NSP and RIP are properties of a matrix that determine the number
of samples needed to reasonably recover a sparse signal with noise
2. Motivation: to actually compute the signal, we use L1 minimisation, which
can be easily computed. We need to determine an upper bound in the error
for the recovered signal x̂ and the true signal x. So we explore the value
of kx − x̂k2 in the noise free context
3. The L1 minimisation algorithm for a sensing matrix which satisfies the
RIP is instance-optimal
√
Theorem 1.6. Suppose A satisfies the RIP of order 2k with δ2k < 2 −
1. Suppose we obtain measurements of the form y = Ax. Let x̂ ∈
C0
arg min kzk1 : Az = y. Then kx̂ − xk2 ≤ √
σ (x)1
k k
Proof. By assumption kx̂k1 ≤ kxk1 . Applying Lemma 1.7:
C0
C1
kx̂ − xk2 ≤ √ σk (x)1 +
|hA(x̂ − x)Λ , A(x̂ − x)i|
khΛ k2
k
Noting that Ax = Ax̂ = y, then A(x̂ − x) = 0. The result follows.
10
4. The following lemma is similar to Lemma 1.6, and is essential to proving
the above result.
Lemma 1.7. Suppose A satisfies the RIP of order 2k. Let Λ0 be the set
of the index set for k largest components of x, and Λ1 be the index set of
the next k-largest components of hΛc0 where h = x̂ − x for x, x̂ ∈ Rn . Let
Λ = Λ0 ∪ Λ1 . If kx̂k1 ≤ kxk1 then:
C1
C0
|hAhΛ , Ahi|
khk2 ≤ √ σk (x)1 +
khΛ k2
k
Proof. Using the triangle inequality, khk2 ≤ khΛc k2 + khΛ k2 . We bound
each term individually:
(a) By Lemma 1.5, we have that
khΛc k2 ≤
X
hΛj ≤ √1 hΛc 0 1
k
j≥2
Therefore,
we will use σk (x)1 = kx − xΛ0 k1 = xΛc0 to upper bound
hΛc .
0 1
kxk1 ≥ kx̂k1 = kx + hk1
≥ kxΛ0 k1 − khΛ0 k1 + hΛc0 1 − xΛc0 1
Rearranging, and using Lemma 1.2:
hΛc ≤ kxk − kxΛ k + khΛ k + xΛc 0 1
0 1
1
0 1
0 1
≤ kx − xΛ0 k + khΛ0 k + xΛc 1
1
0
1
≤ 2σk (x)1 + khΛ0 k1
√
≤ 2σk (x)1 + k khΛ0 k2
Therefore:
2
khΛc k2 ≤ khΛ k2 + √ σk (x)1
k
(b) By Lemma 1.6 and the previous part of this proof:
|hAhΛ , Ahi|
α khΛ k2 ≤ √ hΛc0 1 + β
khΛ k2
k
2α
|hAhΛ , Ahi|
≤ √ σk (x)1 + α khΛ k2 + β
khΛ k2
k
β |hAhΛ , Ahi|
2α
σk (x)1 +
≤√
1 − α khΛ k2
k(1 − α)
11
(c) Combining the two results:
σk (x)1
khk2 ≤ 2 khΛ k2 + 2 √
k
4α
2β |hAhΛ , Ahi|
σk (x)1
√
≤
+
+2
1−α
1 − α khΛ k2
k
2β |hAhΛ , Ahi|
1 + α σk (x)1
√
+
=2
1−α
1 − α khΛ k2
k
5. Remarks
(a) If x is exactly k-sparse, kx̂ − xk2 = 0 since σk (x)1 = 0. Therefore,
we would recover x exactly.
(b) Since we are not dealing with noise, we can simplify Lemma 1.7 and
the theorem by assuming an NSP of order 2k.
1.6
Noisy Signal Recovery
1. Motivation: More often than note, y = Ax + e is our sample where e is
some sort of corruption, in which case, we want to estimate z to satisfy
kAx − yk2 ≤ kek2 . If kek2 is bounded, we can bound the error between
the L1 estimator and the signal.
2. Error bound between the L1 estimator and the signal
m×n
Theorem
satisfies the RIP of order 2k with
√ 1.7. Suppose A ∈ R
δ2k < 2 − 1. Let y = Ax + e, where kek2 < . Let x̂ ∈ arg min kzk1 :
kAx − yk2 ≤ . Then:
C0
kx̂ − xk2 ≤ √ σk (x)1 + C2 k
Proof. Let h = x̂ − x. Let Λ0 be the index set of the k-largest components
of x, and Λ1 be the index set for the k largest components of hΛc0 . Let
Λ = Λ0 ∪ Λ1 .
(a) By assumption, kx̂k1 ≤ kxk1 . Applying Lemma 1.7:
C0
|hAhΛ , Ahi|
khk2 ≤ √ σk (x)1 + C1
khΛ k2
k
(b) By Cauchy-Schwarz: |hAhΛ , Ahi| ≤ kAhΛ k2 kAhk2 . By RIP,
p
kAhΛ k2
≤ 1 + δ2k
khΛ k2
(c) By triangle-inequality, definition of the estimator, and our assumption on the error:
kAhk2 ≤ kAx − yk + kAx̂ − yk ≤ 2 kek2 ≤ 2
12
(d) Therefore,
p
C0
kx̂ − xk2 ≤ √ σk (x)1 + C1 2 1 + δ2k k
1.7
Coherence
1. Motivation: Spark, NSP, and RIP all provide guarantees for recovering
k-sparse signals, but verifying these properties is difficult. The property
of coherence is easily computed and can be related to spark, NSP and RIP
under certain conditions.
2. Note: we will be using two definitions of coherence. One is presented
below. The other is the maximal modulus element of the matrix.
3. Definition:
Definition 1.5. The coherence of A with columns ai is:
µ(A) =
max
1≤i<j≤n
|hai , aj i|
kai k2 kaj k2
4. Important Properties (without proof):
Theorem 1.8. Let µ(A) be the coherence defined above of a matrix A.
1
(a) For any matrix A, spark (A) ≥ 1 + µ(A)
1
(b) If k < 21 1 + µ(A)
then for all y ∈ Rm , there exists at most 1
x ∈ Σk such that y = Ax.
(c) If A has unit norm columns and coherence µ(A), then A satisfies the
1
RIP of order k with δk = (k − 1)µ(A) for all k < µ(A)
.
2
Signal Recovery in Random Sampling
2.1
Bernstein Inequalities
1. Bernstein Inequality
Theorem 2.1. Suppose Y1 , . . .h, Yn are
i independent zero-mean random
2
variables and |Yj | ≤ R a.s. If E |Yj | ≤ σj2 then for all t > 0:
hX i
P Yj ≥ t ≤ 2 exp
2. Vector Bernstein Inequality
13
−t2 /2
P 2 1
σj + 3 RT
!
Theorem 2.2. Let Y1 , . . . , Yn ∈ Cd be independent
izero-mean random
P h
2
2
vectors such that kYj k2 ≤ R a.s. Let σ ≥ E kYj k2 . Then for 0 < t:
i
hX P Yj ≥ σ + t ≤ exp
2
−t2 /2
σ 2 + (6σ + t)R/3
3. Square Matrix Bernstein Inequality
Theorem 2.3. Let Y1 , . . . , Yn ∈ Cd×d are self-adjoint (equal to its conjugate transposed), zero-mean random matrices. Suppose
P the largest eigenvalue of Yj satisfies λmax (Yj ) ≤ R a.s.. Let σ 2 = k V [Yj ]k2→2 . Then
h
i
X
−t2 /2
P λmax (
Yj ) ≥ t ≤ d exp
σ 2 + Rt/3
4. Rectangular Matrix Bernstein Inequality
Theorem 2.4. P
Let Y1 , . . . , Yn P
∈ Cd1 ×d2 such that kYl k2→2 ≤ R for all l.
2
∗
Let σ = max {k E [Yl Yl ]k , k E [Yl∗ Yl ]k}. Then for t > 0:
hX i
−t2 /2
Yl P ≥ t ≤ 2(d1 + d2 ) exp
σ 2 + Rt/3
2→2
2.2
RIP Less Theory of Compressed Sensing
1. Motivation: in some situations, the RIP theory of compressive sensing
requires an impossible number of measurements to be taken. The RIPless theory holds in a probabilistic setting, allowing us to circumvent these
issues. Even in the RIPless setting, we can still construct sensing matrices
A, and must determine when unique solutions can be recovered.
2. Notation
(
a/ |a| if a 6= 0
0
if a = 0


sgn (x1 )


..
(b) For x ∈ Cn , let sgn (x) = 

.
(a) For a ∈ C, let sgn (a) =
sgn (xn )
(c) For S ⊂ {1, . . . , n} and A ∈ Cm×n , AS = APS , which is a restriction
to columns in S.
3. Inexact Dual Certificate: Properties of a sensing matrix A which ensure a
unique L1 minimiser.
Theorem 2.5. Let A ∈ Cm×n with columns al for l ∈ {1, . . . , n}. let
x ∈ Cn with supp(x) = S (i.e. the non-zero components of x are indexed
by S). Let α, β, γ, θ > 0 such that:
(a) Firstly:
−1 ∗
As As |Range(PS )
2→2
14
≤α
maxc kA∗s al k2 ≤ β
l∈S
(b) Secondly, suppose ∃u ∈ Cn such that u = A∗ h for h ∈ Cm where
kus − sgn (xs )k2 ≤ γ
kuS c k ≤ θ
If θ + αβγ < 1 then x is the unique minimiser of min kzk1 : Ax = Az
Proof. Let x̂ = arg min kzk1 : Ax = Az. Let v = x̂ − x. We want to show
that v = 0
(a) We do this by showing kx̂k1 ≥ kxk1 + Constant × kvS c k1 which
implies that kvs k1 = 0 since, by construction, kx̂k1 ≤ kxk1
(b) Note that using the definition of sgn (a + bi) and using generic vectors
a + bi and c + di we can show the last line:
kx̂k1 = kx + vk1
= kx + vS k1 + kvS c k1
= hsgn (x + vS ) , x + vS i + kvS c k1
≥ kxk1 − Re (hsgn (x) , vS i) + kvS c k1
(c) Now we want to upper bound Re (hsgn (x) , vS i). We use us to do so.
Note that:
huS , vS i = hu, vi − huS , vS c i − huS c , vS i − huS c , vS c i
= hu, vi − huS c , vS c i
= hA∗ h, vi − huS c , vS c i
= hh, Avi − huS c , vS c i
= −huS c , vS c i
Therefore, by Cauchy-Schwarz:
|hsgn (x) , vS i| = |hsgn (x) − uS , vS i + huS , vS i|
≤ |hsgn (x) − uS , vS i| + |huS c , vS c i|
≤ ksgn (x) − uS k2 kvS k2 + kuS c k∞ kvS c k1
≤ γ kvS k2 + θ kvS c k1
So we have:
kx̂k1 ≥ kxk1 − γ kvS k2 + (1 − θ) kvS c k1
(d) Now we want to get kvS k2 in terms of kvS c k1 . Noting that Av =
0 =⇒ AS vS + AS c vS c = 0,
−1
kvS k2 = (A∗S AS ) [A∗S (AS c vS c )] ≤ α kA∗S AS c vS c k2
2
Moreover,
kA∗S AS c vS c k2
X
∗
=
AS al vl c
l∈S
2
X
∗
≤
|vl | kAS al k2
l∈S c
≤β
X
|vl |
l∈S c
= β kvS c k1
15
(e) Therefore, we have that kx̂k ≥ kxk1 + (1 − θ − αβγ) kvS c k1 . By
assumption, 1 > θ + αβγ. So vS c = 0 which implies that vS = 0
since AS vS = −AS c vS c = 0. So v = 0. And we recover x exactly.
2.3
Random Sampling in Bounded Orthonormal Systems
1. Motivation: A random sensing matrix confers unique advantages to reconstructing signals. However, structured random matrices, those generated
by a random choice of parameters, have many computational advantages
over completely unstructured random matrices
2. Definition of Bounded Orthonormal System (BOS)
Definition 2.1. Suppose D ⊆ Rd with a probability measure ν. Let
Φ = {φ1 , . . . , φn } be an orthonormal
system of complex valued functions
R
on D with respect to ν (i.e. D φi (t)φi (t)dν(t) = δij ). Φ is a Bounded
Orthonormal System if ∃K > 0 such that kφi k∞ ≤ K for i = 1, . . . , n
Pn
3. Goal: consider a function f (t) = k=1 xk φk (t). Suppose t1 , . . . , tm ∈ D
are randomly sampled points with measurements yi = f (ti ). Then we
T
have measurements y = ( y1 · · · ym ) , sensing matrix Aij = φj (ti ), and
unknown parameters x = ( x1 · · · xn )T . By determining x we can
construct the function f .
2.4
Sampling for Recovery
1. Motivation: Ultimately, we want a lower bound guarantee on the number
of sampled points m that allows us to recover x with some probability
2. Applying the Bernstein Inequality
Lemma 2.1. Let A ∈ Cm×n be a random sampling matrix with respect to
a BOS. Let à = √1m A. Let v ∈ Cn with supp(v) = S, and |S| = s. Then
for t > 0
!
i
h
−m
t2
∗
ps
P ÃS c Ãv > t kvk2 ≤ 4n exp
4K 2 1 + 18
∞
t
Proof. We will be applying the Bernstein Inequality to prove this solution.
So we need to find a suitable RV and prove it has 0 mean, are bounded
and have bounded variance.
(a) Note that Ã∗S c Ãv = maxk∈S c hek , Ã∗ Avi. Let Xl = {φj (tl )}nj=1 .
∞
Then Xl are the columns of A∗ . Without loss of generality, let
kvk2 = 1 and let Yl = hek , Xl Xl ∗ vi
(b) We compute the expectation of Y , noting that
h
i
h
i
E (Xl Xl∗ )ij = E φi (tl )φj (tl ) = δij
Implies, since k ∈ S c and supp(v) = S:
E [Yl ] = hek , E [Xl Xl∗ ] vi = hek , Ivi = 0
16
(c) We now bound Yl :
|Yl | = |hek , Xl Xl∗ vi|
= |hek , Xl i| |hXl |k∈S , vi|
= |φk (tl )| |hXl |k∈S , vi|
≤ K kXl |k∈S k2 kvk2
= K kXl |k∈S k2
sX
2
=K
|φk (tl )|
s∈K
≤ K2
sX
1
s∈K
√
= K2 s
(d) We now bound the variance:
E [Yl Yl∗ ] = E [hek , Xl ihXl , vihv, Xl ihXl , ek i]
h
i
2
= E |φk (tl )| v ∗ Xl Xl∗ v
≤ K 2 v ∗ E [Xl Xl∗ ] v
= K 2 v∗ v
= K2
(e) Before we apply the Bernstein Inequality, we note that for any z ∈ C
2
|z| = Re(z)2 + Im(z)2 ≤ 2Re(z)2 ∧ 2Im(z)2
(f) In light of the previous fact, and applying Bernstein’s Inequality:
"
#
m
i
h
X
1
t
P hek , Ã∗ Ãvi > t ≤ P
Re(Yl ) > √
m
2
l=1
"
#
m
1 X
t
+P
Im(Yl ) > √
m
2
l=1
2 2
−m t /4
√
≤ 4 exp
√
mK 2 + sK 2 mt/ 18
!
t2
−m
ps
= 4 exp
4K 2 1 + 18
t
(g) Noting that P [maxk∈S c Zl > t] ≤
variable Zl , the result follows.
Pn
3. Applying the Matrix Bernstein Inequality
17
k=1
P [Zl > t] for some random
Lemma 2.2. Let A ∈ Cm×n random matrix corresponding to a BOS with
K ≥ 1. Let S ⊆ {1, . . . , n} and |S| = s. Then for δ ∈ (0, 1), Ã = √1m A,
and I = PS :
h
i
−3mδ 2
∗
P ÃS ÃS − I > δ ≤ 2s exp
8K 2 s
2→2
Proof. We want to use matrix Bernstein inequality for which we need
Yl independent, self-adjoin, zero-mean, bounded, and bounded summed
covariance in spectral norm.
(a) Let Xl be the columns of A∗S which are independent since tl are
independent. Let Yl = Xl Xl∗ − I = Yl∗ . So Yl are independent,
self-adjoint, and:
m
1 X
Yl
Ã∗S Ã∗S − I =
m
l=1
(b) E [Yl ] =
E [Xl Xl∗
− I] = 0
(c) Now we bound Yl :
kYl k2→2 = kXl Xl∗ − Ik2→2
2
= max v ∗ (Xl Xl∗ )v − kvk2 kvk2 =1
2
= max (hXl , vi) − 1
kvk2 =1
2
2
≤ max kXl k2 kvk2 − 1
kvk2 =1
2
= kXl k2 − 1
2
X ≤
φk (tl )
k∈S
2
≤K s
(d) Noting that the covariance of Yl has the following property (in the
usual sense):
V [Yl ] =E [Yl Yl∗ ]
h
i
2
=E (Xl Xl∗ − I)
=E [Xl (Xl∗ Xl )Xl∗ − 2Xl Xl∗ + I]
h
i
2
=E kXl k2 Xl XL∗ − I
≤K 2 sI − I
≤K 2 sI
Then:
m
X
V [Xl ]
l=1
≤ mK 2 sI 2→2 = mK 2 s
2→2
18
(e) Applying the Matrix Bernstein Inequality:
#
" m X i
h
∗
> mδ
> δ =P Yl P ÃS ÃS − I 2→2
l=1
2→2
−δ 2 m/2
≤2s exp
K 2 s(1 + δ/3)
2
−δ m 3
≤2s exp
K 2 s 6 + 2δ
−3δ 2 m
≤2s exp
8K 2 s
4. Application of Vector Bernstein Inequality
Lemma 2.3. Let S ⊆ {1, . . . , n} with |S| = s. Let v ∈ Cs and kvk2 = 1.
Then for t > 0:
"
#
!
r
K 2s
−mt2
1
∗
p
P ÃS ÃS − I v ≥
+ t ≤ exp
m
2K 2 s 1 + 2 K 2 s/m + t/3
2
Proof. Here we want to apply the vector Bernstein inequality, which requires a random hvector with
mean zero, bounded in norm, and with
i
2
2
σ = supkzk2 =1 E |hY, zi| bounded.
(a) Let Xl∗ be a column vector of A∗S . Let Yl = (Xl Xl∗ − I)v. Then:
m
1 X
Ã∗S ÃS − I v =
Yl
m
l=1
(b) Zero Mean:
E [Yl ] = E [Xl Xl∗ − I] v = (I − I)v = 0
(c) Bounded in Norm:
kYl k2 = k(Xl Xl∗ − I)vk ≤ K 2 s
(d) Bounded σ 2 :
h
i
2
σ 2 ≤m max E |hYl , zi|
kzk2 =1
i
h
2
2
≤m max E kYl k2 kzk2
kzk2 =1
h
i
2
=mv ∗ E kXl k2 Xl Xl∗ − 2Xl Xl∗ + I v
≤mv ∗ (K 2 s − 1)v
≤mK 2 s
19
(e) Therefore, by the Vector Bernstein Inequality:
#
"
r
K 2s
∗
P ÃS ÃS − I v ≥
+ t
m
2
hX i
√
=P Yl ≥ mK 2 s + mt
2
≤ exp
= exp
−m2 t2 /2
√
msK 2 + (6 msK 2 + mt)sK 2 /3
!
1
−mt2
p
2K 2 s 1 + 2 K 2 s/m + t/3
!
5. Application of Rectangular Matrix Bernstein Inequality
√
Lemma 2.4. For 0 < t ≤ 2 s, and ãj the j th column of à =
√1 A:
m
−3 mt2
∗ P maxc ÃS ãj ≥ t ≤ 2(s + 1)n exp
j∈S
10 K 2 s
2
Proof. We want to apply the rectangular matrix Bernstein for s × 1 matrices. We need to find Yl with mean zero, bounded and bounded σ 2
(a) Let Xl be the column vector of A∗S . Let Yl = Xl φj (tl ) for j ∈ S c
fixed.
(b) Zero-mean, since k ∈ S and j ∈ S c :
h
i
E [Yl ]k = E φk (tl )φj (tl ) = 0
(c) Bounded in Norm
√
kYl k2 = kXl φj (tl )k2 ≤ |φj (tl )| kXl k2 ≤ K × K s
Pm
Pm
(d) Bounded σ 2 = max k l=1 E [Yl Yl∗ ]k2→2 , k l=1 E [Yl∗ Yl ]k . For the
first term:
h
i
2
E [Yl Yl∗ ] = E |φj (tl )| Xl Xl∗ ≤ K 2 I
For the second term, since the system is orthonormal:
h
i
2
E [Yl∗ Yl ] = E |φj (tl )| Xl∗ Xl
i
X h
≤ K2
E φk (tl )φk (tl )
k∈S
=K
2
X
k∈S
= K 2s
Therefore, σ 2 ≤ mK 2 s.
20
1
√
(e) We apply the Rectangular Matrix Bernstein for t ≤ 2 s
"
#
X 1 i
h
∗ P ÃS ãj ≥ t = P Yl ≥ t
m 2
l∈S
2
−m2 t2 /2
√
≤ 2(s + 1) exp
σ 2 + K 2 smt/3
−3mt2
≤ 2(s + 1) exp
6K 2 s + 4K 2 s
Therefore:, the result follows from:
X
n
i
h
P maxc Ã∗S ãj ≥ t ≤
P Ã∗S ãj ≥ t
j∈S
2
2
j=1
6. Probably of Satisfying the Inexact Dual Certificate for a BOS:
Theorem 2.6. Let x ∈ Cn be an s-sparse vector. Choose A ∈ Cm×n
random matrix corresponding to a BOS with K ≥ 1. If
m ≥ CK 2 s 2 log(4N ) log(12−1 ) + log(s) log(12−1 log(s))
for some constant C > 0 then x is the unique minimiser for kzk1 : Ax =
Az with probability of at least 1 − .
Proof. We want to satisfy the conditions of the inexact dual certificate,
which allow us to recover a sparse solution. These conditions are that for
some α, β, γ, θ:
1 For al the lth column of A:
∗
−1 ≤α
(AS AS ) maxc kA∗S al k2 ≤ β
l∈S
2→2
∗
n
2 There exists a u ∈ C for which u = A h for some h ∈ Cm such that:
kuS − sgn (xS )k2 ≤ γ
kuS c k ≤ θ
The proof depends on the Golfing Framework, which we set up first. Then
we show that property 2 holds with some probability, from which we can
show property 1 will hold as well.
PL
(1)
(a) Golfing Scheme Construction. Let m =
∈
k=1 mk . Let A
1
m1 ×N
(L)
mL ×N
C
,...,A
∈ C
. Therefore, letting à = √m A, where
A is:
 (1) 
A
 .. 
A =  .  ∈ Cm×N
A(L)
We select u in a recursive way hoping it will satisfy 2. Letting u(0) =
0 ∈ CN , define the sequence:
1 (n) ∗ (n)
(n−1)
) + u(n−1) ∈ CN
u(n) =
A
AS (sgn (xS ) − uS
mn
21
i. By construction, u(L) = A∗ h for h ∈ Cm
(n)
ii. Defining w(n) = sgn (xS ) − uS :
1 (n) ∗ (n)
w(n) = I −
AS
AS
w(n−1)
mn


n
∗
Y
1
(j)
(j)
AS
AS  sgn (xS )
w(n) = 
I−
m
j
j=1
u(n) =
n
X
1 (i) (i) (i−1)
AS w
A
mi
i=1
(b) Demonstrating property 2.
i. Notice, we want to apply Lemma 2.3 for some choice of rn > 0:
(n)
uS − sgn ((xS )) = w(n) 2
2
1 (n) ∗ (n)
(n) ≤ w AS
AS − I 2 mn
2→2
hp
i
K 2 s/mn + rn
≤ w(n) 2
i
Y hp
(L)
K 2 s/mn + rn
uS − sgn ((xS )) ≤ ksgn (xS )k2
2
i
√ Y hp 2
K s/mn + rn
≤ s
Therefore by Lemma 2.3, the first inequality has probability of
failure:
!
1
−mn rn2
p
p1 (n) ≤ exp
2K 2 s 1 + 2 K 2 s/mn + rn /3
So the overall probability of failure is going to be
L
X
p1 (n)
n=1
ii. Now to get the second part of property 2, we want to use Lemma
2.1. Therefore, for some choice of tn :
(L) uS c ∞
L X
1 (n) ∗ (n) (n−1) ≤
AS w
mn AS c
∞
i=1
≤
L
X
tn w(n−1) 2
n=1
≤
L
X
i
Y hp
√ n−1
K 2 s/mn0 + rn0
tn s
n=1
22
n0 =1
By Lemma 2.1, we fail at bounding each term with a probability
of:
!
1
−mn t2n
p
p2 (n) ≤ 4N exp
4K 2 1 + s/18tn
We fail at abounding the entire sum by:
n
X
p2 (n)
L=1
iii. We now choose mn , rn , tn , L, C:
A. m1 = m2 ≥ CK 2 s log(4N ) log(2−1 )
and mn ≥ CK 2 s log(2L−1 )
1
1
B. r1 = r2 = 2e√log
and rn = 2e
4N
C. t1 = t2 =
1
√
e s
and tn =
log√4N
e s
D. L = dlog(s)/2e + 2
h
i
E. C = 8e2 1 + 2e−1 √18 + 61
iv. These choices imply the following results:
A. The choice of mn and rn implies:
p
1
K 2 s/m1 + r1 ≤ √
e log 4N
p
K 2 s/mn + rn ≤
1
e
P
B. kuS − sgn (xS )k2 ≤ e−2 for which failure is
p1 (n) ≤ 2
P
1
C. kuS c k∞ ≤ −1 for which failure is
p2 (n) ≤ 2
D. So 2 holds with probability 1 − 4
(c) Demonstrating property 1:
≤ 21 with probability of faili. By Lemma 2.2, Ã∗S ÃS − I 2→2
∗
−1 2
−3m
<m
holds with
ure 2s exp 32K
2 s . Therefore, (AS AS )
2→2
the same probability, which we want to be bounded by . This
requires that:
m > 32/3K 2 s log(2s−1 )
which can be satisfied by the value of m selected to demonstrate
2.
ii. Note that α = 2/m, γ = e−2 , θ = (e−1)−1 . To satisfy θ+αβγ <
1, we require β/m < 1.554 so we choose β/m < 3/2. By Lemma
2.4
maxc Ã∗S ãl < β/m
l∈S
2
with a probability of failure of at most
−27m
2
2N exp
40K 2 s
We can make this probability less than with the selection of m
above.
23
(d) Overall, α, β, γ, θ behave with a probability of 1 − 4 + 1 − + 1 − for the appropriate selection of m. Replacing 6 by gives us the
desired result.
Part II
Generalised Sampling
24