Download Compressive (or Compressed) Sensing “Detecci´ on Comprimida”

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardware random number generator wikipedia , lookup

Randomness wikipedia , lookup

Least squares wikipedia , lookup

Wavelet wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Quorum sensing wikipedia , lookup

Transcript
Compressive (or Compressed) Sensing
“Detección Comprimida”
Javier Peña
Carnegie Mellon University
UN Encuentro de Matemáticas
Universidad Nacional, July 2012
Materials available at
http://andrew.cmu.edu/user/jfp/UNencuentro
1 / 32
Compressive Sensing, Lecture 1
Consider the following two pictures
One of these is a raw jpg 2.7MB file. The other one is a
compressed jpg 300KB version.
Can you tell them apart?
2 / 32
Wavelets and images
Compressibility corresponds
to and
sparsity
in a suitable basis.
Wavelets
Images
wavelet coeffs
(sorted)
16000
1000
14000
900
12000
800
10000
700
8000
600
6000
500
4000
400
2000
300
0
200
!2000
!4000
100
0
2
4
6
8
10
12
0
0
2
4
6
8
10
5
⇓
3000
12
5
x 10
x 10
0
!0.5
2000
!1
1000
!1.5
!2
0
!2.5
!1000
!3
1 megapixel image
!3.5
!2000
!3000
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
!4
0
2
4
6
8
10
zoom in
12
5
x 10
4
x 10
(log10 sorted)
3 / 32
Compression
• Take original 1 megapixel image
Approximatio
WaveletWavelet
Approximation
• Compute all 1 million wavelet coefficients
• Keep only the 25K largest and set the others to zero
• Invert the wavelet transform
0
0
!1
!1
!2
!2
!3
!3
!4
!4
!5
2
!5
4
original
image1 megapixel
25Kapprox
approximation
1 megapixel
image
image25k term
25k
term approx B-term a
4 / 32
Going against a long established tradition?
Conventional
signal acquisition paradigm
Acquire/Sample (A-to-D converter, digital camera)
Compress (signal dependent, nonlinear)
sample
!
!
N
sparse
wavelet
transform
receive
Fundamental
question
Question
!
N >> M
compress
M
transmit/store
!
M
decompress
!
N
Can we
acquire
the useful partcan
of the
If directly
the signal
is just
compressible,
it signal?
be acquired
efficiently?
5 / 32
Compressive sensing
• Term coined by Donoho
• Body of theory and algorithms for sparse signal acquisition
and recovery
• Seminal papers:
• Candès, Romberg and Tao (2006)
• Candès and Tao (2006)
• Donoho (2006)
• Hot area of research spanning information theory, signal
processing, statistics, mathematics, etc.
• Applications where measurements are
• slow or costly (MRI)
• missing or wasteful
• beyond other capabilities such as memory
6 / 32
Plan
• Introduction to compressive sensing, undetermined systems of
equations, `1 minimization
• Main theoretical and computational techniques
• Matrix completion, undetermined linear matrix equations,
nuclear norm minimization.
7 / 32
Undetermined systems of equations
Problem
Recover a signal x̄ ∈ Rn from m n linear measurements
bk = hak , x̄i, k = 1, . . . , m
b = Ax̄
• In general this is impossible.
• Suppose we know that x̄ is sparse. Does that help?
Example
Suppose only one component of x̄ is different from zero.
Can we get by with fewer than n measurements?
8 / 32
Possible approach to recover sparse x̄
Take m n measurements b = Ax̄ and then solve
min kxk0
Ax = b
Here k · k0 stands for the `0 quasi-norm: kxk0 = |{i : xi 6= 0}|.
Relevant questions
• Does this work (provided we take enough measurements)?
• Suppose x̄ is k-sparse, i.e., kx̄k0 = k. How many
measurements suffice?
• How hard is it to solve the above `0 -minimization problem?
9 / 32
`0 versus `1 -optimization
min kxk0
Ax = b
computationally hard
min kxk1
Ax = b
computationally tractable
(linear program)
• `1 norm is the convex envelope of the `0 quasi-norm.
• The `1 minimization problem is a convex relaxation of the `0
minimization problem.
• In many cases the above `1 minimization problem yields the
same solution as the `0 problem.
10 / 32
A bit of history of `1 optimization
`1 minimization often finds the “right” answer
• Seismology
• Lasso regression
• Bandlimited deconvolution
• Total variation (TV) denoising
• Basis pursuit
11 / 32
Acquiring a sparse signal
Suppose x̄ ∈ Rn is s-sparse.
• Take m random and nonadaptive measurements
bk = hak , x̄i, k = 1, . . . , m
b = Ax̄
• Try to reconstruct x̄ via `1 minimization.
First fundamental result
If m & s · log n and the ak are suitably chosen, then the recovery is
exact.
12 / 32
Nonadaptive sensing of compressible signals
Classical approach
Compressive sensing
• Measure the full signal x̄
• Take m random
• Store s largest coefficients
• Reconstruct x̂ via `1
(all coefficients)
• Distorsion kx̄ − x̄s k2
measurements
minimization
Second fundamental result
If m & s · log n then
kx̂ − x̄k2 . kx̄ − x̄s k2
13 / 32
Optimality of compressive sensing
It is not possible to do better
• with fewer measurements
• with other reconstruction algorithms
Key features of compressive sensing
• Obtain compressible signals from few sensors
• Sensing is nonadaptive: no knowledge about the signal
• Simple acquisition followed by `1 decoder
14 / 32
Next: formal statements
• Probabilistic approach
• Isotropy and incoherence
• Incoherent sampling theorems
• Deterministic approach
• Restricted isometry property
• Signal recovery theorem
• Robustness to noise
• Optimality
15 / 32
Probabilistic approach: random sensing
Acquire x̄ ∈ Rn by measuring
bk = hak , x̄i, k = 1, . . . , m
b = Ax̄
where ak are iid F for some distribution F in Cn .
Two key properties
• Isotropy: E(aa∗ ) = I
• Coherence measure µ(F ): smallest number such that
max |ha, ei i|2 ≤ µ(F ) with high probability
i=1,...,n
We want low coherence µ(F ).
16 / 32
Coherence
Observe
• E(aa∗ ) = I implies µ(F ) ≥ 1.
• We would like µ(F ) to be as close as possible to 1 (incoherent
sensing).
Examples of isotropic incoherent sensing
• Gaussian sensing
• Binary sensing
• Partial Fourier transform
Notation: for a ∈ Cn and i = 1, . . . , n
a[i] := ha, ei i.
17 / 32
Isotropic incoherent sensing
Gaussian sensing
a ∼ N (0, I ), that is, a[1], . . . , a[n] are iid N (0, 1).
In this case µ(F ) = log n.
Binary sensing
a[1], . . . , a[n] are iid with distribution P(a[i] = ±1) = 1/2.
In this case µ(F ) = 1.
Partial discrete Fourier transform
• Select k uniformly at random in {0, 1, . . . , n − 1}
• Set a[t] := e i2πkt/n , t = 0, 1 . . . , n − 1.
In this case µ(F ) = 1.
18 / 32
An example of coherent sensing
Sample random components of x
• Select j uniformly at random in {1, . . . , n}
• Set a =
√
nej .
In this case
E(aa∗ ) = I
and
max |a[i]|2 = n.
i=1,...,n
For this type of sensing, how many samples do we need to recover
a 1-sparse vector with high probability?
19 / 32
Incoherent sampling theorems
Compressive sampling approach
measure
bk = hak , x̄i, k = 1, . . . , m
b = Ax̄
perform `1 recovery
x̂ := argmin{kxk1 : Ax = b}
x
Theorem (Candès & Plan)
Assume x̄ is s-sparse. Then recovery is exact with probability at
least 1 − 5/n − e −β provided that
m ≥ C0 · (1 + β) · µ(F ) · s · log n
for some constant C0 .
20 / 32
Related predecessors:
Theorem (Candès & Tao)
Assume x̄ is s-sparse and sample m Fourier coefficients selected at
random. Then recovery is exact with probability at least
1 − O(n−β ) provided that
m ≥ Cβ · s · log n
for some constant Cβ that depends only on the desired accuracy β.
This result is optimal: any reliable recovery method would require
at least s · log n samples.
Theorem (Candès & Tao)
Assume x̄ ∈ Rn is s-sparse, n is prime, and we sample m Fourier
coefficients. Then x̄ can be reconstructed from the m samples if
m ≥ 2s.
21 / 32
Sampling of non-sparse signals
Given x ∈ Rn and s < n define
xs := argmin kz − xk2
kzk0 ≤k
Theorem (Candès & Plan)
Let x ∈ Rn , β > 0 and s̄ be such that m ≥ Cβ · s̄ · log n. Then
with probability at least 1 − 6/n − 6e −β the `1 solution x̂ satisfies
kx̂ − xk1 ≤ min C · (1 + α) · kx − xs k1
1≤s≤s̄
for some constant C and
s
(1 + β)sµ(F ) log n log m log2 s
α=
.
m
22 / 32
Deterministic approach: restricted isometry
Restricted isometry property (RIP)
Given A ∈ Rm×n and k ∈ {1, . . . , m}, the k-isometry constant δk
is the smallest δ ≥ 0 such that
(1 − δ)kxk22 ≤ kAxk22 ≤ (1 + δ)kxk22
for all k-sparse x ∈ Rn .
If δk < 1, we say that A satisfies the RIP with constant δk .
23 / 32
Signal recovery with a RIP matrix
Compressive sampling approach
measure
bk = hak , x̄i, k = 1, . . . , m
b = Ax̄
perform `1 recovery
x̂ := argmin{kxk1 : Ax = b}
x
Theorem (Candès, Romberg, Tao)
Assume δ2s ≤
√
2 − 1. Then the solution x̂ satisfies
kx̂ − x̄k1 ≤ C · kx̄ − x̄s k1
and
kx̂ − x̄k2 ≤ C ·
kx̄ − x̄s k1
√
s
for some constant C .
24 / 32
Matrices that satisfy RIP
With high probability an m × n matrix A satisfies the RIP in the
following cases:
• m & s log(n/s) and A is Gaussian
• m & s log(n/s) and A is binary
• m & s log4 n and A is partial DFT
• m & µ(F )s log4 n and rows of A are iid F .
25 / 32
Sampling with noise
• Measurements in real life are generally noisy
• More appropriate model
b = Ax̄ + z
noise term z ∼ N (0, σ 2 I )
• Assume all columns of A have Euclidean norm equal to one.
• Modify `1 minimization to account for noise
Lasso
min
1
kb − Axk22 + λ · σ · kxk1
2
Dantzig selector
min kxk1
kA∗ (b − Ax)k∞ ≤ λ · σ
26 / 32
Noise aware recovery (random sensing)
Theorem (Candès & Plan)
Let x̄ ∈ Rn , β > 0 and s̄ be such that m ≥ Cβ · s̄ · log n. Then
with probability at least 1 − 6/n − 6e −β the
√ solution x̂ to the
Lasso or the Dantzig selector with λ = 10 log n satisfies
!
r
log
n
kx̂ − x̄k1 ≤ min C · (1 + α2 ) · kx̄ − x̄s k1 + sσ
1≤s≤s̄
m
for some constant C .
27 / 32
Optimality of Compressive Sensing
Back to Gaussian sensing
m Gaussian measurements and `1 decoding:
kx̂ − xk2 .
kx − xs k1
m
√
.
, s≈
log(n/m)
s
Question
Can we do better with other measurements or other algorithms?
28 / 32
Signals with power law
&
!"
%
!"
$
!"
#
!"
!
!"
"
!"
!!
!"
"
!"
image
!
!"
#
!"
$
!"
%
!"
&
!"
'
!"
sorted wavelet coefficients
Power law decay: |x|(1) ≥ |x|(2) ≥ · · · ≥ |x|(n)
|x|(k) ≤
C
kp
Model
`p ball Bp := {x : kxkp ≤ 1}.
Discuss case p = 1 but same discussion applies to 0 ≤ p ≤ 1.
29 / 32
Recovery of `1 ball B
Gaussian sensing
• Suppose unknown vector is in B
• Take m Gaussian measurements
r
kx̂ − xk2 .
log(n/m) + 1
.
m
Ideal sensing
Best we can hope from m linear measurements:
Em (B) = inf sup kx − D(F (x))k.
D,E x∈B
30 / 32
Gelfand widths
Theorem (Donoho)
dm (B) ≤ Em (B) ≤ C · dm (B),
where dm (B) is the m-width of B:
dm (B) := inf sup kPV xk2 : codim(V ) < m
V
x∈B
Theorem (Kashin, Garnaev-Gluskin)
For `1 ball
r
C1 ·
log(n/m) + 1
≤ dm (B) ≤ C2 ·
m
r
log(n/m) + 1
.
m
Compressive sensing achieves the limits of performance.
31 / 32
References for today’s material
• Slides for this minicourse:
http://andrew.cmu.edu/user/jfp/UNencuentro
• E. Candès & Y. Plan, “A probabilistic and RIPless theory of
compressed sensing,” IEEE Trans. Inf. Theory, vol 57, no. 11, pp.
7235–7254, Nov. 2011.
• E. Candès, J. Romberg, & T. Tao, “Robust uncertainty principles:
Exact signal reconstruction from highly incomplete frequency
information,” IEEE Trans. Inf. Theory, vol 52, no. 2, pp. 489–509,
Feb. 2006.
• E. Candès & T. Tao, “Decoding by linear programming,” IEEE
Trans. Inf. Theory, vol 51, no. 12, pp. 4203–4215, Dec. 2005.
• D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol 52,
no. 4, pp. 1289–1306, April 2006.
• Papers on compressive sensing: http://dsp.rice.edu/cs
32 / 32