Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Compressive (or Compressed) Sensing “Detección Comprimida” Javier Peña Carnegie Mellon University UN Encuentro de Matemáticas Universidad Nacional, July 2012 Materials available at http://andrew.cmu.edu/user/jfp/UNencuentro 1 / 32 Compressive Sensing, Lecture 1 Consider the following two pictures One of these is a raw jpg 2.7MB file. The other one is a compressed jpg 300KB version. Can you tell them apart? 2 / 32 Wavelets and images Compressibility corresponds to and sparsity in a suitable basis. Wavelets Images wavelet coeffs (sorted) 16000 1000 14000 900 12000 800 10000 700 8000 600 6000 500 4000 400 2000 300 0 200 !2000 !4000 100 0 2 4 6 8 10 12 0 0 2 4 6 8 10 5 ⇓ 3000 12 5 x 10 x 10 0 !0.5 2000 !1 1000 !1.5 !2 0 !2.5 !1000 !3 1 megapixel image !3.5 !2000 !3000 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 !4 0 2 4 6 8 10 zoom in 12 5 x 10 4 x 10 (log10 sorted) 3 / 32 Compression • Take original 1 megapixel image Approximatio WaveletWavelet Approximation • Compute all 1 million wavelet coefficients • Keep only the 25K largest and set the others to zero • Invert the wavelet transform 0 0 !1 !1 !2 !2 !3 !3 !4 !4 !5 2 !5 4 original image1 megapixel 25Kapprox approximation 1 megapixel image image25k term 25k term approx B-term a 4 / 32 Going against a long established tradition? Conventional signal acquisition paradigm Acquire/Sample (A-to-D converter, digital camera) Compress (signal dependent, nonlinear) sample ! ! N sparse wavelet transform receive Fundamental question Question ! N >> M compress M transmit/store ! M decompress ! N Can we acquire the useful partcan of the If directly the signal is just compressible, it signal? be acquired efficiently? 5 / 32 Compressive sensing • Term coined by Donoho • Body of theory and algorithms for sparse signal acquisition and recovery • Seminal papers: • Candès, Romberg and Tao (2006) • Candès and Tao (2006) • Donoho (2006) • Hot area of research spanning information theory, signal processing, statistics, mathematics, etc. • Applications where measurements are • slow or costly (MRI) • missing or wasteful • beyond other capabilities such as memory 6 / 32 Plan • Introduction to compressive sensing, undetermined systems of equations, `1 minimization • Main theoretical and computational techniques • Matrix completion, undetermined linear matrix equations, nuclear norm minimization. 7 / 32 Undetermined systems of equations Problem Recover a signal x̄ ∈ Rn from m n linear measurements bk = hak , x̄i, k = 1, . . . , m b = Ax̄ • In general this is impossible. • Suppose we know that x̄ is sparse. Does that help? Example Suppose only one component of x̄ is different from zero. Can we get by with fewer than n measurements? 8 / 32 Possible approach to recover sparse x̄ Take m n measurements b = Ax̄ and then solve min kxk0 Ax = b Here k · k0 stands for the `0 quasi-norm: kxk0 = |{i : xi 6= 0}|. Relevant questions • Does this work (provided we take enough measurements)? • Suppose x̄ is k-sparse, i.e., kx̄k0 = k. How many measurements suffice? • How hard is it to solve the above `0 -minimization problem? 9 / 32 `0 versus `1 -optimization min kxk0 Ax = b computationally hard min kxk1 Ax = b computationally tractable (linear program) • `1 norm is the convex envelope of the `0 quasi-norm. • The `1 minimization problem is a convex relaxation of the `0 minimization problem. • In many cases the above `1 minimization problem yields the same solution as the `0 problem. 10 / 32 A bit of history of `1 optimization `1 minimization often finds the “right” answer • Seismology • Lasso regression • Bandlimited deconvolution • Total variation (TV) denoising • Basis pursuit 11 / 32 Acquiring a sparse signal Suppose x̄ ∈ Rn is s-sparse. • Take m random and nonadaptive measurements bk = hak , x̄i, k = 1, . . . , m b = Ax̄ • Try to reconstruct x̄ via `1 minimization. First fundamental result If m & s · log n and the ak are suitably chosen, then the recovery is exact. 12 / 32 Nonadaptive sensing of compressible signals Classical approach Compressive sensing • Measure the full signal x̄ • Take m random • Store s largest coefficients • Reconstruct x̂ via `1 (all coefficients) • Distorsion kx̄ − x̄s k2 measurements minimization Second fundamental result If m & s · log n then kx̂ − x̄k2 . kx̄ − x̄s k2 13 / 32 Optimality of compressive sensing It is not possible to do better • with fewer measurements • with other reconstruction algorithms Key features of compressive sensing • Obtain compressible signals from few sensors • Sensing is nonadaptive: no knowledge about the signal • Simple acquisition followed by `1 decoder 14 / 32 Next: formal statements • Probabilistic approach • Isotropy and incoherence • Incoherent sampling theorems • Deterministic approach • Restricted isometry property • Signal recovery theorem • Robustness to noise • Optimality 15 / 32 Probabilistic approach: random sensing Acquire x̄ ∈ Rn by measuring bk = hak , x̄i, k = 1, . . . , m b = Ax̄ where ak are iid F for some distribution F in Cn . Two key properties • Isotropy: E(aa∗ ) = I • Coherence measure µ(F ): smallest number such that max |ha, ei i|2 ≤ µ(F ) with high probability i=1,...,n We want low coherence µ(F ). 16 / 32 Coherence Observe • E(aa∗ ) = I implies µ(F ) ≥ 1. • We would like µ(F ) to be as close as possible to 1 (incoherent sensing). Examples of isotropic incoherent sensing • Gaussian sensing • Binary sensing • Partial Fourier transform Notation: for a ∈ Cn and i = 1, . . . , n a[i] := ha, ei i. 17 / 32 Isotropic incoherent sensing Gaussian sensing a ∼ N (0, I ), that is, a[1], . . . , a[n] are iid N (0, 1). In this case µ(F ) = log n. Binary sensing a[1], . . . , a[n] are iid with distribution P(a[i] = ±1) = 1/2. In this case µ(F ) = 1. Partial discrete Fourier transform • Select k uniformly at random in {0, 1, . . . , n − 1} • Set a[t] := e i2πkt/n , t = 0, 1 . . . , n − 1. In this case µ(F ) = 1. 18 / 32 An example of coherent sensing Sample random components of x • Select j uniformly at random in {1, . . . , n} • Set a = √ nej . In this case E(aa∗ ) = I and max |a[i]|2 = n. i=1,...,n For this type of sensing, how many samples do we need to recover a 1-sparse vector with high probability? 19 / 32 Incoherent sampling theorems Compressive sampling approach measure bk = hak , x̄i, k = 1, . . . , m b = Ax̄ perform `1 recovery x̂ := argmin{kxk1 : Ax = b} x Theorem (Candès & Plan) Assume x̄ is s-sparse. Then recovery is exact with probability at least 1 − 5/n − e −β provided that m ≥ C0 · (1 + β) · µ(F ) · s · log n for some constant C0 . 20 / 32 Related predecessors: Theorem (Candès & Tao) Assume x̄ is s-sparse and sample m Fourier coefficients selected at random. Then recovery is exact with probability at least 1 − O(n−β ) provided that m ≥ Cβ · s · log n for some constant Cβ that depends only on the desired accuracy β. This result is optimal: any reliable recovery method would require at least s · log n samples. Theorem (Candès & Tao) Assume x̄ ∈ Rn is s-sparse, n is prime, and we sample m Fourier coefficients. Then x̄ can be reconstructed from the m samples if m ≥ 2s. 21 / 32 Sampling of non-sparse signals Given x ∈ Rn and s < n define xs := argmin kz − xk2 kzk0 ≤k Theorem (Candès & Plan) Let x ∈ Rn , β > 0 and s̄ be such that m ≥ Cβ · s̄ · log n. Then with probability at least 1 − 6/n − 6e −β the `1 solution x̂ satisfies kx̂ − xk1 ≤ min C · (1 + α) · kx − xs k1 1≤s≤s̄ for some constant C and s (1 + β)sµ(F ) log n log m log2 s α= . m 22 / 32 Deterministic approach: restricted isometry Restricted isometry property (RIP) Given A ∈ Rm×n and k ∈ {1, . . . , m}, the k-isometry constant δk is the smallest δ ≥ 0 such that (1 − δ)kxk22 ≤ kAxk22 ≤ (1 + δ)kxk22 for all k-sparse x ∈ Rn . If δk < 1, we say that A satisfies the RIP with constant δk . 23 / 32 Signal recovery with a RIP matrix Compressive sampling approach measure bk = hak , x̄i, k = 1, . . . , m b = Ax̄ perform `1 recovery x̂ := argmin{kxk1 : Ax = b} x Theorem (Candès, Romberg, Tao) Assume δ2s ≤ √ 2 − 1. Then the solution x̂ satisfies kx̂ − x̄k1 ≤ C · kx̄ − x̄s k1 and kx̂ − x̄k2 ≤ C · kx̄ − x̄s k1 √ s for some constant C . 24 / 32 Matrices that satisfy RIP With high probability an m × n matrix A satisfies the RIP in the following cases: • m & s log(n/s) and A is Gaussian • m & s log(n/s) and A is binary • m & s log4 n and A is partial DFT • m & µ(F )s log4 n and rows of A are iid F . 25 / 32 Sampling with noise • Measurements in real life are generally noisy • More appropriate model b = Ax̄ + z noise term z ∼ N (0, σ 2 I ) • Assume all columns of A have Euclidean norm equal to one. • Modify `1 minimization to account for noise Lasso min 1 kb − Axk22 + λ · σ · kxk1 2 Dantzig selector min kxk1 kA∗ (b − Ax)k∞ ≤ λ · σ 26 / 32 Noise aware recovery (random sensing) Theorem (Candès & Plan) Let x̄ ∈ Rn , β > 0 and s̄ be such that m ≥ Cβ · s̄ · log n. Then with probability at least 1 − 6/n − 6e −β the √ solution x̂ to the Lasso or the Dantzig selector with λ = 10 log n satisfies ! r log n kx̂ − x̄k1 ≤ min C · (1 + α2 ) · kx̄ − x̄s k1 + sσ 1≤s≤s̄ m for some constant C . 27 / 32 Optimality of Compressive Sensing Back to Gaussian sensing m Gaussian measurements and `1 decoding: kx̂ − xk2 . kx − xs k1 m √ . , s≈ log(n/m) s Question Can we do better with other measurements or other algorithms? 28 / 32 Signals with power law & !" % !" $ !" # !" ! !" " !" !! !" " !" image ! !" # !" $ !" % !" & !" ' !" sorted wavelet coefficients Power law decay: |x|(1) ≥ |x|(2) ≥ · · · ≥ |x|(n) |x|(k) ≤ C kp Model `p ball Bp := {x : kxkp ≤ 1}. Discuss case p = 1 but same discussion applies to 0 ≤ p ≤ 1. 29 / 32 Recovery of `1 ball B Gaussian sensing • Suppose unknown vector is in B • Take m Gaussian measurements r kx̂ − xk2 . log(n/m) + 1 . m Ideal sensing Best we can hope from m linear measurements: Em (B) = inf sup kx − D(F (x))k. D,E x∈B 30 / 32 Gelfand widths Theorem (Donoho) dm (B) ≤ Em (B) ≤ C · dm (B), where dm (B) is the m-width of B: dm (B) := inf sup kPV xk2 : codim(V ) < m V x∈B Theorem (Kashin, Garnaev-Gluskin) For `1 ball r C1 · log(n/m) + 1 ≤ dm (B) ≤ C2 · m r log(n/m) + 1 . m Compressive sensing achieves the limits of performance. 31 / 32 References for today’s material • Slides for this minicourse: http://andrew.cmu.edu/user/jfp/UNencuentro • E. Candès & Y. Plan, “A probabilistic and RIPless theory of compressed sensing,” IEEE Trans. Inf. Theory, vol 57, no. 11, pp. 7235–7254, Nov. 2011. • E. Candès, J. Romberg, & T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol 52, no. 2, pp. 489–509, Feb. 2006. • E. Candès & T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol 51, no. 12, pp. 4203–4215, Dec. 2005. • D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol 52, no. 4, pp. 1289–1306, April 2006. • Papers on compressive sensing: http://dsp.rice.edu/cs 32 / 32