Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Recent Developments in the Sparse Fourier Transform Piotr Indyk MIT 1 Basics • Overview of Sparse Fourier Transforms – Faster Fourier Transform for signals with sparse spectra • Elena will help me with that (thanks, Elena!) • Based on lectures 1…6 from my course “Algorithms and Signal Processing”, available at https://stellar.mit.edu/S/course/6/fa14/6.893/materials.html • Will try compress 6 x 80 mins into 4 hours – Will try to minimize the distortion • Rough plan: – – – – Overview Algorithms for signals with one non-zero/large Fourier coefficient Average case algorithms Worst case algorithms 2 Sparsity • Signal a=a0….an-1 • Sparse approximation: approximate a by a’ parametrized by k coefficients, k<<n • Applications: – – – – Compression Denoising Machine learning … 3 Sparse approximations – transform coding • Transform coding: – Compute Ta where T is an nxn orthonormal matrix – Compute Hk(Ta) by keeping only the largest* k coefficients of Ta – T-1 (Hk(Ta)) is a k-sparse approximation to a w.r.t. T *In magnitude 4 Transform coding: examples Fourier Transform+thresh olding +Inverse FT Wavelet Transform +thesholding +Inverse WT Sparsity k=0.1 n 5 (Discrete) Fourier Transform • Input (time domain): a=a0….an-1 – Today: n=2l • Output (frequency domain): â=â0…ân-1, where âu = 1/n Σj aj e -2πi/n uj DFT • Notation: ω=ωn=e 2πi/n is the n-th root of unity • Then âu = 1/n Σj aj ω-uj • Matrix notation: â = F a • • Fuj=1/n ω-uj F-1 ju= ωuj 6 Computing â = F a • Matrix-vector multiplication – O(n2) time • Fast Fourier Transform: computes â from a (and vice versa) in O(n log n) time • One can then sort the coefficients and select top k of them – O(n log n) time • O(n log n) time overall …. to compute k coefficients • Sparse Fourier Transform: – Directly computes the k largest coefficients of â (approximately - formal definition in a moment) – Running time: as low as O(k log n) (more details in a moment) – Sub-linear time – cannot read the whole input 7 Sparse Fourier Transform (k=1) • Warmup: â is exactly 1-sparse, i.e., âu’ =0 for all u’≠u, for some u • I.e., the signal is “pure” – only one frequency is present • Need to find u and âu 8 Two-point sampling • We have aj=âu ωuj • Sample a0 , a1 • Observe: 2πi u / n – a 0 = âu – a1 = âu ωu a1/a0 = ωu – Can read u from the angle of ωu • Constant time algorithm! • Unfortunately, it relies heavily on signal being pure 9 What if â is not quite1-sparse ? • Ideally, would like to find 1-sparse â* that contains the largest entry of â • We will need to allow approximation: compute a 1-sparse â’ such that ||â-â’||2 ≤ C ||â-â*||2 where C is the approximation factor – L2/L2 guarantee • The definition naturally extends to general sparsity k 10 Interlude: History of Sparse Fourier Transform • Hadamard Transform: [Goldreich-Levin’89, KushilevitzMansour’91] • Fourier Transform: – Mansour’92: poly(k, log n) • Assuming coefficients are integers from –poly(n)…poly(n) – Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02: k2 poly(log n) – Gilbert-Muthukrishnan-Strauss’04: k poly(log n) – Hassanieh-Indyk-Katabi-Price’12: k log n log(n/k) (or k log n for signals that are exactly k sparse) – Akavia-Goldwasser-Safra’03, Iwen’08, Akavia’10,… – Many papers since 2012 (see later lectures) 11 (Discrete) Fourier Transform • Input (time domain): a=a0….an-1 – Today: n=2l • Output (frequency domain): â=â0…ân-1, where âu = 1/n Σj aj e -2πi/n uj DFT • Notation: ω=ωn=e 2πi/n = cos(2/n)+isin(2/n) is the n-th root of unity • Then âu = 1/n Σj aj ω-uj • Matrix notation: â = F a • • Fuj=1/n ω-uj F-1 ju= ωuj 12 Back to k=1 • Compute a 1-sparse â’ such that ||â-â’||2 ≤ C ||â-â*||2 • Note that the guarantee is meaningful only for signals where C||â-â*||2 <||â||2 – Otherwise, can just report â’ =0 • So we will assume Σu’≠u â2u’ <ε â2u holds for some u, for small ε=ε(C) • Will describe the algorithm assuming the exactly 1-sparse case first, and deal with epsilons later • We will find u bit by bit (method introduced in GGIMS’02 and GMS’04) 13 Bit 0: compute u mod 2 • We assumed aj=âu ωuj • Suppose u=2v+b, we want b • Sample: – a0+r =âu ωur – an/2+r=âu ωu n/2ωur = âu ω2v n/2 +b n/2ωur = âu (-1)bωur • Test: b =0 iff |a0 -an/2| < |a0 +an/2| Actual test: b=0 iff |ar -an/2+r| < |ar +an/2+r| , where r is chosen uniformly at random from 0…n-1 14 Bit 1 • Can pretend that b =u mod 2 = 0. Why ? – Claim: if a’j=aj ωbj then â’u =â’u-b (“Time shift” theorem. Follows directly from the definition of FT) – If b=1 then we replace a by a’ • Since u mod 2 = 0, we have u =2v, and therefore aj=âu ωuj = â2v ω2vj = â’v ωn/2 vj for a frequency vector â’v =â2v , v=0…n/2-1 • So, a0…an/2-1 are time samples of â’0…â’n/2-1, and â’v is the large coefficient in â’ • Bit 1 of u is bit 0 of v, so we can compute it by sampling a0…an/2-1 as before • I.e., v=0 mod 2 iff |ar –an/4+r| < |ar +an/4+r| 15 Bit reading recap • Bit b0: test if |ar -an/2+r| < |ar +an/2+r| • Bit b1: test if |ar ωrb0–an/4+r ω(n/4+r) b0| < |arωrb0 +an/4+r ω(n/4+r) b0| • Bit b2: test if |ar ωr (b0 +2b1) –an/8+r ω(n/8+r) (b0 +2b1)| <|ar ωr (b0 +2b1) –an/8+r ω(n/8+r) (b0 +2b1)| • … • Altogether, we use O(log n) samples to identify u • O(log n) time algorithm in the exactly 1-sparse case 16 Dealing with noise • We now have aj=âu ωuj + Σu’≠u âu’ ωu’j =âu ωuj + μj • Observe that Σ μj2 =n Σu’≠u âu’2 • Therefore, if we choose r at random from 0…n-1, we have Er[μr2] = Σu’≠u âu’2 • Since we assumed Σu’≠u â2u’ <ε â2u, it follows that E[μr2] < ε â2u 17 Dealing with noise II • Claim: For all values of μr, μn/2+r satisfying 2 (|μr|+|μn/2+r|)<|âu| the outcome of the comparison |ar -an/2+r| < |ar +an/2+r| is always the same. • From Markov inequality we have Prr [ |μr| >|âu|/4 ] = Prr [ |μr|2 >|âu|2/16 ] ≤16 ε • The same probability bound holds for μn/2+r • Corollary: If 32ε <1/3, then a bit test is correct with probability >2/3 aj=âu ωuj + μj E[μr2] < ε â2u 18 Finding the heavy hitter: algorithm/analysis • For each bit bi, make O(log log n) independent tests, and use a majority vote • From Chernoff bound, the probability that bi is estimated incorrectly is < 1/(4 log n) • The probability that the coordinate is estimated incorrectly is at most log n/(4 log n) =1/4 • Total complexity: O(log n * log log n) 19 Estimating the value of the heavy hitter • Recall that aj=âu ωuj + Σu’≠u âu’ ωu’j =âu ωuj + μj where E[μr2] = Σu’≠u âu’2 = ||â - â*||2 • By Markov inequality Prr [ |μr| >2 ||â - â*||2 ] ≤ ¼ • In this case, the estimate â’u =aj ω-uj satisfies |â’u - âu| ≤ 2 ||â - â*||2 • If we set â’u’=0 for all u’ ≠u then ||â-â’||2 ≤ |â’u - âu| + ||â - â*||2 ≤ 3||â - â*||2 so â’ satisfies the L2/L2 guarantee 20