Download Indyk Lecture 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Recent Developments in the
Sparse Fourier Transform
Piotr Indyk
MIT
1
Basics
• Overview of Sparse Fourier Transforms
– Faster Fourier Transform for signals with sparse spectra
• Elena will help me with that (thanks, Elena!)
• Based on lectures 1…6 from my course “Algorithms and
Signal Processing”, available at
https://stellar.mit.edu/S/course/6/fa14/6.893/materials.html
• Will try compress 6 x 80 mins into 4 hours
– Will try to minimize the distortion 
• Rough plan:
–
–
–
–
Overview
Algorithms for signals with one non-zero/large Fourier coefficient
Average case algorithms
Worst case algorithms
2
Sparsity
• Signal a=a0….an-1
• Sparse approximation:
approximate a by a’
parametrized by k
coefficients, k<<n
• Applications:
–
–
–
–
Compression
Denoising
Machine learning
…
3
Sparse approximations –
transform coding
• Transform coding:
– Compute Ta where T is an nxn orthonormal
matrix
– Compute Hk(Ta) by keeping only the largest*
k coefficients of Ta
– T-1 (Hk(Ta)) is a k-sparse approximation to a
w.r.t. T
*In magnitude
4
Transform coding: examples
Fourier
Transform+thresh
olding
+Inverse FT
Wavelet Transform
+thesholding
+Inverse WT
Sparsity k=0.1 n
5
(Discrete) Fourier Transform
• Input (time domain): a=a0….an-1
– Today: n=2l
• Output (frequency domain): â=â0…ân-1, where
âu = 1/n Σj aj e -2πi/n uj
DFT
• Notation: ω=ωn=e 2πi/n is the n-th root of unity
• Then
âu = 1/n Σj aj ω-uj
• Matrix notation: â = F a
•
•
Fuj=1/n ω-uj
F-1 ju= ωuj
6
Computing â = F a
• Matrix-vector multiplication – O(n2) time
• Fast Fourier Transform: computes â from a (and
vice versa) in O(n log n) time
• One can then sort the coefficients and select top k
of them – O(n log n) time
• O(n log n) time overall
…. to compute k coefficients
• Sparse Fourier Transform:
– Directly computes the k largest coefficients of â
(approximately - formal definition in a moment)
– Running time: as low as O(k log n)
(more details in a moment)
– Sub-linear time – cannot read the whole input
7
Sparse Fourier Transform (k=1)
• Warmup: â is exactly 1-sparse, i.e., âu’ =0
for all u’≠u, for some u
• I.e., the signal is “pure” – only one
frequency is present
• Need to find u and âu
8
Two-point sampling
• We have
aj=âu ωuj
• Sample a0 , a1
• Observe:
2πi u / n
– a 0 = âu
– a1 = âu ωu  a1/a0 = ωu
– Can read u from the angle
of ωu
• Constant time algorithm!
• Unfortunately, it relies
heavily on signal being
pure
9
What if â is not quite1-sparse ?
• Ideally, would like to find 1-sparse â* that
contains the largest entry of â
• We will need to allow approximation:
compute a 1-sparse â’ such that
||â-â’||2 ≤ C ||â-â*||2
where C is the approximation factor
– L2/L2 guarantee
• The definition naturally extends to general
sparsity k
10
Interlude: History of Sparse
Fourier Transform
• Hadamard Transform: [Goldreich-Levin’89, KushilevitzMansour’91]
• Fourier Transform:
– Mansour’92: poly(k, log n)
• Assuming coefficients are integers from –poly(n)…poly(n)
– Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02: k2 poly(log n)
– Gilbert-Muthukrishnan-Strauss’04: k poly(log n)
– Hassanieh-Indyk-Katabi-Price’12: k log n log(n/k)
(or k log n for signals that are exactly k sparse)
– Akavia-Goldwasser-Safra’03, Iwen’08, Akavia’10,…
– Many papers since 2012 (see later lectures)
11
(Discrete) Fourier Transform
• Input (time domain): a=a0….an-1
– Today: n=2l
• Output (frequency domain): â=â0…ân-1, where
âu = 1/n Σj aj e -2πi/n uj
DFT
• Notation: ω=ωn=e 2πi/n = cos(2/n)+isin(2/n) is the n-th
root of unity
• Then
âu = 1/n Σj aj ω-uj
• Matrix notation: â = F a
•
•
Fuj=1/n ω-uj
F-1 ju= ωuj
12
Back to k=1
• Compute a 1-sparse â’ such that
||â-â’||2 ≤ C ||â-â*||2
• Note that the guarantee is meaningful only for
signals where C||â-â*||2 <||â||2
– Otherwise, can just report â’ =0
• So we will assume Σu’≠u â2u’ <ε â2u holds for some
u, for small ε=ε(C)
• Will describe the algorithm assuming the exactly
1-sparse case first, and deal with epsilons later
• We will find u bit by bit
(method introduced in GGIMS’02 and GMS’04)
13
Bit 0: compute u mod 2
• We assumed aj=âu ωuj
• Suppose u=2v+b, we want b
• Sample:
– a0+r =âu ωur
– an/2+r=âu ωu n/2ωur = âu ω2v n/2 +b n/2ωur = âu (-1)bωur
• Test: b =0 iff |a0 -an/2| < |a0 +an/2|
Actual test:
b=0 iff |ar -an/2+r| < |ar +an/2+r| ,
where r is chosen uniformly at random from 0…n-1
14
Bit 1
• Can pretend that b =u mod 2 = 0. Why ?
– Claim: if a’j=aj ωbj then â’u =â’u-b
(“Time shift” theorem. Follows directly from the definition of
FT)
– If b=1 then we replace a by a’
• Since u mod 2 = 0, we have u =2v, and therefore
aj=âu ωuj = â2v ω2vj = â’v ωn/2 vj
for a frequency vector â’v =â2v , v=0…n/2-1
• So, a0…an/2-1 are time samples of â’0…â’n/2-1, and â’v is
the large coefficient in â’
• Bit 1 of u is bit 0 of v, so we can compute it by sampling
a0…an/2-1 as before
• I.e., v=0 mod 2 iff |ar –an/4+r| < |ar +an/4+r|
15
Bit reading recap
• Bit b0: test if
|ar -an/2+r| < |ar +an/2+r|
• Bit b1: test if
|ar ωrb0–an/4+r ω(n/4+r) b0| < |arωrb0 +an/4+r ω(n/4+r) b0|
• Bit b2: test if
|ar ωr (b0 +2b1) –an/8+r ω(n/8+r) (b0 +2b1)| <|ar ωr (b0 +2b1) –an/8+r ω(n/8+r) (b0 +2b1)|
• …
• Altogether, we use O(log n) samples to identify u
• O(log n) time algorithm in the exactly 1-sparse
case
16
Dealing with noise
• We now have
aj=âu ωuj + Σu’≠u âu’ ωu’j
=âu ωuj + μj
• Observe that Σ μj2 =n Σu’≠u âu’2
• Therefore, if we choose r at random from
0…n-1, we have
Er[μr2] = Σu’≠u âu’2
• Since we assumed Σu’≠u â2u’ <ε â2u, it follows
that
E[μr2] < ε â2u
17
Dealing with noise II
• Claim: For all values of μr, μn/2+r satisfying
2 (|μr|+|μn/2+r|)<|âu|
the outcome of the comparison
|ar -an/2+r| < |ar +an/2+r|
is always the same.
• From Markov inequality we have
Prr [ |μr| >|âu|/4 ] = Prr [ |μr|2 >|âu|2/16 ] ≤16 ε
• The same probability bound holds for
μn/2+r
• Corollary: If 32ε <1/3, then a bit test is
correct with probability >2/3
aj=âu ωuj + μj
E[μr2] < ε â2u
18
Finding the heavy hitter:
algorithm/analysis
• For each bit bi, make O(log log n)
independent tests, and use a majority vote
• From Chernoff bound, the probability that
bi is estimated incorrectly is < 1/(4 log n)
• The probability that the coordinate is
estimated incorrectly is at most
log n/(4 log n) =1/4
• Total complexity: O(log n * log log n)
19
Estimating the value of the
heavy hitter
• Recall that
aj=âu ωuj + Σu’≠u âu’ ωu’j
=âu ωuj + μj
where E[μr2] = Σu’≠u âu’2 = ||â - â*||2
• By Markov inequality
Prr [ |μr| >2 ||â - â*||2 ] ≤ ¼
• In this case, the estimate â’u =aj ω-uj satisfies
|â’u - âu| ≤ 2 ||â - â*||2
• If we set â’u’=0 for all u’ ≠u then
||â-â’||2 ≤ |â’u - âu| + ||â - â*||2 ≤ 3||â - â*||2
so â’ satisfies the L2/L2 guarantee
20
Related documents