Download Notes on Testing Linearity of Functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Notes on Testing Linearity of Functions
We start with some definitions and notations. Let f : {0, 1}m → {0, 1}. (All that we shall show
extends to functions f : G → H, where G and H are groups.
Definition 1 We say that f is a linear function
if there exist coefficients b1 , . . . , bm ∈ {0, 1} such
Pm
m
the sum is taken modulo 2). In other
that for x = x1 , . . . , xm ∈ {0, 1} , f (x) = i=1 bi · xi (where P
words, there exists a subset S ⊆ {1, . . . , m} such that f (x) = i∈S xi .
Given query access to a function f : {0, 1}m → {0, 1} and a parameter , we would like to test
whether a function f is a linear function or whether it is -far from being linear. In the latter we
mean that for every linear function g, dist(f, g) > , where dist(f, g) = Prx∈{0,1}m [f (x) 6= g(x)] and
the probability is taken over the uniform choice of x.
An alternative interpretation of this task, is from the view point of coding theory. Namely, if we
think of each function f as a string of length n = 2m , where each position in the string corresponds
to a vector x ∈ {0, 1}m , then the strings we get constitute the (shortened) first-order Reed-Muller
code (also referred to as the Hadamard code). So now our task can be described as (local) testing
of the Hadamard code. Namely, given query access to a string (word) we would like to determine
whether it belongs to the code or whether it is more than -far then any codeword. As we shall
see, from our analysis we also get a random correction procedure.
We now return to the “linear functions view”. The following claim is easy to verify (and is
given as a homework exercise).
Claim 1 A function f : {0, 1}m → {0, 1} is linear if and only if for every x, y ∈ {0, 1}m ,
f (x) + f (y) = f (x + y).
The testing algorithm is very simple:
1. Repeat the following Θ(1/) times:
(a) Uniformly and independently select x, y ∈ {0, 1}m .
(b) If f (x) + f (y) 6= f (x + y) then reject (and exit).
2. If no iteration caused rejection then accept.
Let (f ) denote the distance of f to being linear. Namely, if we let L denote the set of all linear
functions then
def
(f ) = min{dist(f, g)}
(1)
g∈L
1
We note that it can be shown that for every f , (f ) ≤ 12 . By Claim 1, if (f ) = 0, that is, f is
linear, then the test accepts with probability 1. We would like to prove that for every given > 0,
if > (f ) then the probability that the test rejects is a large constant (e.g., 2/3). To this end
define
def
η(f ) = Prx,y [f (x) + f (y) 6= f (x + y)]
(2)
In other words, η(f ) is the probability that a single iteration of the algorithm “finds evidence” that
f is not a linear function. We shall show that η(f ) ≥ (f )/c for some constant c ≥ 1 (it can be
show for c = 1 but this requires using Discrete Fourier analysis and the proof we show builds on
first principles). It directly follows that if the number of iteration is at least 2c/ and (f ) > ,
then the probability that the test rejects is at least
1 − (1 − η(f ))2c/ > 1 − e−2cη(f )/ ≥ 1 − e−2 > 2/3
(3)
Somewhat un-intuitively, showing that η(f ) ≥ (f )/c is easier if (f ) is not too large. Specifically,
it is not hard to prove the following claim (given as a homework exercise):
Claim 2 For every function f , η(f ) ≥ 3(f ) · (1 − 2(f )). In particular, if (f ) ≤
3
1
2 (f ) (and more generally, if η(f ) = 2 − γ for γ > 0, then η(f ) ≥ 6γ · (f )).
1
4
then η(f ) ≥
It remains to prove that even when (f ) is not bounded away (from below) from 1/2 then still
η(f ) ≥ (f )/c for a constant c. To this end we define the following majority function: For each
x ∈ {0, 1}m , g(x) = 0 if Pry [f (x + y) − f (y) = 0] ≥ 1/2, and g(x) = 1 otherwise. Let
def
Vy (x) = f (x + y) − f (y) = f (y) + f (x + y)
(4)
be the Vote that y casts on the value of x. Then we define g by the majority vote taken over all y.
Note that if f is linear then Vy (x) = f (x) for every y.
We shall prove two lemmas:
Lemma 3 dist(f, g) ≤ 2η(f ).
Lemma 4 If η(f ) ≤
1
6
then g is a linear function.
By combining Lemmas 3 and 4 we get that η(f ) ≥ 16 (f ). To see why this is true, observe first
that if η(f ) > 16 , then the inequality clearly holds because (f ) ≤ 1. (In fact, since it can be shown
that (f ) ≤ 1/2 for every f , we actually have that η(f ) ≥ 31 (f ).) Otherwise (η(f ) ≤ 61 ), since g is
linear and dist(f, g) ≤ 2η(f ), we have that (f ) ≤ dist(f, g) ≤ 2η(f ), so that η(f ) ≥ (f )/2, and
we are done.
Proof of Lemma 3:
Let U consist of all points x such that
Pry [Vy (x) 6= f (x)] ≥ 1/2 ,
(5)
Pry [f (x + y) − f (y) 6= f (x)] ≥ 1/2 .
(6)
that is
On one hand, η(f ) ≥ |U | · 2−m · 12 (because if we select a point x ∈ U then with probability greater
than 1/2 over the choice of y, Vy (x) 6= f (x)). In other words,
|U | · 2−m ≤ 2η(f )
2
(7)
On the other hand, for every x ∈
/ U , Pry [Vy (x) 6= f (x)] < 1/2, so that Pry [Vy (x) = f (x)] > 1/2,
and we have by the definition of g as the majority function that f (x) = g(x). Therefore
dist(f, g) ≤ |U | · 2−m .
The lemma follows by combining the last two equations.
Proof of Lemma 4:
(8)
(Lemma 4)
In order to prove this lemma, we first prove the next claim.
Claim 5 For every a ∈ {0, 1}m it holds that Pry [g(a) = Vy (a)] ≥ 1 − 2η(f ).
Note that by definition of g as the “majority-vote function”, Pry [g(a) = Vy (a)] ≥
says that the majority is actually “stronger” (for small η(f )).
1
2.
The claim
Proof: Let p = Pry [g(a) = Vy (a)]. If we now consider the probability that two points y and z
have the “same vote”, then
Pry,z [Vy (a) = Vz (a)] = p2 + (1 − p)2
(9)
(either they both agree with the majority vote, or they both disagree with it). We shall show that
Pry,z [Vy (a) = Vz (a)] ≥ 1 − 2η(f ), that is 1 + 2p2 − 2p ≥ 1 − 2η(f ), or equivalently p(1 − p) ≤ η(f ).
Since p ≥ 1/2, it follows that 1 − p ≤ 2η(f ), so that p ≥ 1 − 2η(f ) (maybe give as exercise). In
what follows we shall use the fact that the range of f is {0, 1}.
Pry,z [Vy (a) = Vz (a)]
= Pry,z [Vy (a) + Vz (a) = 0]
= Pry,z [f (y) + f (a + y) + f (z) + f (a + z)]
= Pry,z [f (y) + f (a + z) + f (y + a + z) + f (z) + f (a + y) + f (z + a + y) = 0]
≥ Pry,z [f (y) + f (a + z) + f (y + a + z) = 0 ∧ f (z) + f (a + y) + f (z + a + y) = 0]
= 1 − Pry,z [f (y) + f (a + z) + f (y + a + z) = 1 ∨ f (z) + f (a + y) + f (z + a + y) = 1]
≥ 1 − (Pry,z [f (y) + f (a + z) + f (y + a + z) = 1] + Pry,z [f (z) + f (a + y) + f (z + a + y) = 1])
= 1 − 2η(f ) .
(Claim 5)
We need to show that for any two given points a, b ∈ {0, 1}m , g(a) + g(b) = g(a + b). We shall
prove this by the probabilistic method. Specifically, we shall show that there exists a point y for
which the following three equalities hold simultaneously:
1. g(a) = f (a + y)) − f (y) (= Vy (a)).
2. g(b) = f (b + (a + y)) − f (a + y) (= Va+y (b)).
3. g(a + b) = f (a + b + y)) − f (y) (= Vy (a + b)).
But in such a case,
g(a) + g(b) = f (b + a + y) − f (y) = g(a + b) ,
(10)
and we are done. To see why there exists such a point y, consider selecting y uniformly at random.
For each equality, by Claim 5, the probability that the equality does not hold is at most 2η(f ). By
3
the union bound, the probability (over a uniform selection of y) that any one of the three does not
hold is at most 6η(f ). Since η(f ) < 1/6, this is bounded away from 1, and so the probability that
there exists a point y for which all three equalities hold simultaneously, is greater than 0, and so
there exists at least one such pair.
Self-Correction
One of the nice features of the analysis is that it implies that f can be self-corrected (assuming it is
sufficiently close to being linear). That is, for any x of our choice, if we want to know the value of
the closest linear function on x, or, in the coding theory view, we want to know the correct bit in
the position corresponding to x in the closest code word, the we simply select, uniformly at random
y 1 , . . . , y t and take the “majority vote” of Vy1 (x), . . . , Vyt (x) (where the choice of t determines the
probability that the majority is correct).
Linearity over Other Groups
The argument we gave can be extended to linearity of f : G → H for any two Abbelian (commutative) groups G and H, but there is need for some modifications since we used the particular
properties of GF (2) in some parts of our analysis. First, g needs to be defined as the plurality
function in case |H| > 2 rather than that majority function. The proof of Lemma 3 remains as
is, but the proof of Lemma 4 needs to be modified. In particular we need to modify the proof of
Claim 5 (the rest of the proof did not use any particular properties of H = GF (2)).
Recall that the claim says that for every a ∈ G we have that Pry∈G [g(a) = Vy (a)] ≥ 1 − 2η(f )
(where Vy (a) = f (a + y) − f (y) as before, and η(f ) is also defined as before). We shall say that
two points y, z ∈ G are compatible w.r.t. a if two events hold:
1. f (y) + f (z − y) = f (z);
2. f (a + y) + f (z − y) = f (a + z)
First note that the probability that the two events hold when selecting y and z uniformly at random,
is at least 1 − 2η(f ). Next note that if the two events indeed hold, then by subtracting the first
equality from the second we get that
f (a + y) − f (y) = f (a + z) − f (z)
(11)
that is, Vy (a) = Vz (a) (which is why they are said to be compatible w.r.t. a). Therefore, we
have the with probability at least 1 − 2η(f ) over the choice of a random pair y, z, we have that
Vy (a) = Vz (a).
Next consider an auxiliary graph of size |G| where there is a vertex for each y ∈ G, and we
put an edge between y and z if Vy (a) = Vz (a). We claim that this auxiliary graph must have a
connected component of size at least (1 − 2η(f ))|G|. If we show this then we are done, since by
definition of the graph, all y’s in the connected component have the same vote on a.
To see why it is true that there must be such a large connected component, assume first that
there are just two connected components. One is of size α|G| and the other of size (1 − α)|G|,
where α ≥ 1 − α so that α ≥ 1/2. The point is that for every pair y and z that belong to different
4
connected components, Vy (a) 6= Vz (a). But we know that there are at most 2η · |G|2 such pairs,
and so 2α(1 − α) ≤ 2η. But since α ≥ 1/2, we have that 2α(1 − α) ≥ 1 − α and so 1 − α ≤ 2η,
implying that α ≥ 1 − 2η.
The argument easily generalized to the case where there are k > 2 connected components. Let
the sizes of the components be α1 , . . . , αk where α1 is the largest (so that in particular α1 ≥ 1/k).
Then we have that
k
X
αi (1 − αi ) ≤ 2η
i=1
On the other hand,
k
X
αi (1 − αi ) = α1 (1 − α1 ) +
i=1
k
X
(1 − αi )αi ≥
i=2
Thus we get that α1 ≥ 1 − 2η.
5
1
1
(1 − α1 ) + (k − 1) · (1 − α1 ) ≥ (1 − α1 )
k
k