Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Notes on Testing Linearity of Functions We start with some definitions and notations. Let f : {0, 1}m → {0, 1}. (All that we shall show extends to functions f : G → H, where G and H are groups. Definition 1 We say that f is a linear function if there exist coefficients b1 , . . . , bm ∈ {0, 1} such Pm m the sum is taken modulo 2). In other that for x = x1 , . . . , xm ∈ {0, 1} , f (x) = i=1 bi · xi (where P words, there exists a subset S ⊆ {1, . . . , m} such that f (x) = i∈S xi . Given query access to a function f : {0, 1}m → {0, 1} and a parameter , we would like to test whether a function f is a linear function or whether it is -far from being linear. In the latter we mean that for every linear function g, dist(f, g) > , where dist(f, g) = Prx∈{0,1}m [f (x) 6= g(x)] and the probability is taken over the uniform choice of x. An alternative interpretation of this task, is from the view point of coding theory. Namely, if we think of each function f as a string of length n = 2m , where each position in the string corresponds to a vector x ∈ {0, 1}m , then the strings we get constitute the (shortened) first-order Reed-Muller code (also referred to as the Hadamard code). So now our task can be described as (local) testing of the Hadamard code. Namely, given query access to a string (word) we would like to determine whether it belongs to the code or whether it is more than -far then any codeword. As we shall see, from our analysis we also get a random correction procedure. We now return to the “linear functions view”. The following claim is easy to verify (and is given as a homework exercise). Claim 1 A function f : {0, 1}m → {0, 1} is linear if and only if for every x, y ∈ {0, 1}m , f (x) + f (y) = f (x + y). The testing algorithm is very simple: 1. Repeat the following Θ(1/) times: (a) Uniformly and independently select x, y ∈ {0, 1}m . (b) If f (x) + f (y) 6= f (x + y) then reject (and exit). 2. If no iteration caused rejection then accept. Let (f ) denote the distance of f to being linear. Namely, if we let L denote the set of all linear functions then def (f ) = min{dist(f, g)} (1) g∈L 1 We note that it can be shown that for every f , (f ) ≤ 12 . By Claim 1, if (f ) = 0, that is, f is linear, then the test accepts with probability 1. We would like to prove that for every given > 0, if > (f ) then the probability that the test rejects is a large constant (e.g., 2/3). To this end define def η(f ) = Prx,y [f (x) + f (y) 6= f (x + y)] (2) In other words, η(f ) is the probability that a single iteration of the algorithm “finds evidence” that f is not a linear function. We shall show that η(f ) ≥ (f )/c for some constant c ≥ 1 (it can be show for c = 1 but this requires using Discrete Fourier analysis and the proof we show builds on first principles). It directly follows that if the number of iteration is at least 2c/ and (f ) > , then the probability that the test rejects is at least 1 − (1 − η(f ))2c/ > 1 − e−2cη(f )/ ≥ 1 − e−2 > 2/3 (3) Somewhat un-intuitively, showing that η(f ) ≥ (f )/c is easier if (f ) is not too large. Specifically, it is not hard to prove the following claim (given as a homework exercise): Claim 2 For every function f , η(f ) ≥ 3(f ) · (1 − 2(f )). In particular, if (f ) ≤ 3 1 2 (f ) (and more generally, if η(f ) = 2 − γ for γ > 0, then η(f ) ≥ 6γ · (f )). 1 4 then η(f ) ≥ It remains to prove that even when (f ) is not bounded away (from below) from 1/2 then still η(f ) ≥ (f )/c for a constant c. To this end we define the following majority function: For each x ∈ {0, 1}m , g(x) = 0 if Pry [f (x + y) − f (y) = 0] ≥ 1/2, and g(x) = 1 otherwise. Let def Vy (x) = f (x + y) − f (y) = f (y) + f (x + y) (4) be the Vote that y casts on the value of x. Then we define g by the majority vote taken over all y. Note that if f is linear then Vy (x) = f (x) for every y. We shall prove two lemmas: Lemma 3 dist(f, g) ≤ 2η(f ). Lemma 4 If η(f ) ≤ 1 6 then g is a linear function. By combining Lemmas 3 and 4 we get that η(f ) ≥ 16 (f ). To see why this is true, observe first that if η(f ) > 16 , then the inequality clearly holds because (f ) ≤ 1. (In fact, since it can be shown that (f ) ≤ 1/2 for every f , we actually have that η(f ) ≥ 31 (f ).) Otherwise (η(f ) ≤ 61 ), since g is linear and dist(f, g) ≤ 2η(f ), we have that (f ) ≤ dist(f, g) ≤ 2η(f ), so that η(f ) ≥ (f )/2, and we are done. Proof of Lemma 3: Let U consist of all points x such that Pry [Vy (x) 6= f (x)] ≥ 1/2 , (5) Pry [f (x + y) − f (y) 6= f (x)] ≥ 1/2 . (6) that is On one hand, η(f ) ≥ |U | · 2−m · 12 (because if we select a point x ∈ U then with probability greater than 1/2 over the choice of y, Vy (x) 6= f (x)). In other words, |U | · 2−m ≤ 2η(f ) 2 (7) On the other hand, for every x ∈ / U , Pry [Vy (x) 6= f (x)] < 1/2, so that Pry [Vy (x) = f (x)] > 1/2, and we have by the definition of g as the majority function that f (x) = g(x). Therefore dist(f, g) ≤ |U | · 2−m . The lemma follows by combining the last two equations. Proof of Lemma 4: (8) (Lemma 4) In order to prove this lemma, we first prove the next claim. Claim 5 For every a ∈ {0, 1}m it holds that Pry [g(a) = Vy (a)] ≥ 1 − 2η(f ). Note that by definition of g as the “majority-vote function”, Pry [g(a) = Vy (a)] ≥ says that the majority is actually “stronger” (for small η(f )). 1 2. The claim Proof: Let p = Pry [g(a) = Vy (a)]. If we now consider the probability that two points y and z have the “same vote”, then Pry,z [Vy (a) = Vz (a)] = p2 + (1 − p)2 (9) (either they both agree with the majority vote, or they both disagree with it). We shall show that Pry,z [Vy (a) = Vz (a)] ≥ 1 − 2η(f ), that is 1 + 2p2 − 2p ≥ 1 − 2η(f ), or equivalently p(1 − p) ≤ η(f ). Since p ≥ 1/2, it follows that 1 − p ≤ 2η(f ), so that p ≥ 1 − 2η(f ) (maybe give as exercise). In what follows we shall use the fact that the range of f is {0, 1}. Pry,z [Vy (a) = Vz (a)] = Pry,z [Vy (a) + Vz (a) = 0] = Pry,z [f (y) + f (a + y) + f (z) + f (a + z)] = Pry,z [f (y) + f (a + z) + f (y + a + z) + f (z) + f (a + y) + f (z + a + y) = 0] ≥ Pry,z [f (y) + f (a + z) + f (y + a + z) = 0 ∧ f (z) + f (a + y) + f (z + a + y) = 0] = 1 − Pry,z [f (y) + f (a + z) + f (y + a + z) = 1 ∨ f (z) + f (a + y) + f (z + a + y) = 1] ≥ 1 − (Pry,z [f (y) + f (a + z) + f (y + a + z) = 1] + Pry,z [f (z) + f (a + y) + f (z + a + y) = 1]) = 1 − 2η(f ) . (Claim 5) We need to show that for any two given points a, b ∈ {0, 1}m , g(a) + g(b) = g(a + b). We shall prove this by the probabilistic method. Specifically, we shall show that there exists a point y for which the following three equalities hold simultaneously: 1. g(a) = f (a + y)) − f (y) (= Vy (a)). 2. g(b) = f (b + (a + y)) − f (a + y) (= Va+y (b)). 3. g(a + b) = f (a + b + y)) − f (y) (= Vy (a + b)). But in such a case, g(a) + g(b) = f (b + a + y) − f (y) = g(a + b) , (10) and we are done. To see why there exists such a point y, consider selecting y uniformly at random. For each equality, by Claim 5, the probability that the equality does not hold is at most 2η(f ). By 3 the union bound, the probability (over a uniform selection of y) that any one of the three does not hold is at most 6η(f ). Since η(f ) < 1/6, this is bounded away from 1, and so the probability that there exists a point y for which all three equalities hold simultaneously, is greater than 0, and so there exists at least one such pair. Self-Correction One of the nice features of the analysis is that it implies that f can be self-corrected (assuming it is sufficiently close to being linear). That is, for any x of our choice, if we want to know the value of the closest linear function on x, or, in the coding theory view, we want to know the correct bit in the position corresponding to x in the closest code word, the we simply select, uniformly at random y 1 , . . . , y t and take the “majority vote” of Vy1 (x), . . . , Vyt (x) (where the choice of t determines the probability that the majority is correct). Linearity over Other Groups The argument we gave can be extended to linearity of f : G → H for any two Abbelian (commutative) groups G and H, but there is need for some modifications since we used the particular properties of GF (2) in some parts of our analysis. First, g needs to be defined as the plurality function in case |H| > 2 rather than that majority function. The proof of Lemma 3 remains as is, but the proof of Lemma 4 needs to be modified. In particular we need to modify the proof of Claim 5 (the rest of the proof did not use any particular properties of H = GF (2)). Recall that the claim says that for every a ∈ G we have that Pry∈G [g(a) = Vy (a)] ≥ 1 − 2η(f ) (where Vy (a) = f (a + y) − f (y) as before, and η(f ) is also defined as before). We shall say that two points y, z ∈ G are compatible w.r.t. a if two events hold: 1. f (y) + f (z − y) = f (z); 2. f (a + y) + f (z − y) = f (a + z) First note that the probability that the two events hold when selecting y and z uniformly at random, is at least 1 − 2η(f ). Next note that if the two events indeed hold, then by subtracting the first equality from the second we get that f (a + y) − f (y) = f (a + z) − f (z) (11) that is, Vy (a) = Vz (a) (which is why they are said to be compatible w.r.t. a). Therefore, we have the with probability at least 1 − 2η(f ) over the choice of a random pair y, z, we have that Vy (a) = Vz (a). Next consider an auxiliary graph of size |G| where there is a vertex for each y ∈ G, and we put an edge between y and z if Vy (a) = Vz (a). We claim that this auxiliary graph must have a connected component of size at least (1 − 2η(f ))|G|. If we show this then we are done, since by definition of the graph, all y’s in the connected component have the same vote on a. To see why it is true that there must be such a large connected component, assume first that there are just two connected components. One is of size α|G| and the other of size (1 − α)|G|, where α ≥ 1 − α so that α ≥ 1/2. The point is that for every pair y and z that belong to different 4 connected components, Vy (a) 6= Vz (a). But we know that there are at most 2η · |G|2 such pairs, and so 2α(1 − α) ≤ 2η. But since α ≥ 1/2, we have that 2α(1 − α) ≥ 1 − α and so 1 − α ≤ 2η, implying that α ≥ 1 − 2η. The argument easily generalized to the case where there are k > 2 connected components. Let the sizes of the components be α1 , . . . , αk where α1 is the largest (so that in particular α1 ≥ 1/k). Then we have that k X αi (1 − αi ) ≤ 2η i=1 On the other hand, k X αi (1 − αi ) = α1 (1 − α1 ) + i=1 k X (1 − αi )αi ≥ i=2 Thus we get that α1 ≥ 1 − 2η. 5 1 1 (1 − α1 ) + (k − 1) · (1 − α1 ) ≥ (1 − α1 ) k k