Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University Property Testing (Informal Definition) For a fixed property P and any object O, determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). ? ? ? ? ? Task should be performed by inspecting the object (in as few places as possible). Examples • The object can be a function and the property can be linearity. • The object can be a string and the property can be membership in a fixed regular language L. • The object can be a graph and the property can be 3-colorabilty. Two Views of Property Testing • A relaxation of exactly deciding whether the object has the property or does not have the property. • A relaxation of learning the object (with membership queries and under the uniform distribution). In either case want testing algorithm to be significantly more efficient than decision/learning algorithm. Q: Which view is “more right”? A: Depends… Mainly on type of objects and properties studied: Combinatorial objects and properties vs. function classes that are of interest to learning community A Little Background • Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). • With Goldreich and Goldwasser initiated study of testing properties of combinatorial objects, and in particular graphs. • Growing body of work deals with properties of functions, graphs, strings, sets of points ... Many algorithms with complexity that is sub-linear in (or even independent of) size of object. Formal Definition (“standard” model) A property testing algorithm for property P is given a distance parameter and query access(*) to a function f. If f has property P then the algorithm should accept w.h.p. If f is -far from any function having property P then the algorithm should reject w.h.p. Distance is measured with respect to the uniform dist.(**) (*) May consider testing with random examples only (**) May consider other distributions (unknown dist.) Property Testing and Learning: Basic Comments/Observations Comment (Motivation): Can use testing as preliminary step to learning: That is, for efficiently selecting good hypothesis class. Observation: Testing is no harder than (proper) learning: If have learning algorithm for function class F that outputs hypothesis in F, then can use to test the property of belonging to F. That is, run learning alg with accuracy parameter set to /2, and check that the hypothesis it outputs is at most 3/4 far from f on independent sample. Want testing algorithm to be more efficient than learning algorithm. Classes/Properties for which testing is more efficient then learning Linear functions Low-degree polynomials Singletons, Monomials (small) DNF and general Boolean formula and circuits Monotone functions Juntas Halfspaces Decision Lists, Decision Trees, Branching Programs Clustering Properties of Distributions Two recurring approaches: “Self-Correcting” approach, and “Enforce&Test” Approach. Linearity Testing [Blum,Luby,Rubinfeld] Def1: Let F be a finite field. A function f : Fn F is called linear (multi-linear) if there exists constants a1,…,an F s.t. for every x=x1,…,xn Fn it holds that f(x) = aixi . Def2: A function f is said to be -far from linear if for every linear function g, dist(f,g)>, where dist(f,g)=Pr[f(x) g(x)] (x selected uniformly in Fn). Fact: A function f : Fn F is linear i.f.f for every x,y Fn it holds that f(x)+f(y)=f(x+y) . Linearity Testing Cont’ Linearity Test 1) Uniformly and independently select (1/) pairs of elements x,y Fn . 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y) f(x+y)), then REJECT, otherwise ACCEPT. Query complexity: (1/), i.e., independent of n. In contrast to learning where need (n) queries/examples. Theorem: If f is linear then test accepts w.p. 1., and if f is -far from linear then with probability at least 2/3 the test rejects it. Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear. Lemma: If f is accepted with probability greater than 1/3 , then f is -close to linear. Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Define self-corrected version of f, denote g: For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x) g(x) = Plurality(Vy(x)) Can show that (conditioned on < /2 fraction of violating pairs) (1) g is linear. (2) dist(f,g) Testing “Basic” Properties of Functions [Parnas, R, Samorodnitsky] Considers following function classes: • Singletons: f ( x) xi f ( x) xi x j xk f ( x) ( xi x j ) ( xk x xm ) • Monomials: • DNF: Testing “Basic” Properties of Functions Cont’ • Can test whether f is a singleton using O(1/) queries. • Can test whether f is a monomial using O(1/) queries. • Can test whether f is a monotone DNF with at most t terms using Õ(t2/) queries. Common theme: no dependence in query complexity on size of input, n, and linear dependence on distance parameter, (as opposed to learning these classes where have dependence on n (logarithmic)) Recent result of [Diakonikolas, Lee, Matulef, Onak, Rubinfeld, Servedio, Wan] greatly extends the above Testing (Monotone) Singletons Singletons satisfy: (1) (2) Pr[ f ( x) 1] 1 / 2 f ( x y) f ( x) f ( y) x, y Natural test: check, by sampling, that conditions hold (approximately). Can analyze natural test for case that distance between function and class of singletons is not too big (bounded away from 1/2). Testing Singletons II - Parity Testing Observation: Singletons are a special case of parity functions (i.e., linear functions: g ( x) xi .) iS Claim: Let g ( x) xi . If | S | 2 iS then Pr[ g ( x y ) g ( x) g ( y )] 1 / 4 Modified algorithm: (1) Test whether f is a parity function (with dist. par. ) using algorithm of [BLR] . (2) Uniformly select constant number of pairs x,y and check whether any is a violating pair (i.e.: f ( x y ) f ( x) f ( y ) ). Testing Singletons III - Self Correcting This “almost works”: If f is singleton - always accepted. If f is -far from parity - rejected w.h.p. But if f is -close to parity function g, then cannot simply apply claim to argue that many violating pairs w.r.t. f. If we could only test violations w.r.t. g instead of f ... Use Self-Corrector of [BLR] to “fix” f into parity function (g), and then test violations on self-corrected version. Testing Singletons IV - The Algorithm Final Algorithm for Testing Singletons: (1) Test whether f is a parity function with dist. par. using algorithm of [BLR] . (2) Uniformly select constant number of pairs x,y. Verify that Self-Cor(f,x) Self-Cor(f,y) = Self-Cor(f,xy) . (3) Verify that Self-Cor( f ,1) = 1 . Testing Monomials and Monotone DNF Monomial testing algorithm has similar structure to Singleton testing algorithm. (Here too suffice to find test for monotone monomials.) - The first stage of linearity testing is replaced by Affinity Testing: if f is a monomial then F1={x: f(x)=1} is an affine subspace. [ Fact: H is affine subspace i.f.f x,y,zH, xyz H ]. Affinity test is similar to parity test: select x,yF1, z{0,1}n, verify that f(xyz)=f(x)f(y)f(z). - The second stage is also analogous singleton test (check for violating pairs). Here affinity adds structure that helps analyze second stage. Testing monotone DNF: use monomial test as sub-routine Result of [DLMORSW07] which extends to other families (e.g., non-monotone DNF) uses different techniques. Testing of Clustering [Alon,Dar,Parnas,R] Notation: X - set of points |X| = n dist(x,y) - distance between points x and y. Assume that triangle inequality holds (dist(x,y)≤ dist(x,z)+dist(z,y)). For any subset S of X: Diameter of S d(S) = maxx,ySdist(x,y) Clustering Cont’ X is (k,b)-clusterable if exists a k-way partition (clustering) of X s.t. each cluster has diameter at most b. b X is -far from being (k,b’)-clusterable (b’ b) if there is no k-way partition of any Y X, |Y| ≥ (1-)n s.t. each cluster has diameter at most b’. (In particular, will look at b’=(1+)b, ≤1 .) In first case algorithm should accept and in second reject with probability ≥ 2/3 Clustering Cont’ Testing Algorithm (input: k,b,, ) (1) Take sample of m=m(k, ,) points from X. (2) If sample is (k,b)-clusterable then accept, o.w. reject If X is (k,b)-clusterable then always accept. Suppose X is -far from being (k,(1+)b)-clusterable. Show that reject w.p. at least 2/3. Will prove for general metric and =1 where m=O(k/) Note: for general metric cannot go below =1 unless allow m=(|X|1/2), but can do so for Euclidean distance in d dimensions (in that case must have dependence on (1/ )d/2 ). Other sublinear clustering work (e.g. for k-median) include: [Indyk], [Mishra,Oblinger,Pitt], [Ben-David], [Meyerson,O’Callaghan,Plotkin] Clustering Cont’ Consider following mental experiment: - Let points in sample be x1,x2,…,xm - Construct (growing) set of “cluster representatives” REPS. - Initially, REPS = {x1}. - At each step, take next point that is at distance > b from every x in REPS, and add to REPS. >b Claim: If X is -far from being (k, 2b)-clusterable then w.h.p. |REPS|>k, causing the algorithm to reject as required. Proof Idea: At each step consider clustering “enforced” by REPS: Each point in X\REPS is assigned to closest xi in REPS. Additional sample points “test” this clustering. Tolerant Testing of Clustering [Parnas,R,Rubinfeld] Tolerant Testing: Reject when -far but accept when ’-close Tolerant Testing Algorithm (input: k, b, ,’, ) (1) Take sample of m=m(k, ,’, ) points from X. (2) If sample is (’ + ( - ’)/2)-close to (k,b)-clusterable then accept, o.w. reject Can analyze using a generalization of a framework by Czumaj & Sohler for (standard) testing that captures aspects of “enforce&test” approach. Sample has quadratic dependence on 1/( - ’), and same dependence on other parameters as (standard) testing algorithm. Distribution-Free Testing (with queries) First results [GGR]: trivial positive results that follow from learning testing, and simple negative results for combinatorial properties (e.g., bipartiteness) First non-trivial positive results [Halevi, Kushilevitz]: (1) Linearity and Low-degree polynomials. In general: when have self-corrector. (2) Monotonicity in low dimensions (Also had positive results for graph properties) On the other hand, [Halevi, Kushilevitz] showed hardness of distribution free testing of monotonicity in high-dimensions (i.e., exponential in d over {0,1}d). Recently, [Glasner,Servedio] showed that for several classes (monomials, decisions lists, linear threshold functions) need ((n/log n)1/5) queries for dist-free testing. Conclusions and Open Problems Property testing: A relaxation of learning where should determine (w.h.p) if exists good approximation of function f in class F rather than find such approximation. Can serve as preliminary step to learning. For quite a few function classes have testing algorithms that are more query-efficient than learning algorithms. Some extend to dist-free testing, for some have strong lower bounds. Still much is left to be understood about relation between testing and learning (has flavor of relation between decision and search). Thanks Property Testing and Learning: Basic Comments/Observations Motivation I: Can use testing as preliminary step to learning: That is, for efficiently selecting good hypothesis class. Motivation II: Relation between testing and learning has similarity to relation between decision and search. Testing is no harder than (proper) learning: If have learning algorithm for function class F that outputs hypothesis in F, then can use to test the property of belonging to F. That is, run learning alg with accuracy parameter set to /2, and check that the hypothesis it outputs is at most 3/4 far from f on independent sample. Want testing algorithm to be significantly more efficient than learning algorithm. Lemma: If f is accepted with probability greater than 1/3 , then f is -close to linear. Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Define self-corrected version of f, denote g: For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x) g(x) = Plurality(Vy(x)) Can show that (conditioned on < /2 fraction of violating pairs) (1) g is linear. (2) dist(f,g) Main Technical Lemma (informal): if few violating pairs then x we have that for almost all y, Vy(x)=g(x) Learning Boolean Formulae • Can learn singletons and monomials under uniform distribution using O(log n/) queries/examples (variation on Occam’s razor) • Can properly learn monotone DNF with t terms and r literals using Õ(r log2n/ + t(r + 1/ )) queries [Angluin+Bshouty&Jackson&Tamon]. Main difference w.r.t testing results: in testing there is no dependence on n and different algorithmic approach.