Download Testing of Clustering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Property Testing:
A Learning Theory Perspective
Dana Ron
Tel Aviv University
Property Testing (Informal Definition)
For a fixed property P and any object O,
determine whether O has property P,
or whether O is far from having property P
(i.e., far from any other object having P ).
?
?
?
?
?
Task should be performed by inspecting the object (in
as few places as possible).
Examples
• The object can be a function and the property
can be linearity.
• The object can be a string and the property can
be membership in a fixed regular language L.
• The object can be a graph and the property can
be 3-colorabilty.
Two Views of Property Testing
• A relaxation of exactly deciding whether the object
has the property or does not have the property.
• A relaxation of learning the object (with membership
queries and under the uniform distribution).
In either case want testing algorithm to be significantly
more efficient than decision/learning algorithm.
Q: Which view is “more right”?
A: Depends…
Mainly on type of objects and properties studied:
Combinatorial objects and properties vs. function
classes that are of interest to learning community
A Little Background
• Initially defined by Rubinfeld and Sudan in the
context of Program Testing (of algebraic functions).
• With Goldreich and Goldwasser initiated study of
testing properties of combinatorial objects, and
in particular graphs.
• Growing body of work deals with properties of
functions, graphs, strings, sets of points ...
Many algorithms with complexity that is sub-linear in
(or even independent of) size of object.
Formal Definition (“standard” model)
A property testing algorithm for property P is given a distance
parameter  and query access(*) to a function f.
 If f has property P then the algorithm should accept w.h.p.
 If f is -far from any function having property P then the
algorithm should reject w.h.p.
Distance is measured with respect to the uniform dist.(**)
(*) May consider testing with random examples only
(**) May consider other distributions (unknown dist.)
Property Testing and Learning:
Basic Comments/Observations
 Comment (Motivation): Can use testing as preliminary step
to learning: That is, for efficiently selecting good hypothesis
class.
 Observation: Testing is no harder than (proper) learning:
If have learning algorithm for function class F that outputs
hypothesis in F, then can use to test the property of belonging
to F.
That is, run learning alg with accuracy parameter
set to /2, and check that the hypothesis it
outputs is at most 3/4 far from f on independent
sample.
Want testing algorithm to be more efficient than learning
algorithm.
Classes/Properties for which testing is
more efficient then learning










Linear functions
Low-degree polynomials
Singletons, Monomials
(small) DNF and general Boolean formula and circuits
Monotone functions
Juntas
Halfspaces
Decision Lists, Decision Trees, Branching Programs
Clustering
Properties of Distributions
Two recurring approaches: “Self-Correcting” approach, and
“Enforce&Test” Approach.
Linearity Testing [Blum,Luby,Rubinfeld]
Def1: Let F be a finite field. A function f : Fn  F is called
linear (multi-linear) if there exists constants a1,…,an  F
s.t. for every x=x1,…,xn  Fn it holds that
f(x) =  aixi .
Def2: A function f is said to be -far from linear if for
every linear function g, dist(f,g)>, where
dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in Fn).
Fact: A function f : Fn  F is linear i.f.f for every x,y  Fn it
holds that f(x)+f(y)=f(x+y) .
Linearity Testing Cont’
Linearity Test
1) Uniformly and independently select (1/) pairs of
elements x,y  Fn .
2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y).
3) If for any of the pairs selected linearity is violated (i.e.,
f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT.
Query complexity: (1/), i.e., independent of n. In
contrast to learning where need (n) queries/examples.
Theorem: If f is linear then test accepts w.p. 1., and if f
is -far from linear then with probability at least 2/3 the
test rejects it.
Lemma: If f is accepted with probability greater than 1/3,
then f is -close to linear.
Lemma: If f is accepted with probability
greater than 1/3 , then f is -close to linear.
Linearity Testing Cont’
Suppose f is accepted w.p > 1/3
small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y))
Define self-corrected version of f, denote g:
For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x)
g(x) = Plurality(Vy(x))
Can show that (conditioned on < /2 fraction of violating pairs)
(1) g is linear.
(2) dist(f,g)  
Testing “Basic” Properties of Functions
[Parnas, R, Samorodnitsky]
Considers following function classes:
• Singletons: f ( x)  xi
f ( x)  xi  x j  xk
f ( x)  ( xi  x j )  ( xk  x  xm )
• Monomials:
• DNF:
Testing “Basic” Properties of Functions Cont’
• Can test whether f is a singleton using O(1/) queries.
• Can test whether f is a monomial using O(1/) queries.
• Can test whether f is a monotone DNF with at most t
terms using Õ(t2/) queries.
Common theme: no dependence in query complexity on
size of input, n, and linear dependence on distance
parameter,  (as opposed to learning these classes where
have dependence on n (logarithmic))
Recent result of [Diakonikolas, Lee, Matulef, Onak, Rubinfeld,
Servedio, Wan] greatly extends the above
Testing (Monotone) Singletons
Singletons satisfy: (1)
(2)
Pr[ f ( x)  1]  1 / 2
f ( x  y)  f ( x)  f ( y) x, y
Natural test: check, by sampling, that conditions hold
(approximately).
Can analyze natural test for case that distance between
function and class of singletons is not too big (bounded
away from 1/2).
Testing Singletons II - Parity Testing
Observation: Singletons are a special case of parity functions
(i.e., linear functions: g ( x)   xi .)
iS
Claim: Let
g ( x)   xi . If | S | 2
iS
then
Pr[ g ( x  y )  g ( x)  g ( y )]  1 / 4
Modified algorithm:
(1) Test whether f is a parity function (with dist. par. ) using
algorithm of [BLR] .
(2) Uniformly select constant number of pairs x,y and check
whether any is a violating pair (i.e.: f ( x  y )  f ( x)  f ( y ) ).
Testing Singletons III - Self Correcting
This “almost works”: If f is singleton - always accepted.
If f is -far from parity - rejected w.h.p.
But if f is -close to parity function g, then cannot simply
apply claim to argue that many violating pairs w.r.t. f.
If we could only test violations w.r.t. g instead of f ...
Use Self-Corrector of [BLR] to “fix” f into parity function (g),
and then test violations on self-corrected version.
Testing Singletons IV - The Algorithm
Final Algorithm for Testing Singletons:
(1) Test whether f is a parity function with dist. par.  using
algorithm of [BLR] .
(2) Uniformly select constant number of pairs x,y. Verify that
Self-Cor(f,x)  Self-Cor(f,y) = Self-Cor(f,xy) .

(3) Verify that Self-Cor( f ,1) = 1 .
Testing Monomials and Monotone DNF
Monomial testing algorithm has similar structure to Singleton
testing algorithm. (Here too suffice to find test for monotone
monomials.)
- The first stage of linearity testing is replaced by Affinity Testing:
if f is a monomial then F1={x: f(x)=1} is an affine subspace.
[ Fact: H is affine subspace i.f.f x,y,zH, xyz H ].
Affinity test is similar to parity test: select x,yF1, z{0,1}n, verify that
f(xyz)=f(x)f(y)f(z).
- The second stage is also analogous singleton test (check for
violating pairs). Here affinity adds structure that helps analyze
second stage.
Testing monotone DNF: use monomial test as sub-routine
Result of [DLMORSW07] which extends to other families (e.g.,
non-monotone DNF) uses different techniques.
Testing of Clustering [Alon,Dar,Parnas,R]
Notation:
X - set of points |X| = n
dist(x,y) - distance between points x and y.
Assume that triangle inequality holds (dist(x,y)≤ dist(x,z)+dist(z,y)).
For any subset S of X:
Diameter of S
d(S) = maxx,ySdist(x,y)
Clustering Cont’
X is (k,b)-clusterable if exists a k-way
partition (clustering) of X s.t. each
cluster has diameter at most b.
b
X is -far from being (k,b’)-clusterable (b’  b)
if there is no k-way partition of any Y  X, |Y| ≥ (1-)n
s.t. each cluster has diameter at most b’.
(In particular, will look at b’=(1+)b, ≤1 .)
In first case algorithm should accept
and in second reject with probability ≥ 2/3
Clustering Cont’
Testing Algorithm (input: k,b,, )
(1) Take sample of m=m(k, ,) points from X.
(2) If sample is (k,b)-clusterable then accept, o.w. reject
If X is (k,b)-clusterable then always accept.
Suppose X is -far from being (k,(1+)b)-clusterable.
Show that reject w.p. at least 2/3.
Will prove for general metric and =1 where m=O(k/)
Note: for general metric cannot go below =1 unless allow
m=(|X|1/2), but can do so for Euclidean distance in d
dimensions (in that case must have dependence on (1/ )d/2 ).
Other sublinear clustering work (e.g. for k-median) include: [Indyk],
[Mishra,Oblinger,Pitt], [Ben-David], [Meyerson,O’Callaghan,Plotkin]
Clustering Cont’
Consider following mental experiment:
- Let points in sample be x1,x2,…,xm
- Construct (growing) set of “cluster representatives” REPS.
- Initially, REPS = {x1}.
- At each step, take next point that is at distance > b from every
x in REPS, and add to REPS.
>b
Claim: If X is -far from being (k, 2b)-clusterable then w.h.p.
|REPS|>k, causing the algorithm to reject as required.
Proof Idea: At each step consider clustering “enforced” by
REPS: Each point in X\REPS is assigned to closest xi in REPS.
Additional sample points “test” this clustering.
Tolerant Testing of Clustering
[Parnas,R,Rubinfeld]
Tolerant Testing: Reject when -far but accept when ’-close
Tolerant Testing Algorithm (input: k, b, ,’,  )
(1) Take sample of m=m(k, ,’, ) points from X.
(2) If sample is (’ + ( - ’)/2)-close to (k,b)-clusterable
then accept, o.w. reject
Can analyze using a generalization of a framework by
Czumaj & Sohler for (standard) testing that captures
aspects of “enforce&test” approach.
Sample has quadratic dependence on 1/( - ’), and
same dependence on other parameters as (standard)
testing algorithm.
Distribution-Free Testing (with queries)
First results [GGR]: trivial positive results that follow from
learning  testing, and simple negative results for
combinatorial properties (e.g., bipartiteness)
First non-trivial positive results [Halevi, Kushilevitz]:
(1) Linearity and Low-degree polynomials. In general: when
have self-corrector.
(2) Monotonicity in low dimensions
(Also had positive results for graph properties)
On the other hand, [Halevi, Kushilevitz] showed hardness of
distribution free testing of monotonicity in high-dimensions
(i.e., exponential in d over {0,1}d).
Recently, [Glasner,Servedio] showed that for several classes
(monomials, decisions lists, linear threshold functions) need
((n/log n)1/5) queries for dist-free testing.
Conclusions and Open Problems
 Property testing: A relaxation of learning where should
determine (w.h.p) if exists good approximation of function
f in class F rather than find such approximation.
 Can serve as preliminary step to learning.
 For quite a few function classes have testing algorithms
that are more query-efficient than learning algorithms.
 Some extend to dist-free testing, for some have strong
lower bounds.
 Still much is left to be understood about relation between
testing and learning (has flavor of relation between decision
and search).
Thanks
Property Testing and Learning:
Basic Comments/Observations
 Motivation I: Can use testing as preliminary step to learning:
That is, for efficiently selecting good hypothesis class.
 Motivation II: Relation between testing and learning has
similarity to relation between decision and search.
 Testing is no harder than (proper) learning: If have learning
algorithm for function class F that outputs hypothesis in F, then
can use to test the property of belonging to F.
That is, run learning alg with accuracy parameter
set to /2, and check that the hypothesis it
outputs is at most 3/4 far from f on independent
sample.
Want testing algorithm to be significantly more efficient
than learning algorithm.
Lemma: If f is accepted with probability
greater than 1/3 , then f is -close to linear.
Linearity Testing Cont’
Suppose f is accepted w.p > 1/3
small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y))
Define self-corrected version of f, denote g:
For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x)
g(x) = Plurality(Vy(x))
Can show that (conditioned on < /2 fraction of violating pairs)
(1) g is linear.
(2) dist(f,g)  
Main Technical Lemma (informal): if few violating pairs
then x we have that for almost all y, Vy(x)=g(x)
Learning Boolean Formulae
• Can learn singletons and monomials under uniform
distribution using O(log n/) queries/examples
(variation on Occam’s razor)
• Can properly learn monotone DNF with t terms and r literals
using Õ(r log2n/ + t(r + 1/ )) queries
[Angluin+Bshouty&Jackson&Tamon].
Main difference w.r.t testing results: in testing there is no
dependence on n and different algorithmic approach.