Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relative Approximations in Geometric Hypergraphs Esther Ezra New York University Range Counting P - Set of points in Rd (objects) . R - Family of regions in Rd (ranges) . Goal: Preprocess P in order to efficiently answer counting queries. Counting query: Given a range R, report the size of P . Various Problem Settings (GIS, DB…) Input: Cities in the USA (mapped as points in R2 ) . Query: How many cities are located within a certain distance from a given location? (disk in R2 ) Input: people with personal details, as age, yearly income, address, etc. (mapped as points in Rd ) Query: How many people are in the age [35,50] and have an income of at least 50,000$ ? (axis-parallel boxes in Rd ) Approximate Range Counting The Range Counting problem is well studied [AM-94, AS-93, Chaz-93, Chaz-94, Mat-93, Mat-94, MW-92,…] Too expensive Computing the exact count requires storing the entire set X. If approximate count is sufficient then we can save space: Compute a small subset Z X that approximates | X| well for each R . Specifically, we would like the two proportions | X| / |X| , | Z| / |Z| to be roughly the same. Range Spaces The approximate range counting is an instance of a more general and abstract setting: Range space (X, R) : X – Objects R – Ranges: Subsets of X . |R| 2|X| Abstract form: Hypergraphs. X – vertices. R – hyperedges. Geometric Range Spaces specification: X Rd, R = set of simply-shaped regions in Rd X – Points on the real line. R – Intervals. X – Points on the plane. R – halfplanes, disks,… Assume X is finite, |X| = n: |R| is polynomial in |X|, and this continues to hold for any “projected” range space. The approximate count The measure of a range R is the quantity: X( ) = | X | / |X| Let 0 < < 1 be the error parameter. Ideally, Equivalent to we would like to have a relatively small |Z( ) - X( )| X( ) subset Z X s.t., X( ) (1 - ) Z( ) X( ) (1 + ) But this requirement is too strong. At the extreme, when Z = , X must also be empty! But this property may guarantee |Z| = X | ) ! Cannot distinguish between empty/non-empty ranges when Z is small! Relative (p,)-Approximation Thus, refine the definition for the approximate count by introducing an additional parameter 0 < p < 1. Definition: A subset Z X is a relative (p,)-approximation for (X, R) if: | Z( ) - X( ) | X( ) , if X( ) p , and | Z( ) - X( ) | p , otherwise. In particular, when Z = , X( ) does not exceed p . How small can we make Z? Upper bounds [LLS-01] Theorem [Li , Long, Srinivasan 2001]: Bound does not depend on n. A random sample of O( log (1 / p) / 2 p ) elements of X is a relative (p, )-approximation for (X, R) with constant probability. For X( ) p Example: Points and intervals on the real line: Put ,|Z| = 2/p . We have | Z( ) - X( ) | p/2 X( ) p pn Previous Results Points and halfplanes in 2D: O( log 4/3 (1/p) / (4/3 p) ) [Har-Peled Sharir 2011] Points and halfplanes in 3D: O( log 3/2 (1/p) / (3/2 p) ) [Har-Peled Sharir 2011] • This case is somewhat restricted (bound corresponds to multiple subsets of this overall size). We improve the dependency on p for a wider family of range spaces. Our Result: Well-Behaved Range Spaces We consider “well-behaved” range spaces (X, R) : 1. For any parameter 1 k n , the number of ranges of size at most k is nearly-linear in n and polynomial in k. bound is n (n) , where (n) , is a slowly growing function. 2. kc , for some constant c > 0 This property holds for any “projected” range space (X’, R’) . That is, X’ X, and R’ = { X’ | R } . Well-Behaved Primal Range Spaces in Geometry Range Space Points and halfspaces in 2- and 3-dimensions. Points and disks in the plane. # ranges of size k O(nk), O(nk 2) O(nk2) Points and axis-parallel boxes O(nk 2 log n) , in 2- and 3-dimension. O(nk 2 log3 n) Involves some canonization Points and -fat triangles in the plane O(nk 3 log2 n) Each angle Well-Behaved Dual Range Spaces in Geometry Range Space # ranges of size k Halfspaces and points in 2- and 3-dimensions. O(nk), O(nk2) Pseudo-disks and points in the plane. O(nk) -fat triangles and points in the plane. O(nk log* n) Locally -fat objects and points nk 2 O(log* n) in the plane. Our Result Theorem: For simplicity of presentation assume Any well-behaved range space (X, R) admits () grows slower a relative (p, )-approximation Z of size than log () O( { log log (1 /p) + log(1 / ) } / 2 p ) , Constant of prop depends on degree of polynomial in the bound on |R| An improvement when p < and Z can be constructed in expected polynomial time. Corollary: Each of the well-behaved geometric range spaces listed above admits a relative (p, )-approximation of size O( { log log (1 /p) + log(1 / ) } / 2 p ) . Highlights of the Construction F – a relative (p, )-approximation for (X, R) , F X . T – the set of ranges projected onto F . Remove the dependency on n Replace X by a F: |F | = O( log (1/ p) / 2 p ) . Now, compute an improved relative (p, )- approximation for (F, T ) . lose only a constant factor in and in p Since (X, R) is well-behaved, then by definition, (F, T ) is also well-behaved. Classifying the objects in F An object in F is heavy if it participates in “many” ranges (of T ). Otherwise, this object is light. Put all of them in the target approximation Claim: The number of heavy objects is at most O(1 / p) << |F | The proof follows from the fact the range space (F, T ) is well-behaved. Handwaving argument: Consider the case where all ranges have size ~ k, and number of such ranges is O(|F|k). Then number of object/range incidences is O(|F|k 2) , and then there are only O(|F|/k) (heavy) objects that participate in > k 3 ranges. A random sampling scheme for the light objects The overall majority of F consists of light objects. Sample each light object independently with probability := (log log (1 /p) + log(1 / )) / (log(1/p) + log (1/ ) ) Let F1 be the resulting sample. E[|F1 |] = O( log log (1 /p) + log(1 / ) / 2 p ) . Goal: Show that the heavy objects and F1 comprise a relative (p, )-approximation for (F, T) . This relative approximation is weighted Analysis The analysis for the light objects consists of two major steps. 1. 2. Show that, for a fixed range , the probability to have a relative error between its original measure (w.r.t. the light objects) and its approximated measure (w.r.t. F1 ), is small. Use standard Chernoff's bound A Number of events (ranges) is too large! But they admit a relatively small “degree of dependency”. So we can apply the (asymmetric) Lovasz Local Lemma in order to conclude that, with positive probability, none of these events happen. The Lovasz Local Lemma (LLL) Goal: Show that there exists an assignment x from the set of events to (0,1) , s.t., () Prob[A ] x (A ) A ~ A’, ’ (1 - x (A’ )) For each A . Then the Local Lemma of Lovasz implies: Prob [ A ] (1 - x (A )) > 0 . There exists a sample F1 that well approximates all range counts. Applying the Local Lemma of Lovasz: Dependency among events Observation: Due to our sampling model, for a pair of ranges , ’ the two events A , A’ are mutually independent iff there is no light object in F that participates in both , ’. Omitting further technical details, the degree of dependency for a fixed range depends on its size. These values are non-uniform! (which is the reason we need to resort to the asymmetric version of LLL) ’ Non-Uniform Degree of Dependency We overcome this difficulty by applying an exponential partition of the ranges according to their count: A range T lies at the i th layer of T if, for i = 1, …, log(1/p) 2i-1 p | F | / |F | 2i p , and lies at the 0 th layer if 0 | F | / |F | p Wrapping Up Then the bound on Prob[A ] (computed by Chernoff's bound) is sensitive to the layer: Prob[A ] < ( / log(1/p)) B 2^{i-1} The (asymmetric) Lovasz Local Lemma “connects” between all layers: x (A ) = exp{2i+1} Prob[A ] , for each Si . This yields one (universal) sample for all ranges at all layers. Remark: The simpler version of LLL can be applied in each layer separately, but that produces multiple samples and not a single one. Keeping the Sample Small We have shown that We sample each light object independently with prob. E[ | F1 | ] = O( log log (1 /p) + log(1 / ) / 2 p ) . Nevertheless, we need to show that a sample of that size exists, and that it also well approximates all ranges (w.r.t. light objects). Let B be the (bad) event: |F1 | > (1 + ) E[ | F1 |] . is a sufficiently large constant Our analysis extends the Local Lemma to include this event. Applying the Constructive LLL Apply the randomized algorithm of Moser and Tardos [MT 2010]. Our scenario matches the properties of the setting in [MT 2010] : The light objects induce a set of mutually independent random variables : Each of which is chosen randomly and independently with probability . Each bad event A and B ) is determined by these variables. A sample F1 with the above desired properties can be constructed in expected polynomial time. Application: Set Multi-Cover Problem A set cover for (X, R) is a subset S R, s.t., any x X is covered by S. Goal: find smallest set cover. Finding a set cover of smallest size is NP-hard, even for geometric range spaces! Use an approximation algorithm instead [Lovasz 75, Chvatal 79, Clarkson 93, ERS 05...]. Application: Set Multi-Cover Problem In set multi-cover each item x X is associated with a demand d(x) . Goal: Find a smallest subset of R , s.t., each item x X is covered at least d(x) times. Model: Allow repetitions. Set Multi-Cover with Repetitions Theorem [Chekuri, Clarkson, Har-Peled]: If (X, R) admits a relative (p, -approximation of size (1 / p) / 2 p) , Computable in polynomial time then there exists a polynomial-time algorithm that approximates the smallest set multi-cover up to a factor of O( (OPT)) . Applying the standard bound O ( log (1 / p) / 2 p ) yields an approximation factor of O(log OPT) . Applying our bound for well-behaved range spaces yields an approximation factor of O(log log OPT) . Further Research Can we remove the factor log(1 / ) in the enumerator of our bound? and thus obtain the bound O( { log log (1 /p) } / 2 p ) ? Li , Long, and Srinivasan [LLS-01] obtained such an improvement for more general range spaces: They reduced the bound O( (log (1 / p) + log(1 / )) / 2 p ) to O( log (1 / p) / 2 p ) . Does our technique have any implications to combinatorial discrepancy? Can one integrate this machinery with existing techniques in order to obtain improved bounds on certain scenarios? The Case of Points and Halfplanes [Har-Peled Sharir 2011] The bound for points and halfplanes is achieved via combinatorial discrepancy. X – set of n points. H – all halfplanes defined on X. Informally, we want to color each point of X either red or blue, in such a way that any halfplane in H has roughly the same number of red and blue points. The maximum deviation from an even splitting, over all halfplanes, is the discrepancy of H. The Case of Points and Halfplanes [Har-Peled Sharir 2011] Formally, a coloring of X is any mapping : X → {−1, +1}. The discrepancy of H, denoted by disc(H ), is the minimum, over all colorings , of disc(χ, H) := max hH disc(χ, h) , where disc(χ,h) = |x∈h χ(x) |. What is the best coloring for X ? Theorem: [Matousek 95] disc(H ) = O(n¼ ) The Case of Points and Halfplanes [Har-Peled Sharir 2011] The analysis of Har-Peled and Sharir shows that the discrepancy is sensitive to the size of the halfspace: Theorem: [Har-Peled Sharir 2011] Given a set X of n points in the plane, there exists a coloring : X → {−1, +1}, s.t., for any halfplane h H , disc( h ) = O( | h X | ¼ log n ) . This property yields a bound of O( log 4/3 (1/p) / (4/3 p) ) on the size of the relative (p, )-approximation using “halving technique”. Thank you