Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stochastic Skyline Operator Xuemin Lin School of Computer Science University of New South Wales Australia Joint Work with: Ying Zhang (UNSW), Wenjie Zhang (UNSW), Muhammad Aamir Cheema (UNSW) Introduction: Skyline 2 a user preference ≺ is given on each dimension of Rd. two points in Rd, u dominates v (u ≺ v) i (1 ≤ i ≤ d), u.i ≺= v.i; j (1 ≤ j ≤ d), u.j ≺ v.j Skyline: Points not dominated by another point. Multiple criteria optimal decision making: minimum set of candidates of best options regarding any monotonic functions. Skyline of Uncertain Objects Probabilistic Skyline: (VLDB07, PODS09, etc) Skyline probabilities by possible worlds. Providing the probabilities not worse than any other objects. Provide minimal candidate set of optimal solutions? How to define optimal options? How to characterize the minimum candidate set? Expected Utility & Stochastic Order Expected Utility Principle: Given a set U of uncertain objects and a decreasing utility function f, select U in U to maxmize E[f (U)]. Stochastic Order: Given a family ℱ of utility functions, U ≺ℱ V if for each f in ℱ E[f(U)] ≥ E [f(V)] Decreasing Multiplicative Functions: ℱ= { d i 1 f i ( xi )} where f is nonnegative decreasing. i Low orthant order: the stochastic order is defined over the family of decreasing multiplicative functions. Athlete Instance 1 /probability Instance 2 /probability A (1,4) / 0.5 (3,2) / 0.5 B (2,5) / 0.5 (4,3) / 0.5 C (5,1) / 0.01 (3,4) / 0.99 Utility function: f f1 (h) f 2 (t ) o f1 : nonnegative decreasing [0, ) [0,1] o f 2 : nonnegative decreasing [0, ) [0,1] 1 1 f1 (4) f 2 (1) f1 (2) f 2 (3) 2 2 1 1 E[ f ( B)] f1 (5) f 2 (2) f1 (3) f 2 (4) 2 2 E[ f ( A)] E[ f (C )] 1 99 f1 (1) f 2 (5) f1 (4) f 2 (3) 100 100 e.g. f1 (h) e a (1 h ) f 2 (t ) ebt a 0 b 0 ; f 2 (t ) ebt a 0 b 0 ; Contributions Introduce a novel skyline operator: stochastic skyline. Guarantee the minimal candidate set to the optimal solutions regarding decreasing multiplicative functions. NP-Completeness of computing stochastic skyline regarding dimensionality d. Novel statistic base pruning techniques. Efficient partition base verification algorithms: polynomial if d is fixed. Problem Statement Stochastic Order (lower orthant order): Given U & V, U stochastically dominates V (U ≺sd V) if for any x, U.cdf (x) ≥ V.cdf (x) and exists y such that U.cdf (y) > V.cdf (y). U.cdf (x): probability mass of U in the rectangular region R ((0,0,…0), x); see the shaded region. Stochastic Skyline: the objects in U not stochastically dominated by any others, called stochastic skyline. Problem Statement: efficiently compute stochastic skyline regarding discrete cases. Minimality of stochastic skyline Stochastic skyline removes all objects not preferred by any non-negative decreasing functions! Framework Phase 1: filtering. Remove non-promising objects. Phase 2: verification. Test stochastic dominance between two objects. BBS combing with a heap: the “near” progressiveness only need to test either U ≺sd V or V ≺sd U in most cases (but not both). Testing if U ≺sd V Violation point: a point x in Rd+ is a violation point regarding U ≺sd V if U.cdf (x) < V.cdf (x). Testing algorithm: if no violation points, then U ≺sd V. Not enough to test instances. 1 pu1 pu2 2 1 pv1 pv2 pv3 3 Reduce to Grid Points Test if U.cdf ≥ V.cdf against grid points only (see (a)). Testing the switching grid points only (see solid lines (b)). Algorithm Given a rectangular region R (x, y), if U.cdf (x) ≥ V.cdf (y), then no violation point in R (x, y). Partition base testing algorithm: Get switching points Initial check Iteratively partition the grid to throw away non-promising sub-grids Complexity The algorithm runs O (dm log m + md (T (Uartree) + T (Vartree))) where m is the number of instances in V. NP-Complete regarding d. Covert (the decision version of) the minimal set cover problem to a special case of the testing problem. Filtering Techniques Pruning Rule 1: throw away fully dominated entries. Filtering Techniques Pruning Rules 2: applying Cantelli’s Inequality to get upper-bonds. Size Estimation: Expected size: size of stochastic skyline in Rd is bounded by that of conventional skyline in Rd+1; i.e., lnd (n)/(d+1)! Empirical Study C++ with STL compiled with GNU GCC on 2.4GHz Debian Real data set: NBA player’s game-by-game statistics Synthetic dataset: anti-correlated, correlated, independent Summary a novel skyline operator: stochastic skyline guarantee minimality . NP-complete to test stochastic order (lower orthant order) . novel efficient algorithms to compute stochastic order. Future work: F is a set of all decreasing functions? Thank you!