* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probabilistic Skylines on Uncertain Data
Survey
Document related concepts
Transcript
Probabilistic Skylines on Uncertain Data VLDB2007 Outline Motivation Traditional and Probabilistic Skyline Problem Definition Computation Problem and Algorithms (Top down and Bottom up) Experimental Results Motivation Skyline Analysis on NBA players performance (#Assists) Uncertainty Each Player has multiple records (#Rebounds) Motivation Skyline Analysis on NBA players with multiple records Motivation Skyline Analysis on NBA players with multiple records Easy Approach – Averaging Arbor (x) is better in assist than Eddy, but Eddy (point b) dominates all games of Arbor (x). “not so fair to say Eddy is a worse in assist than Arbor” Bob (point a) bias the aggregate value Motivation Motivating result using Probabilistic Skyline Olajuwon and Kobe Bryant are missing from Aggregate Skyline but present in Probabilistic Skyline Their performance vary a lot over games Details in experiment analysis Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data Uncertain Data Dominance Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data Uncertain Data Dominance Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data Uncertain Data Dominance Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data Uncertain Data Dominance Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Probabilistic Skyline Calculation of Probability Object A dominating Object C Pr [A≺C] = 1/4*1/3 (4+..) Probabilistic Skyline Calculation of Probability Object A dominates Object C Pr [A≺C] =1/4*1/3 (4+4+..) Probabilistic Skyline Calculation of Probability Object A dominates Object C Pr [A≺C] = 1/4*1/3 (4+4+0) Probabilistic Skyline Probabilistic Skyline: From Dominance to Skyline Intuition of finding Skyline, probability of an object not to be dominated by other objects 0 (1/3)(1/3) Computation Problem of p-skyline First, each uncertain object may have many instances. We have to process a large number of instances. Second, we have to consider many probabilities in deriving the probabilistic skylines. Algorithms (Top down and Bottom up) Data Multiple records of objects in the hope of approximating the probability density function Techniques: Bounding Pruning Refining Bottom-up Algorithm Technique – Minimum Bounding Box (MBB) Bottom-up Algorithm - Pruning Techniques (1/3) using Umin, Umax to decide membership of p-skyline For an uncertain object U and probability threshold p, if Pr(Umin) < p, then U is not in the p-skyline. If Pr(Umax) ≥ p, then U is in the p-skyline Bottom-up Algorithm - Pruning Techniques (2/3) using Umax to prune instances of objects Let U and V be uncertain objects such that U V . If u is an instance of U and Vmax ≺ u, then Pr(u) = 0. C2 is dominated by Umax, dominated by all instances in object D Bottom-up Algorithm - Pruning Techniques (3/3) using subset of instance to prune objects Estimate Pr(Vmin) upper bound by Pr(Umax’) Pr(Vmin) = (1 – |U’|/|U|)(..)(..) If |U’| is large, more instances dominate Vmin, then Pr(Vmin) is low Top-down Algorithm Idea of bounding The skyline probability of each subset of uncertain object can be bounded using its MBB. The skyline probability of the uncertain object can be bounded as the weighted mean of the bounds of subsets. Top-down Algorithm supporting data structure : partition tree D B A C D B A C B A D C Top-down Algorithm partition tree for bounding D B A C A’ B A D A’ D’ C’ B’ D C C’ B’ C B A D’ B’ A’ D’ C’ Compare the partition of U with other partition tree as follows: traverse the partition tree of other uncertain object V, in the depth-first manner. Top-down Algorithm all possible situations during partition trees traversal B D A C A B C D B A B’ D’ A’ C’ A’ D C B’ D’ C’ B’ A’ D’ C’ Top-down Algorithm Pruning partition tree B D A C A B C D B A D C Experiment Other Experiment results Scalability with respect to probability threshold Conclusion Multiple records MBB