Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional data sets & approaches Graphs (e.g., road networks) Immersidata User ISI’02 (e.g., haptic) profiles & aggregation/clustering 1 Challenges Storing multidimensional data (matrix vs. relations) Indexing multidimensional data (R-tree) Queries ISI’02 Search for similar objects (similarity search)[ICDE’00,ICME’00] Spatial and temporal queries [IDEAS’00,ACM-GIS’01,KAIS’02] Multidimensional data mining Aggregation [EDBT’02,PODS’02] Clustering[ACM-MMj’02] Classification [INFORMS’02] Finding outliers [SSDBM’01] 2 Stock Prices $price f1 S1 e.g., std f (S1) 1 f5 365 f2 g (S1) day $price g (Sn) f3 f (Sn) Sn e.g., avg 1 365 day • A point in 365 dimensions (computationally complex) ISI’02 f4 • A point in 2 dimensions (not accurate enough) • A point in 5 dimensions transformation-based: FFT, Wavelet [SSDBM’00, 01] 3 More Similarity Search & Clustering R Red Green Blue 255 0 208 125 100 ... G B Red Green Blue 80 100 Images j2 j3 Color Histograms j1 C j4 j5 j6 j9 j8 j7 Angle Sequences = [j1,j2,j3,j4,j5,j6,j7,j8,j9] Shapes [ICDE’99 … ICME’00] ISI’02 More accurate 210 Web Navigations P1 P2 P3 P4 P5 … 3 0 8 7 (Hit) Feature Vectors [RIDE’97 … WebKDD’01] 4 On-Line Analytical Processing (OLAP) Multidimensional data sets: Range-sum queries Average sale of shoes in CA in 2001 Number of jackets sold in Seattle in Sep. 2001 Tougher queries: ISI’02 Dimension attributes (e.g., Store, Product, Date) Measure attributes (e.g., Sale, Price) Covariance of sale and price of jackets in CA in 2001 (correlation) Variance of price of jackets in 2001 in Seattle Market-Relation Store Product Location LA NY ... Date Sale Price Shoes Jan. 01 $21,500 $85.99 Jacket June 01 $28,700 $45.99 ... ... ... ... Avg (sale) s(d <in> 2001) s(s <in> CA) s(p=shoe) Market-Relation 5 Example Solution (Pre-computation): Prefix-sum [Agrawal et. al 1997] $150k $120k $100k 0 $65k $50k $55k $58k $100k $130k $120k 25 Age 25 28 30 50 55 57 $40k Age Salary $55k Salary 40 50 60 Issues: • Measure attribute should be pre-selected • Aggregation function should be pre-selected (sum or count) •Updates are expensive (need re-computation) ISI’02 80 Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k) Result: I – II – III + IV 6 Spatial & Temporal Data [ACM-GIS’01, VLDB’01] Complex Queries Data types: • A point: <latitude, longitude, altitude> or <x, y, z> • A line-segment: <x1, y1, x2, y2> • A line: sequence of line-segments • A region: A closed set of lines • Moving point: <x, y, t> (e.g., car, train, …) • Changing region: <region, value, t> (e.g., changing temperature of a county) Queries: • Rivers <intersect> Countries • Hospitals <in> Cities • Taxi <within> 5km of Home <in the next> 10 min • Experiments <overlap> BrainR ISI’02 [Visual’99] 7 Spatial & Temporal Data & Queries Data types: Station A point: <latitude, longitude, altitude> or <x, y, z> A line-segment: <x1, y1, x2, y2> A line: sequence of line-segments A region: A closed set of lines Moving point: <x, y, t> (e.g., objects, car, train, …) Queries: ISI’02 Molecules <intersect> Microbes Train-stations <in> Cities Round objects <within> 5cm of Hand <in the next> 10 s Number of distractions in <south-east> of subject 8 Spatial & Temporal Data & Queries … K Nearest Neighbor queries: find the k nearest objects to a query point (5 closest hospitals to my car) What is nearest? In road network (or a graph) is “shortest path” which is complex to compute in realtime for all points of interests Approach: embed graph into high dimensional space where computationally simple Minkowski metrics (e.g., Euclidean) can approximate real distances [ACM-GIS’02?] 2-D Space A B C ISI’02 n-D Space Embedding Techniques (e.g., Lipschitz) A C B 9 Immersidata and Mining Queries [CIKM’01, UACHI’01] ISI’02 10 Immersidata and Mining Queries … A dynamic sign, e.g., ASL colors … … L: ISI’02 11 User Profiles & Clustering Offline Processes Clusters Item Database User Profiles User 1 User 2 User 3 User 4 User 5 User 6 PPED Similarity Measure User U-6 User U-5 User U-4 and User U-3 User U-2 User U-1 User U Clustering Favorite Features Voting (Rock= High Classical= Low Pop= Low Rap= High) Fuzzy Aggregation Cluster Wish-list 0.87 0.83 0.72 0.61 0.47 ISI’02 12 User Profiles & Clustering Online Processes Clusters Cluster Wish-lists Current User’s Profile PPED Similarity Measure 0.87 0.83 0.72 0.61 0.87 0.83 0.72 0.61 0.87 0.83 0.72 0.61 0.47 0.47 0.47 User Wish-List 0.87 0.83 0.82 0.79 0.72 0.70 0.68 A List of Similarity Values 0.65 0.32 0.79 0.65 Fuzzy Aggregation 0.63 0.61 0.54 0.47 0.42 ISI’02 13