Download intro - InfoLab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Multidimensional Databases

Challenge: representation for efficient
storage, indexing & querying

Examples (time-series, images)

New multidimensional data sets &
approaches
 Graphs
(e.g., road networks)
 Immersidata
 User
ISI’02
(e.g., haptic)
profiles & aggregation/clustering
1
Challenges

Storing multidimensional data (matrix vs. relations)

Indexing multidimensional data (R-tree)

Queries

ISI’02

Search for similar objects (similarity search)[ICDE’00,ICME’00]

Spatial and temporal queries [IDEAS’00,ACM-GIS’01,KAIS’02]
Multidimensional data mining

Aggregation [EDBT’02,PODS’02]

Clustering[ACM-MMj’02]

Classification [INFORMS’02]

Finding outliers [SSDBM’01]
2
Stock Prices
$price
f1
S1
e.g., std
f (S1)
1
f5
365
f2
g (S1)
day
$price
g (Sn)
f3
f (Sn)
Sn
e.g., avg
1
365
day
• A point in 365 dimensions
(computationally complex)
ISI’02
f4
• A point in 2 dimensions
(not accurate enough)
• A point in 5 dimensions
transformation-based:
FFT, Wavelet [SSDBM’00, 01]
3
More Similarity Search & Clustering
R
Red Green Blue
255
0
208 125 100
...
G
B
Red Green Blue
80
100
Images
j2
j3
Color Histograms
j1
C
j4
j5 j6
j9
j8
j7
Angle Sequences = [j1,j2,j3,j4,j5,j6,j7,j8,j9]
Shapes [ICDE’99 … ICME’00]
ISI’02
More accurate
210
Web Navigations
P1 P2 P3 P4 P5 …
3 0 8 7
(Hit) Feature Vectors
[RIDE’97 … WebKDD’01]
4
On-Line Analytical Processing (OLAP)

Multidimensional data
sets:



Range-sum queries



Average sale of shoes in
CA in 2001
Number of jackets sold in
Seattle in Sep. 2001
Tougher queries:


ISI’02
Dimension attributes
(e.g., Store, Product,
Date)
Measure attributes (e.g.,
Sale, Price)
Covariance of sale and
price of jackets in CA in
2001 (correlation)
Variance of price of
jackets in 2001 in Seattle
Market-Relation
Store
Product
Location
LA
NY
...
Date
Sale
Price
Shoes Jan. 01 $21,500 $85.99
Jacket June 01 $28,700 $45.99
...
...
...
...
Avg (sale)
s(d <in> 2001)
s(s <in> CA)
s(p=shoe)
Market-Relation
5
Example Solution (Pre-computation):
Prefix-sum [Agrawal et. al 1997]
$150k
$120k
$100k
0
$65k
$50k
$55k
$58k
$100k
$130k
$120k
25
Age
25
28
30
50
55
57
$40k
Age Salary
$55k
Salary
40
50
60
Issues:
• Measure attribute should be pre-selected
• Aggregation function should be pre-selected
(sum or count)
•Updates are expensive (need re-computation)
ISI’02
80
Query: Sum(salary) when
(25 < age < 40) and
(55k < salary < 150k)
Result: I – II – III + IV 6
Spatial & Temporal Data
[ACM-GIS’01, VLDB’01]
Complex Queries
Data types:
• A point: <latitude, longitude, altitude> or <x, y, z>
• A line-segment: <x1, y1, x2, y2>
• A line: sequence of line-segments
• A region: A closed set of lines
• Moving point: <x, y, t> (e.g., car, train, …)
• Changing region: <region, value, t> (e.g., changing
temperature of a county)
Queries:
• Rivers <intersect> Countries
• Hospitals <in> Cities
• Taxi <within> 5km of Home
<in the next> 10 min
• Experiments <overlap> BrainR
ISI’02
[Visual’99]
7
Spatial & Temporal Data & Queries
Data types:
Station

A point: <latitude, longitude, altitude> or <x, y, z>

A line-segment: <x1, y1, x2, y2>

A line: sequence of line-segments

A region: A closed set of lines

Moving point: <x, y, t> (e.g., objects, car, train, …)
Queries:
ISI’02

Molecules <intersect> Microbes

Train-stations <in> Cities

Round objects <within> 5cm of Hand <in the next> 10 s

Number of distractions in <south-east> of subject
8
Spatial & Temporal Data & Queries …

K Nearest Neighbor queries: find the k nearest objects
to a query point (5 closest hospitals to my car)

What is nearest? In road network (or a graph) is
“shortest path” which is complex to compute in realtime for all points of interests

Approach: embed graph into high dimensional
space where computationally simple Minkowski
metrics (e.g., Euclidean) can approximate real
distances [ACM-GIS’02?]
2-D Space
A
B
C
ISI’02
n-D Space
Embedding
Techniques
(e.g., Lipschitz)
A
C
B
9
Immersidata and Mining Queries
[CIKM’01, UACHI’01]
ISI’02
10
Immersidata and Mining Queries …
A dynamic sign, e.g., ASL colors
…
…
L:
ISI’02
11
User Profiles & Clustering
Offline Processes
Clusters
Item Database
User Profiles
User 1
User 2
User 3
User 4
User 5
User 6
PPED
Similarity
Measure
User U-6
User U-5
User U-4
and
User U-3
User U-2
User U-1
User U
Clustering
Favorite
Features
Voting
(Rock=
High
Classical=
Low
Pop=
Low
Rap=
High)
Fuzzy
Aggregation
Cluster
Wish-list
0.87
0.83
0.72
0.61
0.47
ISI’02
12
User Profiles & Clustering
Online Processes
Clusters
Cluster Wish-lists
Current User’s
Profile
PPED
Similarity
Measure
0.87
0.83
0.72
0.61
0.87
0.83
0.72
0.61
0.87
0.83
0.72
0.61
0.47
0.47
0.47
User
Wish-List
0.87
0.83
0.82
0.79
0.72
0.70
0.68
A List of Similarity Values
0.65
0.32
0.79
0.65
Fuzzy
Aggregation
0.63
0.61
0.54
0.47
0.42
ISI’02
13
Related documents