Download CIDR presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Here are my Data Files. Here are my Queries.
Where are my Results?
Stratos Idreos*
*CWI, Amsterdam
Ioannis Alagiannis‡
Anastasia Ailamaki‡
‡École
Polytechnique
Fédérale de Lausanne
Ryan Johnson§
§University
Toronto
of
CERN ($20B physics experiment)
 Last year: 35PB!

Experiments, simulation, user data…
 All stored in flat files

Database only stores metadata
 Custom solutions & scripts

Almost never a DBMS
Why???
2
Why people don’t use DBMS?
Requirements Analysis
Define a schema
Load the data
Iterate to
convergence
Tune the system
Evolving requirements => no convergence
3
Data import & tuning
Flat Files
Massage
Data
Load
Tuples
Database
DBMS owns
the data now
Why wait?
Why complete load?
Which format?
Hire DB expert?
Not worth the startup cost
4
Avoiding up-front overheads
Flat File
a1
a2
a3
…
a10
…
Flat files an integral
part of the system
Hot data
Query over flat files
Adaptive loads
Tuning in background
DBMS actions driven by workload
5
Adaptive loading
Flat File
a1
a2
a3
a4
…
Metadata
…
Column Load
Loaded Columns:
a2
a3
Partial Load
Full Load
Metadata
Loaded Parts:
…
a2
a3
Storage
6
Dynamic file adaptation
New Flat Files
a) Parse only needed columns
b) New flat file per attribute
a1
a2
…
a4
…
Original Flat File
a1
a2
a3
…
…
Analyze non-tokenized
attributes
a1
a2
…
a4
…
7
Adaptive loading in practice
Q1: Loading Cost + First Query
Constant performance for all queries
Response Time (seconds)
100
Q11: load from FF
Filtering on-the-fly
MonetDB
Q1: half the cost
MySQL CSV
10
Column Loads
Partial Loads
a) On-the-fly load
b) Cache data
1
1
5
10
15
20
Query Sequence
select sum(a1),
avg(a2)
from
R where
a2<v2
Amortize
loading
cost
over
the a1<v1
queryand
sequence
8
Towards a fully autonomous system
Give me your
data as is
Give me your
queries
Adaptive
Load
Get your
results!
Adaptive
Data Store
Adaptive
Invisible DBMS
Kernel
(supports SQL + your tools)
grep, awk
Challenge: make this invisible
9
10
Related documents