Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Here are my Data Files. Here are my Queries. Where are my Results? Stratos Idreos* *CWI, Amsterdam Ioannis Alagiannis‡ Anastasia Ailamaki‡ ‡École Polytechnique Fédérale de Lausanne Ryan Johnson§ §University Toronto of CERN ($20B physics experiment) Last year: 35PB! Experiments, simulation, user data… All stored in flat files Database only stores metadata Custom solutions & scripts Almost never a DBMS Why??? 2 Why people don’t use DBMS? Requirements Analysis Define a schema Load the data Iterate to convergence Tune the system Evolving requirements => no convergence 3 Data import & tuning Flat Files Massage Data Load Tuples Database DBMS owns the data now Why wait? Why complete load? Which format? Hire DB expert? Not worth the startup cost 4 Avoiding up-front overheads Flat File a1 a2 a3 … a10 … Flat files an integral part of the system Hot data Query over flat files Adaptive loads Tuning in background DBMS actions driven by workload 5 Adaptive loading Flat File a1 a2 a3 a4 … Metadata … Column Load Loaded Columns: a2 a3 Partial Load Full Load Metadata Loaded Parts: … a2 a3 Storage 6 Dynamic file adaptation New Flat Files a) Parse only needed columns b) New flat file per attribute a1 a2 … a4 … Original Flat File a1 a2 a3 … … Analyze non-tokenized attributes a1 a2 … a4 … 7 Adaptive loading in practice Q1: Loading Cost + First Query Constant performance for all queries Response Time (seconds) 100 Q11: load from FF Filtering on-the-fly MonetDB Q1: half the cost MySQL CSV 10 Column Loads Partial Loads a) On-the-fly load b) Cache data 1 1 5 10 15 20 Query Sequence select sum(a1), avg(a2) from R where a2<v2 Amortize loading cost over the a1<v1 queryand sequence 8 Towards a fully autonomous system Give me your data as is Give me your queries Adaptive Load Get your results! Adaptive Data Store Adaptive Invisible DBMS Kernel (supports SQL + your tools) grep, awk Challenge: make this invisible 9 10