* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Researcher’s Guide to the Data Deluge: Querying a
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds Martin L. Kersten Stratos Idreos Stefan Manegold Erietta Liarou (and members of the CWI database group) Science Feb’11 Data http://www.sciencemag.org/site/special/data/ Science Feb’11 Data …. We have recently passed the point where more data is being collected than we can physically store. This storage gap will widen rapidly in data-intensive fields. Thus, decisions will be needed on which data to archive and which to discard. A separate problem is how to access and use these data. Many data sets are becoming too large to download. Even fields with well-established data archives, such as genomics, are facing new and growing challenges in data volume and management. And even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used…. Science Feb’11 Data Science Feb’11 Data Database research vision • Throwing away data before harvesting is the worst ROI one can imagine. • LSST budget is 100 M$ – During its ten-year survey, LSST will acquire 5.6 million 15-second images, spread over 2.8 million pointings. – 20 billion rows in the Object table, 3 trillion rows in the Source table Database technology is not designed for the challenges All sizes don’t fit The Dawn of a new Database Era Capture the query intent ! FIVE STEPS INTO THE FUTURE • One-minute DBMS for real-time performance. • Multi-scale query processing for gradual exploration. • Post processing for conveying meaningful data. • Query morphing to adjust for proximity results. • Query alternatives to cope with lack of providence. One-minute database kernels Step 1: Do the BEST you can within a given time frame ! • Research how to … – organize query evaluation around what is available at low cost – redesign algorithms and operators such that they adaptively avoid expensive steps normally needed for correctness and completeness – stop process after agreed upon time – ensure continuation upon request. Multi-scale query processing Step 2: Use a staging scheme for query evaluation ! • Research how to … – partition the database for producing incremental valuable results D => D1 union (D2.1 union (D2.2 union (D2.3 union .. – avoid harmful SELECT * FROM table queries – break a query into a converging query sequence Q => Q1 union Q2 => Q1 union Q2.1 union Q2.2 => Q1 union Q2.1 union Q2.2.1 union Q2.2.2 ……. Result-set post processing Step 3: Use meaningful compression to convey more ! • Research how to … – post-process results sets statistically – prepare for facetted query answers – show sort for boundaries first • Min/max domain enclosures for all attributes Query morphing Step 4: Bend the search towards interesting areas ! • Research how to … – explore the query expression space? – transform a query with small result set such that it produces relevant, nearby answers Result-set post processing Step 5: Ignore stupid questions, give hints instead ! • Research how to … – find alternative queries in terms of expressiveness + performance – Better exploit the query log for hints SELECT * FROM PhotoObj -- Q1: Using the time budget. (36291322 tuples) SELECT ra, dec, band1, intensity1, type FROM PhotoObj; -- Q2: Using data statistics. (879300 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82; -- Q3: Using query statistics. (899 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82 AND distance(ra,dec,radius) < 10; The Dawn of a new Database Era Brought to you by the CWI database research group