Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Access wikipedia , lookup
Clusterpoint wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Database model wikipedia , lookup
Supporting Join Queries Talk by: Collaborators: Andy Cooke Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University What queries would users like to ask? (1) A continuously executing query that might involve matching tuples across several streams. “ stream to me average net traffic passing between two ComputingElements (CEs)” need to specify in the query the age of tuples that can be matched (a “sliding window”) e.g. “consider only tuples no older than 5 min. from now” Possibly interesting? What queries would users like to ask? (2) A “latest snapshot” query that joins the latest values of keys. “return all CEs that Steve is allowed to use” (Resource Broker) This query would involve joining tuples from CE tables, VO tables and denied users tables Probably interesting! A “history” query involving self-joins and aggregation “what was the growth in net traffic since last week?” Possibly interesting? How can R-GMA answer such queries? Observation: If all the relevant tuples are inside one DBMS, then we can pass the query on to that DBMS query engine. - EASY! If there are > 1 relevant producers, then our mediator probably needs an execution engine! - HARD! In any case, we know that some R-GMA users are defining Archivers and querying these directly. However: the local answer may only be a subset of the global answer. they may get a wrong answer (if the query involved max, avg, count, etc.) Answering Joins using Archivers tables: cpuLoad, discspace condition: country =‘britain’ Requirements: • Complete views (I publish everything!) • “Latest” or “History” query-type (so data in a database, not a buffer). • A smart registry hmm.. just need to go to 1 Archiver. • Tuple matching always needs to take place in the same database, and never across databases. e.g. “SELECT * FROM cpuload c, discspace s WHERE c.site = s.site” can easily be answered using site archivers Problems with Answering Joins using Archivers (1) Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers new LatestProducer registers. Archiver can’t stream from it. mediator needs to mediate between two producers, but doesn’t have a query engine! If a Answering Joins using Archivers (2) Problems: Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers new LatestProducer registers. Archiver can’t stream from it. mediator needs to mediate between two producers, but doesn’t have a query engine! If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. Answering Joins using Archivers (2) Problems: Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers new LatestProducer registers. Archiver can’t stream from it. mediator needs to mediate between two producers, but doesn’t have a query engine! If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. Answering Joins without Archivers Query Planning and Execution: What are the relevant Producers? What sub-queries should we send them? How should results be combined and operated on? (need a query engine!) Where? Possible Query Engines: MySQL - dump all the data into MySQL … easy! Polar Star (Manchester) ? … compatability? Conclusions We could support some “global” join queries quite easily: when just one Archiver is enough (needs a smarter Registry) suggestions could be given when there isn’t one Archiver available (consumer.getPlan()) and/or ad hoc joins could answered (in-efficiently) by first loading data into MySQL But: what queries do users want to pose? shouldn’t we restrict users to using only StreamProducers?